The Monte Carlo Method vs. the Normal Distribution: Approximating Uncertainty in the Absence of “Headwinds”

Disclaimer: This post is rather wonkish with statistical and metallurgical discussions.

In a previous post I outlined the use of Bill James (founder of Sabermetrics / Moneyball influencer) similarity index to a metallurgical engineering project (Link). In the example, a statistical model was developed for projecting the strength of a particular alloy with respect to changes in a processing parameter. A Monte Carlo simulation was used to evaluate the distribution in the projected strength as a result of 1,000’s of changes in the model inputs. The result of the simulation gives you the approximate probability of the various outcomes.

The Monte Carlo Method

The Monte Carlo method, as you may have guessed, derives its name from the Monte Carlo casino in Monaco. The approach was invented by a Polish mathematician named Stanislaw Ulam as part of the Manhattan Project (Source). The inspiration for Ulam came from playing solitaire and wondering an easy way to calculate the probability of winning the game, eventually leading him to apply this logic to neutron diffusion (Source). The method involves taking a model and feeding in distributions of the various inputs and recording the outputs over hundreds or thousands of iterations. An example of a distribution generated from a Monte Carlo simulation results for work performed on developing an improved age practice for 7068 aluminum is provided below. The simulation was performed by changes in the time and temperature adjustments to the two step age practice performed following solution heat treatment.

MonteCarloPic

The Monte Carlo method as a tool for generating a distribution of probable outcomes, differs from the classical example taught in Stats 101 courses. Classical empirical modeling is typically introduced using linear regression (think Excel and the linear trend line). Linear regression models are developed such that a line is drawn through the average or expected outcome for an input variable or set of input variables. The residuals or difference between the actual and projected (expected) values at a given point are assumed to be normally distributed and any residual outside of 2 standard deviations from the projected value is considered an “outlier” (Source). This “lazy” approach to modeling the distribution of outcomes can be effective; however, comes with the risk of greatly underestimating the probability of “unlikely” outcomes.

When the Normal Distribution Fails

The defining characteristic of the normal distribution is its central tendency or in layman’s terms, the majority of the data is clustered around the mean. The image below outlines this concept by highlighting the percentage of the data in each area with respect to the number of standard deviations (σ) from the mean (μ). From the image you can see as only 0.1% of the data lies beyond the 3σ point. This feature makes the normal distribution easy to illustrate and drives its use as the basis for tools such as control charts.

Standard_deviation_diagram-1024x725

Image source

Nassim Nicholas Taleb (NNT), in his book The Black Swan, exhausts the fact that using the Normal (Gaussian) Distribution is dangerous for approximating the likelihood of seemingly low probability outcomes. NNT states that things that are normally distributed face “headwinds” which make probabilities drop faster and faster as you move away from the mean (e.g. height, IQ, etc.). If the “headwinds” are removed the resulting outcomes will become significantly asymmetrical (think 80/20 Pareto principle). NNT illustrates this point by contrasting wealth distribution in Europe and contrasting it with what the distribution would look like if it were normally distributed.

Wealth Distribution in Europe:

  • People with wealth greater than €1 million: 1 in 63
  • Higher than  €2 million: 1 in 125
  • Higher than  €4 million: 1 in 250
  • Higher than  €8 million: 1 in 500
  • Higher than  €16 million: 1 in 1,000
  • Higher than  €32 million: 1 in 2,000
  • Higher than €320 million: 1 in 20,000
  • Higher than €640 million: 1 in 40,000

Normal Wealth Distribution:

  • People with wealth greater than €1 million: 1 in 63
  • Higher than  €2 million: 1 in 127,000
  • Higher than  €4 million: 1 in 886,000,000,000,000,000
  • Higher than  €8 million: 1 in 16,000,000,000,000,000,000,000,000,000,000,000

 

The above example demonstrates if wealth was  normally distributed, the likelihood of a Bill Gates or Warren Buffett is incomputable and provides a simple lesson in the fragility of the normal distribution when it comes approximating the probability of unlikely outcomes.

Wrapping it Up

“All models are wrong, some models are useful”

-George Box, Industrial Statistician

In process engineering, “head winds” to borrow the term from NNT are made up of the controls imposed on the process inputs. These controls form the basis for the Y = f(x) philosophy touted by Six Sigma books to demonstrate that if the inputs to a process are “in control” the resulting outputs will be as well. The problem with this logic is that it implies that the organization attempting to control the process has identified all the necessary input variables and deployed adequate controls (i.e. “headwinds).

Recently, I fell victim to this oversimplification after resurrecting a model which used the “lazy” approach to modeling uncertainty discussed above and applying it to a process where the “headwinds” (i.e. controls on raw material) had been removed. The result was a drastic underestimation of the probability of an undesirable outcome (production of material outside of the specification limits). Using the “lazy” approach (+/- 3σ) to modeling the probable outcomes the likelihood of nonconformity ended up being an order of magnitude higher than originally projected, DOH!.

Lesson Learned: Avoid the “lazy” approach and embrace the Monte Carlo!

 

 

Advertisements

How Nate Silver made me a better Metallurgist

Nate Silver is the founder of Fiverthirtyeight.com, creator of the PECOTA baseball forecasting system used by Baseball Prospectus, and a renowned political forecaster. In his book, The Signal and the Noise, Nate outlines the creation of the PECOTA system and lessons learned from Bill James (founder of Sabermetrics), along with taking a look at other forecasting problems opportunities. Silver’s PECOTA system relies on a metric resembling the similarity index  proposed by Bill James in his 1986 Baseball Abstract. James developed the similarity index as a tool for comparing any two major league players. In James system the index starts with a 1000 points and detects points based on a set of guidelines. Highly similar players will have indexes as high as 950 or 975. Similarly the PECOTA system uses an index to evaluate a player against a multitude of former major and minor leaguers to project a players performance.

For a young metallurgist whose livelihood depends on projecting the results of varying parameters of an assortment of  metallurgical processes to achieve a desired result, how could the lessons of a Sabermetrician help? The opportunity presented itself with the need to develop a high strength product in Alloy 825, an austenitic iron-nickel-chromium alloy commonly used in environments where enhanced corrosion performance is required. The product was to be cold-worked (i.e. deformed at room temperature) to a desired size and strength level. The challenge is none of this data was readily available!

After performing a simple Google search, data for other austenitic alloys such as Alloy 625 (a Ni based alloy) and 316 stainless steel (Fe based) could readily be obtained from sources like ATI and Special Metals. Thus, a simple curve could be fitted to the results for these two alloys. Following Silver’s first principle, Think Probabilistically, a Monte Carlo simulation was developed using several distributions fed into the model to generate a distribution of results at each cold working level. The Monte Carlo simulation was formulated feeding a similarity index varying uniformly (0.5-0.9), a normal distribution of fully annealed Alloy 825 yield strengths, and a normal distribution of residuals from the fitted cold working curves for Alloy 625 and 316. An outline of the model is provided in the figure below.

Alloy 825 Model

The Monte Carlo simulation results are provided in the graph below with the blue line representing the mean result with respect to degree of deformation (i.e. percent cold work / area reduction), the redline representing the 99% probability and the bottom line representing the 1% probability. The customer upper and lower specification limits (USL & LSL) are also plotted for reference. The work hardening curve below shows that at a cold working percent of about 30 the product is nearly assured to meet the tensile strength requirements. These results were subsequently validated with actual experiments with a percent error of less than 3%. Eureka!

825 Model Results

 

Create a website or blog at WordPress.com

Up ↑

%d bloggers like this: