Continue to Site

Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

  • Congratulations KootK on being selected by the Eng-Tips community for having the most helpful posts in the forums last week. Way to Go!

The Fallacy of Extrapolating with Computer Models 7

Status
Not open for further replies.
I have vocally and repeatedly proclaimed that computer models cannot prove anything. As I was working on updating my 5-day Engineering course I came across a perfect example of what I'm talking about.

This model started with the best dataset ever assembled. Really. The best ever. The underlying data was all "coincident" to this model. By that I mean that the people collecting the data had a strong motive to make it as accurate and complete as it could possibly be. Those folks got paid and they paid partners and mineral owners based on the data (so they had no significant incentive to illegally adulterate it), oh yeah, the data is required by law to be complete and accurate and has been since the 1940's. Further the data was collected each month on upwards of 400,000 discrete entities operated by nearly 100,000 business entities, all with an explicit license to operate that is not trivial to acquire. In other words the data entering this model has had financial and legal incentives to be accurate and complete. Of course, the dataset is monthly U.S. gas production by well.

If you start with this high quality data and bring in:
[ul]
[li]Historical wellhead price and consumer price data sets along with an independent forecast of those prices into the future[/li]
[li]A detailed data set containing historical new-well permits that can be compared to the price data over time[/li]
[li]A detailed data set containing new facility permits that can be compared to new well permits and the price data[/li]
[li]A detailed list of issued permits for facilities (that take up to 10 years to build after the permit is issued) with their projected completion dates and projected capacities (see the big uptick in the attachment in the Alaska data in 2019 representing the pipeline coming on line)[/li]
[li]Independent forecasts of inflation[/li]
[li]Historical and (independently) projected steel pipe worldwide manufacturing tonnage and prices[/li]
[li]A team of very talented, very experienced Engineers, Economists, Statisticians, and Computer Modelers[/li]
[li]A project deadline that the team felt was very liberal[/li]
[li]No limits on budget for manpower, computing equipment, or software[/li]
[/ul]

It really doesn't get any better than this. They published the attached forecast in the 2007 Energy Outlook there were some glitches in the first version so they updated it and published the attached in the 2008 Energy Outlook. This chart was reprinted hundreds of times over the next few years. I haven't seen it much since 2011. I pulled in the data yesterdat and added the actual production between 2006 and 2011 (last data available that breaks out the various gas types).

So at year 5 you find:
[ul]
[li]Unconventional gas under predicted 92% (using the Unconventional Gas forecast as the denominator)[/li]
[li]Onshore conventional over predicted 41%[/li]
[li]Offshore over predicted 60%[/li]
[li]Alaska over predicted 55%[/li]
[li]Total gas under predicted 19%[/li]
[li]If you remove the Unconventional component, total would be over predicted 55%[/li]
[/ul]

This is at year 5 of a 24 year forecast. In early 2008, Gas prices were over $10/MSCF, the drilling in the Marcellus, Hanesville, and Fayetteville shales had accelerated and rig counts were approaching all-time highs. All of this data was readily available to the modelers, but they didn't quite believe it and tweaked the model back to a slight increase followed by flatish with offshore taking up the slack.

I don't mean to ridicule these guys, they did a workmanlike job. I made a similar blunder in 1990 when I failed to include a group of wells (that I had already built pipe to) in a forecast of the value of a company that was on the market. A competitor did include those wells and offered $15 million more than we did--the group of wells I excluded produced that much profit the first 6 months and 23 years later they are still on production.

My point is that with a superb set of clean data, unlimited time and budget, a team with all of the requisite skills and no incentives for reaching a particular conclusion couldn't predict something as "simple" as gas production within 55%, how can anyone put any credence in the climate models that have questionable data, intense time pressure, intense budget pressure, and intense pressure to reach a specific conclusion? Hell, they could even be "right", but I won't be willing to accept that until we can look back at a body of predictions that have the same shape as the actual (raw) data for that period. So far we are not even close.

David Simpson, PE
MuleShoe Engineering

"Belief" is the acceptance of an hypotheses in the absence of data.
"Prejudice" is having an opinion not supported by the preponderance of the data.
"Knowledge" is only found through the accumulation and analysis of data.
The plural of anecdote is not "data"
 
MintJulep,
Of course you are right. It has been a while since last I saw a model that purported to project some non-trivial data set that didn't not brag about using bits and pieces from others to show that the biases couldn't possibly be theirs. That may be cynical, they are probably bragging about using the other models to show the vast quality of the results. There were probably 15 forecast data sets (model output) used as input to the model.

Of course permits do not equal facilities, but the argument (that I kind of support) is that it is so expensive to get a permit to build Oil & Gas facilities that once you have it in hand it is an asset and you will either build the facility or sell the permit to someone who will (in a way that doesn't invalidate the permit).

David Simpson, PE
MuleShoe Engineering

"Belief" is the acceptance of an hypotheses in the absence of data.
"Prejudice" is having an opinion not supported by the preponderance of the data.
"Knowledge" is only found through the accumulation and analysis of data.
The plural of anecdote is not "data"
 
But you have to ask, would things be better if there was NO 'modeling' whatsoever? In other words, doesn't having a 'model' which is based on current and relevant data, but is recognized as being potentially inaccurate due to so many unpredictable variables, still better than just shooting in the dark?

John R. Baker, P.E.
Product 'Evangelist'
Product Engineering Software
Siemens PLM Software Inc.
Industry Sector
Cypress, CA
Siemens PLM:
UG/NX Museum:

To an Engineer, the glass is twice as big as it needs to be.
 
Shouldn't we define what we mean by a 'model'.

Which is a computer model.

1 A program calculating force on an object given its mass and acceleration as inputs.
2 A program calculating the numerical value of an integral that doesn't have closed form solution.
4 A PID feedback controller designed around a particular system model implemented in a computer.
5 A control system on a modern aircraft that stabilizes the plane in a way a human could not fly implemented in a computer.
6 A hydrological model of a lakes level given initial state and recent heavy rainfall.

All physical relationships are just models.
F=ma
V=IR
Now I don't think zdas04 would object to calculating f=ma with his handheld calculator or a spreadsheet even though
these are just models implemented in a computer.
So somewhere Zdas04 mind between these and navier stokes equations, models become the subject of scorn.
Maybe Zdas04 ( who has more experience than most with compute models ) could tell me where models go bad and
earn his scorn.


 
The title of the thread kind of covers it. I use pipeline models. I calibrate a model based on the best set of data I can get my hands on. I pull whatever levers are available to me to get the output of the model to match measured conditions. Then I bring in another data set and leave all the adjustments as with the first dataset. If the model matches the pressures on the second set (without adjustments) then I have some confidence that when I put in a pipeline modification that the model will indicate performance. I put in a very limited number of modifications and evaluate what those piping modifications might do to the pressure distribution on the system. Eventually I get to a point of recommending physical modifications to the pipe based on the model results. Then I build the new pipe and calibrate a new model. Basically I am willing to extrapolate one time step with some confidence. A second time step would use model results as input and I have far less confidence that that step would relate to reality in a meaningful way. A 20,000th step is just random numbers.

In my mind, complex computer models are fantastic tools to evaluate potential modifications to the physical world. They are also very good at illuminating areas for further experimentation. They do not "prove" things since a model cannot be anything more than the knowledge of the person who wrote it. If someone was to say "AGW is a fact because I know it is a fact", you wouldn't have much faith in that proof. That is exactly what the computer modelers are saying. They wrote a model. I shows a different temperature rise with man-generated CO2 than without man-generated CO2 so AGW is "proven". The proof in this case has no more validity than the statement that someone "knows" it to be true.

David Simpson, PE
MuleShoe Engineering

"Belief" is the acceptance of an hypotheses in the absence of data.
"Prejudice" is having an opinion not supported by the preponderance of the data.
"Knowledge" is only found through the accumulation and analysis of data.
The plural of anecdote is not "data"
 
I'm reminded of financial modeling and widespread use of gaussian copula centered models by hedge funds and larger intermediaries as 'mortgage-based' [plus commodity-backed, etc.] structured finance ramped up during the 90's and 00's -- Surely worked well didn't it.

But, on topic[?] -

Climate change impact on available water resources obtained using multiple global climate and hydrology models

''..future climate change impact assessments are highly uncertain. For the first time, multiple global climate (three) and hydrological models (eight) were used to systematically assess the hydrological response to climate change and project the future state of global water resources. This multi-model ensemble allows us to investigate how the hydrology models contribute to the uncertainty in projected hydrological changes compared to the climate models. ...''

Earth Syst. Dynam., 4, 129-144, 2013
doi:10.5194/esd-4-129-2013
© Author(s) 2013. This work is distributed
under the Creative Commons Attribution 3.0 License.

Effective - accurate, predictive - models, are, I believe, possible.
 
From my perspective...

Many people think of models as being designed to predict the future based on initial conditions and a set of governing physical equations. Except for very simple systems, we all know that this isn't generally possible, beyond a very small advance in time. Otherwise we'd all be able to plan for future weather.

For other models, there may be boundary conditions that constrain the evolution of the system being simulated. I work in the engine performance simulation industry. It is very rare for a perfomance model to be dependent on initial conditions, since it has a strong, cyclic boundary condition applied to it. Nearly all engine flow/combustion models converge to a cyclically repeating solution (and if they don't that's probably ok too). We can make changes to the model and believe the changes to the solution.

- Steve
 
I haven't read any serious climate scientists who claim that a computer model 'proves' anything.

But, like politics, it's more fun to claim people say something, and then disparage them for what you pretend they say.

 
Computer models don't even have to be based on any physical laws to get things right. Google "Artificial Neural Networks" and spend a couple days reading. All you need, to create a model that has great predictive power, is lots of historical data to use as inputs and outputs. Then train the model to that data.

The idea that models can't be used to predict is silly. Of course they can be used to predict. Otherwise they'd be pointless. And they don't even need to be based on science if you have enough data. You can use ANNs to predict football games.

What they CANNOT do is prove causality. This is always the fun example, with which most engineers will have some familiarity:

hq-graphcopy2_800.jpg


Now sloppy graphing aside, one can show an inverse relationship between the number of pirates vs global warming. One can even tune a computer model to show that, and get a correlation between the two that's not terrible. What you can't do, is use that model to prove that one caused the other, when there may be entirely different factors in play that could be causing both.

In the global climate environment discussion, it's not just carbon that's increasing, it's everything else that mankind does to manipulate our environment that's also increasing. The things tracking 'upwards' are temperature and people. That antropogenic carbon is on the rise is incidental to the fact that *everything* humans do is on the rise. The model trained to carbon proves nothing. You could train a model to the length of roads, or the number of buildings, or the number of printed books, or CFCs (apparently) or who knows what and still get a high correlation. But correlation does not mean causality.

A model specifically trained to follow a correlation is not proof. It can't be proof, by its very nature. Lots of scientists are forgetting that these days.


Hydrology, Drainage Analysis, Flood Studies, and Complex Stormwater Litigation for Atlanta and the South East -
 
i think the problem extrapolating historical trends is that things change in the future.

sure you can predict the statistical chance that one team will beat another. would you use the same methods to predict the next ten meetings of those teams ? or just the next game ??

but in complex and chaotic systems things change unpredictably (there might be another outbreak of Somali pirates, and no doubt another cooling, because more guys and boats and guns became available.

Quando Omni Flunkus Moritati
 
"would you use the same methods to predict the next ten meetings of those teams ? or just the next game ??"

Or would you not try to predict the individual games at all and instead predict the overall record at the end of 10 games.
 
To clarify.

Open loop (predictive) models are all rubbish.

- Steve
 
To take the sports game analogy a little further...

brad1979 makes an interesting point - what resolution do you want in the predictions? I suppose that's why not only are there climate models, but ensembles of climate models - supposedly none of the individual climate models are exceedingly good, but the multi-model mean is somehow very good.

Since fluid mechanics is not only an initial-value problem, but also a boundary-value problem, it is conceivable that one could be fooled into the thinking that it doesn't matter what the outcome of the next 10 games are, only the record at the end of the 10 games. However, when the outcome of each individual game is the seeding for the next game (initial-value problem here - all sporting games have an initial-value of identically zero), then the results of EACH game are important.

That brings me to an interesting issue of averaging. If I had a winter and spring that was exceptionally below-average, and a summer and fall that was equally above-average, what does averaging tell me about my year? Nothing. How about Phoenix, AZ being hotter than average, while Miami, FL being below-average. What does that tell me about CONUS temperatures? Nothing. Or even more dramatically, how about Miami being 1°F below average (at 89°F and 90% RH) vs Tuktoyaktuk, NWT being 5°F above average (and 30°F and 25% RH)? Can you even average temperature? What does that mean?
 
Interesting column on watts up


Basically rather than looking at the actual temperature record the author looked at the various computer models that have been used.

Rather interestingly, for all their complexity, and the claims that they were developed independently, they all behave in accordance with a simple one line equation with a couple of tuning factors in it.

In my little patch of the world this happens quite often - for example I can spend months building and correlating a complex multi body dynamics model that predicts when or if an SUV will roll over, or I can multiply the height of the cg by the coefficient of friction of the tires, divide it by the vehicle's track, and make a prediction with much the same accuracy.

This doesn't mean more complex models aren't worth having, necessarily, but it does indicate that in the case of climate models their apparent complexity is not doing any real favors accuracy wise.



Cheers

Greg Locock


New here? Try reading these, they might help FAQ731-376
 
It depends on what kind of system we are trying to model. Nonlinear systems can be difficult to model and you could also be dealing with chaos. For nonlinear systems there is often a time horizon beyond which prediction breaks down. Nonlinear systems are also very sensitive to small changes in initial conditions.

The time horizon is related to the error in the initial conditions for the model. There is no such thing as 100% accuracy in any measurement device. There will always be some small amount of error. This error represents the difference between our measurements of the initial conditions and what the initial conditions actually are. As time progresses in the model, this discrepancy will grow. The time horizon is the period of time which the model remains accurate to within some tolerance. The time horizon has a logarithmic dependence on the discrepancy between the measured initial conditions and the actual initial conditions. In grad school I took a course on chaos theory. Absolutely zero practical application in my career thus far, but some very mind blowing stuff. One example I remember from class on this time horizon stuff is that if you were to invent the most accurate measurement equipment in the world-intsturments that are ONE MILLION times more accurate than what is currently available to measure the initial conditons of a chaotic system, the accuracy of the extrapolation of the model would only increase by 2.5 times.


In nonlinear models that are solved iteratively, round off error can also grow very rapidly over successive iterations. One very simple nonlinear iterator can demonstrate the sensitive dependence on initial conditions. Let's say we have a model that takes an input, squares it, and then feeds the result back into the model. Take three numbers:0.99999,1, and 1.00001 and use them as initial conditions for the model. What happens? All three numbers are essentially 1. The difference between the largest and the smallest is 2e-5. However, our results from the iterator will be shockingly different after a few iterations. When you start with 1, 1^2=1. You can iterate forever and the result will always be 1. Squaring 1.00001 gives 1.0000200001, and after 20 iterations you have 35800.15749030058, and for 0.99999, after 20 iterations you have 2.79299091957e-5. Even the difference between 1.00001 and 1.000006 will be large after 20 iterations (35800 vs 539) Miniscule difference in initial conditons, huge difference in results after 20 iterations.
 
How about an outbreak of Nigerian Pirates?

If you create a model, can't you predict how accurite it is?

All I know is the local model of traffic flows seems to look like I get every red light, and I do every morning. I know it's a model because lights turn green when no cars are present, while I sit at the intersection with a red light.

 
Actually - that was kinda the point of the wattsupwiththat.com post - that the super-duper-complicated GCMs were no different from a 0-dimension simple equation. And the deviation of the actual numbers from the prediction (from 1981) falls well outside the 2σ bands from the Hansen81 paper, especially when you consider actual CO2 concentrations.

Conclusion - the GCMs can be reduced to a simple equation. Neither the simple equation nor the GCMs are particularly accurate. Also, could it be that for however complicated the GCMs are, they have hard-wired into them the simple equation, which means that they will likely do no better than the simple equation.

Model fail.
 
Well i think the heat flux effect of GHG is very well established, what the complicated GCM models are trying
to do is estimate how this heat flux will effect the variables that effect the amount of heat flux.
In other words what are the feedback ramifications??

And how does this excess heat flow into the Earth 'mainly the oceans'.

So simple heat flux equations tell part but not all of the story.

 
Overall heat flux is one thing. Spatial variability and transport (convection, advection, ocean currents, Rossby-waves, jet-stream, tropical cyclones, etc), as well as timing (daily onset of tropical thunderstorms, for example) should be the areas of interest in GCMs. If energy is flowing into earth's system mainly in the tropics, but is flowing outward in the tropics and the poles, the transport issues are critical.

I am of the opinion (I cannot confirm this) that the feedback mechanisms should be left to the physics of the simulation. However, they appear to be more hard-coded to better hindcast existing temperature histories. Of course, that leads to the "tropical tropospheric hot-spot" that is hypothesized, but isn't measured, hypothesized changes to humidity, etc.

Since the regional resolution and success of the GCMs is poor-to-fair, we get phenomena such as mis-estimating the seasonal Arctic Sea Ice area/volume and variability in one direction while mis-estimating the Antarctic Sea Ice area/volume and variability in the other direction. We still have no idea why phenomenon such as blocking highs occur and persist or even why an one emergent low-pressure system in Ethiopia can grow into a Cat 5 Atlantic hurricane while others fizzle over the Sahara.

Since the 0D equation does not hindcast or forecast even remotely closely, and the current GCMs are no better than the 0D equation, what does that really say about the GCMs?
 
TGS4 said:
Actually - that was kinda the point of the wattsupwiththat.com post - that the super-duper-complicated GCMs were no different from a 0-dimension simple equation. And the deviation of the actual numbers from the prediction (from 1981) falls well outside the 2σ bands from the Hansen81 paper, especially when you consider actual CO2 concentrations.

Conclusion - the GCMs can be reduced to a simple equation. Neither the simple equation nor the GCMs are particularly accurate. Also, could it be that for however complicated the GCMs are, they have hard-wired into them the simple equation, which means that they will likely do no better than the simple equation.

Model fail.

In other words, they could have gotten the same results from plotting CO2 vs global temp in Excel and told it to best fit curve the results.

And if that's the case, it's quite likely they could get similar results by plotting human population vs global temp in Excel.



Hydrology, Drainage Analysis, Flood Studies, and Complex Stormwater Litigation for Atlanta and the South East -
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor