Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

Correlation Coefficient

Status
Not open for further replies.

BlackestKnight

Industrial
Aug 20, 2012
11
0
0
US
Hello all,

I am working for a plant and we have executed a green initiative that starts with the monitoring of our total usage in water, electricity, fuel etc. We have a table to do this, but it requires way too much data entry and time to complete so we are trying to use excel to automate it as much as possible.

If I have three column sets of data that are ordered like this (rough example of the table that has more data)

P= Pressure vessel cooling water, S= Sanitary water, and T = total water

P + S =T
20 10 30

30 15 45

40 20 60


And they are tabulated over the span of a few years, is it possible to create a formula or use a constant number that I can multiply or divide the total water consumption by, that can give me the pressure vessel cooling water with a slight but negligible margin of error?

What I did was with two of the columns (pressure vessel cooling water and total water consumption) calculate the correlation coefficient. It was about .9 which is pretty good. with that I set the

[Pressure Vessel cooling water amount] = [Total Water Consumption]*x

Solved for x and got say 0.7, can I use this number within the last few years of data sets in order to predict the amount of cooling water? The goal is to be able to do it for the remaining months of this year with again as I mentioned, a reasonable margin of error but to run the experiment on the last few years in order to get a better coefficient.

I experimented with the value and I recieved variations between the calculated values vs the original values of +/- no more than 10%. I don't know if there is a more formal formula for doing this but any help would be much appreciated!
 
Replies continue below

Recommended for you

Yes thats how I got 0.7 but this is just a sample for for one year, 2011. I want to know that if I can use this same 0.7 moving forward to predict the values as long as our production doesn't vary too much. I'm not sure if there is a more sophisticated way to explain or do this but we want to be able to predict the amount of cooling water we will have to use with a reasonable amount of error.
 
Seems to me that you are going at this backwards. If you have a known and operational process, then you should already know how much cooling water and other water is required to run the process, which should analytically give you what you want.

TTFN
faq731-376
7ofakss
 
IRstuff,

I understand what you mean and that is how I approached it from the beginning but the main question I would say is how to ensure the repeatability of the number I solve for. That is why I tried the correlation coefficient first between the two sets of data. This showed that they typically rely on one another. Then I solved for the constant variable that I would use in the future to get a good prediction on what either the amount of cooling water will be or sanitary water etc etc. Again these are just a few values from several columns that include pH adjusted water and so on.

In another sense, if I have a constant value that relates two of the variables, I can predict what one will be within a reasonable margin of error depending on their correlation.

I just want to be able to create a type of standardized formula. I know there will be error but I am sure we can neglect it if it is reasonable say 5%.
 
*In another sense, if I have a constant value that relates two of the variables, I can predict what one will be within a reasonable margin of error during the next few years (I would be able to forecast the amount of sanitary or cooling water in the next few years give or take a few gallons) depending on their correlation.
 
Correlation does not equal to cause. I was trying to get you to ask someone how the sanitary water relates to the cooling water. It's your own plant and your own processes, so someone already knows the answer.

TTFN
faq731-376
7ofakss
 
Will this work? The top half has randomly created values for P & S that must be replaced with actual values. The bottom half is for entering day by day P values to obtain the estimated S value and to total for value T. The statistical information on the top half is to allow you to see how accurate your estimated values for S are likely to be.

Warning about implementing this. If you think you may need to change something, the first step is to measure where you are so that the effect of changes may also be measured. Estimating S maybe time saving for now, but you can probably expect that once you have collected enough data to show the present state of the system, someone is going to want to make system changes to "improve" the water usage and that may well be measured by changes in the ration between P and S. If that is the case, you need actual data, not estimated data and you will need to measure each water usage separately and do the additional data entry.
 
Status
Not open for further replies.
Back
Top