Lucaslondon
Computer
- Sep 3, 2007
- 1
Hi,
First of all apologies for the long post. Hope someone can offer some advice.
I have about 200 columns of time series data that I need to perform a correlation analysis on in terms calculating lagged cross correlations between all the variables.
Currently the data is stored in Excel. Variable 1 is in Column A, variable 2 in Column B etc, the data ends in column GR which contains the last variable. I need to calculate the cross correlations between variable 1 in Column A and all other variables in the other remaining columns. The key thing here is that I need to calculate lagged as well as contemperaous correlations between each of the variables.
Then I want to do the same for the next variable in Column B, i.e. calculate contemperaous and lagged correlations between variable 2 and all other variables.
Is there anyway to automate this so that it work out the correlations up to say 10 lags of for each variable against all other variables and paste the results in some kind of readable grid in an excel sheet for each variable:
For example, for Column A - Variable 1 the results could be shown like this (for the first five variables):
B C D E F
Lag0 X X X X X
Lag1 X X X X X
Lag2 X X X X X
Lag3 X X X X X
Lag4 X X X X X
Lag5 X X X X X
Lag6 X X X X X
Lag7 X X X X X
Lag8 X X X X X
Lag9 X X X X X
Lag10 X X X X X
Where X represents the correlation coefficient between variable 1 and all the other variables (col B, C, D, E, F to Col GR) at different lags. The output doesn't have to be exactly like this though.
I can do this manually in excel using the Correl function, adjusting the series range in the formula every time I want to work out the lagged correlation. The correlation the function takes two arguments:
Array1, Array2
So to calculate correlations for Column A against all other columns I would input:
correl (A2:A100, B2:B100) to give the contemperous correlation between Col A and Col B, correl (A2:A100, c2:c100) to give correlation for col C vs. Col A etc.
Then to work out the correlation at 1 Lag I change the formula to (A3:A100, B2:B99), at lag 2 the formula becomes (A4:A100, B2:B98) etc. Then I do this for all the other columns/Lags but to do this for 200 variables, would take a huge amount of time (probably several days)! Is there a way of doing it using VBA perhaps? Even if it takes a few hours that's OK, as I can leave it to run in the background.
The way I'm thinking of doing this is setting up a workbook with the following sheets:
1) Source Sheet - this would contain all the raw data (Columns 1-200)as described in the top of the post.
2) Calculation sheet - Here I would have a duplicate of the above sheet but with one extra column (say column A) and so the dataset would end in column 201). Beneath this matrix I would set up formulas using the Excel’s correl function to work out the lagged correlation between the variable in column 1 and all other variables to say ten lags. Effectively column one would be variable column whereas all the data in the other columns would be fixed. I could then copy each variable from the source sheet and paste into column 1, work the correlations and then move to the next variable
3) Output Sheet - this would store the results of the correlations for each variable (effectively replicating the output grid I described earlier).
Would this be the best way of doing what I need? If so how easy is it to write the code? Basically I need code that will take the first variable from the source sheet, paste it into column A in the calculation sheet. Then copy the values from the correlation grid and paste it into the output sheet.
Then it needs to repeat the process for next column in the source sheet, and paste the values beneath the results for the previous column in the output sheet. I'm guess I need some kind of loop function that will loop through each variable in the source sheet and apply the code.
Any help much appreciated!
Thanks
Lucas
First of all apologies for the long post. Hope someone can offer some advice.
I have about 200 columns of time series data that I need to perform a correlation analysis on in terms calculating lagged cross correlations between all the variables.
Currently the data is stored in Excel. Variable 1 is in Column A, variable 2 in Column B etc, the data ends in column GR which contains the last variable. I need to calculate the cross correlations between variable 1 in Column A and all other variables in the other remaining columns. The key thing here is that I need to calculate lagged as well as contemperaous correlations between each of the variables.
Then I want to do the same for the next variable in Column B, i.e. calculate contemperaous and lagged correlations between variable 2 and all other variables.
Is there anyway to automate this so that it work out the correlations up to say 10 lags of for each variable against all other variables and paste the results in some kind of readable grid in an excel sheet for each variable:
For example, for Column A - Variable 1 the results could be shown like this (for the first five variables):
B C D E F
Lag0 X X X X X
Lag1 X X X X X
Lag2 X X X X X
Lag3 X X X X X
Lag4 X X X X X
Lag5 X X X X X
Lag6 X X X X X
Lag7 X X X X X
Lag8 X X X X X
Lag9 X X X X X
Lag10 X X X X X
Where X represents the correlation coefficient between variable 1 and all the other variables (col B, C, D, E, F to Col GR) at different lags. The output doesn't have to be exactly like this though.
I can do this manually in excel using the Correl function, adjusting the series range in the formula every time I want to work out the lagged correlation. The correlation the function takes two arguments:
Array1, Array2
So to calculate correlations for Column A against all other columns I would input:
correl (A2:A100, B2:B100) to give the contemperous correlation between Col A and Col B, correl (A2:A100, c2:c100) to give correlation for col C vs. Col A etc.
Then to work out the correlation at 1 Lag I change the formula to (A3:A100, B2:B99), at lag 2 the formula becomes (A4:A100, B2:B98) etc. Then I do this for all the other columns/Lags but to do this for 200 variables, would take a huge amount of time (probably several days)! Is there a way of doing it using VBA perhaps? Even if it takes a few hours that's OK, as I can leave it to run in the background.
The way I'm thinking of doing this is setting up a workbook with the following sheets:
1) Source Sheet - this would contain all the raw data (Columns 1-200)as described in the top of the post.
2) Calculation sheet - Here I would have a duplicate of the above sheet but with one extra column (say column A) and so the dataset would end in column 201). Beneath this matrix I would set up formulas using the Excel’s correl function to work out the lagged correlation between the variable in column 1 and all other variables to say ten lags. Effectively column one would be variable column whereas all the data in the other columns would be fixed. I could then copy each variable from the source sheet and paste into column 1, work the correlations and then move to the next variable
3) Output Sheet - this would store the results of the correlations for each variable (effectively replicating the output grid I described earlier).
Would this be the best way of doing what I need? If so how easy is it to write the code? Basically I need code that will take the first variable from the source sheet, paste it into column A in the calculation sheet. Then copy the values from the correlation grid and paste it into the output sheet.
Then it needs to repeat the process for next column in the source sheet, and paste the values beneath the results for the previous column in the output sheet. I'm guess I need some kind of loop function that will loop through each variable in the source sheet and apply the code.
Any help much appreciated!
Thanks
Lucas