Home |
Search |
Today's Posts |
|
#1
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Hi all
Does anyone have a quick routine for running multiple regressions over a range of files in Excel? I have a whole load of .xls files. Each contain data in columns D and E. D is always the independent (x axis) and E is always the dependent (y axis) but each file will have a different number of rows. I've never been too fond of .xlDown in VBA and a clumsy routine to count until the first blank row would take a while with over 5,000 files! I need a routine to open up each file, feed the ranges in D and E into the Regression routine in the Data Analysis toolpak, create the outputs but (crucially) dump the output somewhere. With 5,000 files, this means 5,000 regressions and therefore 5,000 worksheets in the results workbook. I'm not sure what the limit is nowadays - used to be 256 sheets per book when I last got anywhere near it. What I really need is a table showing (for each regression) what the "important" results were i.e. the intercept, the coefficient and the associated goodness of fit. If anyone can help, that would be great. Thx lk |
#2
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
lk has brought this to us :
Hi all Does anyone have a quick routine for running multiple regressions over a range of files in Excel? I have a whole load of .xls files. Each contain data in columns D and E. D is always the independent (x axis) and E is always the dependent (y axis) but each file will have a different number of rows. I've never been too fond of .xlDown in VBA and a clumsy routine to count until the first blank row would take a while with over 5,000 files! I need a routine to open up each file, feed the ranges in D and E into the Regression routine in the Data Analysis toolpak, create the outputs but (crucially) dump the output somewhere. With 5,000 files, this means 5,000 regressions and therefore 5,000 worksheets in the results workbook. I'm not sure what the limit is nowadays - used to be 256 sheets per book when I last got anywhere near it. What I really need is a table showing (for each regression) what the "important" results were i.e. the intercept, the coefficient and the associated goodness of fit. If anyone can help, that would be great. Thx lk Firstly, to find the last row of data you should start at the last cell of a column and use .End(xlUp).Row to find the last entry in the column regardless if the data is contiguous. Secondly, I don't recommend you open over 5000 Excel workbooks to just grab the contents of Cols D:E. You might want to consider using ADODB to grab the data (from closed workbooks) into a recordset that you can manipulate however you like. Examples of how to do this can be downloaded here... http://www.appspro.com/conference/Da...rogramming.zip I'm not sure why anyone would store output data in an Excel file as a storage container. Normally, outputs are written to DAT, TXT, or CSV files so the data can be utilized by any program via its normal file I/O functions. Excel files take up way more storage space than plain text files and so is not an efficient format for any program to output raw data to. Not saying exporting to Excel is a bad practice for analysis purposes, just that it's not an efficient way to store data for use by other software using that data. So.., if the files actually are CSVs then the specific data you need to pass to Analysis Toolpak functions can be manipulated fairly easily using VB's normal file I/O functions to load the data into an array where it can also be easily/quickly parsed as needed for further use. Since all of this happens in memory (as apposed to physically opening each file) it's blazingly faster than working with Excel workbooks and using ADODB to grab data from closed workbooks. -- Garry Free usenet access at http://www.eternal-september.org ClassicVB Users Regroup! comp.lang.basic.visual.misc |
#3
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]() "GS" wrote in message ... lk has brought this to us : Hi all Does anyone have a quick routine for running multiple regressions over a range of files in Excel? I have a whole load of .xls files. Each contain data in columns D and E. D is always the independent (x axis) and E is always the dependent (y axis) but each file will have a different number of rows. I've never been too fond of .xlDown in VBA and a clumsy routine to count until the first blank row would take a while with over 5,000 files! I need a routine to open up each file, feed the ranges in D and E into the Regression routine in the Data Analysis toolpak, create the outputs but (crucially) dump the output somewhere. With 5,000 files, this means 5,000 regressions and therefore 5,000 worksheets in the results workbook. I'm not sure what the limit is nowadays - used to be 256 sheets per book when I last got anywhere near it. What I really need is a table showing (for each regression) what the "important" results were i.e. the intercept, the coefficient and the associated goodness of fit. If anyone can help, that would be great. Thx lk Firstly, to find the last row of data you should start at the last cell of a column and use .End(xlUp).Row to find the last entry in the column regardless if the data is contiguous. Secondly, I don't recommend you open over 5000 Excel workbooks to just grab the contents of Cols D:E. You might want to consider using ADODB to grab the data (from closed workbooks) into a recordset that you can manipulate however you like. Examples of how to do this can be downloaded here... http://www.appspro.com/conference/Da...rogramming.zip I'm not sure why anyone would store output data in an Excel file as a storage container. Normally, outputs are written to DAT, TXT, or CSV files so the data can be utilized by any program via its normal file I/O functions. Excel files take up way more storage space than plain text files and so is not an efficient format for any program to output raw data to. Not saying exporting to Excel is a bad practice for analysis purposes, just that it's not an efficient way to store data for use by other software using that data. So.., if the files actually are CSVs then the specific data you need to pass to Analysis Toolpak functions can be manipulated fairly easily using VB's normal file I/O functions to load the data into an array where it can also be easily/quickly parsed as needed for further use. Since all of this happens in memory (as apposed to physically opening each file) it's blazingly faster than working with Excel workbooks and using ADODB to grab data from closed workbooks. -- Garry Free usenet access at http://www.eternal-september.org ClassicVB Users Regroup! comp.lang.basic.visual.misc If it's any help - I have the whole dataset in a 15,000,000 row csv file. I'm quite happy to manipulate if from there but couldn't see how to get Excel to read chunks of it, analyse that chunk and then move on to the next chunk. |
#4
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Dec 23, 2:58*am, "lk" wrote:
Hi all Does anyone have a quick routine for running multiple regressions over a range of files in Excel? I have a whole load of .xls files. * Each contain data in columns D and E. D is always the independent (x axis) and E is always the dependent (y axis) but each file will have a different number of rows. * I've never been too fond of .xlDown in VBA and a clumsy routine to count until the first blank row would take a while with over 5,000 files! I need a routine to open up each file, feed the ranges in D and E into the Regression routine in the Data Analysis toolpak, create the outputs but (crucially) dump the output somewhere. * With 5,000 files, this means 5,000 regressions and therefore 5,000 worksheets in the results workbook. * I'm not sure what the limit is nowadays - used to be 256 sheets per book when I last got anywhere near it. What I really need is a table showing (for each regression) what the "important" results were i.e. the intercept, the coefficient and the associated goodness of fit. If anyone can help, that would be great. Thx lk If it makes things easier, you don't need to use the Data Analysis Toolpack. You can use the inbuilt Excel functions SLOPE and INTERCEPT over whole columns. In a spare spot you can put something like Range("Z1") = "=SLOPE(E:E,D:D)" Range("Z2") = "=INTERCEPT(E:E,D:D)" in your VBA. The when the file is opened, it copies the formulas into Z1:Z2 and the slope and intercept appear, ready to be copied off somewhere else. It's probably quicker too. xt |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Regressions with data in (changing) rows | Excel Discussion (Misc queries) | |||
Regressions using dates | Excel Worksheet Functions | |||
running regressions in Excel 2003 | Excel Worksheet Functions | |||
I used Lotus Help for regressions in MS Excel 99. Now what? | Excel Worksheet Functions | |||
Creating regressions with more than 17 data points... | Excel Programming |