![]() |
How to deal with a growing dataset...
I have a sheet that looks like
User# 1/10/2007 1/17/2007 1/21/2007 1 0 21 44 2 0 45 66 3 0 22 76 4 0 13 45 5 0 45 90 So basically, every week the number of columns will increase as I collect more data -- my table continues getting bigger and bigger. Someplace then I will have an column of equations for the current data for each user that looks like = (D2-c2) * something Where "D2" and "C2" should be the two most recent columns in the above table This seems like such an obvious thing, yet I've rarely seen a sample spreadsheet that deals with a constantly growing table. I have struggled with this problem in various forms for a long time and have never found a good solution. The alternative here would be to manually shift the data over each period when entering the new data - or manually edit those equations. |
How to deal with a growing dataset...
Quick fix to this problem that i could think of would be using the =MAX
function coupled with the =LOOKUP function (assuming the dates are in assending order). cant quite understand what you are asking. Max obviosly finds the most recent date and the lookup can reference what cells you need "Joe Porkka [MSFT]" wrote: I have a sheet that looks like User# 1/10/2007 1/17/2007 1/21/2007 1 0 21 44 2 0 45 66 3 0 22 76 4 0 13 45 5 0 45 90 So basically, every week the number of columns will increase as I collect more data -- my table continues getting bigger and bigger. Someplace then I will have an column of equations for the current data for each user that looks like = (D2-c2) * something Where "D2" and "C2" should be the two most recent columns in the above table This seems like such an obvious thing, yet I've rarely seen a sample spreadsheet that deals with a constantly growing table. I have struggled with this problem in various forms for a long time and have never found a good solution. The alternative here would be to manually shift the data over each period when entering the new data - or manually edit those equations. |
How to deal with a growing dataset...
You could use the LARGE function to find the two largest dates in row 1.
LARGE(1:1,1) finds first largest date LARGE(1:1,2) finds second largest date Then, you do something like: =(HLOOKUP(LARGE(1:1,1),A1:Z10,2)-HLOOKUP(LARGE(1:1,2),A1:Z10,2))*something Adjust the ranges to meet your needs. HTH, Elkar "Joe Porkka [MSFT]" wrote: I have a sheet that looks like User# 1/10/2007 1/17/2007 1/21/2007 1 0 21 44 2 0 45 66 3 0 22 76 4 0 13 45 5 0 45 90 So basically, every week the number of columns will increase as I collect more data -- my table continues getting bigger and bigger. Someplace then I will have an column of equations for the current data for each user that looks like = (D2-c2) * something Where "D2" and "C2" should be the two most recent columns in the above table This seems like such an obvious thing, yet I've rarely seen a sample spreadsheet that deals with a constantly growing table. I have struggled with this problem in various forms for a long time and have never found a good solution. The alternative here would be to manually shift the data over each period when entering the new data - or manually edit those equations. |
How to deal with a growing dataset...
Joe -
You are running into 2 separate issues: The first is trying to use a spreadsheet to track data that screams out to be in a DATABASE. The second is that your data is structured awkwardly. Based on your sample data, you are far better off to have 3 and only 3 columns: UserID, Date, and Value Once Your data is in that format, you can use a simple SUMPRODUCT() formula to calculate the value you are seeking. Another point about the suggested layout is that it allows you to do pivot tables and other analyses easy as anything, while your current structure makes analysis very difficult. "Joe Porkka [MSFT]" wrote: I have a sheet that looks like User# 1/10/2007 1/17/2007 1/21/2007 1 0 21 44 2 0 45 66 3 0 22 76 4 0 13 45 5 0 45 90 So basically, every week the number of columns will increase as I collect more data -- my table continues getting bigger and bigger. Someplace then I will have an column of equations for the current data for each user that looks like = (D2-c2) * something Where "D2" and "C2" should be the two most recent columns in the above table This seems like such an obvious thing, yet I've rarely seen a sample spreadsheet that deals with a constantly growing table. I have struggled with this problem in various forms for a long time and have never found a good solution. The alternative here would be to manually shift the data over each period when entering the new data - or manually edit those equations. |
How to deal with a growing dataset...
Hey that sounds like the right approach...
You are right - eventually I do want a database, but in the meantime I want to get something working a little sooner. I don't understand how SUMPRODUCT does the computation necessary. Given the table of 3 columns (user, date, currentData), what I would want is (in real life the data would be sorted by date, not user, but that shouldn't matter here at all). A B C 1 User Date Reading 2 1 1/10/2007 0 3 1 1/17/2007 21 4 1 1/21/2007 44 5 2 1/10/2007 0 6 2 1/17/2007 45 7 2 1/21/2007 66 8 3 1/10/2007 0 9 3 1/17/2007 22 10 3 1/21/2007 76 11 4 1/10/2007 0 12 4 1/17/2007 13 13 4 1/21/2007 45 14 5 1/10/2007 0 15 5 1/17/2007 45 16 5 1/21/2007 90 For each unique user select 2 most recent rows and subtract currentData. So I would end up with data that looks like: user Delta Comment 1 23 "C4-c3" 2 21 "c7-c6" 3 54 "c10-c9" 4 32 5 45 "Duke Carey" wrote: Joe - You are running into 2 separate issues: The first is trying to use a spreadsheet to track data that screams out to be in a DATABASE. The second is that your data is structured awkwardly. Based on your sample data, you are far better off to have 3 and only 3 columns: UserID, Date, and Value Once Your data is in that format, you can use a simple SUMPRODUCT() formula to calculate the value you are seeking. Another point about the suggested layout is that it allows you to do pivot tables and other analyses easy as anything, while your current structure makes analysis very difficult. "Joe Porkka [MSFT]" wrote: I have a sheet that looks like User# 1/10/2007 1/17/2007 1/21/2007 1 0 21 44 2 0 45 66 3 0 22 76 4 0 13 45 5 0 45 90 So basically, every week the number of columns will increase as I collect more data -- my table continues getting bigger and bigger. Someplace then I will have an column of equations for the current data for each user that looks like = (D2-c2) * something Where "D2" and "C2" should be the two most recent columns in the above table This seems like such an obvious thing, yet I've rarely seen a sample spreadsheet that deals with a constantly growing table. I have struggled with this problem in various forms for a long time and have never found a good solution. The alternative here would be to manually shift the data over each period when entering the new data - or manually edit those equations. |
How to deal with a growing dataset...
With the user ID in E1, the latest date in F1, and the next latest date in G1
=SUMPRODUCT(--(A2:A16=E1),--(B2:B16=F1),C2:C16)-SUMPRODUCT(--(A2:A16=E1),--(B2:B16=G1),C2:C16) "Joe Porkka [MSFT]" wrote: Hey that sounds like the right approach... You are right - eventually I do want a database, but in the meantime I want to get something working a little sooner. I don't understand how SUMPRODUCT does the computation necessary. Given the table of 3 columns (user, date, currentData), what I would want is (in real life the data would be sorted by date, not user, but that shouldn't matter here at all). A B C 1 User Date Reading 2 1 1/10/2007 0 3 1 1/17/2007 21 4 1 1/21/2007 44 5 2 1/10/2007 0 6 2 1/17/2007 45 7 2 1/21/2007 66 8 3 1/10/2007 0 9 3 1/17/2007 22 10 3 1/21/2007 76 11 4 1/10/2007 0 12 4 1/17/2007 13 13 4 1/21/2007 45 14 5 1/10/2007 0 15 5 1/17/2007 45 16 5 1/21/2007 90 For each unique user select 2 most recent rows and subtract currentData. So I would end up with data that looks like: user Delta Comment 1 23 "C4-c3" 2 21 "c7-c6" 3 54 "c10-c9" 4 32 5 45 "Duke Carey" wrote: Joe - You are running into 2 separate issues: The first is trying to use a spreadsheet to track data that screams out to be in a DATABASE. The second is that your data is structured awkwardly. Based on your sample data, you are far better off to have 3 and only 3 columns: UserID, Date, and Value Once Your data is in that format, you can use a simple SUMPRODUCT() formula to calculate the value you are seeking. Another point about the suggested layout is that it allows you to do pivot tables and other analyses easy as anything, while your current structure makes analysis very difficult. "Joe Porkka [MSFT]" wrote: I have a sheet that looks like User# 1/10/2007 1/17/2007 1/21/2007 1 0 21 44 2 0 45 66 3 0 22 76 4 0 13 45 5 0 45 90 So basically, every week the number of columns will increase as I collect more data -- my table continues getting bigger and bigger. Someplace then I will have an column of equations for the current data for each user that looks like = (D2-c2) * something Where "D2" and "C2" should be the two most recent columns in the above table This seems like such an obvious thing, yet I've rarely seen a sample spreadsheet that deals with a constantly growing table. I have struggled with this problem in various forms for a long time and have never found a good solution. The alternative here would be to manually shift the data over each period when entering the new data - or manually edit those equations. |
How to deal with a growing dataset...
While this works, I'm not thrilled with the way it ends up looking in the
workbook. I did find "structured references" - which seems to be a better match for what I'm looking for. I've only just started learning about them - but it seems like a great new feature in Excel 2007. "Duke Carey" wrote: With the user ID in E1, the latest date in F1, and the next latest date in G1 =SUMPRODUCT(--(A2:A16=E1),--(B2:B16=F1),C2:C16)-SUMPRODUCT(--(A2:A16=E1),--(B2:B16=G1),C2:C16) "Joe Porkka [MSFT]" wrote: Hey that sounds like the right approach... You are right - eventually I do want a database, but in the meantime I want to get something working a little sooner. I don't understand how SUMPRODUCT does the computation necessary. Given the table of 3 columns (user, date, currentData), what I would want is (in real life the data would be sorted by date, not user, but that shouldn't matter here at all). A B C 1 User Date Reading 2 1 1/10/2007 0 3 1 1/17/2007 21 4 1 1/21/2007 44 5 2 1/10/2007 0 6 2 1/17/2007 45 7 2 1/21/2007 66 8 3 1/10/2007 0 9 3 1/17/2007 22 10 3 1/21/2007 76 11 4 1/10/2007 0 12 4 1/17/2007 13 13 4 1/21/2007 45 14 5 1/10/2007 0 15 5 1/17/2007 45 16 5 1/21/2007 90 For each unique user select 2 most recent rows and subtract currentData. So I would end up with data that looks like: user Delta Comment 1 23 "C4-c3" 2 21 "c7-c6" 3 54 "c10-c9" 4 32 5 45 "Duke Carey" wrote: Joe - You are running into 2 separate issues: The first is trying to use a spreadsheet to track data that screams out to be in a DATABASE. The second is that your data is structured awkwardly. Based on your sample data, you are far better off to have 3 and only 3 columns: UserID, Date, and Value Once Your data is in that format, you can use a simple SUMPRODUCT() formula to calculate the value you are seeking. Another point about the suggested layout is that it allows you to do pivot tables and other analyses easy as anything, while your current structure makes analysis very difficult. "Joe Porkka [MSFT]" wrote: I have a sheet that looks like User# 1/10/2007 1/17/2007 1/21/2007 1 0 21 44 2 0 45 66 3 0 22 76 4 0 13 45 5 0 45 90 So basically, every week the number of columns will increase as I collect more data -- my table continues getting bigger and bigger. Someplace then I will have an column of equations for the current data for each user that looks like = (D2-c2) * something Where "D2" and "C2" should be the two most recent columns in the above table This seems like such an obvious thing, yet I've rarely seen a sample spreadsheet that deals with a constantly growing table. I have struggled with this problem in various forms for a long time and have never found a good solution. The alternative here would be to manually shift the data over each period when entering the new data - or manually edit those equations. |
All times are GMT +1. The time now is 08:11 PM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com