ExcelBanter - How to deal with a growing dataset...

ExcelBanter

ExcelBanter (https://www.excelbanter.com/)

- Excel Discussion (Misc queries) (https://www.excelbanter.com/excel-discussion-misc-queries/)

- - How to deal with a growing dataset... (https://www.excelbanter.com/excel-discussion-misc-queries/154717-how-deal-growing-dataset.html)

Joe Porkka [MSFT]

How to deal with a growing dataset...

I have a sheet that looks like
User# 1/10/2007 1/17/2007 1/21/2007
1 0 21 44
2 0 45 66
3 0 22 76
4 0 13 45
5 0 45 90

So basically, every week the number of columns will increase as I collect
more data -- my table continues getting bigger and bigger.
Someplace then I will have an column of equations for the current data
for each user that looks like
= (D2-c2) * something

Where "D2" and "C2" should be the two most recent columns in the above table

This seems like such an obvious thing, yet I've rarely seen a sample
spreadsheet that deals with a constantly growing table. I have struggled with
this problem in various forms for a long time and have never found a good
solution.
The alternative here would be to manually shift the data over each period
when
entering the new data - or manually edit those equations.

Dave

How to deal with a growing dataset...

Quick fix to this problem that i could think of would be using the =MAX
function coupled with the =LOOKUP function (assuming the dates are in
assending order). cant quite understand what you are asking. Max obviosly
finds the most recent date and the lookup can reference what cells you need

"Joe Porkka [MSFT]" wrote:

I have a sheet that looks like
User# 1/10/2007 1/17/2007 1/21/2007
1 0 21 44
2 0 45 66
3 0 22 76
4 0 13 45
5 0 45 90

So basically, every week the number of columns will increase as I collect
more data -- my table continues getting bigger and bigger.
Someplace then I will have an column of equations for the current data
for each user that looks like
= (D2-c2) * something

Where "D2" and "C2" should be the two most recent columns in the above table

This seems like such an obvious thing, yet I've rarely seen a sample
spreadsheet that deals with a constantly growing table. I have struggled with
this problem in various forms for a long time and have never found a good
solution.
The alternative here would be to manually shift the data over each period
when
entering the new data - or manually edit those equations.

Elkar

How to deal with a growing dataset...

You could use the LARGE function to find the two largest dates in row 1.

LARGE(1:1,1) finds first largest date
LARGE(1:1,2) finds second largest date

Then, you do something like:

=(HLOOKUP(LARGE(1:1,1),A1:Z10,2)-HLOOKUP(LARGE(1:1,2),A1:Z10,2))*something

Adjust the ranges to meet your needs.

HTH,
Elkar

"Joe Porkka [MSFT]" wrote:

I have a sheet that looks like
User# 1/10/2007 1/17/2007 1/21/2007
1 0 21 44
2 0 45 66
3 0 22 76
4 0 13 45
5 0 45 90

So basically, every week the number of columns will increase as I collect
more data -- my table continues getting bigger and bigger.
Someplace then I will have an column of equations for the current data
for each user that looks like
= (D2-c2) * something

Where "D2" and "C2" should be the two most recent columns in the above table

This seems like such an obvious thing, yet I've rarely seen a sample
spreadsheet that deals with a constantly growing table. I have struggled with
this problem in various forms for a long time and have never found a good
solution.
The alternative here would be to manually shift the data over each period
when
entering the new data - or manually edit those equations.

Duke Carey

How to deal with a growing dataset...

Joe -

You are running into 2 separate issues:

The first is trying to use a spreadsheet to track data that screams out to
be in a DATABASE.

The second is that your data is structured awkwardly.

Based on your sample data, you are far better off to have 3 and only 3
columns:

UserID, Date, and Value

Once Your data is in that format, you can use a simple SUMPRODUCT() formula
to calculate the value you are seeking.

Another point about the suggested layout is that it allows you to do pivot
tables and other analyses easy as anything, while your current structure
makes analysis very difficult.

"Joe Porkka [MSFT]" wrote:

I have a sheet that looks like
User# 1/10/2007 1/17/2007 1/21/2007
1 0 21 44
2 0 45 66
3 0 22 76
4 0 13 45
5 0 45 90

So basically, every week the number of columns will increase as I collect
more data -- my table continues getting bigger and bigger.
Someplace then I will have an column of equations for the current data
for each user that looks like
= (D2-c2) * something

Where "D2" and "C2" should be the two most recent columns in the above table

This seems like such an obvious thing, yet I've rarely seen a sample
spreadsheet that deals with a constantly growing table. I have struggled with
this problem in various forms for a long time and have never found a good
solution.
The alternative here would be to manually shift the data over each period
when
entering the new data - or manually edit those equations.

Joe Porkka [MSFT]

How to deal with a growing dataset...

Hey that sounds like the right approach...
You are right - eventually I do want a database, but in the meantime I want
to get something working a little sooner.

I don't understand how SUMPRODUCT does the computation necessary.
Given the table of 3 columns (user, date, currentData), what I would want is
(in real life the data would be sorted by date, not user, but that shouldn't
matter here at all).
A B C
1 User Date Reading
2 1 1/10/2007 0
3 1 1/17/2007 21
4 1 1/21/2007 44
5 2 1/10/2007 0
6 2 1/17/2007 45
7 2 1/21/2007 66
8 3 1/10/2007 0
9 3 1/17/2007 22
10 3 1/21/2007 76
11 4 1/10/2007 0
12 4 1/17/2007 13
13 4 1/21/2007 45
14 5 1/10/2007 0
15 5 1/17/2007 45
16 5 1/21/2007 90

For each unique user
select 2 most recent rows and subtract currentData.
So I would end up with data that looks like:
user Delta Comment
1 23 "C4-c3"
2 21 "c7-c6"
3 54 "c10-c9"
4 32
5 45

"Duke Carey" wrote:

Joe -

You are running into 2 separate issues:

The first is trying to use a spreadsheet to track data that screams out to
be in a DATABASE.

The second is that your data is structured awkwardly.

Based on your sample data, you are far better off to have 3 and only 3
columns:

UserID, Date, and Value

Once Your data is in that format, you can use a simple SUMPRODUCT() formula
to calculate the value you are seeking.

Another point about the suggested layout is that it allows you to do pivot
tables and other analyses easy as anything, while your current structure
makes analysis very difficult.

"Joe Porkka [MSFT]" wrote:

I have a sheet that looks like
User# 1/10/2007 1/17/2007 1/21/2007
1 0 21 44
2 0 45 66
3 0 22 76
4 0 13 45
5 0 45 90

So basically, every week the number of columns will increase as I collect
more data -- my table continues getting bigger and bigger.
Someplace then I will have an column of equations for the current data
for each user that looks like
= (D2-c2) * something

Where "D2" and "C2" should be the two most recent columns in the above table

This seems like such an obvious thing, yet I've rarely seen a sample
spreadsheet that deals with a constantly growing table. I have struggled with
this problem in various forms for a long time and have never found a good
solution.
The alternative here would be to manually shift the data over each period
when
entering the new data - or manually edit those equations.

Duke Carey

How to deal with a growing dataset...

With the user ID in E1, the latest date in F1, and the next latest date in G1

=SUMPRODUCT(--(A2:A16=E1),--(B2:B16=F1),C2:C16)-SUMPRODUCT(--(A2:A16=E1),--(B2:B16=G1),C2:C16)

"Joe Porkka [MSFT]" wrote:

Hey that sounds like the right approach...
You are right - eventually I do want a database, but in the meantime I want
to get something working a little sooner.

I don't understand how SUMPRODUCT does the computation necessary.
Given the table of 3 columns (user, date, currentData), what I would want is
(in real life the data would be sorted by date, not user, but that shouldn't
matter here at all).
A B C
1 User Date Reading
2 1 1/10/2007 0
3 1 1/17/2007 21
4 1 1/21/2007 44
5 2 1/10/2007 0
6 2 1/17/2007 45
7 2 1/21/2007 66
8 3 1/10/2007 0
9 3 1/17/2007 22
10 3 1/21/2007 76
11 4 1/10/2007 0
12 4 1/17/2007 13
13 4 1/21/2007 45
14 5 1/10/2007 0
15 5 1/17/2007 45
16 5 1/21/2007 90

For each unique user
select 2 most recent rows and subtract currentData.
So I would end up with data that looks like:
user Delta Comment
1 23 "C4-c3"
2 21 "c7-c6"
3 54 "c10-c9"
4 32
5 45

"Duke Carey" wrote:

Joe -

You are running into 2 separate issues:

The first is trying to use a spreadsheet to track data that screams out to
be in a DATABASE.

The second is that your data is structured awkwardly.

Based on your sample data, you are far better off to have 3 and only 3
columns:

UserID, Date, and Value

Once Your data is in that format, you can use a simple SUMPRODUCT() formula
to calculate the value you are seeking.

Another point about the suggested layout is that it allows you to do pivot
tables and other analyses easy as anything, while your current structure
makes analysis very difficult.

"Joe Porkka [MSFT]" wrote:

I have a sheet that looks like
User# 1/10/2007 1/17/2007 1/21/2007
1 0 21 44
2 0 45 66
3 0 22 76
4 0 13 45
5 0 45 90

So basically, every week the number of columns will increase as I collect
more data -- my table continues getting bigger and bigger.
Someplace then I will have an column of equations for the current data
for each user that looks like
= (D2-c2) * something

Where "D2" and "C2" should be the two most recent columns in the above table

This seems like such an obvious thing, yet I've rarely seen a sample
spreadsheet that deals with a constantly growing table. I have struggled with
this problem in various forms for a long time and have never found a good
solution.
The alternative here would be to manually shift the data over each period
when
entering the new data - or manually edit those equations.

Joe Porkka [MSFT]

How to deal with a growing dataset...

While this works, I'm not thrilled with the way it ends up looking in the
workbook.

I did find "structured references" - which seems to be a better match for
what I'm looking for. I've only just started learning about them - but it
seems like a great new feature in Excel 2007.

"Duke Carey" wrote:

With the user ID in E1, the latest date in F1, and the next latest date in G1

=SUMPRODUCT(--(A2:A16=E1),--(B2:B16=F1),C2:C16)-SUMPRODUCT(--(A2:A16=E1),--(B2:B16=G1),C2:C16)

"Joe Porkka [MSFT]" wrote:

Hey that sounds like the right approach...
You are right - eventually I do want a database, but in the meantime I want
to get something working a little sooner.

I don't understand how SUMPRODUCT does the computation necessary.
Given the table of 3 columns (user, date, currentData), what I would want is
(in real life the data would be sorted by date, not user, but that shouldn't
matter here at all).
A B C
1 User Date Reading
2 1 1/10/2007 0
3 1 1/17/2007 21
4 1 1/21/2007 44
5 2 1/10/2007 0
6 2 1/17/2007 45
7 2 1/21/2007 66
8 3 1/10/2007 0
9 3 1/17/2007 22
10 3 1/21/2007 76
11 4 1/10/2007 0
12 4 1/17/2007 13
13 4 1/21/2007 45
14 5 1/10/2007 0
15 5 1/17/2007 45
16 5 1/21/2007 90

For each unique user
select 2 most recent rows and subtract currentData.
So I would end up with data that looks like:
user Delta Comment
1 23 "C4-c3"
2 21 "c7-c6"
3 54 "c10-c9"
4 32
5 45

"Duke Carey" wrote:

Joe -

You are running into 2 separate issues:

The first is trying to use a spreadsheet to track data that screams out to
be in a DATABASE.

The second is that your data is structured awkwardly.

Based on your sample data, you are far better off to have 3 and only 3
columns:

UserID, Date, and Value

Once Your data is in that format, you can use a simple SUMPRODUCT() formula
to calculate the value you are seeking.

Another point about the suggested layout is that it allows you to do pivot
tables and other analyses easy as anything, while your current structure
makes analysis very difficult.

"Joe Porkka [MSFT]" wrote:

I have a sheet that looks like
User# 1/10/2007 1/17/2007 1/21/2007
1 0 21 44
2 0 45 66
3 0 22 76
4 0 13 45
5 0 45 90

So basically, every week the number of columns will increase as I collect
more data -- my table continues getting bigger and bigger.
Someplace then I will have an column of equations for the current data
for each user that looks like
= (D2-c2) * something

Where "D2" and "C2" should be the two most recent columns in the above table

This seems like such an obvious thing, yet I've rarely seen a sample
spreadsheet that deals with a constantly growing table. I have struggled with
this problem in various forms for a long time and have never found a good
solution.
The alternative here would be to manually shift the data over each period
when
entering the new data - or manually edit those equations.

All times are GMT +1. The time now is 08:11 PM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com