Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2
Default Automatically substituting missing data

I'm doing a project for a client where I have inherited someone else's
data which is kept in Excel files. Here's what I want to do.

The data is updated each hour. Sometimes we get no data or bad data
for short periods of time. A sample of the data is shown below.

In this example, there is no data for hours 5-6. Since we have two
hours worth of missing data, we are required to substitute for the
missing data by taking the average of the data two hours before and
two hours after the period when the data is missing. (These are the
rules we have to live by.)

In this case, the numbers would be calculated as follows:

(7.60 + 7.55 + 7.45 + 7.50) / 4 = 7.525

The value of 7.525 is then substituted into the empty cells for hours
5-6 as shown in the Corrected_Data column.


Hour Raw_Data Corrected_Data
1 7.51 7.51
2 7.45 7.45
3 7.60 7.60
4 7.55 7.55
5 7.525
6 7.525
7 7.45 7.45
8 7.50 7.50

If there were 3 hours of missing data, then we would have substituted
for the missing data by taking the average of the data three hours
before and three hours after the period when the data is missing. I
currently do the substitution manually. I really want to append the
raw data into a spreadsheet and then calculate the Corrected_Data
automatically.

Can anyone tell me how to do that in Excel? As I mentioned, I've
inherited this system, so I have to live with it for now. We're
looking to replace it and if the answer is that this can't be done in
Excel, that's OK. I want to transition to a newer system as smoothly
as possible.

Thanks in advance
  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 68
Default Automatically substituting missing data

This is very possible in Excel but it will require a VBA programme. A
couple of observations from your rules.

1. The number of periods pre and post,as you say, are dependent on the
number of missing values. What happens if there are insufficient pre and
post data values to provide the average?

2. The answer to the first question might resolve the difficulty I am having
with the fact that data comes every hour. When do you need to calculate and
know there are missing values? Is this driven by some internal clock or at
the end of the day or some other control? Since you would not know if data
is missing or if there are sufficient real values to calculate the missing
entries every hour.

Perhaps if you can help shed some light on how you want to address this
conundrum, we can provide some code.

Cheers
Nigel

"Michael DiCostanzo" wrote in message
m...
I'm doing a project for a client where I have inherited someone else's
data which is kept in Excel files. Here's what I want to do.

The data is updated each hour. Sometimes we get no data or bad data
for short periods of time. A sample of the data is shown below.

In this example, there is no data for hours 5-6. Since we have two
hours worth of missing data, we are required to substitute for the
missing data by taking the average of the data two hours before and
two hours after the period when the data is missing. (These are the
rules we have to live by.)

In this case, the numbers would be calculated as follows:

(7.60 + 7.55 + 7.45 + 7.50) / 4 = 7.525

The value of 7.525 is then substituted into the empty cells for hours
5-6 as shown in the Corrected_Data column.


Hour Raw_Data Corrected_Data
1 7.51 7.51
2 7.45 7.45
3 7.60 7.60
4 7.55 7.55
5 7.525
6 7.525
7 7.45 7.45
8 7.50 7.50

If there were 3 hours of missing data, then we would have substituted
for the missing data by taking the average of the data three hours
before and three hours after the period when the data is missing. I
currently do the substitution manually. I really want to append the
raw data into a spreadsheet and then calculate the Corrected_Data
automatically.

Can anyone tell me how to do that in Excel? As I mentioned, I've
inherited this system, so I have to live with it for now. We're
looking to replace it and if the answer is that this can't be done in
Excel, that's OK. I want to transition to a newer system as smoothly
as possible.

Thanks in advance





----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! 100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,885
Default Automatically substituting missing data

Hi Michael
try the following approach (using helper columns to make the formulas a
little bit less complex)
- Add a column B as helper column and enter the following array formula
in B2 (this is your second data row as row 1 always should contain
data). Formula ist entered with CTRL+SHFT+ENTER
=IF(A2="",MIN(IF(A3:$A$20<"",ROW(A3:$A$20),20))," ")
copy down for all rows

- add the helper column C and enter the following formula in C2 (also
array entered)
=IF(A2="",MAX(IF(A$1:A1<"",ROW($A$1:A1),0)),"")
and copy down

- if you like you can hide columns B+C
- now enter the following formula in D2 (this is the column with your
corrected data)
=IF(A2="",SUM(INDIRECT("A" & C2 & ":A" & C2-(B2-C2-2)),INDIRECT("A" &
B2 & ":A" & B2 + (B2-C2 - 2)))/(B2-C2-1),"")
and copy down




Notes:
this approach won't work in all circumstances. e.g.
- the first row is empty (as there is no data before
- the last row ist empty
- There are not enough rows before or after

--
Regards
Frank Kabel
Frankfurt, Germany

Michael DiCostanzo wrote:
I'm doing a project for a client where I have inherited someone

else's
data which is kept in Excel files. Here's what I want to do.

The data is updated each hour. Sometimes we get no data or bad data
for short periods of time. A sample of the data is shown below.

In this example, there is no data for hours 5-6. Since we have two
hours worth of missing data, we are required to substitute for the
missing data by taking the average of the data two hours before and
two hours after the period when the data is missing. (These are the
rules we have to live by.)

In this case, the numbers would be calculated as follows:

(7.60 + 7.55 + 7.45 + 7.50) / 4 = 7.525

The value of 7.525 is then substituted into the empty cells for hours
5-6 as shown in the Corrected_Data column.


Hour Raw_Data Corrected_Data
1 7.51 7.51
2 7.45 7.45
3 7.60 7.60
4 7.55 7.55
5 7.525
6 7.525
7 7.45 7.45
8 7.50 7.50

If there were 3 hours of missing data, then we would have substituted
for the missing data by taking the average of the data three hours
before and three hours after the period when the data is missing. I
currently do the substitution manually. I really want to append the
raw data into a spreadsheet and then calculate the Corrected_Data
automatically.

Can anyone tell me how to do that in Excel? As I mentioned, I've
inherited this system, so I have to live with it for now. We're
looking to replace it and if the answer is that this can't be done in
Excel, that's OK. I want to transition to a newer system as smoothly
as possible.

Thanks in advance


  #4   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,885
Default Automatically substituting missing data

Hi Nigel
no need for VBA, can be done with formulas :-)
Though there're some missing spots in the OP's specification

--
Regards
Frank Kabel
Frankfurt, Germany

Nigel wrote:
This is very possible in Excel but it will require a VBA programme.

A
couple of observations from your rules.

1. The number of periods pre and post,as you say, are dependent on

the
number of missing values. What happens if there are insufficient pre
and
post data values to provide the average?

2. The answer to the first question might resolve the difficulty I am
having
with the fact that data comes every hour. When do you need to
calculate and
know there are missing values? Is this driven by some internal clock
or at
the end of the day or some other control? Since you would not know
if data
is missing or if there are sufficient real values to calculate the
missing entries every hour.

Perhaps if you can help shed some light on how you want to address
this conundrum, we can provide some code.

Cheers
Nigel

"Michael DiCostanzo" wrote in message
m...
I'm doing a project for a client where I have inherited someone
else's data which is kept in Excel files. Here's what I want to do.

The data is updated each hour. Sometimes we get no data or bad data
for short periods of time. A sample of the data is shown below.

In this example, there is no data for hours 5-6. Since we have two
hours worth of missing data, we are required to substitute for the
missing data by taking the average of the data two hours before and
two hours after the period when the data is missing. (These are the
rules we have to live by.)

In this case, the numbers would be calculated as follows:

(7.60 + 7.55 + 7.45 + 7.50) / 4 = 7.525

The value of 7.525 is then substituted into the empty cells for

hours
5-6 as shown in the Corrected_Data column.


Hour Raw_Data Corrected_Data
1 7.51 7.51
2 7.45 7.45
3 7.60 7.60
4 7.55 7.55
5 7.525
6 7.525
7 7.45 7.45
8 7.50 7.50

If there were 3 hours of missing data, then we would have

substituted
for the missing data by taking the average of the data three hours
before and three hours after the period when the data is missing. I
currently do the substitution manually. I really want to append the
raw data into a spreadsheet and then calculate the Corrected_Data
automatically.

Can anyone tell me how to do that in Excel? As I mentioned, I've
inherited this system, so I have to live with it for now. We're
looking to replace it and if the answer is that this can't be done

in
Excel, that's OK. I want to transition to a newer system as

smoothly
as possible.

Thanks in advance





----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet
News==---- http://www.newsfeed.com The #1 Newsgroup Service in the
World! 100,000 Newsgroups ---= 19 East/West-Coast Specialized
Servers - Total Privacy via Encryption =---


  #5   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 218
Default Automatically substituting missing data

This was rather rushed but I think it does what you want.

Note that there is a potential complication not accounted
for, specifically: If a period of missing data exceeds
the gap between the next period of missing data then the
forward averaging will include blank cells thus returning
an inappropriate result. This can be corrected for but it
will make the code substantially more complicated. Your
description didn't take this into account so I assume it's
not very likely. If you need this corrected for you'll
have to specify the rules.

You may have to correct for wordwrap corruption of the
code.

Assumptions:
1) Hourly data is in Column A
2) Raw data is in Column B
3) Corrected data is to be placed in Column C

Sub CorrectData()
Dim BlankRng As Range, AvgRng As Range
Dim Rng As Range, C As Range, CC As Range
Dim Rw As Long, i As Integer
Dim Avg As Single
i = 0
Rw = Cells(Rows.Count, "A").End(xlUp).Row
Set Rng = Range(Cells(1, 2), Cells(Rw, 2))
For Each C In Rng
If C = "" Then
i = i + 1
ElseIf i 0 Then
C.Offset(, 1) = C
Set AvgRng = Range(Range(C.Offset(-i - 1), _
C.Offset(-2 * i)), Range(C, C.Offset(i - 1)))
Avg = Application.Average(AvgRng)
Set BlankRng = Range(C.Offset(-i), C.Offset(-1))
For Each CC In BlankRng
CC.Offset(, 1) = Avg
Next
i = 0
Else
C.Offset(, 1) = C
End If
Next

End Sub

Be advised I'm just a VBA student.

Regards,
Greg



  #6   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 68
Default Automatically substituting missing data

Nice one Frank! - but the missing data elements would cause me a problem.
You still end up with missing data and the objective was to resolve this. I
think we need to get the OP to clarify how to deal with this.

Cheers
Nigel

"Frank Kabel" wrote in message
...
Hi Michael
try the following approach (using helper columns to make the formulas a
little bit less complex)
- Add a column B as helper column and enter the following array formula
in B2 (this is your second data row as row 1 always should contain
data). Formula ist entered with CTRL+SHFT+ENTER
=IF(A2="",MIN(IF(A3:$A$20<"",ROW(A3:$A$20),20))," ")
copy down for all rows

- add the helper column C and enter the following formula in C2 (also
array entered)
=IF(A2="",MAX(IF(A$1:A1<"",ROW($A$1:A1),0)),"")
and copy down

- if you like you can hide columns B+C
- now enter the following formula in D2 (this is the column with your
corrected data)
=IF(A2="",SUM(INDIRECT("A" & C2 & ":A" & C2-(B2-C2-2)),INDIRECT("A" &
B2 & ":A" & B2 + (B2-C2 - 2)))/(B2-C2-1),"")
and copy down




Notes:
this approach won't work in all circumstances. e.g.
- the first row is empty (as there is no data before
- the last row ist empty
- There are not enough rows before or after

--
Regards
Frank Kabel
Frankfurt, Germany

Michael DiCostanzo wrote:
I'm doing a project for a client where I have inherited someone

else's
data which is kept in Excel files. Here's what I want to do.

The data is updated each hour. Sometimes we get no data or bad data
for short periods of time. A sample of the data is shown below.

In this example, there is no data for hours 5-6. Since we have two
hours worth of missing data, we are required to substitute for the
missing data by taking the average of the data two hours before and
two hours after the period when the data is missing. (These are the
rules we have to live by.)

In this case, the numbers would be calculated as follows:

(7.60 + 7.55 + 7.45 + 7.50) / 4 = 7.525

The value of 7.525 is then substituted into the empty cells for hours
5-6 as shown in the Corrected_Data column.


Hour Raw_Data Corrected_Data
1 7.51 7.51
2 7.45 7.45
3 7.60 7.60
4 7.55 7.55
5 7.525
6 7.525
7 7.45 7.45
8 7.50 7.50

If there were 3 hours of missing data, then we would have substituted
for the missing data by taking the average of the data three hours
before and three hours after the period when the data is missing. I
currently do the substitution manually. I really want to append the
raw data into a spreadsheet and then calculate the Corrected_Data
automatically.

Can anyone tell me how to do that in Excel? As I mentioned, I've
inherited this system, so I have to live with it for now. We're
looking to replace it and if the answer is that this can't be done in
Excel, that's OK. I want to transition to a newer system as smoothly
as possible.

Thanks in advance






----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! 100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
  #7   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,885
Default Automatically substituting missing data

Hi Nigel
I agree that the OP has to clarify his specification :-)
But what do you mean with 'Still end with missing data elements'. At
least for his example this works (though I'm also wondering how such
source data could be created...)

Greetings
Frank


--
Regards
Frank Kabel
Frankfurt, Germany

Nigel wrote:
Nice one Frank! - but the missing data elements would cause me a
problem.
You still end up with missing data and the objective was to resolve
this. I think we need to get the OP to clarify how to deal with

this.

Cheers
Nigel

"Frank Kabel" wrote in message
...
Hi Michael
try the following approach (using helper columns to make the
formulas a little bit less complex)
- Add a column B as helper column and enter the following array
formula in B2 (this is your second data row as row 1 always should
contain data). Formula ist entered with CTRL+SHFT+ENTER
=IF(A2="",MIN(IF(A3:$A$20<"",ROW(A3:$A$20),20))," ")
copy down for all rows

- add the helper column C and enter the following formula in C2

(also
array entered)
=IF(A2="",MAX(IF(A$1:A1<"",ROW($A$1:A1),0)),"")
and copy down

- if you like you can hide columns B+C
- now enter the following formula in D2 (this is the column with

your
corrected data)
=IF(A2="",SUM(INDIRECT("A" & C2 & ":A" & C2-(B2-C2-2)),INDIRECT("A"

&
B2 & ":A" & B2 + (B2-C2 - 2)))/(B2-C2-1),"")
and copy down




Notes:
this approach won't work in all circumstances. e.g.
- the first row is empty (as there is no data before
- the last row ist empty
- There are not enough rows before or after

--
Regards
Frank Kabel
Frankfurt, Germany

Michael DiCostanzo wrote:
I'm doing a project for a client where I have inherited someone

else's
data which is kept in Excel files. Here's what I want to do.

The data is updated each hour. Sometimes we get no data or bad

data
for short periods of time. A sample of the data is shown below.

In this example, there is no data for hours 5-6. Since we have two
hours worth of missing data, we are required to substitute for the
missing data by taking the average of the data two hours before and
two hours after the period when the data is missing. (These are

the
rules we have to live by.)

In this case, the numbers would be calculated as follows:

(7.60 + 7.55 + 7.45 + 7.50) / 4 = 7.525

The value of 7.525 is then substituted into the empty cells for
hours 5-6 as shown in the Corrected_Data column.


Hour Raw_Data Corrected_Data
1 7.51 7.51
2 7.45 7.45
3 7.60 7.60
4 7.55 7.55
5 7.525
6 7.525
7 7.45 7.45
8 7.50 7.50

If there were 3 hours of missing data, then we would have
substituted for the missing data by taking the average of the data
three hours before and three hours after the period when the data
is missing. I currently do the substitution manually. I really
want to append the raw data into a spreadsheet and then calculate
the Corrected_Data automatically.

Can anyone tell me how to do that in Excel? As I mentioned, I've
inherited this system, so I have to live with it for now. We're
looking to replace it and if the answer is that this can't be done
in Excel, that's OK. I want to transition to a newer system as
smoothly as possible.

Thanks in advance






----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet
News==---- http://www.newsfeed.com The #1 Newsgroup Service in the
World! 100,000 Newsgroups ---= 19 East/West-Coast Specialized
Servers - Total Privacy via Encryption =---


  #8   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 68
Default Automatically substituting missing data

Hi Frank,
Is it not the case that if there insufficient entries pre and post the
missing entries that you cannot (actually should not) compute the
intermediate values, hence you still having missing data. That was my
reference to 'Still end with missing data elements'

If data remains missing or is incalculable by some form of rule induction
then this should be declared as such, alternatively new rules for this
induction need to be prescribed. The OP needs to provide these rules based
on his analytical process.

Cheers
Nigel


"Frank Kabel" wrote in message
...
Hi Nigel
I agree that the OP has to clarify his specification :-)
But what do you mean with 'Still end with missing data elements'. At
least for his example this works (though I'm also wondering how such
source data could be created...)

Greetings
Frank


--
Regards
Frank Kabel
Frankfurt, Germany

Nigel wrote:
Nice one Frank! - but the missing data elements would cause me a
problem.
You still end up with missing data and the objective was to resolve
this. I think we need to get the OP to clarify how to deal with

this.

Cheers
Nigel

"Frank Kabel" wrote in message
...
Hi Michael
try the following approach (using helper columns to make the
formulas a little bit less complex)
- Add a column B as helper column and enter the following array
formula in B2 (this is your second data row as row 1 always should
contain data). Formula ist entered with CTRL+SHFT+ENTER
=IF(A2="",MIN(IF(A3:$A$20<"",ROW(A3:$A$20),20))," ")
copy down for all rows

- add the helper column C and enter the following formula in C2

(also
array entered)
=IF(A2="",MAX(IF(A$1:A1<"",ROW($A$1:A1),0)),"")
and copy down

- if you like you can hide columns B+C
- now enter the following formula in D2 (this is the column with

your
corrected data)
=IF(A2="",SUM(INDIRECT("A" & C2 & ":A" & C2-(B2-C2-2)),INDIRECT("A"

&
B2 & ":A" & B2 + (B2-C2 - 2)))/(B2-C2-1),"")
and copy down




Notes:
this approach won't work in all circumstances. e.g.
- the first row is empty (as there is no data before
- the last row ist empty
- There are not enough rows before or after

--
Regards
Frank Kabel
Frankfurt, Germany

Michael DiCostanzo wrote:
I'm doing a project for a client where I have inherited someone
else's
data which is kept in Excel files. Here's what I want to do.

The data is updated each hour. Sometimes we get no data or bad

data
for short periods of time. A sample of the data is shown below.

In this example, there is no data for hours 5-6. Since we have two
hours worth of missing data, we are required to substitute for the
missing data by taking the average of the data two hours before and
two hours after the period when the data is missing. (These are

the
rules we have to live by.)

In this case, the numbers would be calculated as follows:

(7.60 + 7.55 + 7.45 + 7.50) / 4 = 7.525

The value of 7.525 is then substituted into the empty cells for
hours 5-6 as shown in the Corrected_Data column.


Hour Raw_Data Corrected_Data
1 7.51 7.51
2 7.45 7.45
3 7.60 7.60
4 7.55 7.55
5 7.525
6 7.525
7 7.45 7.45
8 7.50 7.50

If there were 3 hours of missing data, then we would have
substituted for the missing data by taking the average of the data
three hours before and three hours after the period when the data
is missing. I currently do the substitution manually. I really
want to append the raw data into a spreadsheet and then calculate
the Corrected_Data automatically.

Can anyone tell me how to do that in Excel? As I mentioned, I've
inherited this system, so I have to live with it for now. We're
looking to replace it and if the answer is that this can't be done
in Excel, that's OK. I want to transition to a newer system as
smoothly as possible.

Thanks in advance





----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet
News==---- http://www.newsfeed.com The #1 Newsgroup Service in the
World! 100,000 Newsgroups ---= 19 East/West-Coast Specialized
Servers - Total Privacy via Encryption =---






----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! 100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
  #9   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,885
Default Automatically substituting missing data

Hi Nigel
ah, now I understand and yes you're corect (e.g. if the first entry is
missing or the second and third)

--
Regards
Frank Kabel
Frankfurt, Germany

Nigel wrote:
Hi Frank,
Is it not the case that if there insufficient entries pre and post

the
missing entries that you cannot (actually should not) compute the
intermediate values, hence you still having missing data. That was

my
reference to 'Still end with missing data elements'

If data remains missing or is incalculable by some form of rule
induction
then this should be declared as such, alternatively new rules for

this
induction need to be prescribed. The OP needs to provide these rules
based
on his analytical process.

Cheers
Nigel


"Frank Kabel" wrote in message
...
Hi Nigel
I agree that the OP has to clarify his specification :-)
But what do you mean with 'Still end with missing data elements'. At
least for his example this works (though I'm also wondering how such
source data could be created...)

Greetings
Frank


--
Regards
Frank Kabel
Frankfurt, Germany

Nigel wrote:
Nice one Frank! - but the missing data elements would cause me a
problem.
You still end up with missing data and the objective was to resolve
this. I think we need to get the OP to clarify how to deal with

this.

Cheers
Nigel

"Frank Kabel" wrote in message
...
Hi Michael
try the following approach (using helper columns to make the
formulas a little bit less complex)
- Add a column B as helper column and enter the following array
formula in B2 (this is your second data row as row 1 always should
contain data). Formula ist entered with CTRL+SHFT+ENTER
=IF(A2="",MIN(IF(A3:$A$20<"",ROW(A3:$A$20),20))," ")
copy down for all rows

- add the helper column C and enter the following formula in C2

(also
array entered)
=IF(A2="",MAX(IF(A$1:A1<"",ROW($A$1:A1),0)),"")
and copy down

- if you like you can hide columns B+C
- now enter the following formula in D2 (this is the column with

your
corrected data)
=IF(A2="",SUM(INDIRECT("A" & C2 & ":A" &

C2-(B2-C2-2)),INDIRECT("A"
&
B2 & ":A" & B2 + (B2-C2 - 2)))/(B2-C2-1),"")
and copy down




Notes:
this approach won't work in all circumstances. e.g.
- the first row is empty (as there is no data before
- the last row ist empty
- There are not enough rows before or after

--
Regards
Frank Kabel
Frankfurt, Germany

Michael DiCostanzo wrote:
I'm doing a project for a client where I have inherited someone
else's
data which is kept in Excel files. Here's what I want to do.

The data is updated each hour. Sometimes we get no data or bad

data
for short periods of time. A sample of the data is shown below.

In this example, there is no data for hours 5-6. Since we have
two hours worth of missing data, we are required to substitute
for the missing data by taking the average of the data two hours
before and two hours after the period when the data is missing.
(These are

the
rules we have to live by.)

In this case, the numbers would be calculated as follows:

(7.60 + 7.55 + 7.45 + 7.50) / 4 = 7.525

The value of 7.525 is then substituted into the empty cells for
hours 5-6 as shown in the Corrected_Data column.


Hour Raw_Data Corrected_Data
1 7.51 7.51
2 7.45 7.45
3 7.60 7.60
4 7.55 7.55
5 7.525
6 7.525
7 7.45 7.45
8 7.50 7.50

If there were 3 hours of missing data, then we would have
substituted for the missing data by taking the average of the

data
three hours before and three hours after the period when the data
is missing. I currently do the substitution manually. I really
want to append the raw data into a spreadsheet and then calculate
the Corrected_Data automatically.

Can anyone tell me how to do that in Excel? As I mentioned, I've
inherited this system, so I have to live with it for now. We're
looking to replace it and if the answer is that this can't be

done
in Excel, that's OK. I want to transition to a newer system as
smoothly as possible.

Thanks in advance





----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet
News==---- http://www.newsfeed.com The #1 Newsgroup Service in the
World! 100,000 Newsgroups ---= 19 East/West-Coast Specialized
Servers - Total Privacy via Encryption =---






----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet
News==---- http://www.newsfeed.com The #1 Newsgroup Service in the
World! 100,000 Newsgroups ---= 19 East/West-Coast Specialized
Servers - Total Privacy via Encryption =---


  #10   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2
Default Automatically substituting missing data

Thanks all of you for your quick responses. Let me answer some of the
questions you've raised.

First, there is plenty of historical data (2 years worth), so we're
never going to have a situation where there is no data at the start.
For the purposes of this exercise, you can assume that we have plenty
of data at the beginning.

Second, I need to clarify the rules of substitution better. For
periods of missing data, N [where N is the number of hours of missing
data (< 24 hours)] we take the average of the readings N hours before
and N hours after the period of missing data. We can also have more
than one period of missing data in a 24 hour period.

So in the example I gave, we could have the following situation:

Hour Raw_Data Corrected_Data
1 7.51 7.51
2 7.45 7.45
3 7.60 7.60
4 7.55 7.55
5 7.525
6 7.525
7 7.45 7.45
8 7.50 7.50
9 7.25 7.25
10 7.30 7.30
11 7.45 7.45
12 7.30 7.30
13 7.30 7.30
14 7.30 7.30
15 7.30 7.30
16 7.45 7.40
17 7.50 7.50
18 7.475
19 7.475
20 7.50 7.50
21 7.45 7.40
22 7.30 7.30
23 7.30 7.30
24 7.30 7.30

For hours 18-19, the missing data would be calculated as (7.45 + 7.5 +
7.5 + 7.45)/4 = 7.475.

In this situation, the calculation has to be done twice. I usually
substitute the missing data a day or two later. In this way I assure
myself that I have sufficient data after the periods of missing data
to do the calculation.

Third, you're going to ask, what about periods of missing data that
are longer than 24 hours? The rules that we must follow state that in
this case we must use the maximum 1 hour value from the previous 30
days, not counting the current day. That value is substituted for all
periods of missing data greater than 24 hours.

Obviously, we try to avoid that situation. I don't expect that we'll
get periods of missing data longer than 24 hours now because I'm
paying very close attention to the quality of the data now.

What I'd like to do is at least handle periods of missing data less
than 24 hours. I'd like to put the raw data into an Excel file and
then copy the formula down to handle the missing data for me.

Thanks again for your response, the formula you provided does work.


Michael



"Frank Kabel" wrote in message ...
Hi Michael
try the following approach (using helper columns to make the formulas a
little bit less complex)
- Add a column B as helper column and enter the following array formula
in B2 (this is your second data row as row 1 always should contain
data). Formula ist entered with CTRL+SHFT+ENTER
=IF(A2="",MIN(IF(A3:$A$20<"",ROW(A3:$A$20),20))," ")
copy down for all rows

- add the helper column C and enter the following formula in C2 (also
array entered)
=IF(A2="",MAX(IF(A$1:A1<"",ROW($A$1:A1),0)),"")
and copy down

- if you like you can hide columns B+C
- now enter the following formula in D2 (this is the column with your
corrected data)
=IF(A2="",SUM(INDIRECT("A" & C2 & ":A" & C2-(B2-C2-2)),INDIRECT("A" &
B2 & ":A" & B2 + (B2-C2 - 2)))/(B2-C2-1),"")
and copy down




Notes:
this approach won't work in all circumstances. e.g.
- the first row is empty (as there is no data before
- the last row ist empty
- There are not enough rows before or after

--
Regards
Frank Kabel
Frankfurt, Germany

Michael DiCostanzo wrote:
I'm doing a project for a client where I have inherited someone

else's
data which is kept in Excel files. Here's what I want to do.

The data is updated each hour. Sometimes we get no data or bad data
for short periods of time. A sample of the data is shown below.

In this example, there is no data for hours 5-6. Since we have two
hours worth of missing data, we are required to substitute for the
missing data by taking the average of the data two hours before and
two hours after the period when the data is missing. (These are the
rules we have to live by.)

In this case, the numbers would be calculated as follows:

(7.60 + 7.55 + 7.45 + 7.50) / 4 = 7.525

The value of 7.525 is then substituted into the empty cells for hours
5-6 as shown in the Corrected_Data column.


Hour Raw_Data Corrected_Data
1 7.51 7.51
2 7.45 7.45
3 7.60 7.60
4 7.55 7.55
5 7.525
6 7.525
7 7.45 7.45
8 7.50 7.50

If there were 3 hours of missing data, then we would have substituted
for the missing data by taking the average of the data three hours
before and three hours after the period when the data is missing. I
currently do the substitution manually. I really want to append the
raw data into a spreadsheet and then calculate the Corrected_Data
automatically.

Can anyone tell me how to do that in Excel? As I mentioned, I've
inherited this system, so I have to live with it for now. We're
looking to replace it and if the answer is that this can't be done in
Excel, that's OK. I want to transition to a newer system as smoothly
as possible.

Thanks in advance



  #11   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 27,285
Default Automatically substituting missing data

So the question is answered or are you still looking for an answer?

--
Regards,
Tom Ogilvy



"Michael DiCostanzo" wrote in message
om...
Thanks all of you for your quick responses. Let me answer some of the
questions you've raised.

First, there is plenty of historical data (2 years worth), so we're
never going to have a situation where there is no data at the start.
For the purposes of this exercise, you can assume that we have plenty
of data at the beginning.

Second, I need to clarify the rules of substitution better. For
periods of missing data, N [where N is the number of hours of missing
data (< 24 hours)] we take the average of the readings N hours before
and N hours after the period of missing data. We can also have more
than one period of missing data in a 24 hour period.

So in the example I gave, we could have the following situation:

Hour Raw_Data Corrected_Data
1 7.51 7.51
2 7.45 7.45
3 7.60 7.60
4 7.55 7.55
5 7.525
6 7.525
7 7.45 7.45
8 7.50 7.50
9 7.25 7.25
10 7.30 7.30
11 7.45 7.45
12 7.30 7.30
13 7.30 7.30
14 7.30 7.30
15 7.30 7.30
16 7.45 7.40
17 7.50 7.50
18 7.475
19 7.475
20 7.50 7.50
21 7.45 7.40
22 7.30 7.30
23 7.30 7.30
24 7.30 7.30

For hours 18-19, the missing data would be calculated as (7.45 + 7.5 +
7.5 + 7.45)/4 = 7.475.

In this situation, the calculation has to be done twice. I usually
substitute the missing data a day or two later. In this way I assure
myself that I have sufficient data after the periods of missing data
to do the calculation.

Third, you're going to ask, what about periods of missing data that
are longer than 24 hours? The rules that we must follow state that in
this case we must use the maximum 1 hour value from the previous 30
days, not counting the current day. That value is substituted for all
periods of missing data greater than 24 hours.

Obviously, we try to avoid that situation. I don't expect that we'll
get periods of missing data longer than 24 hours now because I'm
paying very close attention to the quality of the data now.

What I'd like to do is at least handle periods of missing data less
than 24 hours. I'd like to put the raw data into an Excel file and
then copy the formula down to handle the missing data for me.

Thanks again for your response, the formula you provided does work.


Michael



"Frank Kabel" wrote in message

...
Hi Michael
try the following approach (using helper columns to make the formulas a
little bit less complex)
- Add a column B as helper column and enter the following array formula
in B2 (this is your second data row as row 1 always should contain
data). Formula ist entered with CTRL+SHFT+ENTER
=IF(A2="",MIN(IF(A3:$A$20<"",ROW(A3:$A$20),20))," ")
copy down for all rows

- add the helper column C and enter the following formula in C2 (also
array entered)
=IF(A2="",MAX(IF(A$1:A1<"",ROW($A$1:A1),0)),"")
and copy down

- if you like you can hide columns B+C
- now enter the following formula in D2 (this is the column with your
corrected data)
=IF(A2="",SUM(INDIRECT("A" & C2 & ":A" & C2-(B2-C2-2)),INDIRECT("A" &
B2 & ":A" & B2 + (B2-C2 - 2)))/(B2-C2-1),"")
and copy down




Notes:
this approach won't work in all circumstances. e.g.
- the first row is empty (as there is no data before
- the last row ist empty
- There are not enough rows before or after

--
Regards
Frank Kabel
Frankfurt, Germany

Michael DiCostanzo wrote:
I'm doing a project for a client where I have inherited someone

else's
data which is kept in Excel files. Here's what I want to do.

The data is updated each hour. Sometimes we get no data or bad data
for short periods of time. A sample of the data is shown below.

In this example, there is no data for hours 5-6. Since we have two
hours worth of missing data, we are required to substitute for the
missing data by taking the average of the data two hours before and
two hours after the period when the data is missing. (These are the
rules we have to live by.)

In this case, the numbers would be calculated as follows:

(7.60 + 7.55 + 7.45 + 7.50) / 4 = 7.525

The value of 7.525 is then substituted into the empty cells for hours
5-6 as shown in the Corrected_Data column.


Hour Raw_Data Corrected_Data
1 7.51 7.51
2 7.45 7.45
3 7.60 7.60
4 7.55 7.55
5 7.525
6 7.525
7 7.45 7.45
8 7.50 7.50

If there were 3 hours of missing data, then we would have substituted
for the missing data by taking the average of the data three hours
before and three hours after the period when the data is missing. I
currently do the substitution manually. I really want to append the
raw data into a spreadsheet and then calculate the Corrected_Data
automatically.

Can anyone tell me how to do that in Excel? As I mentioned, I've
inherited this system, so I have to live with it for now. We're
looking to replace it and if the answer is that this can't be done in
Excel, that's OK. I want to transition to a newer system as smoothly
as possible.

Thanks in advance



  #12   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 218
Default Automatically substituting missing data

What is the rule when N is greater than the nonblank hours
following a blank period. In this case, averaging forward
N values would include blank values:-

Hour Raw_Data Corrected_Data
1 7.51 7.51
2 7.45 7.45
3 7.60 7.60
4 7.55 7.55
5 ?
6 ?
7 ?
8 ?
9 7.25 7.25
10 7.30 7.30
11 7.45 7.45
12 7.375
13 7.30 7.30
14 7.30 7.30
15 7.30 7.30
16 7.45 7.40
17 7.50 7.50
18 7.475
19 7.475
20 7.50 7.50
21 7.45 7.40
22 7.30 7.30
23 7.30 7.30
24 7.30 7.30

The macro approach doesn't require hidden columns if you
prefer. I updated my macro to include the scenario where
there is blank data for more than 24 hours. In this case
it will substitute the maximum value for the 720 hrs (i.e.
24 * 30) preceeding the blank period. Tested under highly
simplified conditions. Paste to a standard code module
and correct for wordwrap if necessary.

Sub CorrectData()
Dim BlankRng As Range, SrcRng As Range
Dim Rng As Range, C As Range, CC As Range
Dim Rw As Long, i As Integer
Dim SubsVal As Single
i = 0
Rw = Cells(Rows.Count, "A").End(xlUp).Row
Set Rng = Range(Cells(1, 2), Cells(Rw, 2))
For Each C In Rng
If C = "" Then
i = i + 1
ElseIf i 24 Then
C.Offset(, 1) = C
Set SrcRng = Range(C.Offset(-i - 1), _
C.Offset(-i - 720)) '720 = 24 * 30 days
SubsVal = Application.Max(SrcRng)
Set BlankRng = Range(C.Offset(-i), C.Offset(-1))
For Each CC In BlankRng
CC.Offset(, 1) = SubsVal
Next
i = 0
ElseIf i 0 Then
C.Offset(, 1) = C
Set SrcRng = Range(Range(C.Offset(-i - 1), _
C.Offset(-2 * i)), Range(C, C.Offset(i - 1)))
SubsVal = Application.Average(SrcRng)
Set BlankRng = Range(C.Offset(-i), C.Offset(-1))
For Each CC In BlankRng
CC.Offset(, 1) = SubsVal
Next
i = 0
Else
C.Offset(, 1) = C
End If
Next

Regards,
Greg
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Substituting TODAY() for DATE(a,b,c) Bad_Shot[_2_] New Users to Excel 3 January 15th 09 11:26 PM
Substituting for special characters JPyle Excel Discussion (Misc queries) 4 March 10th 08 06:04 PM
Substituting values TLC Excel Discussion (Misc queries) 1 October 22nd 07 07:38 PM
Substituting for cell Dave Excel Discussion (Misc queries) 3 August 24th 07 11:31 PM
Replacing or Substituting Text Naya Excel Worksheet Functions 1 September 22nd 05 02:55 AM


All times are GMT +1. The time now is 05:09 AM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"