Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
Automatically substituting missing data
I'm doing a project for a client where I have inherited someone else's
data which is kept in Excel files. Here's what I want to do. The data is updated each hour. Sometimes we get no data or bad data for short periods of time. A sample of the data is shown below. In this example, there is no data for hours 5-6. Since we have two hours worth of missing data, we are required to substitute for the missing data by taking the average of the data two hours before and two hours after the period when the data is missing. (These are the rules we have to live by.) In this case, the numbers would be calculated as follows: (7.60 + 7.55 + 7.45 + 7.50) / 4 = 7.525 The value of 7.525 is then substituted into the empty cells for hours 5-6 as shown in the Corrected_Data column. Hour Raw_Data Corrected_Data 1 7.51 7.51 2 7.45 7.45 3 7.60 7.60 4 7.55 7.55 5 7.525 6 7.525 7 7.45 7.45 8 7.50 7.50 If there were 3 hours of missing data, then we would have substituted for the missing data by taking the average of the data three hours before and three hours after the period when the data is missing. I currently do the substitution manually. I really want to append the raw data into a spreadsheet and then calculate the Corrected_Data automatically. Can anyone tell me how to do that in Excel? As I mentioned, I've inherited this system, so I have to live with it for now. We're looking to replace it and if the answer is that this can't be done in Excel, that's OK. I want to transition to a newer system as smoothly as possible. Thanks in advance |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
Automatically substituting missing data
This is very possible in Excel but it will require a VBA programme. A
couple of observations from your rules. 1. The number of periods pre and post,as you say, are dependent on the number of missing values. What happens if there are insufficient pre and post data values to provide the average? 2. The answer to the first question might resolve the difficulty I am having with the fact that data comes every hour. When do you need to calculate and know there are missing values? Is this driven by some internal clock or at the end of the day or some other control? Since you would not know if data is missing or if there are sufficient real values to calculate the missing entries every hour. Perhaps if you can help shed some light on how you want to address this conundrum, we can provide some code. Cheers Nigel "Michael DiCostanzo" wrote in message m... I'm doing a project for a client where I have inherited someone else's data which is kept in Excel files. Here's what I want to do. The data is updated each hour. Sometimes we get no data or bad data for short periods of time. A sample of the data is shown below. In this example, there is no data for hours 5-6. Since we have two hours worth of missing data, we are required to substitute for the missing data by taking the average of the data two hours before and two hours after the period when the data is missing. (These are the rules we have to live by.) In this case, the numbers would be calculated as follows: (7.60 + 7.55 + 7.45 + 7.50) / 4 = 7.525 The value of 7.525 is then substituted into the empty cells for hours 5-6 as shown in the Corrected_Data column. Hour Raw_Data Corrected_Data 1 7.51 7.51 2 7.45 7.45 3 7.60 7.60 4 7.55 7.55 5 7.525 6 7.525 7 7.45 7.45 8 7.50 7.50 If there were 3 hours of missing data, then we would have substituted for the missing data by taking the average of the data three hours before and three hours after the period when the data is missing. I currently do the substitution manually. I really want to append the raw data into a spreadsheet and then calculate the Corrected_Data automatically. Can anyone tell me how to do that in Excel? As I mentioned, I've inherited this system, so I have to live with it for now. We're looking to replace it and if the answer is that this can't be done in Excel, that's OK. I want to transition to a newer system as smoothly as possible. Thanks in advance ----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==---- http://www.newsfeed.com The #1 Newsgroup Service in the World! 100,000 Newsgroups ---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =--- |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
Automatically substituting missing data
Hi Michael
try the following approach (using helper columns to make the formulas a little bit less complex) - Add a column B as helper column and enter the following array formula in B2 (this is your second data row as row 1 always should contain data). Formula ist entered with CTRL+SHFT+ENTER =IF(A2="",MIN(IF(A3:$A$20<"",ROW(A3:$A$20),20))," ") copy down for all rows - add the helper column C and enter the following formula in C2 (also array entered) =IF(A2="",MAX(IF(A$1:A1<"",ROW($A$1:A1),0)),"") and copy down - if you like you can hide columns B+C - now enter the following formula in D2 (this is the column with your corrected data) =IF(A2="",SUM(INDIRECT("A" & C2 & ":A" & C2-(B2-C2-2)),INDIRECT("A" & B2 & ":A" & B2 + (B2-C2 - 2)))/(B2-C2-1),"") and copy down Notes: this approach won't work in all circumstances. e.g. - the first row is empty (as there is no data before - the last row ist empty - There are not enough rows before or after -- Regards Frank Kabel Frankfurt, Germany Michael DiCostanzo wrote: I'm doing a project for a client where I have inherited someone else's data which is kept in Excel files. Here's what I want to do. The data is updated each hour. Sometimes we get no data or bad data for short periods of time. A sample of the data is shown below. In this example, there is no data for hours 5-6. Since we have two hours worth of missing data, we are required to substitute for the missing data by taking the average of the data two hours before and two hours after the period when the data is missing. (These are the rules we have to live by.) In this case, the numbers would be calculated as follows: (7.60 + 7.55 + 7.45 + 7.50) / 4 = 7.525 The value of 7.525 is then substituted into the empty cells for hours 5-6 as shown in the Corrected_Data column. Hour Raw_Data Corrected_Data 1 7.51 7.51 2 7.45 7.45 3 7.60 7.60 4 7.55 7.55 5 7.525 6 7.525 7 7.45 7.45 8 7.50 7.50 If there were 3 hours of missing data, then we would have substituted for the missing data by taking the average of the data three hours before and three hours after the period when the data is missing. I currently do the substitution manually. I really want to append the raw data into a spreadsheet and then calculate the Corrected_Data automatically. Can anyone tell me how to do that in Excel? As I mentioned, I've inherited this system, so I have to live with it for now. We're looking to replace it and if the answer is that this can't be done in Excel, that's OK. I want to transition to a newer system as smoothly as possible. Thanks in advance |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
Automatically substituting missing data
Hi Nigel
no need for VBA, can be done with formulas :-) Though there're some missing spots in the OP's specification -- Regards Frank Kabel Frankfurt, Germany Nigel wrote: This is very possible in Excel but it will require a VBA programme. A couple of observations from your rules. 1. The number of periods pre and post,as you say, are dependent on the number of missing values. What happens if there are insufficient pre and post data values to provide the average? 2. The answer to the first question might resolve the difficulty I am having with the fact that data comes every hour. When do you need to calculate and know there are missing values? Is this driven by some internal clock or at the end of the day or some other control? Since you would not know if data is missing or if there are sufficient real values to calculate the missing entries every hour. Perhaps if you can help shed some light on how you want to address this conundrum, we can provide some code. Cheers Nigel "Michael DiCostanzo" wrote in message m... I'm doing a project for a client where I have inherited someone else's data which is kept in Excel files. Here's what I want to do. The data is updated each hour. Sometimes we get no data or bad data for short periods of time. A sample of the data is shown below. In this example, there is no data for hours 5-6. Since we have two hours worth of missing data, we are required to substitute for the missing data by taking the average of the data two hours before and two hours after the period when the data is missing. (These are the rules we have to live by.) In this case, the numbers would be calculated as follows: (7.60 + 7.55 + 7.45 + 7.50) / 4 = 7.525 The value of 7.525 is then substituted into the empty cells for hours 5-6 as shown in the Corrected_Data column. Hour Raw_Data Corrected_Data 1 7.51 7.51 2 7.45 7.45 3 7.60 7.60 4 7.55 7.55 5 7.525 6 7.525 7 7.45 7.45 8 7.50 7.50 If there were 3 hours of missing data, then we would have substituted for the missing data by taking the average of the data three hours before and three hours after the period when the data is missing. I currently do the substitution manually. I really want to append the raw data into a spreadsheet and then calculate the Corrected_Data automatically. Can anyone tell me how to do that in Excel? As I mentioned, I've inherited this system, so I have to live with it for now. We're looking to replace it and if the answer is that this can't be done in Excel, that's OK. I want to transition to a newer system as smoothly as possible. Thanks in advance ----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==---- http://www.newsfeed.com The #1 Newsgroup Service in the World! 100,000 Newsgroups ---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =--- |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
Automatically substituting missing data
This was rather rushed but I think it does what you want.
Note that there is a potential complication not accounted for, specifically: If a period of missing data exceeds the gap between the next period of missing data then the forward averaging will include blank cells thus returning an inappropriate result. This can be corrected for but it will make the code substantially more complicated. Your description didn't take this into account so I assume it's not very likely. If you need this corrected for you'll have to specify the rules. You may have to correct for wordwrap corruption of the code. Assumptions: 1) Hourly data is in Column A 2) Raw data is in Column B 3) Corrected data is to be placed in Column C Sub CorrectData() Dim BlankRng As Range, AvgRng As Range Dim Rng As Range, C As Range, CC As Range Dim Rw As Long, i As Integer Dim Avg As Single i = 0 Rw = Cells(Rows.Count, "A").End(xlUp).Row Set Rng = Range(Cells(1, 2), Cells(Rw, 2)) For Each C In Rng If C = "" Then i = i + 1 ElseIf i 0 Then C.Offset(, 1) = C Set AvgRng = Range(Range(C.Offset(-i - 1), _ C.Offset(-2 * i)), Range(C, C.Offset(i - 1))) Avg = Application.Average(AvgRng) Set BlankRng = Range(C.Offset(-i), C.Offset(-1)) For Each CC In BlankRng CC.Offset(, 1) = Avg Next i = 0 Else C.Offset(, 1) = C End If Next End Sub Be advised I'm just a VBA student. Regards, Greg |
#6
Posted to microsoft.public.excel.programming
|
|||
|
|||
Automatically substituting missing data
Nice one Frank! - but the missing data elements would cause me a problem.
You still end up with missing data and the objective was to resolve this. I think we need to get the OP to clarify how to deal with this. Cheers Nigel "Frank Kabel" wrote in message ... Hi Michael try the following approach (using helper columns to make the formulas a little bit less complex) - Add a column B as helper column and enter the following array formula in B2 (this is your second data row as row 1 always should contain data). Formula ist entered with CTRL+SHFT+ENTER =IF(A2="",MIN(IF(A3:$A$20<"",ROW(A3:$A$20),20))," ") copy down for all rows - add the helper column C and enter the following formula in C2 (also array entered) =IF(A2="",MAX(IF(A$1:A1<"",ROW($A$1:A1),0)),"") and copy down - if you like you can hide columns B+C - now enter the following formula in D2 (this is the column with your corrected data) =IF(A2="",SUM(INDIRECT("A" & C2 & ":A" & C2-(B2-C2-2)),INDIRECT("A" & B2 & ":A" & B2 + (B2-C2 - 2)))/(B2-C2-1),"") and copy down Notes: this approach won't work in all circumstances. e.g. - the first row is empty (as there is no data before - the last row ist empty - There are not enough rows before or after -- Regards Frank Kabel Frankfurt, Germany Michael DiCostanzo wrote: I'm doing a project for a client where I have inherited someone else's data which is kept in Excel files. Here's what I want to do. The data is updated each hour. Sometimes we get no data or bad data for short periods of time. A sample of the data is shown below. In this example, there is no data for hours 5-6. Since we have two hours worth of missing data, we are required to substitute for the missing data by taking the average of the data two hours before and two hours after the period when the data is missing. (These are the rules we have to live by.) In this case, the numbers would be calculated as follows: (7.60 + 7.55 + 7.45 + 7.50) / 4 = 7.525 The value of 7.525 is then substituted into the empty cells for hours 5-6 as shown in the Corrected_Data column. Hour Raw_Data Corrected_Data 1 7.51 7.51 2 7.45 7.45 3 7.60 7.60 4 7.55 7.55 5 7.525 6 7.525 7 7.45 7.45 8 7.50 7.50 If there were 3 hours of missing data, then we would have substituted for the missing data by taking the average of the data three hours before and three hours after the period when the data is missing. I currently do the substitution manually. I really want to append the raw data into a spreadsheet and then calculate the Corrected_Data automatically. Can anyone tell me how to do that in Excel? As I mentioned, I've inherited this system, so I have to live with it for now. We're looking to replace it and if the answer is that this can't be done in Excel, that's OK. I want to transition to a newer system as smoothly as possible. Thanks in advance ----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==---- http://www.newsfeed.com The #1 Newsgroup Service in the World! 100,000 Newsgroups ---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =--- |
#7
Posted to microsoft.public.excel.programming
|
|||
|
|||
Automatically substituting missing data
Hi Nigel
I agree that the OP has to clarify his specification :-) But what do you mean with 'Still end with missing data elements'. At least for his example this works (though I'm also wondering how such source data could be created...) Greetings Frank -- Regards Frank Kabel Frankfurt, Germany Nigel wrote: Nice one Frank! - but the missing data elements would cause me a problem. You still end up with missing data and the objective was to resolve this. I think we need to get the OP to clarify how to deal with this. Cheers Nigel "Frank Kabel" wrote in message ... Hi Michael try the following approach (using helper columns to make the formulas a little bit less complex) - Add a column B as helper column and enter the following array formula in B2 (this is your second data row as row 1 always should contain data). Formula ist entered with CTRL+SHFT+ENTER =IF(A2="",MIN(IF(A3:$A$20<"",ROW(A3:$A$20),20))," ") copy down for all rows - add the helper column C and enter the following formula in C2 (also array entered) =IF(A2="",MAX(IF(A$1:A1<"",ROW($A$1:A1),0)),"") and copy down - if you like you can hide columns B+C - now enter the following formula in D2 (this is the column with your corrected data) =IF(A2="",SUM(INDIRECT("A" & C2 & ":A" & C2-(B2-C2-2)),INDIRECT("A" & B2 & ":A" & B2 + (B2-C2 - 2)))/(B2-C2-1),"") and copy down Notes: this approach won't work in all circumstances. e.g. - the first row is empty (as there is no data before - the last row ist empty - There are not enough rows before or after -- Regards Frank Kabel Frankfurt, Germany Michael DiCostanzo wrote: I'm doing a project for a client where I have inherited someone else's data which is kept in Excel files. Here's what I want to do. The data is updated each hour. Sometimes we get no data or bad data for short periods of time. A sample of the data is shown below. In this example, there is no data for hours 5-6. Since we have two hours worth of missing data, we are required to substitute for the missing data by taking the average of the data two hours before and two hours after the period when the data is missing. (These are the rules we have to live by.) In this case, the numbers would be calculated as follows: (7.60 + 7.55 + 7.45 + 7.50) / 4 = 7.525 The value of 7.525 is then substituted into the empty cells for hours 5-6 as shown in the Corrected_Data column. Hour Raw_Data Corrected_Data 1 7.51 7.51 2 7.45 7.45 3 7.60 7.60 4 7.55 7.55 5 7.525 6 7.525 7 7.45 7.45 8 7.50 7.50 If there were 3 hours of missing data, then we would have substituted for the missing data by taking the average of the data three hours before and three hours after the period when the data is missing. I currently do the substitution manually. I really want to append the raw data into a spreadsheet and then calculate the Corrected_Data automatically. Can anyone tell me how to do that in Excel? As I mentioned, I've inherited this system, so I have to live with it for now. We're looking to replace it and if the answer is that this can't be done in Excel, that's OK. I want to transition to a newer system as smoothly as possible. Thanks in advance ----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==---- http://www.newsfeed.com The #1 Newsgroup Service in the World! 100,000 Newsgroups ---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =--- |
#8
Posted to microsoft.public.excel.programming
|
|||
|
|||
Automatically substituting missing data
Hi Frank,
Is it not the case that if there insufficient entries pre and post the missing entries that you cannot (actually should not) compute the intermediate values, hence you still having missing data. That was my reference to 'Still end with missing data elements' If data remains missing or is incalculable by some form of rule induction then this should be declared as such, alternatively new rules for this induction need to be prescribed. The OP needs to provide these rules based on his analytical process. Cheers Nigel "Frank Kabel" wrote in message ... Hi Nigel I agree that the OP has to clarify his specification :-) But what do you mean with 'Still end with missing data elements'. At least for his example this works (though I'm also wondering how such source data could be created...) Greetings Frank -- Regards Frank Kabel Frankfurt, Germany Nigel wrote: Nice one Frank! - but the missing data elements would cause me a problem. You still end up with missing data and the objective was to resolve this. I think we need to get the OP to clarify how to deal with this. Cheers Nigel "Frank Kabel" wrote in message ... Hi Michael try the following approach (using helper columns to make the formulas a little bit less complex) - Add a column B as helper column and enter the following array formula in B2 (this is your second data row as row 1 always should contain data). Formula ist entered with CTRL+SHFT+ENTER =IF(A2="",MIN(IF(A3:$A$20<"",ROW(A3:$A$20),20))," ") copy down for all rows - add the helper column C and enter the following formula in C2 (also array entered) =IF(A2="",MAX(IF(A$1:A1<"",ROW($A$1:A1),0)),"") and copy down - if you like you can hide columns B+C - now enter the following formula in D2 (this is the column with your corrected data) =IF(A2="",SUM(INDIRECT("A" & C2 & ":A" & C2-(B2-C2-2)),INDIRECT("A" & B2 & ":A" & B2 + (B2-C2 - 2)))/(B2-C2-1),"") and copy down Notes: this approach won't work in all circumstances. e.g. - the first row is empty (as there is no data before - the last row ist empty - There are not enough rows before or after -- Regards Frank Kabel Frankfurt, Germany Michael DiCostanzo wrote: I'm doing a project for a client where I have inherited someone else's data which is kept in Excel files. Here's what I want to do. The data is updated each hour. Sometimes we get no data or bad data for short periods of time. A sample of the data is shown below. In this example, there is no data for hours 5-6. Since we have two hours worth of missing data, we are required to substitute for the missing data by taking the average of the data two hours before and two hours after the period when the data is missing. (These are the rules we have to live by.) In this case, the numbers would be calculated as follows: (7.60 + 7.55 + 7.45 + 7.50) / 4 = 7.525 The value of 7.525 is then substituted into the empty cells for hours 5-6 as shown in the Corrected_Data column. Hour Raw_Data Corrected_Data 1 7.51 7.51 2 7.45 7.45 3 7.60 7.60 4 7.55 7.55 5 7.525 6 7.525 7 7.45 7.45 8 7.50 7.50 If there were 3 hours of missing data, then we would have substituted for the missing data by taking the average of the data three hours before and three hours after the period when the data is missing. I currently do the substitution manually. I really want to append the raw data into a spreadsheet and then calculate the Corrected_Data automatically. Can anyone tell me how to do that in Excel? As I mentioned, I've inherited this system, so I have to live with it for now. We're looking to replace it and if the answer is that this can't be done in Excel, that's OK. I want to transition to a newer system as smoothly as possible. Thanks in advance ----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==---- http://www.newsfeed.com The #1 Newsgroup Service in the World! 100,000 Newsgroups ---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =--- ----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==---- http://www.newsfeed.com The #1 Newsgroup Service in the World! 100,000 Newsgroups ---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =--- |
#9
Posted to microsoft.public.excel.programming
|
|||
|
|||
Automatically substituting missing data
Hi Nigel
ah, now I understand and yes you're corect (e.g. if the first entry is missing or the second and third) -- Regards Frank Kabel Frankfurt, Germany Nigel wrote: Hi Frank, Is it not the case that if there insufficient entries pre and post the missing entries that you cannot (actually should not) compute the intermediate values, hence you still having missing data. That was my reference to 'Still end with missing data elements' If data remains missing or is incalculable by some form of rule induction then this should be declared as such, alternatively new rules for this induction need to be prescribed. The OP needs to provide these rules based on his analytical process. Cheers Nigel "Frank Kabel" wrote in message ... Hi Nigel I agree that the OP has to clarify his specification :-) But what do you mean with 'Still end with missing data elements'. At least for his example this works (though I'm also wondering how such source data could be created...) Greetings Frank -- Regards Frank Kabel Frankfurt, Germany Nigel wrote: Nice one Frank! - but the missing data elements would cause me a problem. You still end up with missing data and the objective was to resolve this. I think we need to get the OP to clarify how to deal with this. Cheers Nigel "Frank Kabel" wrote in message ... Hi Michael try the following approach (using helper columns to make the formulas a little bit less complex) - Add a column B as helper column and enter the following array formula in B2 (this is your second data row as row 1 always should contain data). Formula ist entered with CTRL+SHFT+ENTER =IF(A2="",MIN(IF(A3:$A$20<"",ROW(A3:$A$20),20))," ") copy down for all rows - add the helper column C and enter the following formula in C2 (also array entered) =IF(A2="",MAX(IF(A$1:A1<"",ROW($A$1:A1),0)),"") and copy down - if you like you can hide columns B+C - now enter the following formula in D2 (this is the column with your corrected data) =IF(A2="",SUM(INDIRECT("A" & C2 & ":A" & C2-(B2-C2-2)),INDIRECT("A" & B2 & ":A" & B2 + (B2-C2 - 2)))/(B2-C2-1),"") and copy down Notes: this approach won't work in all circumstances. e.g. - the first row is empty (as there is no data before - the last row ist empty - There are not enough rows before or after -- Regards Frank Kabel Frankfurt, Germany Michael DiCostanzo wrote: I'm doing a project for a client where I have inherited someone else's data which is kept in Excel files. Here's what I want to do. The data is updated each hour. Sometimes we get no data or bad data for short periods of time. A sample of the data is shown below. In this example, there is no data for hours 5-6. Since we have two hours worth of missing data, we are required to substitute for the missing data by taking the average of the data two hours before and two hours after the period when the data is missing. (These are the rules we have to live by.) In this case, the numbers would be calculated as follows: (7.60 + 7.55 + 7.45 + 7.50) / 4 = 7.525 The value of 7.525 is then substituted into the empty cells for hours 5-6 as shown in the Corrected_Data column. Hour Raw_Data Corrected_Data 1 7.51 7.51 2 7.45 7.45 3 7.60 7.60 4 7.55 7.55 5 7.525 6 7.525 7 7.45 7.45 8 7.50 7.50 If there were 3 hours of missing data, then we would have substituted for the missing data by taking the average of the data three hours before and three hours after the period when the data is missing. I currently do the substitution manually. I really want to append the raw data into a spreadsheet and then calculate the Corrected_Data automatically. Can anyone tell me how to do that in Excel? As I mentioned, I've inherited this system, so I have to live with it for now. We're looking to replace it and if the answer is that this can't be done in Excel, that's OK. I want to transition to a newer system as smoothly as possible. Thanks in advance ----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==---- http://www.newsfeed.com The #1 Newsgroup Service in the World! 100,000 Newsgroups ---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =--- ----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==---- http://www.newsfeed.com The #1 Newsgroup Service in the World! 100,000 Newsgroups ---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =--- |
#10
Posted to microsoft.public.excel.programming
|
|||
|
|||
Automatically substituting missing data
Thanks all of you for your quick responses. Let me answer some of the
questions you've raised. First, there is plenty of historical data (2 years worth), so we're never going to have a situation where there is no data at the start. For the purposes of this exercise, you can assume that we have plenty of data at the beginning. Second, I need to clarify the rules of substitution better. For periods of missing data, N [where N is the number of hours of missing data (< 24 hours)] we take the average of the readings N hours before and N hours after the period of missing data. We can also have more than one period of missing data in a 24 hour period. So in the example I gave, we could have the following situation: Hour Raw_Data Corrected_Data 1 7.51 7.51 2 7.45 7.45 3 7.60 7.60 4 7.55 7.55 5 7.525 6 7.525 7 7.45 7.45 8 7.50 7.50 9 7.25 7.25 10 7.30 7.30 11 7.45 7.45 12 7.30 7.30 13 7.30 7.30 14 7.30 7.30 15 7.30 7.30 16 7.45 7.40 17 7.50 7.50 18 7.475 19 7.475 20 7.50 7.50 21 7.45 7.40 22 7.30 7.30 23 7.30 7.30 24 7.30 7.30 For hours 18-19, the missing data would be calculated as (7.45 + 7.5 + 7.5 + 7.45)/4 = 7.475. In this situation, the calculation has to be done twice. I usually substitute the missing data a day or two later. In this way I assure myself that I have sufficient data after the periods of missing data to do the calculation. Third, you're going to ask, what about periods of missing data that are longer than 24 hours? The rules that we must follow state that in this case we must use the maximum 1 hour value from the previous 30 days, not counting the current day. That value is substituted for all periods of missing data greater than 24 hours. Obviously, we try to avoid that situation. I don't expect that we'll get periods of missing data longer than 24 hours now because I'm paying very close attention to the quality of the data now. What I'd like to do is at least handle periods of missing data less than 24 hours. I'd like to put the raw data into an Excel file and then copy the formula down to handle the missing data for me. Thanks again for your response, the formula you provided does work. Michael "Frank Kabel" wrote in message ... Hi Michael try the following approach (using helper columns to make the formulas a little bit less complex) - Add a column B as helper column and enter the following array formula in B2 (this is your second data row as row 1 always should contain data). Formula ist entered with CTRL+SHFT+ENTER =IF(A2="",MIN(IF(A3:$A$20<"",ROW(A3:$A$20),20))," ") copy down for all rows - add the helper column C and enter the following formula in C2 (also array entered) =IF(A2="",MAX(IF(A$1:A1<"",ROW($A$1:A1),0)),"") and copy down - if you like you can hide columns B+C - now enter the following formula in D2 (this is the column with your corrected data) =IF(A2="",SUM(INDIRECT("A" & C2 & ":A" & C2-(B2-C2-2)),INDIRECT("A" & B2 & ":A" & B2 + (B2-C2 - 2)))/(B2-C2-1),"") and copy down Notes: this approach won't work in all circumstances. e.g. - the first row is empty (as there is no data before - the last row ist empty - There are not enough rows before or after -- Regards Frank Kabel Frankfurt, Germany Michael DiCostanzo wrote: I'm doing a project for a client where I have inherited someone else's data which is kept in Excel files. Here's what I want to do. The data is updated each hour. Sometimes we get no data or bad data for short periods of time. A sample of the data is shown below. In this example, there is no data for hours 5-6. Since we have two hours worth of missing data, we are required to substitute for the missing data by taking the average of the data two hours before and two hours after the period when the data is missing. (These are the rules we have to live by.) In this case, the numbers would be calculated as follows: (7.60 + 7.55 + 7.45 + 7.50) / 4 = 7.525 The value of 7.525 is then substituted into the empty cells for hours 5-6 as shown in the Corrected_Data column. Hour Raw_Data Corrected_Data 1 7.51 7.51 2 7.45 7.45 3 7.60 7.60 4 7.55 7.55 5 7.525 6 7.525 7 7.45 7.45 8 7.50 7.50 If there were 3 hours of missing data, then we would have substituted for the missing data by taking the average of the data three hours before and three hours after the period when the data is missing. I currently do the substitution manually. I really want to append the raw data into a spreadsheet and then calculate the Corrected_Data automatically. Can anyone tell me how to do that in Excel? As I mentioned, I've inherited this system, so I have to live with it for now. We're looking to replace it and if the answer is that this can't be done in Excel, that's OK. I want to transition to a newer system as smoothly as possible. Thanks in advance |
#11
Posted to microsoft.public.excel.programming
|
|||
|
|||
Automatically substituting missing data
So the question is answered or are you still looking for an answer?
-- Regards, Tom Ogilvy "Michael DiCostanzo" wrote in message om... Thanks all of you for your quick responses. Let me answer some of the questions you've raised. First, there is plenty of historical data (2 years worth), so we're never going to have a situation where there is no data at the start. For the purposes of this exercise, you can assume that we have plenty of data at the beginning. Second, I need to clarify the rules of substitution better. For periods of missing data, N [where N is the number of hours of missing data (< 24 hours)] we take the average of the readings N hours before and N hours after the period of missing data. We can also have more than one period of missing data in a 24 hour period. So in the example I gave, we could have the following situation: Hour Raw_Data Corrected_Data 1 7.51 7.51 2 7.45 7.45 3 7.60 7.60 4 7.55 7.55 5 7.525 6 7.525 7 7.45 7.45 8 7.50 7.50 9 7.25 7.25 10 7.30 7.30 11 7.45 7.45 12 7.30 7.30 13 7.30 7.30 14 7.30 7.30 15 7.30 7.30 16 7.45 7.40 17 7.50 7.50 18 7.475 19 7.475 20 7.50 7.50 21 7.45 7.40 22 7.30 7.30 23 7.30 7.30 24 7.30 7.30 For hours 18-19, the missing data would be calculated as (7.45 + 7.5 + 7.5 + 7.45)/4 = 7.475. In this situation, the calculation has to be done twice. I usually substitute the missing data a day or two later. In this way I assure myself that I have sufficient data after the periods of missing data to do the calculation. Third, you're going to ask, what about periods of missing data that are longer than 24 hours? The rules that we must follow state that in this case we must use the maximum 1 hour value from the previous 30 days, not counting the current day. That value is substituted for all periods of missing data greater than 24 hours. Obviously, we try to avoid that situation. I don't expect that we'll get periods of missing data longer than 24 hours now because I'm paying very close attention to the quality of the data now. What I'd like to do is at least handle periods of missing data less than 24 hours. I'd like to put the raw data into an Excel file and then copy the formula down to handle the missing data for me. Thanks again for your response, the formula you provided does work. Michael "Frank Kabel" wrote in message ... Hi Michael try the following approach (using helper columns to make the formulas a little bit less complex) - Add a column B as helper column and enter the following array formula in B2 (this is your second data row as row 1 always should contain data). Formula ist entered with CTRL+SHFT+ENTER =IF(A2="",MIN(IF(A3:$A$20<"",ROW(A3:$A$20),20))," ") copy down for all rows - add the helper column C and enter the following formula in C2 (also array entered) =IF(A2="",MAX(IF(A$1:A1<"",ROW($A$1:A1),0)),"") and copy down - if you like you can hide columns B+C - now enter the following formula in D2 (this is the column with your corrected data) =IF(A2="",SUM(INDIRECT("A" & C2 & ":A" & C2-(B2-C2-2)),INDIRECT("A" & B2 & ":A" & B2 + (B2-C2 - 2)))/(B2-C2-1),"") and copy down Notes: this approach won't work in all circumstances. e.g. - the first row is empty (as there is no data before - the last row ist empty - There are not enough rows before or after -- Regards Frank Kabel Frankfurt, Germany Michael DiCostanzo wrote: I'm doing a project for a client where I have inherited someone else's data which is kept in Excel files. Here's what I want to do. The data is updated each hour. Sometimes we get no data or bad data for short periods of time. A sample of the data is shown below. In this example, there is no data for hours 5-6. Since we have two hours worth of missing data, we are required to substitute for the missing data by taking the average of the data two hours before and two hours after the period when the data is missing. (These are the rules we have to live by.) In this case, the numbers would be calculated as follows: (7.60 + 7.55 + 7.45 + 7.50) / 4 = 7.525 The value of 7.525 is then substituted into the empty cells for hours 5-6 as shown in the Corrected_Data column. Hour Raw_Data Corrected_Data 1 7.51 7.51 2 7.45 7.45 3 7.60 7.60 4 7.55 7.55 5 7.525 6 7.525 7 7.45 7.45 8 7.50 7.50 If there were 3 hours of missing data, then we would have substituted for the missing data by taking the average of the data three hours before and three hours after the period when the data is missing. I currently do the substitution manually. I really want to append the raw data into a spreadsheet and then calculate the Corrected_Data automatically. Can anyone tell me how to do that in Excel? As I mentioned, I've inherited this system, so I have to live with it for now. We're looking to replace it and if the answer is that this can't be done in Excel, that's OK. I want to transition to a newer system as smoothly as possible. Thanks in advance |
#12
Posted to microsoft.public.excel.programming
|
|||
|
|||
Automatically substituting missing data
What is the rule when N is greater than the nonblank hours
following a blank period. In this case, averaging forward N values would include blank values:- Hour Raw_Data Corrected_Data 1 7.51 7.51 2 7.45 7.45 3 7.60 7.60 4 7.55 7.55 5 ? 6 ? 7 ? 8 ? 9 7.25 7.25 10 7.30 7.30 11 7.45 7.45 12 7.375 13 7.30 7.30 14 7.30 7.30 15 7.30 7.30 16 7.45 7.40 17 7.50 7.50 18 7.475 19 7.475 20 7.50 7.50 21 7.45 7.40 22 7.30 7.30 23 7.30 7.30 24 7.30 7.30 The macro approach doesn't require hidden columns if you prefer. I updated my macro to include the scenario where there is blank data for more than 24 hours. In this case it will substitute the maximum value for the 720 hrs (i.e. 24 * 30) preceeding the blank period. Tested under highly simplified conditions. Paste to a standard code module and correct for wordwrap if necessary. Sub CorrectData() Dim BlankRng As Range, SrcRng As Range Dim Rng As Range, C As Range, CC As Range Dim Rw As Long, i As Integer Dim SubsVal As Single i = 0 Rw = Cells(Rows.Count, "A").End(xlUp).Row Set Rng = Range(Cells(1, 2), Cells(Rw, 2)) For Each C In Rng If C = "" Then i = i + 1 ElseIf i 24 Then C.Offset(, 1) = C Set SrcRng = Range(C.Offset(-i - 1), _ C.Offset(-i - 720)) '720 = 24 * 30 days SubsVal = Application.Max(SrcRng) Set BlankRng = Range(C.Offset(-i), C.Offset(-1)) For Each CC In BlankRng CC.Offset(, 1) = SubsVal Next i = 0 ElseIf i 0 Then C.Offset(, 1) = C Set SrcRng = Range(Range(C.Offset(-i - 1), _ C.Offset(-2 * i)), Range(C, C.Offset(i - 1))) SubsVal = Application.Average(SrcRng) Set BlankRng = Range(C.Offset(-i), C.Offset(-1)) For Each CC In BlankRng CC.Offset(, 1) = SubsVal Next i = 0 Else C.Offset(, 1) = C End If Next Regards, Greg |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Substituting TODAY() for DATE(a,b,c) | New Users to Excel | |||
Substituting for special characters | Excel Discussion (Misc queries) | |||
Substituting values | Excel Discussion (Misc queries) | |||
Substituting for cell | Excel Discussion (Misc queries) | |||
Replacing or Substituting Text | Excel Worksheet Functions |