![]() |
Automating Extrapolation
Dear all, I have a large dataset of global climate data, sourced from copious meteorological stations throughout the world. However, due to equipment failure (etc etc), the dataset is not entirely complete. I have a macro which will interpolate parameter values where there is a value either side of a blank cell (ie. a linear interpolation based on the surrounding values), yet this is unable to be performed on cells at the start or end of a time series (ie. the value for the first or last cell of a particular station). Just to clarify, a simplfied example is as follows: Station 1: A = GAP (=10) Station 1: B = 15 Station 1: C = 20 Station 1: D = 25 Station 1: E = 30 Station 1: F = 35 Station 1: G = GAP (=40) Station 2: A = GAP (=4) Station 2: B = 5 Station 2: C = 6 Station 2: D = 7 Station 2: E = 8 Station 2: F = 9 Station 2: G = GAP (=10) As shown above, for the inter-station gap between station 1 and 2, the values cannot be linearly interpolated based simply on the values surrounding the gap. I'm not a statistical/mathamatical expert, but I would guess that some form of regression formula could be used to fill the beginning and end values, based on the trend of the existing data for each individual station? Is there a way of designing (a macro in Visual Basic?) to automate this procedure in Excel 2002? - as doing it manually would be unfeasible due to the nature of the dataset size. Incidently, data which requires interpolation is in columns F to M (inclusive), and is represented as a blank cell; station reference is in column A in my dataset. Just a couple of other bits of information: extrapolation is required vertically down the columns, and not between rows. The time-step is consistant in all cases, and is on a yearly basis. If possible, would it be feasible to include a line (in a macro, say) that states that the operation should only be performed on blank cells at the beginning and end of the station data series (and not for any blanks in between, so as not to risk the filling of unintended gaps which should ideally be tackled by the interpolation macro). Thanks very much for your help and time - if you have any questions, please don't be afraid to get back in touch. Thanks again, Steve M -- smurray444 ------------------------------------------------------------------------ smurray444's Profile: http://www.excelforum.com/member.php...o&userid=28956 View this thread: http://www.excelforum.com/showthread...hreadid=506070 |
Automating Extrapolation
Check out:
TREND Show All Hide All Returns values along a linear trend. Fits a straight line (using the method of least squares) to the arrays known_y's and known_x's. Returns the y-values along that line for the array of new_x's that you specify. Syntax TREND(known_y's,known_x's,new_x's,const) -- | +-- Julian | "smurray444" wrote in message ... Dear all, I have a large dataset of global climate data, sourced from copious meteorological stations throughout the world. However, due to equipment failure (etc etc), the dataset is not entirely complete. I have a macro which will interpolate parameter values where there is a value either side of a blank cell (ie. a linear interpolation based on the surrounding values), yet this is unable to be performed on cells at the start or end of a time series (ie. the value for the first or last cell of a particular station). Just to clarify, a simplfied example is as follows: Station 1: A = GAP (=10) Station 1: B = 15 Station 1: C = 20 Station 1: D = 25 Station 1: E = 30 Station 1: F = 35 Station 1: G = GAP (=40) Station 2: A = GAP (=4) Station 2: B = 5 Station 2: C = 6 Station 2: D = 7 Station 2: E = 8 Station 2: F = 9 Station 2: G = GAP (=10) As shown above, for the inter-station gap between station 1 and 2, the values cannot be linearly interpolated based simply on the values surrounding the gap. I'm not a statistical/mathamatical expert, but I would guess that some form of regression formula could be used to fill the beginning and end values, based on the trend of the existing data for each individual station? Is there a way of designing (a macro in Visual Basic?) to automate this procedure in Excel 2002? - as doing it manually would be unfeasible due to the nature of the dataset size. Incidently, data which requires interpolation is in columns F to M (inclusive), and is represented as a blank cell; station reference is in column A in my dataset. Just a couple of other bits of information: extrapolation is required vertically down the columns, and not between rows. The time-step is consistant in all cases, and is on a yearly basis. If possible, would it be feasible to include a line (in a macro, say) that states that the operation should only be performed on blank cells at the beginning and end of the station data series (and not for any blanks in between, so as not to risk the filling of unintended gaps which should ideally be tackled by the interpolation macro). Thanks very much for your help and time - if you have any questions, please don't be afraid to get back in touch. Thanks again, Steve M -- smurray444 ------------------------------------------------------------------------ smurray444's Profile: http://www.excelforum.com/member.php...o&userid=28956 View this thread: http://www.excelforum.com/showthread...hreadid=506070 |
Automating Extrapolation
A single estimate ("imputation") of each missing value may simplify the
analysis, if all you are interested in is the equation coefficients. However, your approach will invalidate any tests of the model or estimates of its precision, since the analysis will be carried out under the assumption that you have more data (and more critical data at the ends) than you really do. Better approaches would include multiple imputation http://www.stat.psu.edu/~jls/mifaq.html (or any of a number of sites suggested by a Google search) or some kind of maximum likelihood analysis that takes account of the missing values. Do you have access to a statistician to help guide you through this literature? Jerry "smurray444" wrote: Dear all, I have a large dataset of global climate data, sourced from copious meteorological stations throughout the world. However, due to equipment failure (etc etc), the dataset is not entirely complete. I have a macro which will interpolate parameter values where there is a value either side of a blank cell (ie. a linear interpolation based on the surrounding values), yet this is unable to be performed on cells at the start or end of a time series (ie. the value for the first or last cell of a particular station). Just to clarify, a simplfied example is as follows: Station 1: A = GAP (=10) Station 1: B = 15 Station 1: C = 20 Station 1: D = 25 Station 1: E = 30 Station 1: F = 35 Station 1: G = GAP (=40) Station 2: A = GAP (=4) Station 2: B = 5 Station 2: C = 6 Station 2: D = 7 Station 2: E = 8 Station 2: F = 9 Station 2: G = GAP (=10) As shown above, for the inter-station gap between station 1 and 2, the values cannot be linearly interpolated based simply on the values surrounding the gap. I'm not a statistical/mathamatical expert, but I would guess that some form of regression formula could be used to fill the beginning and end values, based on the trend of the existing data for each individual station? Is there a way of designing (a macro in Visual Basic?) to automate this procedure in Excel 2002? - as doing it manually would be unfeasible due to the nature of the dataset size. Incidently, data which requires interpolation is in columns F to M (inclusive), and is represented as a blank cell; station reference is in column A in my dataset. Just a couple of other bits of information: extrapolation is required vertically down the columns, and not between rows. The time-step is consistant in all cases, and is on a yearly basis. If possible, would it be feasible to include a line (in a macro, say) that states that the operation should only be performed on blank cells at the beginning and end of the station data series (and not for any blanks in between, so as not to risk the filling of unintended gaps which should ideally be tackled by the interpolation macro). Thanks very much for your help and time - if you have any questions, please don't be afraid to get back in touch. Thanks again, Steve M -- smurray444 ------------------------------------------------------------------------ smurray444's Profile: http://www.excelforum.com/member.php...o&userid=28956 View this thread: http://www.excelforum.com/showthread...hreadid=506070 |
Automating Extrapolation
Thanks for your reply - I'm assuming that the "TREND(known_y's,known_x's,new_x's,const)" line is a formula to be copied down the column. However, I'm finding it difficult to implement successfully, and thus I believe a Visual Basic macro would (hopefully) do the trick. Does anyone have any suggested macros designed to tackle this? Thanks again, Steve Julian Wrote: Check out: TREND Show All Hide All Returns values along a linear trend. Fits a straight line (using the method of least squares) to the arrays known_y's and known_x's. Returns the y-values along that line for the array of new_x's that you specify. Syntax TREND(known_y's,known_x's,new_x's,const) -- | +-- Julian | -- smurray444 ------------------------------------------------------------------------ smurray444's Profile: http://www.excelforum.com/member.php...o&userid=28956 View this thread: http://www.excelforum.com/showthread...hreadid=506070 |
Automating Extrapolation
Thanks for your reply - unfortunately I do not have access to someone to guide me through the statistical literature. Advanced statistcal techniques isn't my strongest point by a long way - I was hoping that the solution would be a lot simpler then the suggested multiple imputation... ideally more along the lines of fitting a linear trend to the existing data in order to estimate (via extrapolation) any gaps at the beginning and/or end of a sequence. Would anyone be able to design a VB macro tailored to the example described in my original post? Thanks again, Steve Jerry W. Lewis Wrote: A single estimate ("imputation") of each missing value may simplify the analysis, if all you are interested in is the equation coefficients. However, your approach will invalidate any tests of the model or estimates of its precision, since the analysis will be carried out under the assumption that you have more data (and more critical data at the ends) than you really do. Better approaches would include multiple imputation http://www.stat.psu.edu/~jls/mifaq.html (or any of a number of sites suggested by a Google search) or some kind of maximum likelihood analysis that takes account of the missing values. Do you have access to a statistician to help guide you through this literature? Jerry -- smurray444 ------------------------------------------------------------------------ smurray444's Profile: http://www.excelforum.com/member.php...o&userid=28956 View this thread: http://www.excelforum.com/showthread...hreadid=506070 |
All times are GMT +1. The time now is 08:25 AM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com