![]() |
counting elements of two arrays
Hello
I have two arrays that contain data that I am using for correlations. The two arrays are dynamic (part of a larer data set) Each array will ahve a different number of missing values. The two arrays are a time series, the position in the array is related to specific time stamp. I need to be able to count the number of times there is a value in both arrays for each timestamp and report that value. This will help in determining the validity of the correlations. The data set is a workbook with 20 sheets each contianing 200 columns and 8700 rows. The data is arranged in columns for variables and the rows represent an individual timestamp. I am currently selecting one column from the workbook and then looping through the workbook to find columns that have significant correlations. The problem is that I am getting high correlations because excel does a pairwise deletion and so if there are two values that are common and both change positively then I get a high correlation but the value of the correlation in interpreting the data is a waste. as most the values are being ignored because they are not common. I am stuck at how to start the process. Coudl some one please suggest a place to start. Kind regards Geoff |
counting elements of two arrays
What math function(s) are you using to get your correlation. How many bit
accuracy are you look at. "Geoff" wrote: Hello I have two arrays that contain data that I am using for correlations. The two arrays are dynamic (part of a larer data set) Each array will ahve a different number of missing values. The two arrays are a time series, the position in the array is related to specific time stamp. I need to be able to count the number of times there is a value in both arrays for each timestamp and report that value. This will help in determining the validity of the correlations. The data set is a workbook with 20 sheets each contianing 200 columns and 8700 rows. The data is arranged in columns for variables and the rows represent an individual timestamp. I am currently selecting one column from the workbook and then looping through the workbook to find columns that have significant correlations. The problem is that I am getting high correlations because excel does a pairwise deletion and so if there are two values that are common and both change positively then I get a high correlation but the value of the correlation in interpreting the data is a waste. as most the values are being ignored because they are not common. I am stuck at how to start the process. Coudl some one please suggest a place to start. Kind regards Geoff |
counting elements of two arrays
I am using the intenal correlation function within excel. I am not
looking for an incredible accuracy. What I need to know is how many common elements are in the two arrays that I am producing the correlation for. I think it is a Pearson function but that is entirely speculative :-) Tha data is froma real world data collection system for a paper machine. There are large gaps in the data after removing text and erroneous data. For the data that is left I need some idea of what correlates with key parameters and am now realising that I need to know if I can believe the correlation. I thought a simple way of doing this would be to check that there is at least 100 or more values that are common to the two arrays. 100 may not be the absolute number but it can be a starting point. I am hoping there is a simple way to do this. I could just go through the two arrays and check if there is a value for each row in the array and once it reaches 100 check the correlation. This would be very slow for a macro that has such a large body of data to check. Regards, Geoff On Aug 21, 4:40 pm, Joel wrote: What math function(s) are you using to get your correlation. How many bit accuracy are you look at. |
counting elements of two arrays
The Correlation function will give you a 1 when the two columns are exacttly
the same. When they don't match, the results closestt to one (absolute) is the best results. Correlattion function in Excel ignores empty cells. If you first put a value like 0 into the empty cells then you would get better results. I think the best result would be to use correlation but when a cell is empty place the same value in both columns. This way correlation will count these cells You should go to the excel worksheet and from Inset Menu - Function - Correl. the select help (lowerr left corner of window). This will show you the calculation that is made for Correl. The formula basical is calculatting the difference between the cells in the two columns, and then dividing the product of the squares. It ignores blank cells in the calculation. Putting the same value in both columns will not chbange the results of the Numerator of the function, but does change the value of the denominator. I don't think it will take a macro very long to automatically put the same value in both columns. iff you tell me how your columns are aranged I will write the macro. It take me only 5 minutes. Are your columns A & B, the D & E leavving one empty column between pairs, or some other arrangement. "Geoff" wrote: I am using the intenal correlation function within excel. I am not looking for an incredible accuracy. What I need to know is how many common elements are in the two arrays that I am producing the correlation for. I think it is a Pearson function but that is entirely speculative :-) Tha data is froma real world data collection system for a paper machine. There are large gaps in the data after removing text and erroneous data. For the data that is left I need some idea of what correlates with key parameters and am now realising that I need to know if I can believe the correlation. I thought a simple way of doing this would be to check that there is at least 100 or more values that are common to the two arrays. 100 may not be the absolute number but it can be a starting point. I am hoping there is a simple way to do this. I could just go through the two arrays and check if there is a value for each row in the array and once it reaches 100 check the correlation. This would be very slow for a macro that has such a large body of data to check. Regards, Geoff On Aug 21, 4:40 pm, Joel wrote: What math function(s) are you using to get your correlation. How many bit accuracy are you look at. |
counting elements of two arrays
Hello Joel
I do not want to put in a value in the empty cells. I don't want to force a correlation I want to ignore columns that have correlations based on only a few values. I have 20 sheets each with 200 columns and 8700 rows. I want to find out which columns have a real correlation with a particular columns of interest. Here is the macro that I am currently using. The tagname is allows me to choose from the data the column that I want to find out what correlates so that I can then have acloser look at the data to see what is the value of the correlation. Some columns will have 6000+ data points and others will ahve 100. But if only 10 of the data points overlap with the 6000 then any correlation is not significant as there is insufficiant data for my purposes. To force the correlation to happen will not help me identify which items to look at more closely. I want to identify which columns contain more than say 100 common data points. Thank you for your help. Sub CorrelSearch() Dim startrow As Integer, endrow As Integer, startcol As Integer, endcol As Integer Dim r As Long, c As Long Dim StartTime As Double Dim myArray() As Variant, rsltarray() As Variant Dim myRange As Range Dim clcell As Long Dim aSheet As Object Dim dupval As Variant, rsltcount As Variant Dim colarray() As Variant, corelarray() As Variant Dim colave As Double, colsd As Double Dim testval As Double Dim tagname As Variant Dim srchrange As Variant Dim corelval As Double Dim coreltest As Double Dim rsltrange As Range StartTime = Timer ' find from lookup table tag name tagname = InputBox(prompt:="Enter the tagname", Title:="Correlation selection") coreltest = InputBox(prompt:="Enter the correlation value") ' copy selected tag data to an array For Each aSheet In ActiveWorkbook.Sheets Worksheets(aSheet.Name).Select Set srchrange = Range(Cells(1, 1), Cells(1, 256)) ' srchrange = Range(Selection) If WorksheetFunction.CountIf(srchrange, tagname) = 1 Then Cells.Find(What:=tagname, LookIn:=xlFormulas, LookAt:=xlPart).Activate corelarray = Range(Cells(7, ActiveCell.Column), Cells(ActiveSheet.Cells.SpecialCells(xlLastCell).R ow, ActiveCell.Column)) End If Next aSheet ' cycle through all pages in workbook and check correlation rsltcount = 0 ReDim rsltarray(2000, 8) For Each aSheet In ActiveWorkbook.Sheets Application.ScreenUpdating = True Worksheets(aSheet.Name).Select Application.ScreenUpdating = False Set myRange = Range("B7", ActiveCell.SpecialCells(xlLastCell)) startcol = myRange.Column endcol = myRange.Columns(myRange.Columns.Count).Column startrow = myRange.Row endrow = myRange.Rows(myRange.Rows.Count).Row ReDim myArray(startrow To endrow, startcol To endcol) myArray() = myRange For c = 1 To endcol - startcol + 1 Application.StatusBar = ActiveSheet.Name & ", column " & c colarray = Application.Index(myArray, 0, c) On Error Resume Next If ArrayCount(colarray) ArrayCount(corelarray) * 0.1 Then corelval = WorksheetFunction.Correl(corelarray, colarray) If Abs(corelval) coreltest Then colave = WorksheetFunction.Average(colarray) colsd = WorksheetFunction.StDev(colarray) rsltcount = rsltcount + 1 rsltarray(rsltcount, 1) = Cells(1, c + 1) rsltarray(rsltcount, 2) = Cells(2, c + 1) rsltarray(rsltcount, 3) = colave rsltarray(rsltcount, 4) = colsd rsltarray(rsltcount, 5) = colsd / colave * 100 rsltarray(rsltcount, 6) = corelval rsltarray(rsltcount, 7) = corelval * corelval rsltarray(rsltcount, 8) = WorksheetFunction.Count(colarray) End If End If Next c Next aSheet Sheets.Add after:=ActiveSheet Set rsltrange = Range(Cells(1, 1), Cells(rsltcount, 8)) rsltrange.Value = rsltarray ActiveSheet.Name = tagname ' Display elapsed time Application.ScreenUpdating = True MsgBox Format(Timer - StartTime, "00.00") & " seconds" MsgBox rsltcount End Sub On Aug 23, 1:00 pm, Joel wrote: The Correlation function will give you a 1 when the two columns are exacttly the same. When they don't match, the results closestt to one (absolute) is the best results. Correlattion function in Excel ignores empty cells. If you first put a value like 0 into the empty cells then you would get better results. I think the best result would be to use correlation but when a cell is empty place the same value in both columns. This way correlation will count these cells You should go to the excel worksheet and from Inset Menu - Function - Correl. the select help (lowerr left corner of window). This will show you the calculation that is made for Correl. The formula basical is calculatting the difference between the cells in the two columns, and then dividing the product of the squares. It ignores blank cells in the calculation. Putting the same value in both columns will not chbange the results of the Numerator of the function, but does change the value of the denominator. I don't think it will take a macro very long to automatically put the same value in both columns. iff you tell me how your columns are aranged I will write the macro. It take me only 5 minutes. Are your columns A & B, the D & E leavving one empty column between pairs, or some other arrangement. "Geoff" wrote: I am using the intenal correlation function within excel. I am not looking for an incredible accuracy. What I need to know is how many common elements are in the two arrays that I am producing the correlation for. I think it is a Pearson function but that is entirely speculative :-) Tha data is froma real world data collection system for a paper machine. There are large gaps in the data after removing text and erroneous data. For the data that is left I need some idea of what correlates with key parameters and am now realising that I need to know if I can believe the correlation. I thought a simple way of doing this would be to check that there is at least 100 or more values that are common to the two arrays. 100 may not be the absolute number but it can be a starting point. I am hoping there is a simple way to do this. I could just go through the two arrays and check if there is a value for each row in the array and once it reaches 100 check the correlation. This would be very slow for a macro that has such a large body of data to check. Regards, Geoff On Aug 21, 4:40 pm, Joel wrote: What math function(s) are you using to get your correlation. How many bit accuracy are you look at. |
counting elements of two arrays
If you want to try using countif 0 which will get the non-blank dates it
requires only one line change. This method doesn't look at bad data, just cares if there is data or is'nt data. chage from: corelval = WorksheetFunction.Correl(corelarray, colarray) change to: corelval = WorksheetFunction.Countif(corelarray, "0") If ArrayCount(colarray) ArrayCount(corelarray) * 0.1 Then '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''''''''''''''''''''''''''''' corelval = WorksheetFunction.Correl(corelarray, colarray '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''''''''''''''''''''''''''''''''''''' ''''''''''''''''''''''''''''''''''''''''''' If Abs(corelval) coreltest Then colave = WorksheetFunction.Average(colarray) colsd = WorksheetFunction.StDev(colarray) rsltcount = rsltcount + 1 rsltarray(rsltcount, 1) = Cells(1, c + 1) rsltarray(rsltcount, 2) = Cells(2, c + 1) rsltarray(rsltcount, 3) = colave rsltarray(rsltcount, 4) = colsd rsltarray(rsltcount, 5) = colsd / colave * 100 rsltarray(rsltcount, 6) = corelval rsltarray(rsltcount, 7) = corelval * corelval rsltarray(rsltcount, 8) = WorksheetFunction.Count(colarray) End If End If Next c "Geoff" wrote: Hello Joel I do not want to put in a value in the empty cells. I don't want to force a correlation I want to ignore columns that have correlations based on only a few values. I have 20 sheets each with 200 columns and 8700 rows. I want to find out which columns have a real correlation with a particular columns of interest. Here is the macro that I am currently using. The tagname is allows me to choose from the data the column that I want to find out what correlates so that I can then have acloser look at the data to see what is the value of the correlation. Some columns will have 6000+ data points and others will ahve 100. But if only 10 of the data points overlap with the 6000 then any correlation is not significant as there is insufficiant data for my purposes. To force the correlation to happen will not help me identify which items to look at more closely. I want to identify which columns contain more than say 100 common data points. Thank you for your help. Sub CorrelSearch() Dim startrow As Integer, endrow As Integer, startcol As Integer, endcol As Integer Dim r As Long, c As Long Dim StartTime As Double Dim myArray() As Variant, rsltarray() As Variant Dim myRange As Range Dim clcell As Long Dim aSheet As Object Dim dupval As Variant, rsltcount As Variant Dim colarray() As Variant, corelarray() As Variant Dim colave As Double, colsd As Double Dim testval As Double Dim tagname As Variant Dim srchrange As Variant Dim corelval As Double Dim coreltest As Double Dim rsltrange As Range StartTime = Timer ' find from lookup table tag name tagname = InputBox(prompt:="Enter the tagname", Title:="Correlation selection") coreltest = InputBox(prompt:="Enter the correlation value") ' copy selected tag data to an array For Each aSheet In ActiveWorkbook.Sheets Worksheets(aSheet.Name).Select Set srchrange = Range(Cells(1, 1), Cells(1, 256)) ' srchrange = Range(Selection) If WorksheetFunction.CountIf(srchrange, tagname) = 1 Then Cells.Find(What:=tagname, LookIn:=xlFormulas, LookAt:=xlPart).Activate corelarray = Range(Cells(7, ActiveCell.Column), Cells(ActiveSheet.Cells.SpecialCells(xlLastCell).R ow, ActiveCell.Column)) End If Next aSheet ' cycle through all pages in workbook and check correlation rsltcount = 0 ReDim rsltarray(2000, 8) For Each aSheet In ActiveWorkbook.Sheets Application.ScreenUpdating = True Worksheets(aSheet.Name).Select Application.ScreenUpdating = False Set myRange = Range("B7", ActiveCell.SpecialCells(xlLastCell)) startcol = myRange.Column endcol = myRange.Columns(myRange.Columns.Count).Column startrow = myRange.Row endrow = myRange.Rows(myRange.Rows.Count).Row ReDim myArray(startrow To endrow, startcol To endcol) myArray() = myRange For c = 1 To endcol - startcol + 1 Application.StatusBar = ActiveSheet.Name & ", column " & c colarray = Application.Index(myArray, 0, c) On Error Resume Next If ArrayCount(colarray) ArrayCount(corelarray) * 0.1 Then corelval = WorksheetFunction.Correl(corelarray, colarray) If Abs(corelval) coreltest Then colave = WorksheetFunction.Average(colarray) colsd = WorksheetFunction.StDev(colarray) rsltcount = rsltcount + 1 rsltarray(rsltcount, 1) = Cells(1, c + 1) rsltarray(rsltcount, 2) = Cells(2, c + 1) rsltarray(rsltcount, 3) = colave rsltarray(rsltcount, 4) = colsd rsltarray(rsltcount, 5) = colsd / colave * 100 rsltarray(rsltcount, 6) = corelval rsltarray(rsltcount, 7) = corelval * corelval rsltarray(rsltcount, 8) = WorksheetFunction.Count(colarray) End If End If Next c Next aSheet Sheets.Add after:=ActiveSheet Set rsltrange = Range(Cells(1, 1), Cells(rsltcount, 8)) rsltrange.Value = rsltarray ActiveSheet.Name = tagname ' Display elapsed time Application.ScreenUpdating = True MsgBox Format(Timer - StartTime, "00.00") & " seconds" MsgBox rsltcount End Sub On Aug 23, 1:00 pm, Joel wrote: The Correlation function will give you a 1 when the two columns are exacttly the same. When they don't match, the results closestt to one (absolute) is the best results. Correlattion function in Excel ignores empty cells. If you first put a value like 0 into the empty cells then you would get better results. I think the best result would be to use correlation but when a cell is empty place the same value in both columns. This way correlation will count these cells You should go to the excel worksheet and from Inset Menu - Function - Correl. the select help (lowerr left corner of window). This will show you the calculation that is made for Correl. The formula basical is calculatting the difference between the cells in the two columns, and then dividing the product of the squares. It ignores blank cells in the calculation. Putting the same value in both columns will not chbange the results of the Numerator of the function, but does change the value of the denominator. I don't think it will take a macro very long to automatically put the same value in both columns. iff you tell me how your columns are aranged I will write the macro. It take me only 5 minutes. Are your columns A & B, the D & E leavving one empty column between pairs, or some other arrangement. "Geoff" wrote: I am using the intenal correlation function within excel. I am not looking for an incredible accuracy. What I need to know is how many common elements are in the two arrays that I am producing the correlation for. I think it is a Pearson function but that is entirely speculative :-) Tha data is froma real world data collection system for a paper machine. There are large gaps in the data after removing text and erroneous data. For the data that is left I need some idea of what correlates with key parameters and am now realising that I need to know if I can believe the correlation. I thought a simple way of doing this would be to check that there is at least 100 or more values that are common to the two arrays. 100 may not be the absolute number but it can be a starting point. I am hoping there is a simple way to do this. I could just go through the two arrays and check if there is a value for each row in the array and once it reaches 100 check the correlation. This would be very slow for a macro that has such a large body of data to check. Regards, Geoff On Aug 21, 4:40 pm, Joel wrote: What math function(s) are you using to get your correlation. How many bit accuracy are you look at. |
counting elements of two arrays
Thank you Joel I will try that.
Kind regards Geoff On Aug 23, 2:52 pm, Joel wrote: If you want to try using countif 0 which will get the non-blank dates it requires only one line change. This method doesn't look at bad data, just cares if there is data or is'nt data. chage from: corelval = WorksheetFunction.Correl(corelarray, colarray) change to: corelval = WorksheetFunction.Countif(corelarray, "0") If ArrayCount(colarray) ArrayCount(corelarray) * 0.1 Then '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''''''''''''*'''''''''''''''''''''''' ''''''''''''''''''''''''''''''''''''''''''' corelval = WorksheetFunction.Correl(corelarray, colarray) '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''''''''''''*'''''''''''''''''''''''' '''''''''''''''''''''''''''''''''''''''''''' If Abs(corelval) coreltest Then colave = WorksheetFunction.Average(colarray) colsd = WorksheetFunction.StDev(colarray) rsltcount = rsltcount + 1 rsltarray(rsltcount, 1) = Cells(1, c + 1) rsltarray(rsltcount, 2) = Cells(2, c + 1) rsltarray(rsltcount, 3) = colave rsltarray(rsltcount, 4) = colsd rsltarray(rsltcount, 5) = colsd / colave * 100 rsltarray(rsltcount, 6) = corelval rsltarray(rsltcount, 7) = corelval * corelval rsltarray(rsltcount, 8) = WorksheetFunction.Count(colarray) End If End If Next c |
All times are GMT +1. The time now is 02:23 PM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com