Home |
Search |
Today's Posts |
#1
|
|||
|
|||
Frequency for Histograms in Excel
I am using Excel 2002. There is a problem with the frequency function, which
is used for histograms. The frequency counts for some (not all) bins is incorrect. The case that I learned of in another posting involves 101 data values: 0, 10, and values in between that are created by using an increment of 0.1. The frequency value for the first bin is (correctly) 11, and most of the other bins have the correct frequency (10). However, the bin representing the interval from 1 to 2 has a frequency of 9, which the bin representing the interval from 4 to 5 has a frequency of 11! I have also used the "Better Histrogram" downloadable file for this same problem, but apparently the "Better Histogram" procedure starts with the Frequency function, as it ends up with a histogram with the same results. Any suggestions for correctly calculating all bin frequencies? -- Jim |
#2
|
|||
|
|||
In my initial question/posting, I forgot to add that the bins used for my
example data set a 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10. "Jim" wrote: I am using Excel 2002. There is a problem with the frequency function, which is used for histograms. The frequency counts for some (not all) bins is incorrect. The case that I learned of in another posting involves 101 data values: 0, 10, and values in between that are created by using an increment of 0.1. The frequency value for the first bin is (correctly) 11, and most of the other bins have the correct frequency (10). However, the bin representing the interval from 1 to 2 has a frequency of 9, which the bin representing the interval from 4 to 5 has a frequency of 11! I have also used the "Better Histrogram" downloadable file for this same problem, but apparently the "Better Histogram" procedure starts with the Frequency function, as it ends up with a histogram with the same results. Any suggestions for correctly calculating all bin frequencies? -- Jim |
#4
|
|||
|
|||
Perhaps I was not clear in my original posting, but the data is the following
set of 101 data elements: {0.0, 0.1, 0.2, 0.3, ..., 0.9, 1.0, 1.1, ... 1.9, 2.0, 2.1, ...,2.9, 3.0, 3.1, ...3.9, 4.0, 4.1, ... 4.9, 5.0, 5.1,...5.9, 6.0, 6.1, ...6.9, 7.0, 7.1,...7.9, 8.0, 8.1,...8.9, 9.0, 9.1,... 9.9, 10.0} and the bins are {1,2,3,4,5,6,7,8,9,10}. Midpoints is not relevant to the problem. The problem is that theoretically, the calculated bin frequencies (beginning with bin 1) should be {11,10,10,10,10,10,10,10,10,10}; instead, these calculated bin frequencies are {11,9,10,10, 11, 10,10,10,10,10} "Tushar Mehta" wrote: Since you don't share the raw data, it is hard to know what the frequency function is doing wrong. However, it, by *definition* considers the bin value as the upper bound for the bin and not the mid- point. So, if that is the problem you allude to, it is an XL design feature. If it is something else, maybe you can share the data for 1- to-5 so that it becomes easier to understand how the FREQUENCY function is getting it wrong. -- Regards, Tushar Mehta www.tushar-mehta.com Excel, PowerPoint, and VBA add-ins, tutorials Custom MS Office productivity solutions In article , says... I am using Excel 2002. There is a problem with the frequency function, which is used for histograms. The frequency counts for some (not all) bins is incorrect. The case that I learned of in another posting involves 101 data values: 0, 10, and values in between that are created by using an increment of 0.1. The frequency value for the first bin is (correctly) 11, and most of the other bins have the correct frequency (10). However, the bin representing the interval from 1 to 2 has a frequency of 9, which the bin representing the interval from 4 to 5 has a frequency of 11! I have also used the "Better Histrogram" downloadable file for this same problem, but apparently the "Better Histogram" procedure starts with the Frequency function, as it ends up with a histogram with the same results. Any suggestions for correctly calculating all bin frequencies? |
#5
|
|||
|
|||
My best guess (and it is only a guess) is that the problem occurs
because of floating point limitations and is exacerbated by something in the FREQUENCY function. I did the following tests: Generated 101 numbers in A1:A101 between 0 and 10. Entered 0 in A1, =A1+0.1 in A2 and copied A2 down as needed. In C1 entered the formula =A1+1. C2 contained =C1+1 and copied C2 to C3:C10. D1 contained the array formula =FREQUENCY(A1:A101,C1:C10). E1 contained the formula =COUNTIF($A$1:$A$101,"<="&C1)-COUNTIF($A$1:$A $101,"<="&(C1-1)). E1 was copied down to E2:E10. When A1 contains a zero, the COUNTIF formulas returned the correct results, whereas FREQUENCY returned the error you found. Changing A1 to 60 caused the FREQUENCY function to report 10 in the 61 bin and 11 in the 65 bin. But, now COUNTIF function reported 9 in the 64 bin and 11 in the 65 bin. Setting the value in A1 to different starting values results in a shifting error pattern in the results of the 2 functions. However, change the A column so that the numbers are generated with the formula =$A$1+0.1*(ROW()-1) and all the results from both FREQUENCY and the COUNTIF are always correct. -- Regards, Tushar Mehta www.tushar-mehta.com Excel, PowerPoint, and VBA add-ins, tutorials Custom MS Office productivity solutions In article , says... Perhaps I was not clear in my original posting, but the data is the following set of 101 data elements: {0.0, 0.1, 0.2, 0.3, ..., 0.9, 1.0, 1.1, ... 1.9, 2.0, 2.1, ...,2.9, 3.0, 3.1, ...3.9, 4.0, 4.1, ... 4.9, 5.0, 5.1,...5.9, 6.0, 6.1, ...6.9, 7.0, 7.1,...7.9, 8.0, 8.1,...8.9, 9.0, 9.1,... 9.9, 10.0} and the bins are {1,2,3,4,5,6,7,8,9,10}. Midpoints is not relevant to the problem. The problem is that theoretically, the calculated bin frequencies (beginning with bin 1) should be {11,10,10,10,10,10,10,10,10,10}; instead, these calculated bin frequencies are {11,9,10,10, 11, 10,10,10,10,10} "Tushar Mehta" wrote: Since you don't share the raw data, it is hard to know what the frequency function is doing wrong. However, it, by *definition* considers the bin value as the upper bound for the bin and not the mid- point. So, if that is the problem you allude to, it is an XL design feature. If it is something else, maybe you can share the data for 1- to-5 so that it becomes easier to understand how the FREQUENCY function is getting it wrong. -- Regards, Tushar Mehta www.tushar-mehta.com Excel, PowerPoint, and VBA add-ins, tutorials Custom MS Office productivity solutions In article , says... I am using Excel 2002. There is a problem with the frequency function, which is used for histograms. The frequency counts for some (not all) bins is incorrect. The case that I learned of in another posting involves 101 data values: 0, 10, and values in between that are created by using an increment of 0.1. The frequency value for the first bin is (correctly) 11, and most of the other bins have the correct frequency (10). However, the bin representing the interval from 1 to 2 has a frequency of 9, which the bin representing the interval from 4 to 5 has a frequency of 11! I have also used the "Better Histrogram" downloadable file for this same problem, but apparently the "Better Histogram" procedure starts with the Frequency function, as it ends up with a histogram with the same results. Any suggestions for correctly calculating all bin frequencies? |
#6
|
|||
|
|||
If your 101 data elements were the values that you claim, then the bin
frequencies would be the values that you expect. Instead, I suspect that your data elements were created by successively adding 0.1 to the previous value. That algorithm will not produce exactly the values that you claim, because computers do binary math, and ..1 has no exact binary representation. The result of accumulating these approximations are that instad of 9.9, you get 9.89999999999998, etc. In A1:A101, put the formula =(ROW()-1)/10 =FREQUENCY(B$1:B$101,{1,2,3,4,5,6,7,8,9,10}) returns {11,10,10,10,10,10,10,10,10,10} In B1 put 0, in B2 put =B1+0.1, and copy the B2 formula and paste over B3:B101 =FREQUENCY(B$1:B$101,{1,2,3,4,5,6,7,8,9,10}) returns {11,9,10,10,11,10,10,10,10,10} The difference is that the formula in A1:A101 avoids accumulating approximations. Jerry Jim wrote: Perhaps I was not clear in my original posting, but the data is the following set of 101 data elements: {0.0, 0.1, 0.2, 0.3, ..., 0.9, 1.0, 1.1, ... 1.9, 2.0, 2.1, ...,2.9, 3.0, 3.1, ...3.9, 4.0, 4.1, ... 4.9, 5.0, 5.1,...5.9, 6.0, 6.1, ...6.9, 7.0, 7.1,...7.9, 8.0, 8.1,...8.9, 9.0, 9.1,... 9.9, 10.0} and the bins are {1,2,3,4,5,6,7,8,9,10}. Midpoints is not relevant to the problem. The problem is that theoretically, the calculated bin frequencies (beginning with bin 1) should be {11,10,10,10,10,10,10,10,10,10}; instead, these calculated bin frequencies are {11,9,10,10, 11, 10,10,10,10,10} "Tushar Mehta" wrote: Since you don't share the raw data, it is hard to know what the frequency function is doing wrong. However, it, by *definition* considers the bin value as the upper bound for the bin and not the mid- point. So, if that is the problem you allude to, it is an XL design feature. If it is something else, maybe you can share the data for 1- to-5 so that it becomes easier to understand how the FREQUENCY function is getting it wrong. -- Regards, Tushar Mehta www.tushar-mehta.com Excel, PowerPoint, and VBA add-ins, tutorials Custom MS Office productivity solutions In article , says... I am using Excel 2002. There is a problem with the frequency function, which is used for histograms. The frequency counts for some (not all) bins is incorrect. The case that I learned of in another posting involves 101 data values: 0, 10, and values in between that are created by using an increment of 0.1. The frequency value for the first bin is (correctly) 11, and most of the other bins have the correct frequency (10). However, the bin representing the interval from 1 to 2 has a frequency of 9, which the bin representing the interval from 4 to 5 has a frequency of 11! I have also used the "Better Histrogram" downloadable file for this same problem, but apparently the "Better Histogram" procedure starts with the Frequency function, as it ends up with a histogram with the same results. Any suggestions for correctly calculating all bin frequencies? |
#7
|
|||
|
|||
Alternately, you could use
=ROUND(B1+0.1,1) etc. to avoid accumulating binary approximations. Jerry Jerry W. Lewis wrote: If your 101 data elements were the values that you claim, then the bin frequencies would be the values that you expect. Instead, I suspect that your data elements were created by successively adding 0.1 to the previous value. That algorithm will not produce exactly the values that you claim, because computers do binary math, and .1 has no exact binary representation. The result of accumulating these approximations are that instad of 9.9, you get 9.89999999999998, etc. In A1:A101, put the formula =(ROW()-1)/10 =FREQUENCY(B$1:B$101,{1,2,3,4,5,6,7,8,9,10}) returns {11,10,10,10,10,10,10,10,10,10} In B1 put 0, in B2 put =B1+0.1, and copy the B2 formula and paste over B3:B101 =FREQUENCY(B$1:B$101,{1,2,3,4,5,6,7,8,9,10}) returns {11,9,10,10,11,10,10,10,10,10} The difference is that the formula in A1:A101 avoids accumulating approximations. Jerry Jim wrote: Perhaps I was not clear in my original posting, but the data is the following set of 101 data elements: {0.0, 0.1, 0.2, 0.3, ..., 0.9, 1.0, 1.1, ... 1.9, 2.0, 2.1, ...,2.9, 3.0, 3.1, ...3.9, 4.0, 4.1, ... 4.9, 5.0, 5.1,...5.9, 6.0, 6.1, ...6.9, 7.0, 7.1,...7.9, 8.0, 8.1,...8.9, 9.0, 9.1,... 9.9, 10.0} and the bins are {1,2,3,4,5,6,7,8,9,10}. Midpoints is not relevant to the problem. The problem is that theoretically, the calculated bin frequencies (beginning with bin 1) should be {11,10,10,10,10,10,10,10,10,10}; instead, these calculated bin frequencies are {11,9,10,10, 11, 10,10,10,10,10} "Tushar Mehta" wrote: Since you don't share the raw data, it is hard to know what the frequency function is doing wrong. However, it, by *definition* considers the bin value as the upper bound for the bin and not the mid- point. So, if that is the problem you allude to, it is an XL design feature. If it is something else, maybe you can share the data for 1- to-5 so that it becomes easier to understand how the FREQUENCY function is getting it wrong. -- Regards, Tushar Mehta www.tushar-mehta.com Excel, PowerPoint, and VBA add-ins, tutorials Custom MS Office productivity solutions In article , says... I am using Excel 2002. There is a problem with the frequency function, which is used for histograms. The frequency counts for some (not all) bins is incorrect. The case that I learned of in another posting involves 101 data values: 0, 10, and values in between that are created by using an increment of 0.1. The frequency value for the first bin is (correctly) 11, and most of the other bins have the correct frequency (10). However, the bin representing the interval from 1 to 2 has a frequency of 9, which the bin representing the interval from 4 to 5 has a frequency of 11! I have also used the "Better Histrogram" downloadable file for this same problem, but apparently the "Better Histogram" procedure starts with the Frequency function, as it ends up with a histogram with the same results. Any suggestions for correctly calculating all bin frequencies? |
#8
|
|||
|
|||
Thanks! Tushar Mehta initially suggested that the problem was perhaps caused
by the internal representation of decimal numbers, and Jerry W. Lewis specifically suggested that it was my method of creating data (successive increments of 0.1), rather than the Frequency function itself, that caused the problem that I noted in my initial posting. Now I no longer view the Frequency function with skepticism, because I know that I alone was the culprit. One suggestion: My original concern resulted from Googling "histogram" + "Excel" and reading some earlier email correspondence regarding the Excel Frequency function and the data set (0, 0.1, 0.2, ..., 9.9, 10.0). Others could easily run across this or similar electronic discussions and might not be as lucky as I was in finding a comforting answer. Is there perhaps a communication mechanism that Microsoft could use to correct any similar misunderstandings, in the community of Excel users, regarding the Frequency function? Just a thought! "Jim" wrote: I am using Excel 2002. There is a problem with the frequency function, which is used for histograms. The frequency counts for some (not all) bins is incorrect. The case that I learned of in another posting involves 101 data values: 0, 10, and values in between that are created by using an increment of 0.1. The frequency value for the first bin is (correctly) 11, and most of the other bins have the correct frequency (10). However, the bin representing the interval from 1 to 2 has a frequency of 9, which the bin representing the interval from 4 to 5 has a frequency of 11! I have also used the "Better Histrogram" downloadable file for this same problem, but apparently the "Better Histogram" procedure starts with the Frequency function, as it ends up with a histogram with the same results. Any suggestions for correctly calculating all bin frequencies? -- Jim |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Excel 2003 FAILS, but Excel 2000 SUCCEEDS ??? | Excel Discussion (Misc queries) | |||
Difference in number of Excel NewsGroups | Excel Discussion (Misc queries) | |||
sharing/using/saving Excel 2002 files in Excel 2003 | Excel Discussion (Misc queries) | |||
Displaying frequency polygons in Excel (Statistics) | Charts and Charting in Excel | |||
Excel 2002 and 2000 co-install. Control Which Starts ? | Excel Discussion (Misc queries) |