Home |
Search |
Today's Posts |
#1
|
|||
|
|||
Histograms: how change number of bins
This is a question about the Histogram Data Analysis tool.
By default, Excel seems to apply some (unclear) heuristic for determining the number of bins. Is there an easy way for me to force the number of bins to be N? The operative word is "easy". I know that the tool allows me to specify a "bin range", a range of cells that specify the lower bound of each bin. But that seems "hard" for me use because the range of data cells change, based on the number of data points I want to include in the histogram. I might (please confirm) be able to assign an array name to a data cell range, then constuct an N-cell bin range that uses the min() and max() of the array name. By reassigning the array name to different data cell ranges, the bin range values should change automatically. But it would be "easier" if the histogram tool allowed me to simply specify the number of bins, and the tool computed the lower bound based on the data range, as I would. I am looking for something that "easy". Since the need seems so "obvious" and "common", I suspect I am simply failing to see how to do it. |
#2
|
|||
|
|||
joeu2004 -
This is a question about the Histogram Data Analysis tool. By default, Excel seems to apply some (unclear) heuristic for determining the number of bins. < Seems to be approximately from MIN(data) to MAX(data) with INT(SQRT(n)) bins. Is there an easy way for me to force the number of bins to be N? The operative word is "easy". < I don't think so. I think you need to use your judgment. I know that the tool allows me to specify a "bin range", a range of cells that specify the lower bound of each bin. < No, I think each bin value is the inclusive upper bound of a value range. But that seems "hard" for me use because the range of data cells change, based on the number of data points I want to include in the histogram. I might (please confirm) be able to assign an array name to a data cell range, then constuct an N-cell bin range that uses the min() and max() of the array name. By reassigning the array name to different data cell ranges, the bin range values should change automatically. But it would be "easier" if the histogram tool allowed me to simply specify the number of bins, and the tool computed the lower bound based on the data range, as I would. I am looking for something that "easy". Since the need seems so "obvious" and "common", I suspect I am simply failing to see how to do it. < I don't think it's "easy" at all. As I said, I think you need to use your judgment. Most people find histograms easier to interpret if the bin values are multiples of 2, 4, or 5. And most people prefer to have more bins when the number of values in the data set is larger. An algorithm that incorporates "nice" intervals is not trivial. You might be interested in looking at the "Better Histogram" page at www.treeplan.com to see a solution to a related histogram issue. - Mike www.mikemiddleton.com |
#3
|
|||
|
|||
Michael R Middleton wrote:
[joeu2004 wrote:] Is there an easy way for me to force the number of bins to be N? The operative word is "easy". I don't think so. I think you need to use your judgment. Sorry to hear that. I know that the tool allows me to specify a "bin range", a range of cells that specify the lower bound of each bin. No, I think each bin value is the inclusive upper bound of a value range. You are probably right. When I reread the Histogram Help description, I realize that it is not written in my native language, English. First, it says: "Microsoft Excel counts the number of data points between the current bin number and the adjoining higher bin, if any." I think that supports my interpretation. It would not make sense to count a number in one bin, if the bin boundary value is the upper bound for the bin and the number is "between the current [upper] bin [value] and the [upper bin value of] the adjoining higher bin". But in fact, then it says: "A number is counted in a particular bin if it is equal to or less than the bin number down to the last bin." I think the English translation is: "A number is counted in the highest-valued bin for which the number is less than or equal to the bin's boundary value." I had trouble gronking "DOWN to the LAST bin", especially since the following sentance says: "All values below the first bin value are counted together, as are the values above the last bin value." If the bin values are in "ascending order", I think "down" would be in the direction of descending values. Yet, "last bin" clearly refers to the highest-valued bin. Oh well, thanks so much for clarifying. You might be interested in looking at the "Better Histogram" page at www.treeplan.com to see a solution to a related histogram issue. Interesting. The Excel's banal histogram chart never bothered me because, frankly, I did not know about it. I always chart the data that is generated by the Histogram tool. In fact, I choose an XY Scatter diagram. I do wish that the X-axis values were the midpoint of the bins. Of course, it is "easy" enough for me to add a column to compute that. Thanks again for confirming my suspicion that there is no easy way to control the number of bins. |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
How do I change a large number of cells from a postive to a neget. | Excel Discussion (Misc queries) | |||
How do you change the date to a Number | Excel Discussion (Misc queries) | |||
How do i change numbers in text format to number format? | New Users to Excel | |||
Change number of columns mid page? | Excel Worksheet Functions | |||
EZ Q 4 U: How do I change a number to text, based on the number | Excel Worksheet Functions |