Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to sample data without returning duplicates?
I have installed the analysis toolpak and am using the Data Analysis -
Sampling feature. I have two issues I am trying to resolve: 1) Most important is when I run a sample of my range, the process will return duplicate values in the sample. For example, if I have values of 1 - 100, and I take a sample of 10 itmes, it may return the number 45 several times. Is there a way to prevent this, so that every value returned appears only once in the sample? 2) The data I want to sample is alpha, not numeric. However the Sampling feature apparently only works with numeric input data. How can I get around this limitation. To sum up, I need a sampling method that works on text fields and only selects an item once for inclusion in the sample. Thanks to anyone who can help! Ralph |
#2
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to sample data without returning duplicates?
You are probably using the random sample portion, a random sample can not be
take from data while ignoring what has been taken, it wouldn't be random. You will have to use another method to get what you want. -- -John Please rate when your question is answered to help us and others know what is helpful. "Ralph E Brown" wrote: I have installed the analysis toolpak and am using the Data Analysis - Sampling feature. I have two issues I am trying to resolve: 1) Most important is when I run a sample of my range, the process will return duplicate values in the sample. For example, if I have values of 1 - 100, and I take a sample of 10 itmes, it may return the number 45 several times. Is there a way to prevent this, so that every value returned appears only once in the sample? 2) The data I want to sample is alpha, not numeric. However the Sampling feature apparently only works with numeric input data. How can I get around this limitation. To sum up, I need a sampling method that works on text fields and only selects an item once for inclusion in the sample. Thanks to anyone who can help! Ralph |
#3
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to sample data without returning duplicates?
On Aug 7, 8:38 am, Ralph E Brown <Ralph E
wrote: 1) Most important is when I run a sample of my range, the process will return duplicate values in the sample. [....] Is there a way to prevent this, so that every value returned appears only once in the sample? 2) The data I want to sample is alpha, not numeric. However the Sampling feature apparently only works with numeric input data. How can I get around this limitation. There are at least two common approaches. Arguably, the simplest one is as follows.... Assume your data is in one column. In each cell in an adjacent column, put the formula =RAND(). Note: The value of those cells will change every time you modify the worksheet. Sigh. No matter: the actual values do not matter, only that they are random. Now select the range that includes your data and the adjacent column of random values. Click on Data Sort to sort the random column. This will reorder your data as well. If you select the first "n" of the data column, it will be random without duplication (assuming all of your data are unique). |
#4
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to sample data without returning duplicates?
On Aug 7, 11:14 am, John Bundy (remove) wrote:
You are probably using the random sample portion, a random sample can not be take from data while ignoring what has been taken, it wouldn't be random. So by your definition, a Powerball-like lottery does not do random selection?(!) Of course that's wrong. You can do random sampling with and without replacement. Arguably, without replacement is the most common form of random sampling. Can you imagine a political survey where the opinions of one person might be counted more than once? Can you imagine a jury pool where one person might go through voir dire twice for the same jury panel? |
#5
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to sample data without returning duplicates?
Not to split hairs, as i think i answered the question that it could not be
done with this formula....the definition of random is: Statistics. of or characterizing a process of selection in which each item of a set has an equal probability of being chosen. So by definition, no powerball is NOT random, the machine randomly selects a number, but each number does not have the same chance of being chosen, chosing 5 out of 10 numbers excluding duplicates, the first person picked was picked with 1:10 odds and the 5th 1:5. -- -John Please rate when your question is answered to help us and others know what is helpful. "joeu2004" wrote: On Aug 7, 11:14 am, John Bundy (remove) wrote: You are probably using the random sample portion, a random sample can not be take from data while ignoring what has been taken, it wouldn't be random. So by your definition, a Powerball-like lottery does not do random selection?(!) Of course that's wrong. You can do random sampling with and without replacement. Arguably, without replacement is the most common form of random sampling. Can you imagine a political survey where the opinions of one person might be counted more than once? Can you imagine a jury pool where one person might go through voir dire twice for the same jury panel? |
#6
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to sample data without returning duplicates?
On Aug 7, 3:02 pm, John Bundy (remove) wrote:
Not to split hairs, as i think i answered the question that it could not be done with this formula....the definition of random is: Statistics. of or characterizing a process of selection in which each item of a set has an equal probability of being chosen. So by definition, no powerball is NOT random, the machine randomly selects a number, but each number does not have the same chance of being chosen, chosing 5 out of 10 numbers excluding duplicates, the first person picked was picked with 1:10 odds and the 5th 1:5. Of course it is a random selection. Nowhere in the definition does it say that the probability is equal for all selections; merely that for each selection, the probability is equally distributed. That is, for each selection, the size of the set has changed. So as you pointed out, the probability is 1 in 49 for any ball in the first selection and 1 in 48 for any ball in the second selection (assuming there are 49 balls to begin with). Each selection is (uniformly) random. |
#7
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to sample data without returning duplicates?
Thanks! That solved both problems at the same time. I know about the rand()
function, but it never occurred to me to use it in this way. Thanks again! "joeu2004" wrote: On Aug 7, 8:38 am, Ralph E Brown <Ralph E wrote: 1) Most important is when I run a sample of my range, the process will return duplicate values in the sample. [....] Is there a way to prevent this, so that every value returned appears only once in the sample? 2) The data I want to sample is alpha, not numeric. However the Sampling feature apparently only works with numeric input data. How can I get around this limitation. There are at least two common approaches. Arguably, the simplest one is as follows.... Assume your data is in one column. In each cell in an adjacent column, put the formula =RAND(). Note: The value of those cells will change every time you modify the worksheet. Sigh. No matter: the actual values do not matter, only that they are random. Now select the range that includes your data and the adjacent column of random values. Click on Data Sort to sort the random column. This will reorder your data as well. If you select the first "n" of the data column, it will be random without duplication (assuming all of your data are unique). |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
can i do a random sample with non-numeric data? | Excel Discussion (Misc queries) | |||
Help with Pulling Sample Data | Excel Discussion (Misc queries) | |||
Selecting a Random Sample of 15 from a Large data set | Excel Worksheet Functions | |||
How do I sample hourly data one time per day | New Users to Excel | |||
How to post sample data to this NG | Excel Discussion (Misc queries) |