Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 1
Default How to sample data without returning duplicates?

I have installed the analysis toolpak and am using the Data Analysis -
Sampling feature. I have two issues I am trying to resolve:

1) Most important is when I run a sample of my range, the process will
return duplicate values in the sample. For example, if I have values of 1 -
100, and I take a sample of 10 itmes, it may return the number 45 several
times. Is there a way to prevent this, so that every value returned appears
only once in the sample?

2) The data I want to sample is alpha, not numeric. However the Sampling
feature apparently only works with numeric input data. How can I get around
this limitation.

To sum up, I need a sampling method that works on text fields and only
selects an item once for inclusion in the sample.

Thanks to anyone who can help!

Ralph
  #2   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 772
Default How to sample data without returning duplicates?

You are probably using the random sample portion, a random sample can not be
take from data while ignoring what has been taken, it wouldn't be random. You
will have to use another method to get what you want.
--
-John
Please rate when your question is answered to help us and others know what
is helpful.


"Ralph E Brown" wrote:

I have installed the analysis toolpak and am using the Data Analysis -
Sampling feature. I have two issues I am trying to resolve:

1) Most important is when I run a sample of my range, the process will
return duplicate values in the sample. For example, if I have values of 1 -
100, and I take a sample of 10 itmes, it may return the number 45 several
times. Is there a way to prevent this, so that every value returned appears
only once in the sample?

2) The data I want to sample is alpha, not numeric. However the Sampling
feature apparently only works with numeric input data. How can I get around
this limitation.

To sum up, I need a sampling method that works on text fields and only
selects an item once for inclusion in the sample.

Thanks to anyone who can help!

Ralph

  #3   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 2,059
Default How to sample data without returning duplicates?

On Aug 7, 8:38 am, Ralph E Brown <Ralph E
wrote:

1) Most important is when I run a sample of my range, the process will
return duplicate values in the sample.
[....] Is there a way to prevent this, so that every value returned appears
only once in the sample?

2) The data I want to sample is alpha, not numeric. However the Sampling
feature apparently only works with numeric input data. How can I get around
this limitation.


There are at least two common approaches. Arguably, the simplest one
is as follows....

Assume your data is in one column. In each cell in an adjacent
column, put the formula =RAND(). Note: The value of those cells will
change every time you modify the worksheet. Sigh. No matter: the
actual values do not matter, only that they are random.

Now select the range that includes your data and the adjacent column
of random values. Click on Data Sort to sort the random column.
This will reorder your data as well. If you select the first "n" of
the data column, it will be random without duplication (assuming all
of your data are unique).

  #4   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 2,059
Default How to sample data without returning duplicates?

On Aug 7, 11:14 am, John Bundy (remove) wrote:
You are probably using the random sample portion, a random sample
can not be take from data while ignoring what has been taken, it wouldn't
be random.


So by your definition, a Powerball-like lottery does not do random
selection?(!)

Of course that's wrong. You can do random sampling with and without
replacement.

Arguably, without replacement is the most common form of random
sampling. Can you imagine a political survey where the opinions of
one person might be counted more than once? Can you imagine a jury
pool where one person might go through voir dire twice for the same
jury panel?

  #5   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 772
Default How to sample data without returning duplicates?

Not to split hairs, as i think i answered the question that it could not be
done with this formula....the definition of random is:
Statistics. of or characterizing a process of selection in which each item
of a set has an equal probability of being chosen.

So by definition, no powerball is NOT random, the machine randomly selects a
number, but each number does not have the same chance of being chosen,
chosing 5 out of 10 numbers excluding duplicates, the first person picked was
picked with 1:10 odds and the 5th 1:5.

--
-John
Please rate when your question is answered to help us and others know what
is helpful.


"joeu2004" wrote:

On Aug 7, 11:14 am, John Bundy (remove) wrote:
You are probably using the random sample portion, a random sample
can not be take from data while ignoring what has been taken, it wouldn't
be random.


So by your definition, a Powerball-like lottery does not do random
selection?(!)

Of course that's wrong. You can do random sampling with and without
replacement.

Arguably, without replacement is the most common form of random
sampling. Can you imagine a political survey where the opinions of
one person might be counted more than once? Can you imagine a jury
pool where one person might go through voir dire twice for the same
jury panel?




  #6   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 2,059
Default How to sample data without returning duplicates?

On Aug 7, 3:02 pm, John Bundy (remove) wrote:
Not to split hairs, as i think i answered the question that it could not
be done with this formula....the definition of random is:
Statistics. of or characterizing a process of selection in which each
item of a set has an equal probability of being chosen.

So by definition, no powerball is NOT random, the machine randomly
selects a number, but each number does not have the same chance
of being chosen, chosing 5 out of 10 numbers excluding duplicates,
the first person picked was picked with 1:10 odds and the 5th 1:5.


Of course it is a random selection. Nowhere in the definition does it
say that the probability is equal for all selections; merely that for
each selection, the probability is equally distributed. That is, for
each selection, the size of the set has changed. So as you pointed
out, the probability is 1 in 49 for any ball in the first selection
and 1 in 48 for any ball in the second selection (assuming there are
49 balls to begin with). Each selection is (uniformly) random.

  #7   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 1
Default How to sample data without returning duplicates?

Thanks! That solved both problems at the same time. I know about the rand()
function, but it never occurred to me to use it in this way. Thanks again!

"joeu2004" wrote:

On Aug 7, 8:38 am, Ralph E Brown <Ralph E
wrote:

1) Most important is when I run a sample of my range, the process will
return duplicate values in the sample.
[....] Is there a way to prevent this, so that every value returned appears
only once in the sample?

2) The data I want to sample is alpha, not numeric. However the Sampling
feature apparently only works with numeric input data. How can I get around
this limitation.


There are at least two common approaches. Arguably, the simplest one
is as follows....

Assume your data is in one column. In each cell in an adjacent
column, put the formula =RAND(). Note: The value of those cells will
change every time you modify the worksheet. Sigh. No matter: the
actual values do not matter, only that they are random.

Now select the range that includes your data and the adjacent column
of random values. Click on Data Sort to sort the random column.
This will reorder your data as well. If you select the first "n" of
the data column, it will be random without duplication (assuming all
of your data are unique).


Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
can i do a random sample with non-numeric data? Kyeria Excel Discussion (Misc queries) 1 May 9th 07 10:51 PM
Help with Pulling Sample Data excelmad Excel Discussion (Misc queries) 1 February 1st 07 04:35 AM
Selecting a Random Sample of 15 from a Large data set RMort Excel Worksheet Functions 6 December 22nd 06 05:09 PM
How do I sample hourly data one time per day Doc Merkin New Users to Excel 2 September 12th 06 02:21 AM
How to post sample data to this NG M.Siler Excel Discussion (Misc queries) 12 June 27th 05 11:37 PM


All times are GMT +1. The time now is 06:05 PM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"