Home |
Search |
Today's Posts |
|
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
2.5 StdDevs
I have a column of numbers. I want to eliminate all data in the
column that is greater than 2.5 standard deviations from the mean, leaving only the data that is within 2.5 StdDevs. I have a lot of data, so I need to do this as quickly as possible so I can then get statistical quartiles for the data. Any help on one or both issues greatly appreciated! James |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
2.5 StdDevs
James,
There are many "sexy" ways to do it, but if you only have to do it once or twice.... 1) Calculate the mean and standard deviation 2) Calculate the upper and lower cutoff values 3) Sort the data 4) Manually delete those above/below the cutoff values If this is something you will do on a regular basis, then it may be worth pursing a more automated method. HTH, John |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
2.5 StdDevs
John, Thank you for your reply. I am well-versed in VBA programming,
but a complete dunce at advanced mathematics like this. The problem is that the data is very "raw" -- about 12,000 datapoints ranging from 0.22 to over 100,000. You probably know that this is the kind of mess one sometimes gets from clients who don't really understand the ratios that are being asked for. The objective is to pare the data down to the "reasonable" values and get the bulk of it for subsequent figuring of quartiles. May I use the Reply to Author button to reply to you directly? James On Jul 9, 1:04?pm, wrote: James, There are many "sexy" ways to do it, but if you only have to do it once or twice.... 1) Calculate the mean and standard deviation 2) Calculate the upper and lower cutoff values 3) Sort the data 4) Manually delete those above/below the cutoff values If this is something you will do on a regular basis, then it may be worth pursing a more automated method. HTH, John |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
2.5 StdDevs
James,
Sorry, that email doesn't exist anymore. The programming is fairly straight forward. I'll do what I can to help you, but if it isn't something you are going to do on a regular basis....manual is the way to go. John |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
2.5 StdDevs
John, I understand. I'll plow away at the problem in here in hopes
that you will stick with me for a while and not get irritated with my lack of mathematical expertise. I would like to use built-in Excel functions as much as possible to speed the calculation. Your first step, "Calculate the mean". Would I use Excel's AVERAGE function for this? Say column M contains 12,410 numbers, starting on row 2. Would I then use a formula such as =AVERAGE(m2:m12411) to get the mean? A friend has told me that what is really wanted is the "threshold to the second quartile", so I'm confused. James On Jul 9, 1:48?pm, wrote: James, Sorry, that email doesn't exist anymore. The programming is fairly straight forward. I'll do what I can to help you, but if it isn't something you are going to do on a regular basis....manual is the way to go. John |
#6
Posted to microsoft.public.excel.programming
|
|||
|
|||
2.5 StdDevs
What is the basis for excluding data that is beyond 2.5 Std Dev from the mean.
Why not use the built in quartiles worksheet function against the raw data to get the quartiles from the raw data. If the data is incorrect, throwing out the data you suggest doesn't seem to have much justification to me. You need to get correct data. Nonetheless: Yes =Average($m$2:$M$121411) would give you the average You could use an adjacent empty column to do Assume the data points are in M as you show, in R2 for example =if(Abs(M2-Average($M$2:$M$12411))2.5*Stdev($m$2:$M$121411), "Delete","Keep") then do an autofilter on column R and select Delete in the dropdown. Select all the data except the first row and do Edit=Delete and select entirerow. Note that once you delete the rows (and remove the filter), the formulas will recalculate and new mean and standard deviation and more rows will show up as Delete, but you can ignore these. -- Regards, Tom Ogilvy then drag fill down to "Zone" wrote: John, I understand. I'll plow away at the problem in here in hopes that you will stick with me for a while and not get irritated with my lack of mathematical expertise. I would like to use built-in Excel functions as much as possible to speed the calculation. Your first step, "Calculate the mean". Would I use Excel's AVERAGE function for this? Say column M contains 12,410 numbers, starting on row 2. Would I then use a formula such as =AVERAGE(m2:m12411) to get the mean? A friend has told me that what is really wanted is the "threshold to the second quartile", so I'm confused. James On Jul 9, 1:48?pm, wrote: James, Sorry, that email doesn't exist anymore. The programming is fairly straight forward. I'll do what I can to help you, but if it isn't something you are going to do on a regular basis....manual is the way to go. John |
#7
Posted to microsoft.public.excel.programming
|
|||
|
|||
2.5 StdDevs
James,
I'm thinking like Tom....what is it exactly you are trying to get to? Are you trying to eliminate outliers from your dataset?....all those beyond 2.5 std devs. Or are you trying to find the 25th and 75th quartiles? John |
#8
Posted to microsoft.public.excel.programming
|
|||
|
|||
2.5 StdDevs
"Zone" wrote in message
ups.com... The problem is that the data is very "raw" -- about 12,000 datapoints ranging from 0.22 to over 100,000. You probably know that this is the kind of mess one sometimes gets from clients who don't really understand the ratios that are being asked for. Hi Zone, if you want to use the 2.5 StDev rule for removing outliers than your data should be normally distributed.Ratios are known to be notoriously skewed. 1) Do a transform ; a log- or arcsin transformation could be usefull 2) Do a Kolmogorov-Smirnov test to see if you now have a normal distribution 3) Do the outlier test on the new data Use statistics software if you can , SAS , Statistica , R , S hth Gys |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|