![]() |
Removing statistical outliers
I have a large range of data (thousands of rows) and I was performing the
functions average and max on cells that had time format 37:30:55. I realized in the data there were a few records that had an abnormally large value in them and from discussion they were due to unavoidable circumstances. There are a few records in each data set I am working with which are skewing my numbers (very large values). Is there a way to tell these functions (or use a different function that returns the same results) to exclude outliers or any outlier larger/smaller then what I specify? I know you can use the average function, select a range, skip over the outlier and select another range after the value, and repeat, but now that the data has so many recrds it is getting to be a tiresome task with the new data (thousands of additional records) every month that I put on a new tab. It takes a lot of time to go through thousands of rows to find the outlier and not select them in the average/max function. |
Removing statistical outliers
One way to discard outlier is to use an array* function similar to this:
=AVERAGE(IF(A2:A2000<100,A2:A2000)) This formula is designed to take the average of all number sin the range A2:A2000 that are less than 100. *Array functions are confirmed using Ctrl+Shift+Enter, not just Enter. -- Best Regards, Luke M *Remember to click "yes" if this post helped you!* "William" wrote: I have a large range of data (thousands of rows) and I was performing the functions average and max on cells that had time format 37:30:55. I realized in the data there were a few records that had an abnormally large value in them and from discussion they were due to unavoidable circumstances. There are a few records in each data set I am working with which are skewing my numbers (very large values). Is there a way to tell these functions (or use a different function that returns the same results) to exclude outliers or any outlier larger/smaller then what I specify? I know you can use the average function, select a range, skip over the outlier and select another range after the value, and repeat, but now that the data has so many recrds it is getting to be a tiresome task with the new data (thousands of additional records) every month that I put on a new tab. It takes a lot of time to go through thousands of rows to find the outlier and not select them in the average/max function. |
Removing statistical outliers
Awesome, thanks. Will this work for the MAX function too?
"Luke M" wrote: One way to discard outlier is to use an array* function similar to this: =AVERAGE(IF(A2:A2000<100,A2:A2000)) This formula is designed to take the average of all number sin the range A2:A2000 that are less than 100. *Array functions are confirmed using Ctrl+Shift+Enter, not just Enter. -- Best Regards, Luke M *Remember to click "yes" if this post helped you!* "William" wrote: I have a large range of data (thousands of rows) and I was performing the functions average and max on cells that had time format 37:30:55. I realized in the data there were a few records that had an abnormally large value in them and from discussion they were due to unavoidable circumstances. There are a few records in each data set I am working with which are skewing my numbers (very large values). Is there a way to tell these functions (or use a different function that returns the same results) to exclude outliers or any outlier larger/smaller then what I specify? I know you can use the average function, select a range, skip over the outlier and select another range after the value, and repeat, but now that the data has so many recrds it is getting to be a tiresome task with the new data (thousands of additional records) every month that I put on a new tab. It takes a lot of time to go through thousands of rows to find the outlier and not select them in the average/max function. |
Removing statistical outliers
Luke, how do I use time in that array? My time is format hh:mm:ss do I use
quotes, single quotes?? "Luke M" wrote: One way to discard outlier is to use an array* function similar to this: =AVERAGE(IF(A2:A2000<100,A2:A2000)) This formula is designed to take the average of all number sin the range A2:A2000 that are less than 100. *Array functions are confirmed using Ctrl+Shift+Enter, not just Enter. -- Best Regards, Luke M *Remember to click "yes" if this post helped you!* "William" wrote: I have a large range of data (thousands of rows) and I was performing the functions average and max on cells that had time format 37:30:55. I realized in the data there were a few records that had an abnormally large value in them and from discussion they were due to unavoidable circumstances. There are a few records in each data set I am working with which are skewing my numbers (very large values). Is there a way to tell these functions (or use a different function that returns the same results) to exclude outliers or any outlier larger/smaller then what I specify? I know you can use the average function, select a range, skip over the outlier and select another range after the value, and repeat, but now that the data has so many recrds it is getting to be a tiresome task with the new data (thousands of additional records) every month that I put on a new tab. It takes a lot of time to go through thousands of rows to find the outlier and not select them in the average/max function. |
Removing statistical outliers
I have an article here that might be useful to you:
http://www.codeforexcelandoutlook.co...lude-extremes/ --JP On Oct 6, 1:59*pm, William wrote: I have a large range of data (thousands of rows) and I was performing the functions average and max on cells that had time format 37:30:55. I realized in the data there were a few records that had an abnormally large value in them and from discussion they were due to unavoidable circumstances. There are a few records in each data set I am working with which are skewing my numbers (very large values). Is there a way to tell these functions (or use a different function that returns the same results) to exclude outliers or any outlier larger/smaller then what I specify? |
Removing statistical outliers
Just to add one; check out the below link by Chip to know how time values are
stored in excel.. http://www.cpearson.com/excel/datetime.htm If this post helps click Yes --------------- Jacob Skaria "William" wrote: Luke, how do I use time in that array? My time is format hh:mm:ss do I use quotes, single quotes?? "Luke M" wrote: One way to discard outlier is to use an array* function similar to this: =AVERAGE(IF(A2:A2000<100,A2:A2000)) This formula is designed to take the average of all number sin the range A2:A2000 that are less than 100. *Array functions are confirmed using Ctrl+Shift+Enter, not just Enter. -- Best Regards, Luke M *Remember to click "yes" if this post helped you!* "William" wrote: I have a large range of data (thousands of rows) and I was performing the functions average and max on cells that had time format 37:30:55. I realized in the data there were a few records that had an abnormally large value in them and from discussion they were due to unavoidable circumstances. There are a few records in each data set I am working with which are skewing my numbers (very large values). Is there a way to tell these functions (or use a different function that returns the same results) to exclude outliers or any outlier larger/smaller then what I specify? I know you can use the average function, select a range, skip over the outlier and select another range after the value, and repeat, but now that the data has so many recrds it is getting to be a tiresome task with the new data (thousands of additional records) every month that I put on a new tab. It takes a lot of time to go through thousands of rows to find the outlier and not select them in the average/max function. |
Removing statistical outliers
24:00 equates to 1. So in the below formula you are filtering entries which
are less than 100 which means you are filtering down to hours which are less than 2400:00:00 which should an ideal case to go with... =MAX(IF(A2:A2000<100,A2:A2000)) for finding the max If this post helps click Yes --------------- Jacob Skaria "William" wrote: Luke, how do I use time in that array? My time is format hh:mm:ss do I use quotes, single quotes?? "Luke M" wrote: One way to discard outlier is to use an array* function similar to this: =AVERAGE(IF(A2:A2000<100,A2:A2000)) This formula is designed to take the average of all number sin the range A2:A2000 that are less than 100. *Array functions are confirmed using Ctrl+Shift+Enter, not just Enter. -- Best Regards, Luke M *Remember to click "yes" if this post helped you!* "William" wrote: I have a large range of data (thousands of rows) and I was performing the functions average and max on cells that had time format 37:30:55. I realized in the data there were a few records that had an abnormally large value in them and from discussion they were due to unavoidable circumstances. There are a few records in each data set I am working with which are skewing my numbers (very large values). Is there a way to tell these functions (or use a different function that returns the same results) to exclude outliers or any outlier larger/smaller then what I specify? I know you can use the average function, select a range, skip over the outlier and select another range after the value, and repeat, but now that the data has so many recrds it is getting to be a tiresome task with the new data (thousands of additional records) every month that I put on a new tab. It takes a lot of time to go through thousands of rows to find the outlier and not select them in the average/max function. |
Removing statistical outliers
That kind of sucks. I have dozens and dozens of data sets, each with a
different outlier some are wierd like < 434 hours or < 820 hours. It will be a nightmare for me to try and caclulate what 434 equals if 24 is 1. No easier way, like just using the real time value???? "Jacob Skaria" wrote: 24:00 equates to 1. So in the below formula you are filtering entries which are less than 100 which means you are filtering down to hours which are less than 2400:00:00 which should an ideal case to go with... =MAX(IF(A2:A2000<100,A2:A2000)) for finding the max If this post helps click Yes --------------- Jacob Skaria "William" wrote: Luke, how do I use time in that array? My time is format hh:mm:ss do I use quotes, single quotes?? "Luke M" wrote: One way to discard outlier is to use an array* function similar to this: =AVERAGE(IF(A2:A2000<100,A2:A2000)) This formula is designed to take the average of all number sin the range A2:A2000 that are less than 100. *Array functions are confirmed using Ctrl+Shift+Enter, not just Enter. -- Best Regards, Luke M *Remember to click "yes" if this post helped you!* "William" wrote: I have a large range of data (thousands of rows) and I was performing the functions average and max on cells that had time format 37:30:55. I realized in the data there were a few records that had an abnormally large value in them and from discussion they were due to unavoidable circumstances. There are a few records in each data set I am working with which are skewing my numbers (very large values). Is there a way to tell these functions (or use a different function that returns the same results) to exclude outliers or any outlier larger/smaller then what I specify? I know you can use the average function, select a range, skip over the outlier and select another range after the value, and repeat, but now that the data has so many recrds it is getting to be a tiresome task with the new data (thousands of additional records) every month that I put on a new tab. It takes a lot of time to go through thousands of rows to find the outlier and not select them in the average/max function. |
Removing statistical outliers
I understand you are looking to find Average and max from large data sets...
Have you tried the formula Luke has given on average and the below max formulas (array formulas).. =MAX(IF(A2:A2000<100,A2:A2000)) It would be easy for us to suggest a solution if you can give more information on how your data is arrranged.... If this post helps click Yes --------------- Jacob Skaria "William" wrote: That kind of sucks. I have dozens and dozens of data sets, each with a different outlier some are wierd like < 434 hours or < 820 hours. It will be a nightmare for me to try and caclulate what 434 equals if 24 is 1. No easier way, like just using the real time value???? "Jacob Skaria" wrote: 24:00 equates to 1. So in the below formula you are filtering entries which are less than 100 which means you are filtering down to hours which are less than 2400:00:00 which should an ideal case to go with... =MAX(IF(A2:A2000<100,A2:A2000)) for finding the max If this post helps click Yes --------------- Jacob Skaria "William" wrote: Luke, how do I use time in that array? My time is format hh:mm:ss do I use quotes, single quotes?? "Luke M" wrote: One way to discard outlier is to use an array* function similar to this: =AVERAGE(IF(A2:A2000<100,A2:A2000)) This formula is designed to take the average of all number sin the range A2:A2000 that are less than 100. *Array functions are confirmed using Ctrl+Shift+Enter, not just Enter. -- Best Regards, Luke M *Remember to click "yes" if this post helped you!* "William" wrote: I have a large range of data (thousands of rows) and I was performing the functions average and max on cells that had time format 37:30:55. I realized in the data there were a few records that had an abnormally large value in them and from discussion they were due to unavoidable circumstances. There are a few records in each data set I am working with which are skewing my numbers (very large values). Is there a way to tell these functions (or use a different function that returns the same results) to exclude outliers or any outlier larger/smaller then what I specify? I know you can use the average function, select a range, skip over the outlier and select another range after the value, and repeat, but now that the data has so many recrds it is getting to be a tiresome task with the new data (thousands of additional records) every month that I put on a new tab. It takes a lot of time to go through thousands of rows to find the outlier and not select them in the average/max function. |
Removing statistical outliers
I tried the formula, and it works, however my outliers are strange numbers at times (i.e. 2224:18:40). My spreadsheet is pretty simple. Column A is the date and time when a report was received in format mm/dd/yy hh:mm:ss AM/PM and the column B is the time it was acknowledged in the same format. Column C is the difference between column A & B in time format 37:30:55. I have dozens of tabs and spreadsheets with similiar data (each one unique to itself representing various things). Some of the records have a huge time difference due to the fact that someone forgot about it for months and months. It is always different for outliers one tab has that huge one over 2224 another is 50:34:56 and another is 6:46:20. Even though the formula works great, I would have to take time out to find what number to use in the formula for the outliers instead of using a default formula in my spreadsheets I can reuse. "Jacob Skaria" wrote: I understand you are looking to find Average and max from large data sets... Have you tried the formula Luke has given on average and the below max formulas (array formulas).. =MAX(IF(A2:A2000<100,A2:A2000)) It would be easy for us to suggest a solution if you can give more information on how your data is arrranged.... If this post helps click Yes --------------- Jacob Skaria "William" wrote: That kind of sucks. I have dozens and dozens of data sets, each with a different outlier some are wierd like < 434 hours or < 820 hours. It will be a nightmare for me to try and caclulate what 434 equals if 24 is 1. No easier way, like just using the real time value???? "Jacob Skaria" wrote: 24:00 equates to 1. So in the below formula you are filtering entries which are less than 100 which means you are filtering down to hours which are less than 2400:00:00 which should an ideal case to go with... =MAX(IF(A2:A2000<100,A2:A2000)) for finding the max If this post helps click Yes --------------- Jacob Skaria "William" wrote: Luke, how do I use time in that array? My time is format hh:mm:ss do I use quotes, single quotes?? "Luke M" wrote: One way to discard outlier is to use an array* function similar to this: =AVERAGE(IF(A2:A2000<100,A2:A2000)) This formula is designed to take the average of all number sin the range A2:A2000 that are less than 100. *Array functions are confirmed using Ctrl+Shift+Enter, not just Enter. -- Best Regards, Luke M *Remember to click "yes" if this post helped you!* "William" wrote: I have a large range of data (thousands of rows) and I was performing the functions average and max on cells that had time format 37:30:55. I realized in the data there were a few records that had an abnormally large value in them and from discussion they were due to unavoidable circumstances. There are a few records in each data set I am working with which are skewing my numbers (very large values). Is there a way to tell these functions (or use a different function that returns the same results) to exclude outliers or any outlier larger/smaller then what I specify? I know you can use the average function, select a range, skip over the outlier and select another range after the value, and repeat, but now that the data has so many recrds it is getting to be a tiresome task with the new data (thousands of additional records) every month that I put on a new tab. It takes a lot of time to go through thousands of rows to find the outlier and not select them in the average/max function. |
Removing statistical outliers
Hello,
Here is a general approach: http://sulprobil.com/html/outlier_resistant_beta.html Regards, Bernd |
All times are GMT +1. The time now is 11:31 PM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com