Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Top 10 Text Values
I have a column of text values in which I need to display the ten most
frequently occurring. I tried setting the AutoFilter and choosing "Top 10", but it doesn't seem to be doing anything. I checked into some functions that I thought might work, like Frequency and Rank, but those seem to be for numbers only. Is there a way I can accomplish this? Thanks. |
#2
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Top 10 Text Values
Assuming that A2:A100 contains your text values, try the following which
will take into consideration ties for 10th place... B2, copied down: =IF((A2<"")*(ISNA(MATCH(A2,$A$1:A1,0))),COUNTIF($ A$2:$A$100,A2),"") C2, copied down: =IF(N(B2),RANK(B2,$B$2:$B$100)+COUNTIF($B$2:B2,B2)-1,"") D1: 10 This indicates that you want a Top 10 list. You can change this as necessary. For example, if you want a Top 5 list, enter 5 instead. E1: =MAX(IF(B2:B100=INDEX(B2:B100,MATCH(D1,C2:C100,0)) ,C2:C100))-D1 ....confirmed with CONTROL+SHIFT+ENTER, not just ENTER. F2, copied down: =IF(ROW()-ROW($F$2)+1<=$D$1+$E$1,INDEX($A$2:$A$100,MATCH(ROW ()-ROW($F$2)+ 1,$C$2:$C$100,0)),"") Hope this helps! In article , KarenH wrote: I have a column of text values in which I need to display the ten most frequently occurring. I tried setting the AutoFilter and choosing "Top 10", but it doesn't seem to be doing anything. I checked into some functions that I thought might work, like Frequency and Rank, but those seem to be for numbers only. Is there a way I can accomplish this? Thanks. |
#3
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Top 10 Text Values
Here's a pretty easy way of doing it, assume the values are in A1:A100 with
a header in A1, sel;ect A1:A100, do datafileradvanced filter, select copy to another location and where you want the filtered range, select unique records only and click OK. This will give you a distinct list with the text values, assume you put them in H1, in the first adjacent cell (in my example I2) put =COUNTIF($A$2:$A$100,H2) copy down as long as needed, now select both columns (H and I) and sort descending by column I, the first 10 values in H will be the top 10 -- Regards, Peo Sjoblom -- Regards, Peo Sjoblom "KarenH" wrote in message ... I have a column of text values in which I need to display the ten most frequently occurring. I tried setting the AutoFilter and choosing "Top 10", but it doesn't seem to be doing anything. I checked into some functions that I thought might work, like Frequency and Rank, but those seem to be for numbers only. Is there a way I can accomplish this? Thanks. |
#4
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Top 10 Text Values
Excellent, that works great! Thanks!
"Domenic" wrote: Assuming that A2:A100 contains your text values, try the following which will take into consideration ties for 10th place... B2, copied down: =IF((A2<"")*(ISNA(MATCH(A2,$A$1:A1,0))),COUNTIF($ A$2:$A$100,A2),"") C2, copied down: =IF(N(B2),RANK(B2,$B$2:$B$100)+COUNTIF($B$2:B2,B2)-1,"") D1: 10 This indicates that you want a Top 10 list. You can change this as necessary. For example, if you want a Top 5 list, enter 5 instead. E1: =MAX(IF(B2:B100=INDEX(B2:B100,MATCH(D1,C2:C100,0)) ,C2:C100))-D1 ....confirmed with CONTROL+SHIFT+ENTER, not just ENTER. F2, copied down: =IF(ROW()-ROW($F$2)+1<=$D$1+$E$1,INDEX($A$2:$A$100,MATCH(ROW ()-ROW($F$2)+ 1,$C$2:$C$100,0)),"") Hope this helps! In article , KarenH wrote: I have a column of text values in which I need to display the ten most frequently occurring. I tried setting the AutoFilter and choosing "Top 10", but it doesn't seem to be doing anything. I checked into some functions that I thought might work, like Frequency and Rank, but those seem to be for numbers only. Is there a way I can accomplish this? Thanks. |
#5
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Top 10 Text Values
What does the CONTROL+SHIFT+ENTER do that ENTER doesnt? I notice it puts
little curly brackets around the formula -- and that if I just enter, the values don't come out right -- but I'm wondering what exactly that combination does. Thanks again! "Domenic" wrote: =MAX(IF(B2:B100=INDEX(B2:B100,MATCH(D1,C2:C100,0)) ,C2:C100))-D1 ....confirmed with CONTROL+SHIFT+ENTER, not just ENTER. |
#6
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Top 10 Text Values
CONTROL+SHIFT+ENTER is used when dealing with array formulas. For a
detailed explanation, have a look in Excel's help menu under 'Array Formula'. In article , KarenH wrote: What does the CONTROL+SHIFT+ENTER do that ENTER doesnt? I notice it puts little curly brackets around the formula -- and that if I just enter, the values don't come out right -- but I'm wondering what exactly that combination does. Thanks again! |
#7
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Top 10 Text Values
Domenic wrote...
Assuming that A2:A100 contains your text values, try the following which will take into consideration ties for 10th place... B2, copied down: =IF((A2<"")*(ISNA(MATCH(A2,$A$1:A1,0))),COUNTIF( $A$2:$A$100,A2),"") C2, copied down: =IF(N(B2),RANK(B2,$B$2:$B$100)+COUNTIF($B$2:B2,B2 )-1,"") D1: 10 This indicates that you want a Top 10 list. You can change this as necessary. For example, if you want a Top 5 list, enter 5 instead. E1: =MAX(IF(B2:B100=INDEX(B2:B100,MATCH(D1,C2:C100,0) ),C2:C100))-D1 ...confirmed with CONTROL+SHIFT+ENTER, not just ENTER. F2, copied down: =IF(ROW()-ROW($F$2)+1<=$D$1+$E$1,INDEX($A$2:$A$100, MATCH(ROW()-ROW($F$2)+1,$C$2:$C$100,0)),"") .... Ancillary cells aren't necessary. B2 [array formula]: =INDEX($A$2:$A$100,MODE(MATCH($A$2:$A$100,$A$2:$A$ 100,0))) B3 [array formula]: =INDEX($A$2:$A$100,MODE(IF(COUNTIF(B$2:B2,$A$2:$A$ 100)=0, MATCH($A$2:$A$100,$A$2:$A$100,0)))) Fill B3 down into B4:B11. Ancillary cells can make this more efficient, but only additional one column suffices. B2: =COUNTIF(A2:A100,A2)+1 B3: =IF(COUNTIF(A$2:A2,A3)=0,COUNTIF(A3:A$100,A3) +ROWS(B3:B$100)/ROWS($A$2:$A$100)) Fill B3 down into B4:B100. C2: =INDEX($A$2:$A$100,MATCH(MAX($B$2:$B$100),$B$2:$B$ 100,0)) C3 [array formula]: =INDEX($A$2:$A$100,MATCH(MAX(IF($B$2:$B$100<VLOOKU P(C2,$A$2:$B$100,2,0), $B$2:$B$100)),$B$2:$B$100,0)) Fill C3 down into C4:C11. |
#8
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Top 10 Text Values
A couple of issues...
1) It doesn't take into consideration ties for 10th place. 2) Owing to the MODE function, #N/A is returned when values occurring more than once have been exhausted and only values occurring once remain. Nevertheless, very interesting Harlan! In article . com, "Harlan Grove" wrote: Ancillary cells aren't necessary. B2 [array formula]: =INDEX($A$2:$A$100,MODE(MATCH($A$2:$A$100,$A$2:$A$ 100,0))) B3 [array formula]: =INDEX($A$2:$A$100,MODE(IF(COUNTIF(B$2:B2,$A$2:$A$ 100)=0, MATCH($A$2:$A$100,$A$2:$A$100,0)))) Fill B3 down into B4:B11. |
#9
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Top 10 Text Values
Domenic wrote...
A couple of issues... 1) It doesn't take into consideration ties for 10th place. Meaning all tied values should be shown? 2) Owing to the MODE function, #N/A is returned when values occurring more than once have been exhausted and only values occurring once remain. .... B1 [array formula]: =INDEX($A$1:$A$500,MATCH(MAX(COUNTIF($A$1:$A$500,$ A$1:$A$500)), COUNTIF($A$1:$A$500,$A$1:$A$500),0)) B2 [array formula]: =INDEX($A$1:$A$500,MATCH(MAX(IF(COUNTIF(B$1:B1,$A$ 1:$A$500)=0, COUNTIF($A$1:$A$500,$A$1:$A$500))),COUNTIF($A$1:$A $500, IF(COUNTIF(B$1:B1,$A$1:$A$500)=0,$A$1:$A$500)),0)) without including ties. If ties are an issue, then it's much better to use two columns of formulas to produce the needed results. |
#10
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Top 10 Text Values
In article .com,
"Harlan Grove" wrote: Domenic wrote... A couple of issues... 1) It doesn't take into consideration ties for 10th place. Meaning all tied values should be shown? Yes. I believe that any solution for a Top N list must show any and all values tied for Nth place. Otherwise, which of the tied values should be returned? Personally, for a Top N list, I prefer the solution I offered. It takes into consideration ties for Nth place, it doesn't use array formulas, and is very efficient. By the way, the solution I offered is courtesy of Aladin Akyurek. |
#11
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Top 10 Text Values
Domenic wrote...
"Harlan Grove" wrote: Domenic wrote... .... 1) It doesn't take into consideration ties for 10th place. Meaning all tied values should be shown? Yes. I believe that any solution for a Top N list must show any and all values tied for Nth place. Otherwise, which of the tied values should be returned? Fair point. Personally, for a Top N list, I prefer the solution I offered. It takes into consideration ties for Nth place, it doesn't use array formulas, and is very efficient. By the way, the solution I offered is courtesy of Aladin Akyurek. Source is irrelevant. Very little in these ngs is original. OK, I hadn't considered ties, but your solution still uses one column too many and gains no efficiency by doing so. Given the original data in A2:A501, B2: =COUNTIF(A$2:A$501,A$2:A$501)+(ROWS(A2:A$501)-1)/ROWS(A$2:A$501) B3: =IF(ISNA(MATCH(A3,A$2:A2,0)),COUNTIF(A3:A$501,A3) +(ROWS(A3:A$501)-1)/ROWS(A$2:A$501)) Fill B3 down into B4:B501. Compare my B3 formula to yours. Yours (B3): =IF((A3<"")*(ISNA(MATCH(A3,$A$1:A2,0))),COUNTIF($ A$2:$A$501,A3),"") You're only including non-empty cells in col A. You're also matching against a cell, A1, that isn't specified just so you can use the same formula in B2 as in B3. There are arguments either way, but I tend to avoid needing to leave some cells around my data ranges blank. Next, if the ISNA call returns TRUE, then the current col A cell's value doesn't appear in previous rows in col A, so there's NO POINT to including those rows in the COUNTIF call unless your goal is an INEFFICIENT solution. Again, the motivation seems to be formula uniformity, but if partly and fully relative references are OK inside the MATCH call and as the 2nd arg to COUNTIF, why not as the 1st arg to COUNTIF? As for my (ROWS(x)-1)/ROWS(y) terms, they should be more or less directly calculated from the references without having to access data. Next, I'd enter the desired number of results in C1, then use a formula in B1 to calculate the effective number of results. B1: =COUNTIF($B$2:$B$501,"="&INT(LARGE($B$2:$B$501,C1 ))) This compares to your E1 formula Yours (E1) [array formula]: =MAX(IF(B2:B501=INDEX(B2:B501,MATCH(D1,C2:C501,0)) ,C2:C501))-D1 The LARGE call may make this somewhat less efficient, but it's a mystery why you subtract D1 in your E1 formula when all your col F formulas add $D$1 and $E$1. Seems inefficient to me. Also note your use of an array formula. One or two unavoidable array formulas OK, but many not OK even if they can be filled into the needed range from a single template formula? My revised results would be in C2:C501. C2: =IF(B$1,INDEX(A$2:A$501,MATCH(MAX(B$2:B$501),B$2:B $501,0)),"") C3 [array formula]: =IF(ROWS(B$2:B2)<B$1,INDEX(A$2:A$501, MATCH(MAX(IF(B$2:B$501<VLOOKUP(C2,A$2:B$501,2,0),B $2:B$501)),B$2:B$501,0)), "") Compare my C3 formula to your C3 and F3 formulas. Yours (C3): =IF(N(B2),RANK(B2,$B$2:$B$501)+COUNTIF($B$2:B2,B2)-1,"") Yours (F3): =IF(ROW()-ROW($F$2)+1<=$D$1+$E$1,INDEX($A$2:$A$501, MATCH(ROW()-ROW($F$2)+1,$C$2:$C$501,0)),"") Your C3 COUNTIF call corresponds to my B3 (ROWS(x)-1)/ROWS(y) term. It's a tie breaker. The difference is that your COUNTIF call has to work with values in col B while my ratio or ROWS calls can be calculated straight from the references without having to access range data. Your C3 RANK call correspond to my C3 MAX(IF(.<VLOOKUP(..),.)) term. Mine is probably a bit slower in calculation speed, but it uses less memory. You could tighten up your formulas a bit. B2: =IF((A2<"")*(ISNA(MATCH(A2,$A$1:A1,0))),COUNTIF($ A2:$A$501,A2),"") E1 [array formula]: =MAX(IF(B2:B501=INDEX(B2:B501,MATCH(D1,C2:C501,0)) ,C2:C501)) F2: =IF(ROWS(F$2:F2)<=$E$1,INDEX($A$2:$A$501, MATCH(ROWS(F$2:F2),$C$2:$C$501,0)),"") |
#12
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Top 10 Text Values
In article . com,
"Harlan Grove" wrote: OK, I hadn't considered ties, but your solution still uses one column too many and gains no efficiency by doing so. Given the original data in A2:A501, B2: =COUNTIF(A$2:A$501,A$2:A$501)+(ROWS(A2:A$501)-1)/ROWS(A$2:A$501) I take it the following would suffice, right? =COUNTIF(A2:A501,A2)+(ROWS(A2:A501)-1)/ROWS(A2:A501) B3: =IF(ISNA(MATCH(A3,A$2:A2,0)),COUNTIF(A3:A$501,A3) +(ROWS(A3:A$501)-1)/ROWS(A$2:A$501)) Fill B3 down into B4:B501. Compare my B3 formula to yours. Yours (B3): =IF((A3<"")*(ISNA(MATCH(A3,$A$1:A2,0))),COUNTIF($ A$2:$A$501,A3),"") You're only including non-empty cells in col A. You're also matching against a cell, A1, that isn't specified just so you can use the same formula in B2 as in B3. There are arguments either way, but I tend to avoid needing to leave some cells around my data ranges blank. Although, A1 doesn't necessarily need to be blank. It could contain a header, which in all likelihood would be the case, and wouldn't affect the result. Next, if the ISNA call returns TRUE, then the current col A cell's value doesn't appear in previous rows in col A, so there's NO POINT to including those rows in the COUNTIF call unless your goal is an INEFFICIENT solution. Again, the motivation seems to be formula uniformity, but if partly and fully relative references are OK inside the MATCH call and as the 2nd arg to COUNTIF, why not as the 1st arg to COUNTIF? Excellent point, Harlan!!! As for my (ROWS(x)-1)/ROWS(y) terms, they should be more or less directly calculated from the references without having to access data. Next, I'd enter the desired number of results in C1, then use a formula in B1 to calculate the effective number of results. B1: =COUNTIF($B$2:$B$501,"="&INT(LARGE($B$2:$B$501,C1 ))) This compares to your E1 formula Yours (E1) [array formula]: =MAX(IF(B2:B501=INDEX(B2:B501,MATCH(D1,C2:C501,0)) ,C2:C501))-D1 The LARGE call may make this somewhat less efficient, but it's a mystery why you subtract D1 in your E1 formula when all your col F formulas add $D$1 and $E$1. Seems inefficient to me. Doing it this way enables a user to clearly see the number of tied values that exist, indicated by E1. I suspect any difference in efficiency is negligible. In the end, it probably comes down to a matter of preference. Also note your use of an array formula. One or two unavoidable array formulas OK, but many not OK even if they can be filled into the needed range from a single template formula? Here I'm somewhat confused. You seem to be arguing against your solution. My solution only contains one array formula, whereas yours contains array formulas in C3:C501. My revised results would be in C2:C501. C2: =IF(B$1,INDEX(A$2:A$501,MATCH(MAX(B$2:B$501),B$2:B $501,0)),"") C3 [array formula]: =IF(ROWS(B$2:B2)<B$1,INDEX(A$2:A$501, MATCH(MAX(IF(B$2:B$501<VLOOKUP(C2,A$2:B$501,2,0),B $2:B$501)),B$2:B$501,0)), "") Compare my C3 formula to your C3 and F3 formulas. Yours (C3): =IF(N(B2),RANK(B2,$B$2:$B$501)+COUNTIF($B$2:B2,B2)-1,"") Yours (F3): =IF(ROW()-ROW($F$2)+1<=$D$1+$E$1,INDEX($A$2:$A$501, MATCH(ROW()-ROW($F$2)+1,$C$2:$C$501,0)),"") Your C3 COUNTIF call corresponds to my B3 (ROWS(x)-1)/ROWS(y) term. It's a tie breaker. The difference is that your COUNTIF call has to work with values in col B while my ratio or ROWS calls can be calculated straight from the references without having to access range data. Interesting! Your C3 RANK call correspond to my C3 MAX(IF(.<VLOOKUP(..),.)) term. Mine is probably a bit slower in calculation speed, but it uses less memory. You could tighten up your formulas a bit. B2: =IF((A2<"")*(ISNA(MATCH(A2,$A$1:A1,0))),COUNTIF($ A2:$A$501,A2),"") Most definitely! E1 [array formula]: =MAX(IF(B2:B501=INDEX(B2:B501,MATCH(D1,C2:C501,0)) ,C2:C501)) Here I'd probably stick with the original formula for the reason cited earlier. F2: =IF(ROWS(F$2:F2)<=$E$1,INDEX($A$2:$A$501, MATCH(ROWS(F$2:F2),$C$2:$C$501,0)),"") Yeah, I usually favour... ROWS(F$2:F2) ....over... ROW()-ROW($F$2)+1 |
#13
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Top 10 Text Values
"Domenic" wrote...
"Harlan Grove" wrote: .... B2: =COUNTIF(A$2:A$501,A$2:A$501)+(ROWS(A2:A$501)-1)/ROWS(A$2:A$501) I take it the following would suffice, right? =COUNTIF(A2:A501,A2)+(ROWS(A2:A501)-1)/ROWS(A2:A501) .... Oops. You're right. Although, A1 doesn't necessarily need to be blank. It could contain a header, which in all likelihood would be the case, and wouldn't affect the result. .... It would affect the result if it just happened to match any of the items below it. Improbable, but the nastiest bugs arise from improbable situations. Also note your use of an array formula. One or two unavoidable array formulas OK, but many not OK even if they can be filled into the needed range from a single template formula? Here I'm somewhat confused. You seem to be arguing against your solution. My solution only contains one array formula, whereas yours contains array formulas in C3:C501. .... I mentioned that because my interpretation of what you had written previously implies that it was good to avoid using array formulas. Not always. Array formulas sometimes make for more efficient solutions. Your experience may differ from mine, but mine is that the more cells there are, the more bugs can creep into spreadsheet models. There's an efficiency trade-off. For this particular sort of task, if the data would change infrequently, macros would be most efficient. |
#14
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Top 10 Text Values
In article ,
"Harlan Grove" wrote: I mentioned that because my interpretation of what you had written previously implies that it was good to avoid using array formulas. Not always. Array formulas sometimes make for more efficient solutions. That's great! Thanks Harlan! |
#15
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Top 10 Text Values
In article ,
"Harlan Grove" wrote: Although, A1 doesn't necessarily need to be blank. It could contain a header, which in all likelihood would be the case, and wouldn't affect the result. ... It would affect the result if it just happened to match any of the items below it. Improbable, but the nastiest bugs arise from improbable situations. While I usually try to eliminate any risk, the risk here is so remote that I don't see it as an issue. Also, I carried out an informal test to compare the re-calculation speed between your solution and mine. Here's what I did... 1) used separate workbooks for each solution 2) expanded the range to A2:A10000 3) copied the relevant formulas down to row 10,000 The results surprised me on two counts... 1) With regards to my solution alone, I was surprised with the time it took to copy the formulas down to row 10,000 and the difference in re-calculation speed (about 10 seconds). 2) With regards to the two solutions, I was surprised to see that there didn't seem to be any difference in re-calculation speed. Both took about 10 seconds. Maybe the results would be different if the range was expanded to A2:A65535 and formulas copied down as well. Since there doesn't seem to be any real difference between the two solutions, I think I'll stick with mine. :) |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Inserting two cell values into a new cell + text | Excel Discussion (Misc queries) | |||
How do I LOOKUP text values | Excel Worksheet Functions | |||
Help adding text values | Excel Worksheet Functions | |||
how do i detect like text and add corresponding values? | Excel Discussion (Misc queries) | |||
text and values combined in one cel | Excel Discussion (Misc queries) |