ExcelBanter - Group and Count Text Data More Efficiently

ExcelBanter (https://www.excelbanter.com/)

- Excel Worksheet Functions (https://www.excelbanter.com/excel-worksheet-functions/)

- - Group and Count Text Data More Efficiently (https://www.excelbanter.com/excel-worksheet-functions/76059-group-count-text-data-more-efficiently.html)

Group and Count Text Data More Efficiently

Good afternoon. I frequently need to group and count text data, and I'm
hoping some clever person out here knows of an easier/more robust/more
elegant way to do this.

I have one or more columns of "free text," which needs to be collated to a
manageable number of groups. For example, the input may have entries such as
"Price," "Cost," "Pricing," etc. These should all be grouped into "Price,"
and counted.

This is what I've been using. I consolidate the column(s) on a separate
worksheet, then use Advanced Filter to get unique values. These go in column
A. In column B I have COUNTIF functions to see how many times each original
value appears. In column C I type the category it goes into. In column D I
have a SUMIF, which sums column B where the value in C equals this category.
This function is nested in an IF and MATCH so that each category is only
counted once. This is what it looks like:

A B C D
Aardvark 1 Mammal 13
Bear 2 Mammal
Cod 5 Fish 8
Dog 10 Mammal
Eel 3 Fish
Fly 7 Insect 7

The formulae in column D return null when the category has already been
displayed, which is why each category only appears once.
To chart this data, I copy columns C and D, paste their values, and sort
descending by D. Alternately, I'll copy and paste the categories and use
VLOOKUPs to find the total for each category; this doesn't have to be updated
if the raw data changes, as long as there are no new values.
It works, but is there a better way? Especially, is there some way that
would let me sort this block of data, so I don't need to copy values to a new
location? My boss has a slightly different solution with DSUMs, but it's
fragile and to my thinking even less elegant.
One idea I had would be to add a tiny fraction to the values in D based on
the text value in C. This would make each value in D unique, so I could
write a separate lookup table using LARGE and MATCH, and not have to update
it. Does this make sense?
Thanks for any opinions!

Group and Count Text Data More Efficiently

Maybe you could create a lookup table and assign codes...
then have your original table lookup the codes...

then manipulate the information to use your sumif or countif functions...

or you can just add columns for your various categories and use if
statements to populate those columns when there is a match between the cell
and the column header...
--
Ted

"JonOfAllTrades" wrote:

Good afternoon. I frequently need to group and count text data, and I'm
hoping some clever person out here knows of an easier/more robust/more
elegant way to do this.

I have one or more columns of "free text," which needs to be collated to a
manageable number of groups. For example, the input may have entries such as
"Price," "Cost," "Pricing," etc. These should all be grouped into "Price,"
and counted.

This is what I've been using. I consolidate the column(s) on a separate
worksheet, then use Advanced Filter to get unique values. These go in column
A. In column B I have COUNTIF functions to see how many times each original
value appears. In column C I type the category it goes into. In column D I
have a SUMIF, which sums column B where the value in C equals this category.
This function is nested in an IF and MATCH so that each category is only
counted once. This is what it looks like:

A B C D
Aardvark 1 Mammal 13
Bear 2 Mammal
Cod 5 Fish 8
Dog 10 Mammal
Eel 3 Fish
Fly 7 Insect 7

The formulae in column D return null when the category has already been
displayed, which is why each category only appears once.
To chart this data, I copy columns C and D, paste their values, and sort
descending by D. Alternately, I'll copy and paste the categories and use
VLOOKUPs to find the total for each category; this doesn't have to be updated
if the raw data changes, as long as there are no new values.
It works, but is there a better way? Especially, is there some way that
would let me sort this block of data, so I don't need to copy values to a new
location? My boss has a slightly different solution with DSUMs, but it's
fragile and to my thinking even less elegant.
One idea I had would be to add a tiny fraction to the values in D based on
the text value in C. This would make each value in D unique, so I could
write a separate lookup table using LARGE and MATCH, and not have to update
it. Does this make sense?
Thanks for any opinions!