LinkBack Thread Tools Search this Thread Display Modes
Prev Previous Post   Next Post Next
  #4   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 5,651
Default word frequency counting

On Fri, 7 Mar 2008 22:56:28 -0800 (PST), LAN MIND wrote:

Hello all,

I am trying to optimize my database. I need to find the top occurring
words on my excel file.

Is there any freeware or add-ons that can perform word frequency
counting on excel files?

Thanks for any help- Lan


Here's a UDF that should get you started. There may be more efficient methods,
but I was using some "new to me" techniques here.

It returns a two-dimensional array consisting of the unique words; and the
count of each of those unique words.

By the way, a "word" is defined as a collection of word characters (bounded by
a non-word character or the beginning or end of the line). A word character is
defined as being in the class of the alphabet (A-Za-z), a digit (0-9) or the
underscore (_). If this definition of "word" gives unwanted results, it can be
changed.

To enter this, <alt-F11 opens the VBEditor. Ensure your project is
highlighted in the project explorer window, then Insert/Module and paste the
code below into the window that opens.

THEN: Select Tools/References and set a reference to Microsoft VBScript
Regular Expressions 5.5

There are several ways to display the results.

Assuming your "data" is in A1:A3, enter a formula into some cell:

First word
D1: =INDEX(uniquecount($A$1:$A$3),1,ROWS($1:1))

Count of first word
E1: =INDEX(uniquecount($A$1:$A$3),2,ROWS($1:1))

Then select D1:E1 and fill down as far as required. If you go to far, you'll
see #REF errors.

This might be better for you sorted, but I don't have time to do that right
now.

Once you have the results, you can copy/paste-special Values to some other area
of your worksheet, and then sort on the values.

If this is going to be used frequently, a sort routine can be incorporated.

===============================================
Option Explicit
Function UniqueCount(rg As Range) As Variant
'Requires reference to Microsoft VBScript Regular Expressions 5.5
'Returns a two dimensional array of unique words and count
Dim cWordList As Collection
Dim Str As String
Dim sRes() As Variant
Dim i As Long, j As Long
Dim c As Range
Dim re As RegExp
Dim mc As MatchCollection, m As Match

'Put all words into a single string
For Each c In rg
Str = Str & c.Value & " "
Next c

'get list of unique words
Set re = New RegExp
re.Global = True
re.Pattern = "\b\w+\b"
Set cWordList = New Collection
On Error Resume Next
'Add method with index=word will give error on duplicates
Set mc = re.Execute(Str)
For Each m In mc
cWordList.Add m.Value, m.Value
Next m
On Error GoTo 0
ReDim sRes(0 To 1, 1 To cWordList.Count)
For i = 1 To cWordList.Count
sRes(0, i) = cWordList(i)
Next i

'get word count for each word
re.Global = True
re.IgnoreCase = True
For i = 1 To UBound(sRes, 2)
re.Pattern = "\b" & sRes(0, i) & "\b"
Set mc = re.Execute(Str)
sRes(1, i) = mc.Count
Next i
Set re = Nothing

UniqueCount = sRes
End Function
===============================
--ron
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Counting Numbers of Same Frequency Rothman Excel Discussion (Misc queries) 1 February 22nd 08 03:12 AM
counting frequency of strings in cells tom ossieur Excel Worksheet Functions 4 February 28th 07 03:00 PM
Counting Frequency of Filtered Data tom Excel Worksheet Functions 1 March 1st 06 08:31 PM
Counting cells with a specified frequency(how many times it appear Kelvin Excel Discussion (Misc queries) 0 July 13th 05 08:36 AM
Counting unique values + frequency Mike Ferguson Excel Worksheet Functions 3 November 24th 04 07:31 AM


All times are GMT +1. The time now is 02:43 PM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright ©2004-2025 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"