ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Programming (https://www.excelbanter.com/excel-programming/)
-   -   Most popular word (https://www.excelbanter.com/excel-programming/284792-most-popular-word.html)

Umby

Most popular word
 
Hi all,
I need a function to get the most frequent word in a cell. The length of the
output must be at least 3.
e.g. if a cell contents was "Vick rushed for 145 yards. Michael Vick handled
things in regulation. established in 1881 ; Yards Gear for the Holidays car
yards in crisis" , the output should be "yards".

Any idea?
Thank you.

Umby



Dick Kusleika[_3_]

Most popular word
 
Umby

Try this

Function MFW(Rng As Range) As String

Dim arrWords As Variant
Dim RangeText As String
Dim i As Long
Dim CurrCount As Long
Dim MaxCount As Long
Dim MaxWord As String

RangeText = UCase(Rng.Text)

RangeText = Replace(RangeText, ".", "")
RangeText = Replace(RangeText, ",", "")
RangeText = Replace(RangeText, ";", "")
RangeText = Replace(RangeText, ":", "")

arrWords = Split(RangeText, " ")

For i = LBound(arrWords) To UBound(arrWords)
If Len(arrWords(i)) = 3 Then
CurrCount = (Len(RangeText) - _
Len(Replace(RangeText, arrWords(i), ""))) / Len(arrWords(i))
If CurrCount MaxCount Then
MaxWord = arrWords(i)
MaxCount = CurrCount
End If
End If
Next i

MFW = MaxWord

End Function

Note that I had to strip out punctuation because "yards." is not the same as
"yards". I don't make any representation that the ones I elimate will be
sufficient. Also, I converted everything to caps.


--
Dick Kusleika
MVP - Excel
www.dicks-clicks.com
Post all replies to the newsgroup.

"Umby" wrote in message
.. .
Hi all,
I need a function to get the most frequent word in a cell. The length of

the
output must be at least 3.
e.g. if a cell contents was "Vick rushed for 145 yards. Michael Vick

handled
things in regulation. established in 1881 ; Yards Gear for the Holidays

car
yards in crisis" , the output should be "yards".

Any idea?
Thank you.

Umby





bdcrisp[_5_]

Most popular word
 

thats a nice little function, however, if more than word appears with
the same frequency, it only returns the last one found with the
maxvalue


------------------------------------------------
~~ Message posted from http://www.ExcelTip.com/
~~View and post usenet messages directly from http://www.ExcelForum.com/

~~Now Available: Financial Statements.xls, a step by step guide to creating financial statements

Dick Kusleika[_3_]

Most popular word
 
bdcrisp

Quite right. Except I think it returns the first one, not the last. If you
want a semicolon delimited list of the most frequent words, you would have
to adjust the function to this

Function MFW(Rng As Range) As String

Dim arrWords As Variant
Dim RangeText As String
Dim i As Long
Dim CurrCount As Long
Dim MaxCount As Long
Dim MaxWord As String

RangeText = UCase(Rng.Text)

RangeText = Replace(RangeText, ".", "")
RangeText = Replace(RangeText, ",", "")
RangeText = Replace(RangeText, ";", "")
RangeText = Replace(RangeText, ":", "")

arrWords = Split(RangeText, " ")

For i = LBound(arrWords) To UBound(arrWords)
If Len(arrWords(i)) = 3 Then
CurrCount = (Len(RangeText) - _
Len(Replace(RangeText, arrWords(i), ""))) / Len(arrWords(i))
If CurrCount = MaxCount Then
If InStr(1, MaxWord, arrWords(i) & ";") = 0 Then
MaxWord = MaxWord & arrWords(i) & ";"
End If
MaxCount = CurrCount
End If
End If
Next i

MFW = Left(MaxWord, Len(MaxWord) - 1)

End Function

--
Dick Kusleika
MVP - Excel
www.dicks-clicks.com
Post all replies to the newsgroup.

"bdcrisp" wrote in message
...

thats a nice little function, however, if more than word appears with
the same frequency, it only returns the last one found with the
maxvalue


------------------------------------------------
~~ Message posted from http://www.ExcelTip.com/
~~View and post usenet messages directly from http://www.ExcelForum.com/

~~Now Available: Financial Statements.xls, a step by step guide to

creating financial statements



Umby

Most popular word
 
Thank you very much!
Umby


-----Original Message-----
Umby

Try this

Function MFW(Rng As Range) As String

Dim arrWords As Variant
Dim RangeText As String
Dim i As Long
Dim CurrCount As Long
Dim MaxCount As Long
Dim MaxWord As String

RangeText = UCase(Rng.Text)

RangeText = Replace(RangeText, ".", "")
RangeText = Replace(RangeText, ",", "")
RangeText = Replace(RangeText, ";", "")
RangeText = Replace(RangeText, ":", "")

arrWords = Split(RangeText, " ")

For i = LBound(arrWords) To UBound(arrWords)
If Len(arrWords(i)) = 3 Then
CurrCount = (Len(RangeText) - _
Len(Replace(RangeText, arrWords(i), ""))) /

Len(arrWords(i))
If CurrCount MaxCount Then
MaxWord = arrWords(i)
MaxCount = CurrCount
End If
End If
Next i

MFW = MaxWord

End Function

Note that I had to strip out punctuation because "yards."

is not the same as
"yards". I don't make any representation that the ones I

elimate will be
sufficient. Also, I converted everything to caps.


--
Dick Kusleika
MVP - Excel
www.dicks-clicks.com
Post all replies to the newsgroup.

"Umby" wrote in message
. ..
Hi all,
I need a function to get the most frequent word in a

cell. The length of
the
output must be at least 3.
e.g. if a cell contents was "Vick rushed for 145 yards.

Michael Vick
handled
things in regulation. established in 1881 ; Yards Gear

for the Holidays
car
yards in crisis" , the output should be "yards".

Any idea?
Thank you.

Umby




.


fernando cinquegrani

Most popular word
 
"Umby" ha scritto nel messaggio .. .
| Hi all,
| I need a function to get the most frequent word in a cell. The length of the
| output must be at least 3.
| e.g. if a cell contents was "Vick rushed for 145 yards. Michael Vick handled
| things in regulation. established in 1881 ; Yards Gear for the Holidays car
| yards in crisis" , the output should be "yards".
|
| Any idea?

in microsoft.public.it.office.excel
thread: cercare parole
date: dic 10, 2003

Public Function MFWord(buf1 As Range)
Dim buf2 As Object, buf3 As Object
Dim smax As String, nmax As Long
Set regexp = CreateObject("VBScript.RegExp")
regexp.Global = True
regexp.IgnoreCase = True
regexp.Pattern = "(\w{3,})"
Set buf2 = regexp.Execute(buf1.Text)
For Each occ In buf2
regexp.Pattern = occ & "\b|" & occ & "$"
Set buf3 = regexp.Execute(buf1.Text)
If buf3.Count nmax Then smax = occ: nmax = buf3.Count
Next
MsgBox smax & " (" & nmax & ")"
MFWord = smax
End Function
.f
fernando cinquegrani
Microsoft MVP
http://www.prodomosua.it




Umby

Most popular word
 
Thank you for your function, however it doesn't consider
some character.
e.g. if cell value = "mip@it mip@it mip@it win win" ,
output = "mip"

Regards
Umby



-----Original Message-----
"Umby" ha scritto nel messaggio

.. .
| Hi all,
| I need a function to get the most frequent word in a

cell. The length of the
| output must be at least 3.
| e.g. if a cell contents was "Vick rushed for 145 yards.

Michael Vick handled
| things in regulation. established in 1881 ; Yards Gear

for the Holidays car
| yards in crisis" , the output should be "yards".
|
| Any idea?

in microsoft.public.it.office.excel
thread: cercare parole
date: dic 10, 2003

Public Function MFWord(buf1 As Range)
Dim buf2 As Object, buf3 As Object
Dim smax As String, nmax As Long
Set regexp = CreateObject("VBScript.RegExp")
regexp.Global = True
regexp.IgnoreCase = True
regexp.Pattern = "(\w{3,})"
Set buf2 = regexp.Execute(buf1.Text)
For Each occ In buf2
regexp.Pattern = occ & "\b|" & occ & "$"
Set buf3 = regexp.Execute(buf1.Text)
If buf3.Count nmax Then smax = occ: nmax = buf3.Count
Next
MsgBox smax & " (" & nmax & ")"
MFWord = smax
End Function
.f
fernando cinquegrani
Microsoft MVP
http://www.prodomosua.it



.


fernando cinquegrani

Most popular word
 
"Umby" ha scritto nel messaggio ...
| Thank you for your function, however it doesn't consider
| some character.
| e.g. if cell value = "mip@it mip@it mip@it win win" ,
| output = "mip"
|

Public Function MFWord(buf1 As Range)
Dim buf2 As Variant, n as long
Dim smax As String, nmax As Long
Dim occ As Variant, acc As Variant
Set regexp = CreateObject("VBScript.RegExp")
regexp.Global = True
regexp.IgnoreCase = True
regexp.Pattern = "([a-z_&@]{3,})"
Set buf2 = regexp.Execute(buf1.Text)
For Each occ In buf2
n = 0
For Each acc In buf2
If occ = acc Then n = n + 1
If n nmax Then smax = occ: nmax = n
Next
Next
'MsgBox smax & " (" & n & ")"
MFWord = smax
End Function

the pattern now includes
the characters from a to z (a-z)
and the symbols _, & and @
.f
fernando cinquegrani
Microsoft MVP
http://www.prodomosua.it



Harlan Grove[_5_]

Most popular word
 
"Umby" wrote...
Thank you for your function, however it doesn't consider
some character.
e.g. if cell value = "mip@it mip@it mip@it win win" ,
output = "mip"

...
-----Original Message-----
"Umby" ha scritto nel messaggio

. ..
|Hi all,
|I need a function to get the most frequent word in a cell. The length of the
|output must be at least 3.
|e.g. if a cell contents was "Vick rushed for 145 yards. Michael Vick handled
|things in regulation. established in 1881 ; Yards Gear for the Holidays car
|yards in crisis" , the output should be "yards".
|
|Any idea?

in microsoft.public.it.office.excel
thread: cercare parole
date: dic 10, 2003

...

Rewrite the function to be more flexible. Also eliminate the MsgBox call in a
function - never a good idea.


Function mcw( _
textstr As String, _
Optional sep As String = " ", _
Optional matchpat As String = "\b\w{3,}\b", _
Optional caseinsens As Boolean = True _
) As String
'-------------------------------------------------------------
Dim wc() As Long, m As Object, mc As Object, re As Object
Dim k As Long, n As Long, wcmax As Long, t As String

t = textstr

Set re = CreateObject("VBScript.RegExp")
re.Pattern = matchpat
re.IgnoreCase = caseinsens
re.Global = True
Set mc = re.Execute(textstr)

If mc.Count = 0 Then Exit Function

ReDim wc(1 To mc.Count)

For Each m In mc
n = n + 1
k = Len(t)
re.Pattern = "\b" & m.Value & "\b"
t = re.Replace(t, "")
wc(n) = (k - Len(t)) / Len(m.Value)
If wc(n) wcmax Then wcmax = wc(n)
Next m

For k = 1 To n
If wc(k) = wcmax Then mcw = mcw & sep & mc.Item(k - 1).Value
Next k

Erase wc
Set m = Nothing
Set mc = Nothing
Set re = Nothing

mcw = Mid(mcw, Len(sep) + 1)
End Function


Change the second argument to use a word separator other than space. Change the
third and fourth arguments to use different definitions of 'word' as specified
by VBScript regular expressions.

--
Never attach files.
Snip unnecessary quoted text.
Never multipost (though crossposting is usually OK).
Don't change subject lines because it corrupts Google newsgroup archives.

fernando cinquegrani

Most popular word
 
"Harlan Grove" ha scritto nel messaggio ...

| Rewrite the function to be more flexible.

correct

| Also eliminate the MsgBox call in a
function - never a good idea.

'MsgBox smax & " (" & n & ")"
see the original thread in microsoft.public.it.office.excel

| Change the second argument to use a word separator other than space.

how? " ", ",",";","!","?","."......

| Change the third and fourth arguments to use different definitions of 'word' as specified
| by VBScript regular expressions.

in VBScript regular expressions there is only one definition of 'word'.
parentheses denotes a 'subexpression'.
.f



fernando cinquegrani

Most popular word
 
"Harlan Grove" ha scritto nel messaggio ...

| Change the second argument to use a word separator other than space.

oops. sorry!
now i understand.
.f



Harlan Grove

Most popular word
 
"fernando cinquegrani" wrote...
"Harlan Grove" ha scritto . . .

....
| Change the third and fourth arguments to use different definitions of
|'word' as specified by VBScript regular expressions.

in VBScript regular expressions there is only one definition of 'word'.
parentheses denotes a 'subexpression'.


I don't mean a word as VBScript would define the term (\b\w+\b), I mean a
word as the user wants to define it. My udf defaults to \b\w{3,}\b which is
clost to the VBScript definition, but if the user wants only letters, then
s/he could use mcw(somestring,,"\b[A-Za-z]{3,}\b").




All times are GMT +1. The time now is 11:32 AM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com