View Single Post
  #5   Report Post  
Posted to microsoft.public.excel.misc
J741 J741 is offline
external usenet poster
 
Posts: 11
Default How to get a numbered list of unique words in a column?

Hi Ron. Thanks for that. However I have a few more questions as follows:

1. How can I get it to include e-mail addresses which contain the symbols
'@' and '.'?

2. When I tested it with some data that included mixed case words, I got
invalid results. Specifically, three cells with "Seven seven SEVEN", "SeVeN
sEvEn", and ""Seven seven SEVEN"" returned a word count of only 2 for the
word 'seven'. Why would this be?

3. For the previously mentioned invalid results, when a word such as
' was stripped of non-letter characters, it also got counted as 0
instances of the word.

4. I am trying to understand your code, and the statements 'Set re =
CreateObject("vbscript.regexp")' and 'set mc = re.Execute' are confusing to
me. Can you explain them please (or point me to another resource that
explains them)?

Thanks.

- James.



"Ron Rosenfeld" wrote:

On Tue, 23 Jun 2009 13:56:01 -0700, J741
wrote:

Thanks Ron. That worked.


Glad to hear it. Thanks for the feedback.


Now, how can I refine this to ignore words that are smaller than 3 letters
in length? Words like 'and', 'the', not', etc.


I would do that work in the StripWord function. That's where we clean up and
can also easily test words. If a null string is returned to the calling
routine, it already ignores it.

So, for example, to eliminate words that are 3 or fewer characters in length:

======================
Private Function StripWord(s As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
'allow only letters, digits, slashes and hyphens
re.Pattern = "[^-/A-Za-z0-9]"
StripWord = re.Replace(s, "")
' eliminate words with length of three or less
If Len(StripWord) <= 3 Then StripWord = ""
Set re = Nothing
End Function
=======================

Other modifications as to unacceptable words, would be simple to do here, also.
--ron