ExcelBanter - View Single Post - Genarating count of unique words in a cell or cells

Hari Prasadh

Hi Jeff,

What would you recommend to exclude some common words such as "a", "the",
"etc" or how would you build
such a list for it to bounce against?

I am also facing a similar problem. Since, this wasnt related to excel I
didnt broach this topic before. As far as your question of excluding common
words is concerned, that is easy. Because Tim W has provided -- arrReplace =
Array(vbTab, ":", ";", ".", ",", """", Chr(10), Chr(13)) --so if we want to
remove articles, nouns, pronouns etc one could just add to the above list.
But for me the problem is from where to get an authoritative list of nouns,
pronouns, etc. (in soft copy format) which I could just add to the
arrReplace list. I searched Google (but not too hard) and couldnt get one.

To add another dimension to it, (though Im not sure whether it would
affect/matter in your case) if a word"beautiful" and "beauty" appears in the
target array, then for me both are one and the same. So, how to instruct the
algorithm that consider various parts of the speech as the same. I do have a
solution in the sense that in a sheet (or in a Access database - though I
dont know access) have all the words (with their parts of speech) and give
words with similar parts of speech as same code. But again just like the
previous case I would have to get a Authoritative list of all words in the
english language.(words in common usage - not the esoteric or a filed
specific word). Is such a list available over web?
--
Thanks a lot,
Hari
India

"Jeff Saathoff" wrote in message
...
Just wanted to let you know that this valuable post has helped out myself
as
well. We're experimenting with it right now. What would you recommend to
exclude some common words such as "a", "the", "etc" or how would you build
such a list for it to bounce against?

"Tim Williams" wrote:

Hari,

Modify the Find() line:

Set rngWord = rngSrch.Find(What:=tmp, MatchCase:=False,
lookat:=xlWhole)

xlWhole will match the complete cell contents and not just a
substring.

Tim.

"Hari" < wrote in message
...
Hi Tim,

Thanx a ton for posting the codes. Just to tell you of why I needed
it, I
analyse Market Research data and I needed count of unique words to
analyse
open ended responses.
For example I am studying/tracking the usage of Software Development
tools.
I ran your code on the following 8 responses (8 rows of data).

hot dog pro
As 400 RPG
adobe photo workshop
microfocus emulators
html
ibm web sphere
vx works
powerhouse

The results Im getting is :-

hot 1
dog 1
pro 1
As 1
400 1
RPG 1
adobe 1
photo 1
workshop 2
microfocus 1
emulators 1
html 1
ibm 1
web 1
sphere 1
vx 1
powerhouse 1

Whats happening is that the SUB is treating "works" which is in the
7th row
same as "workshop" which is in the 3rd row. Consequently the count
of
"workshop" is being shown as 2 while "works" doesnt appear in the
result.
Please tell me whether it would be possible to modify the code in
order to
get the count for "workshop" as 1 and count of "works" as 1.