Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 11
Default How to get a numbered list of unique words in a column?

Hi.

I have a spreadsheet with almost 5000 rows with about 15 columns of data, 2
columns of which contain sentences, phrases, or paragraphs of comments
entered by users.

What I need to have as an end result is a list of all unique words found in
those two columns, along with the number of occurrences of each word. I will
then use this to further analyze the data in the spreadsheet.

Right now, I do this manually and it takes a VERY long time for me to do so
(over 200 man-hours). So, I really need to automate this somehow.

The problem is, I have no idea how to begin, because the words will not be
by themselves in their own cell (so I cant use the auto filter functions,
pivot tables, or anything else I can think of), but will be part of groups of
words or symbols within multiple cells.

I think this will need to be done programmatically with code, scripts,
macros, or some other method with which I am not familiar, but I am just not
sure.

Can anyone help me with this? Can anyone point me in the right direction?
Is there anything already built in to excel to do this?

- James.
  #2   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 1,071
Default How to get a numbered list of unique words in a column?

James
What version of Excel are you using? Also, do you have any idea of the
maximum number of words in these cells? Like, no more than 30. Or no more
than 300. Or no more than 3000. What columns are these 2 columns, and what
row has the first entry in these 2 columns. HTH Otto
"J741" wrote in message
...
Hi.

I have a spreadsheet with almost 5000 rows with about 15 columns of data,
2
columns of which contain sentences, phrases, or paragraphs of comments
entered by users.

What I need to have as an end result is a list of all unique words found
in
those two columns, along with the number of occurrences of each word. I
will
then use this to further analyze the data in the spreadsheet.

Right now, I do this manually and it takes a VERY long time for me to do
so
(over 200 man-hours). So, I really need to automate this somehow.

The problem is, I have no idea how to begin, because the words will not be
by themselves in their own cell (so I cant use the auto filter functions,
pivot tables, or anything else I can think of), but will be part of groups
of
words or symbols within multiple cells.

I think this will need to be done programmatically with code, scripts,
macros, or some other method with which I am not familiar, but I am just
not
sure.

Can anyone help me with this? Can anyone point me in the right direction?
Is there anything already built in to excel to do this?

- James.



  #3   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 11
Default How to get a numbered list of unique words in a column?

Hi Otto.

1. The cells have a _lot_ of words, but I don't think it's any more than a
few hundred per cell.

2. The range of cells containing the words is C2:D5000

3. I am using Excel 2003 SP3


- James.



"Otto Moehrbach" wrote:

James
What version of Excel are you using? Also, do you have any idea of the
maximum number of words in these cells? Like, no more than 30. Or no more
than 300. Or no more than 3000. What columns are these 2 columns, and what
row has the first entry in these 2 columns. HTH Otto
"J741" wrote in message
...
Hi.

I have a spreadsheet with almost 5000 rows with about 15 columns of data,
2
columns of which contain sentences, phrases, or paragraphs of comments
entered by users.

What I need to have as an end result is a list of all unique words found
in
those two columns, along with the number of occurrences of each word. I
will
then use this to further analyze the data in the spreadsheet.

Right now, I do this manually and it takes a VERY long time for me to do
so
(over 200 man-hours). So, I really need to automate this somehow.

The problem is, I have no idea how to begin, because the words will not be
by themselves in their own cell (so I cant use the auto filter functions,
pivot tables, or anything else I can think of), but will be part of groups
of
words or symbols within multiple cells.

I think this will need to be done programmatically with code, scripts,
macros, or some other method with which I am not familiar, but I am just
not
sure.

Can anyone help me with this? Can anyone point me in the right direction?
Is there anything already built in to excel to do this?

- James.




  #4   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 11
Default How to get a numbered list of unique words in a column?

Anyone know how to do this?


"J741" wrote:

Hi.

I have a spreadsheet with almost 5000 rows with about 15 columns of data, 2
columns of which contain sentences, phrases, or paragraphs of comments
entered by users.

What I need to have as an end result is a list of all unique words found in
those two columns, along with the number of occurrences of each word. I will
then use this to further analyze the data in the spreadsheet.

Right now, I do this manually and it takes a VERY long time for me to do so
(over 200 man-hours). So, I really need to automate this somehow.

The problem is, I have no idea how to begin, because the words will not be
by themselves in their own cell (so I cant use the auto filter functions,
pivot tables, or anything else I can think of), but will be part of groups of
words or symbols within multiple cells.

I think this will need to be done programmatically with code, scripts,
macros, or some other method with which I am not familiar, but I am just not
sure.

Can anyone help me with this? Can anyone point me in the right direction?
Is there anything already built in to excel to do this?

- James.

  #5   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 1,071
Default How to get a numbered list of unique words in a column?

I'm working on it but had to stop for a couple of days because of company.
Will get on it Sunday/Monday. Otto
"J741" wrote in message
...
Anyone know how to do this?


"J741" wrote:

Hi.

I have a spreadsheet with almost 5000 rows with about 15 columns of data,
2
columns of which contain sentences, phrases, or paragraphs of comments
entered by users.

What I need to have as an end result is a list of all unique words found
in
those two columns, along with the number of occurrences of each word. I
will
then use this to further analyze the data in the spreadsheet.

Right now, I do this manually and it takes a VERY long time for me to do
so
(over 200 man-hours). So, I really need to automate this somehow.

The problem is, I have no idea how to begin, because the words will not
be
by themselves in their own cell (so I cant use the auto filter functions,
pivot tables, or anything else I can think of), but will be part of
groups of
words or symbols within multiple cells.

I think this will need to be done programmatically with code, scripts,
macros, or some other method with which I am not familiar, but I am just
not
sure.

Can anyone help me with this? Can anyone point me in the right
direction?
Is there anything already built in to excel to do this?

- James.





  #6   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 1,071
Default How to get a numbered list of unique words in a column?

James

Here's some code that does what you want, I think. I wrote it
in several macros for ease of development. I assumed that the sheet that
you start with is named "Main". Change that in the code as needed. I also
assumed that you have a blank sheet named "Utility". The code first clears
the Utility sheet, then builds everything in that sheet. The end product,
in the "Utility" sheet is one or more columns consisting of all the words,
including all repeat words. Then there is one column with a header of
"Unique Words" and that's what is in that column. The last column has a
number for each unique word and that number is the number of times that that
word appears in all the words. I ran the code and it works for me. The
macro that you want to run is "GetList". All the other macros run from this
one macro. Place all you see below in a regular module

If you need more help in running this, send me an email and I'll
send you the file in which I developed this code. My email address is
. Remove the "extra" from this address. HTH
Otto

Option Explicit
Dim rColA As Range
Dim rColB As Range
Dim rTheRng As Range
Dim i As Range
Dim Dest As Range
Dim Str As String



Sub GetList()
Set rColA = Range("A2", Range("A" & Rows.Count).End(xlUp))
Set rColB = Range("B2", Range("B" & Rows.Count).End(xlUp))
Set rTheRng = Union(rColA, rColB)
Call GetAllWords
Call GetUniqueWords
End Sub



Sub GetAllWords()
Dim TheArray() As String
Dim c As Long
Sheets("Main").Select
With Sheets("Utility")
.Cells.Clear
Set Dest = .Range("A1")
For Each i In rTheRng
If IsEmpty(i.Value) Then GoTo NextCell
Call CleanEntry
TheArray = Split(Str, " ")
Dest.Resize(1 + UBound(TheArray)) =
WorksheetFunction.Transpose(TheArray)
Set Dest = .Cells(Rows.Count, Dest.Column).End(xlUp).Offset(1)
If Dest.Row 60000 Then _
Set Dest = .Cells(1, Dest.Column + 1)
NextCell:
Next i
End With
End Sub



Sub CleanEntry()
Str = i.Value
Str = Application.Trim(Str)
Str = Replace(Str, ".", "")
Str = Replace(Str, ",", "")
Str = Replace(Str, "?", "")
Str = Replace(Str, "!", "")
End Sub

Sub GetUniqueWords()
Dim LastColumn As Long
Dim UW As Range
Sheets("Utility").Select
Set rTheRng = ActiveSheet.UsedRange
LastColumn = rTheRng(rTheRng.Count).Column + 1
Rows("1:1").Insert Shift:=xlDown
Range("A1") = "All The Words"
Cells(1, LastColumn) = "Unique Words"
Cells(1, LastColumn + 1) = "Qty"
Set Dest = Cells(2, LastColumn)
Set UW = Dest
For Each i In rTheRng
If UW.Find(What:=i.Value, LookAt:=xlWhole) Is Nothing Then
Dest = i.Value
Dest.Offset(, 1) = Application.CountIf(rTheRng, i.Value)
Set Dest = Dest.Offset(1)
Set UW = Range(Cells(2, Dest.Column), Cells(Rows.Count,
Dest.Column).End(xlUp).Offset(1))
End If
Next i
End Sub


"J741" wrote in message
...
Hi Otto.

1. The cells have a _lot_ of words, but I don't think it's any more than a
few hundred per cell.

2. The range of cells containing the words is C2:D5000

3. I am using Excel 2003 SP3


- James.



"Otto Moehrbach" wrote:

James
What version of Excel are you using? Also, do you have any idea of
the
maximum number of words in these cells? Like, no more than 30. Or no
more
than 300. Or no more than 3000. What columns are these 2 columns, and
what
row has the first entry in these 2 columns. HTH Otto
"J741" wrote in message
...
Hi.

I have a spreadsheet with almost 5000 rows with about 15 columns of
data,
2
columns of which contain sentences, phrases, or paragraphs of comments
entered by users.

What I need to have as an end result is a list of all unique words
found
in
those two columns, along with the number of occurrences of each word.
I
will
then use this to further analyze the data in the spreadsheet.

Right now, I do this manually and it takes a VERY long time for me to
do
so
(over 200 man-hours). So, I really need to automate this somehow.

The problem is, I have no idea how to begin, because the words will not
be
by themselves in their own cell (so I cant use the auto filter
functions,
pivot tables, or anything else I can think of), but will be part of
groups
of
words or symbols within multiple cells.

I think this will need to be done programmatically with code, scripts,
macros, or some other method with which I am not familiar, but I am
just
not
sure.

Can anyone help me with this? Can anyone point me in the right
direction?
Is there anything already built in to excel to do this?

- James.






  #7   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 1,071
Default How to get a numbered list of unique words in a column?

James
I forgot to mention that I also assumed that the 2 columns that have the
long cells in question are columns A & B. Change that in the code to match
your actual data layout. Otto
"Otto Moehrbach" wrote in message
...
James

Here's some code that does what you want, I think. I wrote it
in several macros for ease of development. I assumed that the sheet that
you start with is named "Main". Change that in the code as needed. I
also assumed that you have a blank sheet named "Utility". The code first
clears the Utility sheet, then builds everything in that sheet. The end
product, in the "Utility" sheet is one or more columns consisting of all
the words, including all repeat words. Then there is one column with a
header of "Unique Words" and that's what is in that column. The last
column has a number for each unique word and that number is the number of
times that that word appears in all the words. I ran the code and it
works for me. The macro that you want to run is "GetList". All the other
macros run from this one macro. Place all you see below in a regular
module

If you need more help in running this, send me an email and
I'll send you the file in which I developed this code. My email address
is . Remove the "extra" from this address.
HTH Otto

Option Explicit
Dim rColA As Range
Dim rColB As Range
Dim rTheRng As Range
Dim i As Range
Dim Dest As Range
Dim Str As String



Sub GetList()
Set rColA = Range("A2", Range("A" & Rows.Count).End(xlUp))
Set rColB = Range("B2", Range("B" & Rows.Count).End(xlUp))
Set rTheRng = Union(rColA, rColB)
Call GetAllWords
Call GetUniqueWords
End Sub



Sub GetAllWords()
Dim TheArray() As String
Dim c As Long
Sheets("Main").Select
With Sheets("Utility")
.Cells.Clear
Set Dest = .Range("A1")
For Each i In rTheRng
If IsEmpty(i.Value) Then GoTo NextCell
Call CleanEntry
TheArray = Split(Str, " ")
Dest.Resize(1 + UBound(TheArray)) =
WorksheetFunction.Transpose(TheArray)
Set Dest = .Cells(Rows.Count, Dest.Column).End(xlUp).Offset(1)
If Dest.Row 60000 Then _
Set Dest = .Cells(1, Dest.Column + 1)
NextCell:
Next i
End With
End Sub



Sub CleanEntry()
Str = i.Value
Str = Application.Trim(Str)
Str = Replace(Str, ".", "")
Str = Replace(Str, ",", "")
Str = Replace(Str, "?", "")
Str = Replace(Str, "!", "")
End Sub

Sub GetUniqueWords()
Dim LastColumn As Long
Dim UW As Range
Sheets("Utility").Select
Set rTheRng = ActiveSheet.UsedRange
LastColumn = rTheRng(rTheRng.Count).Column + 1
Rows("1:1").Insert Shift:=xlDown
Range("A1") = "All The Words"
Cells(1, LastColumn) = "Unique Words"
Cells(1, LastColumn + 1) = "Qty"
Set Dest = Cells(2, LastColumn)
Set UW = Dest
For Each i In rTheRng
If UW.Find(What:=i.Value, LookAt:=xlWhole) Is Nothing Then
Dest = i.Value
Dest.Offset(, 1) = Application.CountIf(rTheRng, i.Value)
Set Dest = Dest.Offset(1)
Set UW = Range(Cells(2, Dest.Column), Cells(Rows.Count,
Dest.Column).End(xlUp).Offset(1))
End If
Next i
End Sub


"J741" wrote in message
...
Hi Otto.

1. The cells have a _lot_ of words, but I don't think it's any more than
a
few hundred per cell.

2. The range of cells containing the words is C2:D5000

3. I am using Excel 2003 SP3


- James.



"Otto Moehrbach" wrote:

James
What version of Excel are you using? Also, do you have any idea of
the
maximum number of words in these cells? Like, no more than 30. Or no
more
than 300. Or no more than 3000. What columns are these 2 columns, and
what
row has the first entry in these 2 columns. HTH Otto
"J741" wrote in message
...
Hi.

I have a spreadsheet with almost 5000 rows with about 15 columns of
data,
2
columns of which contain sentences, phrases, or paragraphs of comments
entered by users.

What I need to have as an end result is a list of all unique words
found
in
those two columns, along with the number of occurrences of each word.
I
will
then use this to further analyze the data in the spreadsheet.

Right now, I do this manually and it takes a VERY long time for me to
do
so
(over 200 man-hours). So, I really need to automate this somehow.

The problem is, I have no idea how to begin, because the words will
not be
by themselves in their own cell (so I cant use the auto filter
functions,
pivot tables, or anything else I can think of), but will be part of
groups
of
words or symbols within multiple cells.

I think this will need to be done programmatically with code, scripts,
macros, or some other method with which I am not familiar, but I am
just
not
sure.

Can anyone help me with this? Can anyone point me in the right
direction?
Is there anything already built in to excel to do this?

- James.







  #8   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 5,651
Default How to get a numbered list of unique words in a column?

On Thu, 18 Jun 2009 08:36:02 -0700, J741
wrote:

Hi.

I have a spreadsheet with almost 5000 rows with about 15 columns of data, 2
columns of which contain sentences, phrases, or paragraphs of comments
entered by users.

What I need to have as an end result is a list of all unique words found in
those two columns, along with the number of occurrences of each word. I will
then use this to further analyze the data in the spreadsheet.

Right now, I do this manually and it takes a VERY long time for me to do so
(over 200 man-hours). So, I really need to automate this somehow.

The problem is, I have no idea how to begin, because the words will not be
by themselves in their own cell (so I cant use the auto filter functions,
pivot tables, or anything else I can think of), but will be part of groups of
words or symbols within multiple cells.

I think this will need to be done programmatically with code, scripts,
macros, or some other method with which I am not familiar, but I am just not
sure.

Can anyone help me with this? Can anyone point me in the right direction?
Is there anything already built in to excel to do this?

- James.


Here's a start.

You'll need to properly set the range to process (rSrc in the code) and the
range where you want the results (rDest in the code) to match your sheets.

For example, you might set rsrc = range("A1:B5000") to encompass 5000 lines in
two columns.

And, of course, you'll need to set rDest = some cell that is outside your data
range.

I assumed you wanted the results sorted by frequency of the word, with the most
common word being first; but this can be easily changed.

Note also that I formatted the entire first column of rDest as TEXT. Without
this, any numeric strings in the data would be changed to numbers (if we left
the format as General). So long strings might get truncated, or displayed in
scientific notation; and leading zeros would be stripped off.

Also, for this initial example, words are defined as strings containing only
letters, digits, slash or a hyphen. This is done in order to remove
punctuation. But it will also remove other substrings that might include other
characters. If this will be an issue, changes can be easily made.

To enter this Macro (Sub), <alt-F11 opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.

To use this Macro (Sub), <alt-F8 opens the macro dialog box. Select the macro
by name, and <RUN.

========================================
Option Explicit
Sub UniqueWordList()
Dim rSrc As Range, rDest As Range, c As Range
Dim cWordList As Collection
Dim res() As Variant
Dim w() As String
Dim i As Long

Set cWordList = New Collection
Set rSrc = Range("A1:B22")
Set rDest = Range("M1")
rDest.EntireColumn.NumberFormat = "@"
For Each c In rSrc
w = Split(c.Value)
For i = 0 To UBound(w)
w(i) = StripWord(w(i))
If Not w(i) = "" Then
On Error Resume Next
cWordList.Add Item:=w(i), Key:=w(i)
On Error GoTo 0
End If
Next i
Next c

'transfer words to results array
ReDim res(1 To cWordList.Count, 0 To 1)
For i = 1 To cWordList.Count
res(i, 0) = cWordList(i)
Next i

'get counts
For i = LBound(res) To UBound(res)
For Each c In rSrc
res(i, 1) = res(i, 1) + CountWord(c.Value, res(i, 0))
Next c
Next i

'sort alpha: d=0; sort numeric d=1
'there are various ways of sorting
BubbleSort res, 1
For i = LBound(res) To UBound(res)
rDest.Offset(i, 0).Value = res(i, 0)
rDest.Offset(i, 1).Value = res(i, 1)
Next i
End Sub
Private Function StripWord(s As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
'allow only letters, digits, slashes and hyphens
re.Pattern = "[^-/A-Za-z0-9]"
StripWord = re.Replace(s, "")
Set re = Nothing
End Function
Private Function CountWord(ByVal s As String, sPat) As Long
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.Pattern = "\b" & sPat & "\b"

Set mc = re.Execute(s)
CountWord = mc.Count
End Function
Private Sub BubbleSort(TempArray As Variant, d As Long) 'd is 0 based dimension
Dim temp(0, 1) As Variant
Dim i As Integer
Dim NoExchanges As Integer

' Loop until no more "exchanges" are made.
Do
NoExchanges = True

' Loop through each element in the array.
For i = LBound(TempArray) To UBound(TempArray) - 1

' If the element is less than the element
' following it, exchange the two elements.
' change "<" to "" to sort ascending
If TempArray(i, d) < TempArray(i + 1, d) Then
NoExchanges = False
temp(0, 0) = TempArray(i, 0)
temp(0, 1) = TempArray(i, 1)
TempArray(i, 0) = TempArray(i + 1, 0)
TempArray(i, 1) = TempArray(i + 1, 1)
TempArray(i + 1, 0) = temp(0, 0)
TempArray(i + 1, 1) = temp(0, 1)

End If
Next i
Loop While Not (NoExchanges)
End Sub
======================================
--ron
  #9   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 11
Default How to get a numbered list of unique words in a column?

WOW. Thanks Otto.

However, that certainly does not look like instructions for using something
that is already in Excel. It looks like program code. So, how do I use it
in Excel? I'm not sure what you mean by "Place all you see below in a
regular module".

- James.


"Otto Moehrbach" wrote:

James

Here's some code that does what you want, I think. I wrote it
in several macros for ease of development. I assumed that the sheet that
you start with is named "Main". Change that in the code as needed. I also
assumed that you have a blank sheet named "Utility". The code first clears
the Utility sheet, then builds everything in that sheet. The end product,
in the "Utility" sheet is one or more columns consisting of all the words,
including all repeat words. Then there is one column with a header of
"Unique Words" and that's what is in that column. The last column has a
number for each unique word and that number is the number of times that that
word appears in all the words. I ran the code and it works for me. The
macro that you want to run is "GetList". All the other macros run from this
one macro. Place all you see below in a regular module

If you need more help in running this, send me an email and I'll
send you the file in which I developed this code. My email address is
. Remove the "extra" from this address. HTH
Otto

Option Explicit
Dim rColA As Range
Dim rColB As Range
Dim rTheRng As Range
Dim i As Range
Dim Dest As Range
Dim Str As String



Sub GetList()
Set rColA = Range("A2", Range("A" & Rows.Count).End(xlUp))
Set rColB = Range("B2", Range("B" & Rows.Count).End(xlUp))
Set rTheRng = Union(rColA, rColB)
Call GetAllWords
Call GetUniqueWords
End Sub



Sub GetAllWords()
Dim TheArray() As String
Dim c As Long
Sheets("Main").Select
With Sheets("Utility")
.Cells.Clear
Set Dest = .Range("A1")
For Each i In rTheRng
If IsEmpty(i.Value) Then GoTo NextCell
Call CleanEntry
TheArray = Split(Str, " ")
Dest.Resize(1 + UBound(TheArray)) =
WorksheetFunction.Transpose(TheArray)
Set Dest = .Cells(Rows.Count, Dest.Column).End(xlUp).Offset(1)
If Dest.Row 60000 Then _
Set Dest = .Cells(1, Dest.Column + 1)
NextCell:
Next i
End With
End Sub



Sub CleanEntry()
Str = i.Value
Str = Application.Trim(Str)
Str = Replace(Str, ".", "")
Str = Replace(Str, ",", "")
Str = Replace(Str, "?", "")
Str = Replace(Str, "!", "")
End Sub

Sub GetUniqueWords()
Dim LastColumn As Long
Dim UW As Range
Sheets("Utility").Select
Set rTheRng = ActiveSheet.UsedRange
LastColumn = rTheRng(rTheRng.Count).Column + 1
Rows("1:1").Insert Shift:=xlDown
Range("A1") = "All The Words"
Cells(1, LastColumn) = "Unique Words"
Cells(1, LastColumn + 1) = "Qty"
Set Dest = Cells(2, LastColumn)
Set UW = Dest
For Each i In rTheRng
If UW.Find(What:=i.Value, LookAt:=xlWhole) Is Nothing Then
Dest = i.Value
Dest.Offset(, 1) = Application.CountIf(rTheRng, i.Value)
Set Dest = Dest.Offset(1)
Set UW = Range(Cells(2, Dest.Column), Cells(Rows.Count,
Dest.Column).End(xlUp).Offset(1))
End If
Next i
End Sub


"J741" wrote in message
...
Hi Otto.

1. The cells have a _lot_ of words, but I don't think it's any more than a
few hundred per cell.

2. The range of cells containing the words is C2:D5000

3. I am using Excel 2003 SP3


- James.



"Otto Moehrbach" wrote:

James
What version of Excel are you using? Also, do you have any idea of
the
maximum number of words in these cells? Like, no more than 30. Or no
more
than 300. Or no more than 3000. What columns are these 2 columns, and
what
row has the first entry in these 2 columns. HTH Otto
"J741" wrote in message
...
Hi.

I have a spreadsheet with almost 5000 rows with about 15 columns of
data,
2
columns of which contain sentences, phrases, or paragraphs of comments
entered by users.

What I need to have as an end result is a list of all unique words
found
in
those two columns, along with the number of occurrences of each word.
I
will
then use this to further analyze the data in the spreadsheet.

Right now, I do this manually and it takes a VERY long time for me to
do
so
(over 200 man-hours). So, I really need to automate this somehow.

The problem is, I have no idea how to begin, because the words will not
be
by themselves in their own cell (so I cant use the auto filter
functions,
pivot tables, or anything else I can think of), but will be part of
groups
of
words or symbols within multiple cells.

I think this will need to be done programmatically with code, scripts,
macros, or some other method with which I am not familiar, but I am
just
not
sure.

Can anyone help me with this? Can anyone point me in the right
direction?
Is there anything already built in to excel to do this?

- James.






  #10   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 5,651
Default How to get a numbered list of unique words in a column?

On Mon, 22 Jun 2009 16:41:46 -0400, Ron Rosenfeld
wrote:

On Thu, 18 Jun 2009 08:36:02 -0700, J741
wrote:

Hi.

I have a spreadsheet with almost 5000 rows with about 15 columns of data, 2
columns of which contain sentences, phrases, or paragraphs of comments
entered by users.


Small change. The "count" function needs to be made case insensitive. So
change the code to:

=========================
Option Explicit
Sub UniqueWordList()
Dim rSrc As Range, rDest As Range, c As Range
Dim cWordList As Collection
Dim res() As Variant
Dim w() As String
Dim i As Long

Set cWordList = New Collection
Set rSrc = Range("A1:B22")
Set rDest = Range("M1")
rDest.EntireColumn.NumberFormat = "@"
For Each c In rSrc
w = Split(c.Value)
For i = 0 To UBound(w)
w(i) = StripWord(w(i))
If Not w(i) = "" Then
On Error Resume Next
cWordList.Add Item:=w(i), Key:=w(i)
On Error GoTo 0
End If
Next i
Next c

'transfer words to results array
ReDim res(1 To cWordList.Count, 0 To 1)
For i = 1 To cWordList.Count
res(i, 0) = cWordList(i)
Next i

'get counts
For i = LBound(res) To UBound(res)
For Each c In rSrc
res(i, 1) = res(i, 1) + CountWord(c.Value, res(i, 0))
Next c
Next i

'sort alpha: d=0; sort numeric d=1
'there are various ways of sorting
BubbleSort res, 1
For i = LBound(res) To UBound(res)
rDest.Offset(i, 0).Value = res(i, 0)
rDest.Offset(i, 1).Value = res(i, 1)
Next i

End Sub
Private Function StripWord(s As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
'allow only letters, digits, slashes and hyphens
re.Pattern = "[^-/A-Za-z0-9]"
StripWord = re.Replace(s, "")
Set re = Nothing
End Function
Private Function CountWord(ByVal s As String, sPat) As Long
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.ignorecase = True
re.Pattern = "\b" & sPat & "\b"

Set mc = re.Execute(s)
CountWord = mc.Count
End Function
Private Sub BubbleSort(TempArray As Variant, d As Long) 'd is 0 based dimension
Dim temp(0, 1) As Variant
Dim i As Integer
Dim NoExchanges As Integer

' Loop until no more "exchanges" are made.
Do
NoExchanges = True

' Loop through each element in the array.
For i = LBound(TempArray) To UBound(TempArray) - 1

' If the element is less than the element
' following it, exchange the two elements.
' change "<" to "" to sort ascending
If TempArray(i, d) < TempArray(i + 1, d) Then
NoExchanges = False
temp(0, 0) = TempArray(i, 0)
temp(0, 1) = TempArray(i, 1)
TempArray(i, 0) = TempArray(i + 1, 0)
TempArray(i, 1) = TempArray(i + 1, 1)
TempArray(i + 1, 0) = temp(0, 0)
TempArray(i + 1, 1) = temp(0, 1)

End If
Next i
Loop While Not (NoExchanges)
End Sub
=================================
--ron


  #11   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 11
Default How to get a numbered list of unique words in a column?

Thanks Ron. That worked.

Now, how can I refine this to ignore words that are smaller than 3 letters
in length? Words like 'and', 'the', not', etc.

"Ron Rosenfeld" wrote:

On Mon, 22 Jun 2009 16:41:46 -0400, Ron Rosenfeld
wrote:

On Thu, 18 Jun 2009 08:36:02 -0700, J741
wrote:

Hi.

I have a spreadsheet with almost 5000 rows with about 15 columns of data, 2
columns of which contain sentences, phrases, or paragraphs of comments
entered by users.


Small change. The "count" function needs to be made case insensitive. So
change the code to:

=========================
Option Explicit
Sub UniqueWordList()
Dim rSrc As Range, rDest As Range, c As Range
Dim cWordList As Collection
Dim res() As Variant
Dim w() As String
Dim i As Long

Set cWordList = New Collection
Set rSrc = Range("A1:B22")
Set rDest = Range("M1")
rDest.EntireColumn.NumberFormat = "@"
For Each c In rSrc
w = Split(c.Value)
For i = 0 To UBound(w)
w(i) = StripWord(w(i))
If Not w(i) = "" Then
On Error Resume Next
cWordList.Add Item:=w(i), Key:=w(i)
On Error GoTo 0
End If
Next i
Next c

'transfer words to results array
ReDim res(1 To cWordList.Count, 0 To 1)
For i = 1 To cWordList.Count
res(i, 0) = cWordList(i)
Next i

'get counts
For i = LBound(res) To UBound(res)
For Each c In rSrc
res(i, 1) = res(i, 1) + CountWord(c.Value, res(i, 0))
Next c
Next i

'sort alpha: d=0; sort numeric d=1
'there are various ways of sorting
BubbleSort res, 1
For i = LBound(res) To UBound(res)
rDest.Offset(i, 0).Value = res(i, 0)
rDest.Offset(i, 1).Value = res(i, 1)
Next i

End Sub
Private Function StripWord(s As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
'allow only letters, digits, slashes and hyphens
re.Pattern = "[^-/A-Za-z0-9]"
StripWord = re.Replace(s, "")
Set re = Nothing
End Function
Private Function CountWord(ByVal s As String, sPat) As Long
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.ignorecase = True
re.Pattern = "\b" & sPat & "\b"

Set mc = re.Execute(s)
CountWord = mc.Count
End Function
Private Sub BubbleSort(TempArray As Variant, d As Long) 'd is 0 based dimension
Dim temp(0, 1) As Variant
Dim i As Integer
Dim NoExchanges As Integer

' Loop until no more "exchanges" are made.
Do
NoExchanges = True

' Loop through each element in the array.
For i = LBound(TempArray) To UBound(TempArray) - 1

' If the element is less than the element
' following it, exchange the two elements.
' change "<" to "" to sort ascending
If TempArray(i, d) < TempArray(i + 1, d) Then
NoExchanges = False
temp(0, 0) = TempArray(i, 0)
temp(0, 1) = TempArray(i, 1)
TempArray(i, 0) = TempArray(i + 1, 0)
TempArray(i, 1) = TempArray(i + 1, 1)
TempArray(i + 1, 0) = temp(0, 0)
TempArray(i + 1, 1) = temp(0, 1)

End If
Next i
Loop While Not (NoExchanges)
End Sub
=================================
--ron

  #12   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 11
Default How to get a numbered list of unique words in a column?

Thank you Otto. I figured out how to use this code, and it works just fine.
Thank you for your effort.


"Otto Moehrbach" wrote:

James
I forgot to mention that I also assumed that the 2 columns that have the
long cells in question are columns A & B. Change that in the code to match
your actual data layout. Otto
"Otto Moehrbach" wrote in message
...
James

Here's some code that does what you want, I think. I wrote it
in several macros for ease of development. I assumed that the sheet that
you start with is named "Main". Change that in the code as needed. I
also assumed that you have a blank sheet named "Utility". The code first
clears the Utility sheet, then builds everything in that sheet. The end
product, in the "Utility" sheet is one or more columns consisting of all
the words, including all repeat words. Then there is one column with a
header of "Unique Words" and that's what is in that column. The last
column has a number for each unique word and that number is the number of
times that that word appears in all the words. I ran the code and it
works for me. The macro that you want to run is "GetList". All the other
macros run from this one macro. Place all you see below in a regular
module

If you need more help in running this, send me an email and
I'll send you the file in which I developed this code. My email address
is . Remove the "extra" from this address.
HTH Otto

Option Explicit
Dim rColA As Range
Dim rColB As Range
Dim rTheRng As Range
Dim i As Range
Dim Dest As Range
Dim Str As String



Sub GetList()
Set rColA = Range("A2", Range("A" & Rows.Count).End(xlUp))
Set rColB = Range("B2", Range("B" & Rows.Count).End(xlUp))
Set rTheRng = Union(rColA, rColB)
Call GetAllWords
Call GetUniqueWords
End Sub



Sub GetAllWords()
Dim TheArray() As String
Dim c As Long
Sheets("Main").Select
With Sheets("Utility")
.Cells.Clear
Set Dest = .Range("A1")
For Each i In rTheRng
If IsEmpty(i.Value) Then GoTo NextCell
Call CleanEntry
TheArray = Split(Str, " ")
Dest.Resize(1 + UBound(TheArray)) =
WorksheetFunction.Transpose(TheArray)
Set Dest = .Cells(Rows.Count, Dest.Column).End(xlUp).Offset(1)
If Dest.Row 60000 Then _
Set Dest = .Cells(1, Dest.Column + 1)
NextCell:
Next i
End With
End Sub



Sub CleanEntry()
Str = i.Value
Str = Application.Trim(Str)
Str = Replace(Str, ".", "")
Str = Replace(Str, ",", "")
Str = Replace(Str, "?", "")
Str = Replace(Str, "!", "")
End Sub

Sub GetUniqueWords()
Dim LastColumn As Long
Dim UW As Range
Sheets("Utility").Select
Set rTheRng = ActiveSheet.UsedRange
LastColumn = rTheRng(rTheRng.Count).Column + 1
Rows("1:1").Insert Shift:=xlDown
Range("A1") = "All The Words"
Cells(1, LastColumn) = "Unique Words"
Cells(1, LastColumn + 1) = "Qty"
Set Dest = Cells(2, LastColumn)
Set UW = Dest
For Each i In rTheRng
If UW.Find(What:=i.Value, LookAt:=xlWhole) Is Nothing Then
Dest = i.Value
Dest.Offset(, 1) = Application.CountIf(rTheRng, i.Value)
Set Dest = Dest.Offset(1)
Set UW = Range(Cells(2, Dest.Column), Cells(Rows.Count,
Dest.Column).End(xlUp).Offset(1))
End If
Next i
End Sub


"J741" wrote in message
...
Hi Otto.

1. The cells have a _lot_ of words, but I don't think it's any more than
a
few hundred per cell.

2. The range of cells containing the words is C2:D5000

3. I am using Excel 2003 SP3


- James.



"Otto Moehrbach" wrote:

James
What version of Excel are you using? Also, do you have any idea of
the
maximum number of words in these cells? Like, no more than 30. Or no
more
than 300. Or no more than 3000. What columns are these 2 columns, and
what
row has the first entry in these 2 columns. HTH Otto
"J741" wrote in message
...
Hi.

I have a spreadsheet with almost 5000 rows with about 15 columns of
data,
2
columns of which contain sentences, phrases, or paragraphs of comments
entered by users.

What I need to have as an end result is a list of all unique words
found
in
those two columns, along with the number of occurrences of each word.
I
will
then use this to further analyze the data in the spreadsheet.

Right now, I do this manually and it takes a VERY long time for me to
do
so
(over 200 man-hours). So, I really need to automate this somehow.

The problem is, I have no idea how to begin, because the words will
not be
by themselves in their own cell (so I cant use the auto filter
functions,
pivot tables, or anything else I can think of), but will be part of
groups
of
words or symbols within multiple cells.

I think this will need to be done programmatically with code, scripts,
macros, or some other method with which I am not familiar, but I am
just
not
sure.

Can anyone help me with this? Can anyone point me in the right
direction?
Is there anything already built in to excel to do this?

- James.








  #13   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 5,651
Default How to get a numbered list of unique words in a column?

On Tue, 23 Jun 2009 13:56:01 -0700, J741
wrote:

Thanks Ron. That worked.


Glad to hear it. Thanks for the feedback.


Now, how can I refine this to ignore words that are smaller than 3 letters
in length? Words like 'and', 'the', not', etc.


I would do that work in the StripWord function. That's where we clean up and
can also easily test words. If a null string is returned to the calling
routine, it already ignores it.

So, for example, to eliminate words that are 3 or fewer characters in length:

======================
Private Function StripWord(s As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
'allow only letters, digits, slashes and hyphens
re.Pattern = "[^-/A-Za-z0-9]"
StripWord = re.Replace(s, "")
' eliminate words with length of three or less
If Len(StripWord) <= 3 Then StripWord = ""
Set re = Nothing
End Function
=======================

Other modifications as to unacceptable words, would be simple to do here, also.
--ron
  #14   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 11
Default How to get a numbered list of unique words in a column?

Hi Ron. Thanks for that. However I have a few more questions as follows:

1. How can I get it to include e-mail addresses which contain the symbols
'@' and '.'?

2. When I tested it with some data that included mixed case words, I got
invalid results. Specifically, three cells with "Seven seven SEVEN", "SeVeN
sEvEn", and ""Seven seven SEVEN"" returned a word count of only 2 for the
word 'seven'. Why would this be?

3. For the previously mentioned invalid results, when a word such as
' was stripped of non-letter characters, it also got counted as 0
instances of the word.

4. I am trying to understand your code, and the statements 'Set re =
CreateObject("vbscript.regexp")' and 'set mc = re.Execute' are confusing to
me. Can you explain them please (or point me to another resource that
explains them)?

Thanks.

- James.



"Ron Rosenfeld" wrote:

On Tue, 23 Jun 2009 13:56:01 -0700, J741
wrote:

Thanks Ron. That worked.


Glad to hear it. Thanks for the feedback.


Now, how can I refine this to ignore words that are smaller than 3 letters
in length? Words like 'and', 'the', not', etc.


I would do that work in the StripWord function. That's where we clean up and
can also easily test words. If a null string is returned to the calling
routine, it already ignores it.

So, for example, to eliminate words that are 3 or fewer characters in length:

======================
Private Function StripWord(s As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
'allow only letters, digits, slashes and hyphens
re.Pattern = "[^-/A-Za-z0-9]"
StripWord = re.Replace(s, "")
' eliminate words with length of three or less
If Len(StripWord) <= 3 Then StripWord = ""
Set re = Nothing
End Function
=======================

Other modifications as to unacceptable words, would be simple to do here, also.
--ron

  #15   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 11
Default How to get a numbered list of unique words in a column?

O.K. I figured out the 'seven' issue. I resolved it by using an LCase
statement in the 'get counts section of the UniqueWordList() routine as
follows:

res(i, 1) = res(i, 1) + CountWord(LCase(c.Value), res(i, 0))

I also changed the StripWords functions to only return LCased text (to match
the counting).


There's still the other 2 questions though...


"J741" wrote:

Hi Ron. Thanks for that. However I have a few more questions as follows:

1. How can I get it to include e-mail addresses which contain the symbols
'@' and '.'?

2. When I tested it with some data that included mixed case words, I got
invalid results. Specifically, three cells with "Seven seven SEVEN", "SeVeN
sEvEn", and ""Seven seven SEVEN"" returned a word count of only 2 for the
word 'seven'. Why would this be?

3. For the previously mentioned invalid results, when a word such as
' was stripped of non-letter characters, it also got counted as 0
instances of the word.

4. I am trying to understand your code, and the statements 'Set re =
CreateObject("vbscript.regexp")' and 'set mc = re.Execute' are confusing to
me. Can you explain them please (or point me to another resource that
explains them)?

Thanks.

- James.



"Ron Rosenfeld" wrote:

On Tue, 23 Jun 2009 13:56:01 -0700, J741
wrote:

Thanks Ron. That worked.


Glad to hear it. Thanks for the feedback.


Now, how can I refine this to ignore words that are smaller than 3 letters
in length? Words like 'and', 'the', not', etc.


I would do that work in the StripWord function. That's where we clean up and
can also easily test words. If a null string is returned to the calling
routine, it already ignores it.

So, for example, to eliminate words that are 3 or fewer characters in length:

======================
Private Function StripWord(s As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
'allow only letters, digits, slashes and hyphens
re.Pattern = "[^-/A-Za-z0-9]"
StripWord = re.Replace(s, "")
' eliminate words with length of three or less
If Len(StripWord) <= 3 Then StripWord = ""
Set re = Nothing
End Function
=======================

Other modifications as to unacceptable words, would be simple to do here, also.
--ron



  #16   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 5,651
Default How to get a numbered list of unique words in a column?

On Mon, 29 Jun 2009 14:22:01 -0700, J741
wrote:

Hi Ron. Thanks for that. However I have a few more questions as follows:

1. How can I get it to include e-mail addresses which contain the symbols
'@' and '.'?


That depends. It is simple to just add the '@' and '.' to the list of valid
characters in stripword. But that would cause problems when those characters
are not part of a valid email address. At present, StripWord strips out what
you originally stated would be invalid characters. However, to include email
addresses, requires a different approach including validating words and email
addresses. Before I get into that, are there any other requirements that you
have yet to specify?

Below is the modified routine. The major modification is in StripWord which
has been modified to recognize many (but not all) valid email addresses; as
well as any words that contain at least 4 characters. The characters in these
words need to be letters, digits or hyphens. (The email address is made a
valid exception to that rule).

There are some cosmetic modifications in the major routine also. One cosmetic
modification I did not include was to have all the output be lower case. But I
have an annotation for where that can be done.


2. When I tested it with some data that included mixed case words, I got
invalid results. Specifically, three cells with "Seven seven SEVEN", "SeVeN
sEvEn", and ""Seven seven SEVEN"" returned a word count of only 2 for the
word 'seven'. Why would this be?


As written, the code SHOULD be case-insensitive. Are you using the appropriate
code? The first I presented had to be changed to make it case insensitive.

When I put those three phrases in A1:A3, and run the code, I get a result of:

Seven 8

Another reason for your problem is that rSrc is not properly specified so that
the routine is not checking the cells where you have those strings.


3. For the previously mentioned invalid results, when a word such as
' was stripped of non-letter characters, it also got counted as 0
instances of the word.


That is because there is no word = "meherecom". In my first response to you, I
wrote "words are defined as strings containing only letters, digits, slash or a
hyphen. This is done in order to remove punctuation. But it will also remove
other substrings that might include other characters. If this will be an issue,
changes can be easily made."

But this is the first you have indicated that your definition of "word" does
not match the one I stated I was using. So unexpected results are ...
"expected" :-)



4. I am trying to understand your code, and the statements 'Set re =
CreateObject("vbscript.regexp")' and 'set mc = re.Execute' are confusing to
me. Can you explain them please (or point me to another resource that
explains them)?


This is part of the VBScript Regular Expression Engine, which is similar to
that specified in the ECMA262 specification.

Here are some bookmarks (most of which should still be good):

Regular Expressions
http://www.regular-expressions.info/reference.html
http://support.microsoft.com/default...02&Product=vbb
http://msdn2.microsoft.com/en-us/library/6wzad2b2.aspx
http://msdn2.microsoft.com/en-us/library/ms974619.aspx
http://www.regex-guru.info/




Thanks.

- James.




================================================
Option Explicit
Sub UniqueWordList()
Dim rSrc As Range, rDest As Range, c As Range
Dim cWordList As Collection
Dim res() As Variant
Dim w() As String
Dim i As Long

Set cWordList = New Collection
Set rSrc = Range("A1:B50")
Set rDest = Range("M1")
rDest.EntireColumn.NumberFormat = "@"
For Each c In rSrc
w = Split(c.Value)
For i = 0 To UBound(w)
w(i) = StripWord(w(i))
If Not w(i) = "" Then
On Error Resume Next
cWordList.Add Item:=w(i), Key:=w(i)
On Error GoTo 0
End If
Next i
Next c

'transfer words to results array
ReDim res(1 To cWordList.Count, 0 To 1)
For i = 1 To cWordList.Count
res(i, 0) = cWordList(i)
Next i

'get counts
For i = LBound(res) To UBound(res)
For Each c In rSrc
res(i, 1) = res(i, 1) + CountWord(c.Value, res(i, 0))
Next c
Next i

'sort alpha: d=0; sort numeric d=1
'there are various ways of sorting
BubbleSort res, 1

rDest.CurrentRegion.Clear
For i = LBound(res) To UBound(res)
rDest.Offset(i, 0).NumberFormat = "@"
rDest.Offset(i, 0).Value = res(i, 0)
'For just lowercase output, use:
'rDest.Offset(i, 0).Value = LCase(res(i, 0))
rDest.Offset(i, 1).Value = res(i, 1)
Next i
End Sub

Private Function StripWord(s As String) As String
Dim re As Object, mc As Object, m As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
re.Pattern = "[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?" _
& "^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9]" _
& "(?:[a-z0-9-]*[a-z0-9])?|[-\w]{4,}"
If re.test(s) = True Then
Set mc = re.Execute(s)
StripWord = mc(0).Value
End If
Set re = Nothing
End Function

Private Function CountWord(ByVal s As String, sPat) As Long
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.IgnoreCase = True
re.Pattern = "\b" & sPat & "\b"

Set mc = re.Execute(s)
CountWord = mc.Count
End Function

Private Sub BubbleSort(TempArray As Variant, d As Long) 'd is 0 based dimension
Dim temp(0, 1) As Variant
Dim i As Integer
Dim NoExchanges As Integer

' Loop until no more "exchanges" are made.
Do
NoExchanges = True

' Loop through each element in the array.
For i = LBound(TempArray) To UBound(TempArray) - 1

' If the element is less than the element
' following it, exchange the two elements.
' change "<" to "" to sort ascending
If TempArray(i, d) < TempArray(i + 1, d) Then
NoExchanges = False
temp(0, 0) = TempArray(i, 0)
temp(0, 1) = TempArray(i, 1)
TempArray(i, 0) = TempArray(i + 1, 0)
TempArray(i, 1) = TempArray(i + 1, 1)
TempArray(i + 1, 0) = temp(0, 0)
TempArray(i + 1, 1) = temp(0, 1)

End If
Next i
Loop While Not (NoExchanges)
End Sub
==========================================
--ron
  #17   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 5,651
Default How to get a numbered list of unique words in a column?

On Mon, 29 Jun 2009 15:13:00 -0700, J741
wrote:

O.K. I figured out the 'seven' issue. I resolved it by using an LCase
statement in the 'get counts section of the UniqueWordList() routine as
follows:

res(i, 1) = res(i, 1) + CountWord(LCase(c.Value), res(i, 0))

I also changed the StripWords functions to only return LCased text (to match
the counting).


Neither of those steps should have been necessary. Please see my response to
your previous post.
--ron
  #18   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 11
Default How to get a numbered list of unique words in a column?

Hi Ron.

I found a problem with the code you provided. Sometimes I get a "Run-time
error '13': Type Mismatch". If I hit 'debug', it shows me the line:
w = Split(c.Value)

When I add a watch for c, this is what I see:
Value: Error 2029 Type: Range/Range

When I add a watch for c.value, this is what I see:
Value: Error 2029 Type: Variant/Error

I traced this to a cell which contains the following text:
#NAME?

However, after removing the data from that cell and trying again, I now get
"Run-time error '5020': Application-defined or object-defined error".

This error occurs at the following line in the 'CountWord' function:
Set mc = re.Execute(s)


I'm not having much fun with this code, as I keep running into error that I
don't fully understand.


- James.



"Ron Rosenfeld" wrote:

On Tue, 23 Jun 2009 13:56:01 -0700, J741
wrote:

Thanks Ron. That worked.


Glad to hear it. Thanks for the feedback.


Now, how can I refine this to ignore words that are smaller than 3 letters
in length? Words like 'and', 'the', not', etc.


I would do that work in the StripWord function. That's where we clean up and
can also easily test words. If a null string is returned to the calling
routine, it already ignores it.

So, for example, to eliminate words that are 3 or fewer characters in length:

======================
Private Function StripWord(s As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
'allow only letters, digits, slashes and hyphens
re.Pattern = "[^-/A-Za-z0-9]"
StripWord = re.Replace(s, "")
' eliminate words with length of three or less
If Len(StripWord) <= 3 Then StripWord = ""
Set re = Nothing
End Function
=======================

Other modifications as to unacceptable words, would be simple to do here, also.
--ron

  #19   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 5,651
Default How to get a numbered list of unique words in a column?

On Tue, 30 Jun 2009 13:33:01 -0700, J741
wrote:

I traced this to a cell which contains the following text:
#NAME?


You really need to better define what we are dealing with.

In your original post, you indicated:

"... contain sentences, phrases, or paragraphs of comments
entered by users."

#NAME? errors are the result of someone trying to enter a FORMULA. I did not
include FORMULA's in the class of "sentences, phrases, or paragraphs of
comments".

Now a problem is that you have not defined what you want to do in the event
your user enters something that evaluates to a formula?

Are formulas legitimate entries for a user, or are they always going to be a
mistake?

If they are legitimate entries, then you probably want to flag those that
result in errors but test the results of legitimate formulas.

If they are NEVER going to be legitimate entries, then you can test the formula
text (and not the formula result) if there is a formula in the cell, regardless
of whether it returns an error value or not.

If your users will NEVER be entering formulas, then each entry that evaluates
to a formula is really a comment that happens to begin with an equal sign.

That being the case, one solution is to test each cell to see if it is a
formula, and, if so, evaluate the formula text. This approach would ignore the
"equal" sign since it is only one character in length (and you want legitimate
words to be more than three characters in length). However, since, under the
assumption that "=" represents a token for the word "equal" or "equals", maybe
we should substitute that word for the token.

Having done this, you may still miss some entries as formulas that don't
evaluate to errors may have the word-defining spaces removed by Excel's formula
parser.

But if formulas are legitimate user-entries, then you need to use a different
approach.

I'm not having much fun with this code, as I keep running into error that I
don't fully understand.


Feel free to try different code in order to have fun.

Or, if you want to work with this code, post back regarding my questions about
user-entered formulas in these cells, and we can devise code to handle the
problem.
--ron
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
function to retrieve a list of unique characters from a column Pluggie Excel Worksheet Functions 2 June 13th 07 04:17 PM
How can I get a unique list of a column? Yabi New Users to Excel 2 April 11th 07 11:01 AM
create numbered sortable numbered list in excel coloradio Excel Discussion (Misc queries) 2 November 15th 06 07:50 PM
list unique values in a column beechum1 Excel Worksheet Functions 1 March 2nd 06 06:08 AM
Compare multiple column of data and list out common and unique component in adj columns kuansheng Excel Worksheet Functions 15 February 1st 06 11:49 PM


All times are GMT +1. The time now is 01:19 PM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"