Home |
Search |
Today's Posts |
|
#1
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
Hi.
I have a spreadsheet with almost 5000 rows with about 15 columns of data, 2 columns of which contain sentences, phrases, or paragraphs of comments entered by users. What I need to have as an end result is a list of all unique words found in those two columns, along with the number of occurrences of each word. I will then use this to further analyze the data in the spreadsheet. Right now, I do this manually and it takes a VERY long time for me to do so (over 200 man-hours). So, I really need to automate this somehow. The problem is, I have no idea how to begin, because the words will not be by themselves in their own cell (so I cant use the auto filter functions, pivot tables, or anything else I can think of), but will be part of groups of words or symbols within multiple cells. I think this will need to be done programmatically with code, scripts, macros, or some other method with which I am not familiar, but I am just not sure. Can anyone help me with this? Can anyone point me in the right direction? Is there anything already built in to excel to do this? - James. |
#2
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
James
What version of Excel are you using? Also, do you have any idea of the maximum number of words in these cells? Like, no more than 30. Or no more than 300. Or no more than 3000. What columns are these 2 columns, and what row has the first entry in these 2 columns. HTH Otto "J741" wrote in message ... Hi. I have a spreadsheet with almost 5000 rows with about 15 columns of data, 2 columns of which contain sentences, phrases, or paragraphs of comments entered by users. What I need to have as an end result is a list of all unique words found in those two columns, along with the number of occurrences of each word. I will then use this to further analyze the data in the spreadsheet. Right now, I do this manually and it takes a VERY long time for me to do so (over 200 man-hours). So, I really need to automate this somehow. The problem is, I have no idea how to begin, because the words will not be by themselves in their own cell (so I cant use the auto filter functions, pivot tables, or anything else I can think of), but will be part of groups of words or symbols within multiple cells. I think this will need to be done programmatically with code, scripts, macros, or some other method with which I am not familiar, but I am just not sure. Can anyone help me with this? Can anyone point me in the right direction? Is there anything already built in to excel to do this? - James. |
#3
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
Hi Otto.
1. The cells have a _lot_ of words, but I don't think it's any more than a few hundred per cell. 2. The range of cells containing the words is C2:D5000 3. I am using Excel 2003 SP3 - James. "Otto Moehrbach" wrote: James What version of Excel are you using? Also, do you have any idea of the maximum number of words in these cells? Like, no more than 30. Or no more than 300. Or no more than 3000. What columns are these 2 columns, and what row has the first entry in these 2 columns. HTH Otto "J741" wrote in message ... Hi. I have a spreadsheet with almost 5000 rows with about 15 columns of data, 2 columns of which contain sentences, phrases, or paragraphs of comments entered by users. What I need to have as an end result is a list of all unique words found in those two columns, along with the number of occurrences of each word. I will then use this to further analyze the data in the spreadsheet. Right now, I do this manually and it takes a VERY long time for me to do so (over 200 man-hours). So, I really need to automate this somehow. The problem is, I have no idea how to begin, because the words will not be by themselves in their own cell (so I cant use the auto filter functions, pivot tables, or anything else I can think of), but will be part of groups of words or symbols within multiple cells. I think this will need to be done programmatically with code, scripts, macros, or some other method with which I am not familiar, but I am just not sure. Can anyone help me with this? Can anyone point me in the right direction? Is there anything already built in to excel to do this? - James. |
#5
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
James
I forgot to mention that I also assumed that the 2 columns that have the long cells in question are columns A & B. Change that in the code to match your actual data layout. Otto "Otto Moehrbach" wrote in message ... James Here's some code that does what you want, I think. I wrote it in several macros for ease of development. I assumed that the sheet that you start with is named "Main". Change that in the code as needed. I also assumed that you have a blank sheet named "Utility". The code first clears the Utility sheet, then builds everything in that sheet. The end product, in the "Utility" sheet is one or more columns consisting of all the words, including all repeat words. Then there is one column with a header of "Unique Words" and that's what is in that column. The last column has a number for each unique word and that number is the number of times that that word appears in all the words. I ran the code and it works for me. The macro that you want to run is "GetList". All the other macros run from this one macro. Place all you see below in a regular module If you need more help in running this, send me an email and I'll send you the file in which I developed this code. My email address is . Remove the "extra" from this address. HTH Otto Option Explicit Dim rColA As Range Dim rColB As Range Dim rTheRng As Range Dim i As Range Dim Dest As Range Dim Str As String Sub GetList() Set rColA = Range("A2", Range("A" & Rows.Count).End(xlUp)) Set rColB = Range("B2", Range("B" & Rows.Count).End(xlUp)) Set rTheRng = Union(rColA, rColB) Call GetAllWords Call GetUniqueWords End Sub Sub GetAllWords() Dim TheArray() As String Dim c As Long Sheets("Main").Select With Sheets("Utility") .Cells.Clear Set Dest = .Range("A1") For Each i In rTheRng If IsEmpty(i.Value) Then GoTo NextCell Call CleanEntry TheArray = Split(Str, " ") Dest.Resize(1 + UBound(TheArray)) = WorksheetFunction.Transpose(TheArray) Set Dest = .Cells(Rows.Count, Dest.Column).End(xlUp).Offset(1) If Dest.Row 60000 Then _ Set Dest = .Cells(1, Dest.Column + 1) NextCell: Next i End With End Sub Sub CleanEntry() Str = i.Value Str = Application.Trim(Str) Str = Replace(Str, ".", "") Str = Replace(Str, ",", "") Str = Replace(Str, "?", "") Str = Replace(Str, "!", "") End Sub Sub GetUniqueWords() Dim LastColumn As Long Dim UW As Range Sheets("Utility").Select Set rTheRng = ActiveSheet.UsedRange LastColumn = rTheRng(rTheRng.Count).Column + 1 Rows("1:1").Insert Shift:=xlDown Range("A1") = "All The Words" Cells(1, LastColumn) = "Unique Words" Cells(1, LastColumn + 1) = "Qty" Set Dest = Cells(2, LastColumn) Set UW = Dest For Each i In rTheRng If UW.Find(What:=i.Value, LookAt:=xlWhole) Is Nothing Then Dest = i.Value Dest.Offset(, 1) = Application.CountIf(rTheRng, i.Value) Set Dest = Dest.Offset(1) Set UW = Range(Cells(2, Dest.Column), Cells(Rows.Count, Dest.Column).End(xlUp).Offset(1)) End If Next i End Sub "J741" wrote in message ... Hi Otto. 1. The cells have a _lot_ of words, but I don't think it's any more than a few hundred per cell. 2. The range of cells containing the words is C2:D5000 3. I am using Excel 2003 SP3 - James. "Otto Moehrbach" wrote: James What version of Excel are you using? Also, do you have any idea of the maximum number of words in these cells? Like, no more than 30. Or no more than 300. Or no more than 3000. What columns are these 2 columns, and what row has the first entry in these 2 columns. HTH Otto "J741" wrote in message ... Hi. I have a spreadsheet with almost 5000 rows with about 15 columns of data, 2 columns of which contain sentences, phrases, or paragraphs of comments entered by users. What I need to have as an end result is a list of all unique words found in those two columns, along with the number of occurrences of each word. I will then use this to further analyze the data in the spreadsheet. Right now, I do this manually and it takes a VERY long time for me to do so (over 200 man-hours). So, I really need to automate this somehow. The problem is, I have no idea how to begin, because the words will not be by themselves in their own cell (so I cant use the auto filter functions, pivot tables, or anything else I can think of), but will be part of groups of words or symbols within multiple cells. I think this will need to be done programmatically with code, scripts, macros, or some other method with which I am not familiar, but I am just not sure. Can anyone help me with this? Can anyone point me in the right direction? Is there anything already built in to excel to do this? - James. |
#6
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
Thank you Otto. I figured out how to use this code, and it works just fine.
Thank you for your effort. "Otto Moehrbach" wrote: James I forgot to mention that I also assumed that the 2 columns that have the long cells in question are columns A & B. Change that in the code to match your actual data layout. Otto "Otto Moehrbach" wrote in message ... James Here's some code that does what you want, I think. I wrote it in several macros for ease of development. I assumed that the sheet that you start with is named "Main". Change that in the code as needed. I also assumed that you have a blank sheet named "Utility". The code first clears the Utility sheet, then builds everything in that sheet. The end product, in the "Utility" sheet is one or more columns consisting of all the words, including all repeat words. Then there is one column with a header of "Unique Words" and that's what is in that column. The last column has a number for each unique word and that number is the number of times that that word appears in all the words. I ran the code and it works for me. The macro that you want to run is "GetList". All the other macros run from this one macro. Place all you see below in a regular module If you need more help in running this, send me an email and I'll send you the file in which I developed this code. My email address is . Remove the "extra" from this address. HTH Otto Option Explicit Dim rColA As Range Dim rColB As Range Dim rTheRng As Range Dim i As Range Dim Dest As Range Dim Str As String Sub GetList() Set rColA = Range("A2", Range("A" & Rows.Count).End(xlUp)) Set rColB = Range("B2", Range("B" & Rows.Count).End(xlUp)) Set rTheRng = Union(rColA, rColB) Call GetAllWords Call GetUniqueWords End Sub Sub GetAllWords() Dim TheArray() As String Dim c As Long Sheets("Main").Select With Sheets("Utility") .Cells.Clear Set Dest = .Range("A1") For Each i In rTheRng If IsEmpty(i.Value) Then GoTo NextCell Call CleanEntry TheArray = Split(Str, " ") Dest.Resize(1 + UBound(TheArray)) = WorksheetFunction.Transpose(TheArray) Set Dest = .Cells(Rows.Count, Dest.Column).End(xlUp).Offset(1) If Dest.Row 60000 Then _ Set Dest = .Cells(1, Dest.Column + 1) NextCell: Next i End With End Sub Sub CleanEntry() Str = i.Value Str = Application.Trim(Str) Str = Replace(Str, ".", "") Str = Replace(Str, ",", "") Str = Replace(Str, "?", "") Str = Replace(Str, "!", "") End Sub Sub GetUniqueWords() Dim LastColumn As Long Dim UW As Range Sheets("Utility").Select Set rTheRng = ActiveSheet.UsedRange LastColumn = rTheRng(rTheRng.Count).Column + 1 Rows("1:1").Insert Shift:=xlDown Range("A1") = "All The Words" Cells(1, LastColumn) = "Unique Words" Cells(1, LastColumn + 1) = "Qty" Set Dest = Cells(2, LastColumn) Set UW = Dest For Each i In rTheRng If UW.Find(What:=i.Value, LookAt:=xlWhole) Is Nothing Then Dest = i.Value Dest.Offset(, 1) = Application.CountIf(rTheRng, i.Value) Set Dest = Dest.Offset(1) Set UW = Range(Cells(2, Dest.Column), Cells(Rows.Count, Dest.Column).End(xlUp).Offset(1)) End If Next i End Sub "J741" wrote in message ... Hi Otto. 1. The cells have a _lot_ of words, but I don't think it's any more than a few hundred per cell. 2. The range of cells containing the words is C2:D5000 3. I am using Excel 2003 SP3 - James. "Otto Moehrbach" wrote: James What version of Excel are you using? Also, do you have any idea of the maximum number of words in these cells? Like, no more than 30. Or no more than 300. Or no more than 3000. What columns are these 2 columns, and what row has the first entry in these 2 columns. HTH Otto "J741" wrote in message ... Hi. I have a spreadsheet with almost 5000 rows with about 15 columns of data, 2 columns of which contain sentences, phrases, or paragraphs of comments entered by users. What I need to have as an end result is a list of all unique words found in those two columns, along with the number of occurrences of each word. I will then use this to further analyze the data in the spreadsheet. Right now, I do this manually and it takes a VERY long time for me to do so (over 200 man-hours). So, I really need to automate this somehow. The problem is, I have no idea how to begin, because the words will not be by themselves in their own cell (so I cant use the auto filter functions, pivot tables, or anything else I can think of), but will be part of groups of words or symbols within multiple cells. I think this will need to be done programmatically with code, scripts, macros, or some other method with which I am not familiar, but I am just not sure. Can anyone help me with this? Can anyone point me in the right direction? Is there anything already built in to excel to do this? - James. |
#7
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
WOW. Thanks Otto.
However, that certainly does not look like instructions for using something that is already in Excel. It looks like program code. So, how do I use it in Excel? I'm not sure what you mean by "Place all you see below in a regular module". - James. "Otto Moehrbach" wrote: James Here's some code that does what you want, I think. I wrote it in several macros for ease of development. I assumed that the sheet that you start with is named "Main". Change that in the code as needed. I also assumed that you have a blank sheet named "Utility". The code first clears the Utility sheet, then builds everything in that sheet. The end product, in the "Utility" sheet is one or more columns consisting of all the words, including all repeat words. Then there is one column with a header of "Unique Words" and that's what is in that column. The last column has a number for each unique word and that number is the number of times that that word appears in all the words. I ran the code and it works for me. The macro that you want to run is "GetList". All the other macros run from this one macro. Place all you see below in a regular module If you need more help in running this, send me an email and I'll send you the file in which I developed this code. My email address is . Remove the "extra" from this address. HTH Otto Option Explicit Dim rColA As Range Dim rColB As Range Dim rTheRng As Range Dim i As Range Dim Dest As Range Dim Str As String Sub GetList() Set rColA = Range("A2", Range("A" & Rows.Count).End(xlUp)) Set rColB = Range("B2", Range("B" & Rows.Count).End(xlUp)) Set rTheRng = Union(rColA, rColB) Call GetAllWords Call GetUniqueWords End Sub Sub GetAllWords() Dim TheArray() As String Dim c As Long Sheets("Main").Select With Sheets("Utility") .Cells.Clear Set Dest = .Range("A1") For Each i In rTheRng If IsEmpty(i.Value) Then GoTo NextCell Call CleanEntry TheArray = Split(Str, " ") Dest.Resize(1 + UBound(TheArray)) = WorksheetFunction.Transpose(TheArray) Set Dest = .Cells(Rows.Count, Dest.Column).End(xlUp).Offset(1) If Dest.Row 60000 Then _ Set Dest = .Cells(1, Dest.Column + 1) NextCell: Next i End With End Sub Sub CleanEntry() Str = i.Value Str = Application.Trim(Str) Str = Replace(Str, ".", "") Str = Replace(Str, ",", "") Str = Replace(Str, "?", "") Str = Replace(Str, "!", "") End Sub Sub GetUniqueWords() Dim LastColumn As Long Dim UW As Range Sheets("Utility").Select Set rTheRng = ActiveSheet.UsedRange LastColumn = rTheRng(rTheRng.Count).Column + 1 Rows("1:1").Insert Shift:=xlDown Range("A1") = "All The Words" Cells(1, LastColumn) = "Unique Words" Cells(1, LastColumn + 1) = "Qty" Set Dest = Cells(2, LastColumn) Set UW = Dest For Each i In rTheRng If UW.Find(What:=i.Value, LookAt:=xlWhole) Is Nothing Then Dest = i.Value Dest.Offset(, 1) = Application.CountIf(rTheRng, i.Value) Set Dest = Dest.Offset(1) Set UW = Range(Cells(2, Dest.Column), Cells(Rows.Count, Dest.Column).End(xlUp).Offset(1)) End If Next i End Sub "J741" wrote in message ... Hi Otto. 1. The cells have a _lot_ of words, but I don't think it's any more than a few hundred per cell. 2. The range of cells containing the words is C2:D5000 3. I am using Excel 2003 SP3 - James. "Otto Moehrbach" wrote: James What version of Excel are you using? Also, do you have any idea of the maximum number of words in these cells? Like, no more than 30. Or no more than 300. Or no more than 3000. What columns are these 2 columns, and what row has the first entry in these 2 columns. HTH Otto "J741" wrote in message ... Hi. I have a spreadsheet with almost 5000 rows with about 15 columns of data, 2 columns of which contain sentences, phrases, or paragraphs of comments entered by users. What I need to have as an end result is a list of all unique words found in those two columns, along with the number of occurrences of each word. I will then use this to further analyze the data in the spreadsheet. Right now, I do this manually and it takes a VERY long time for me to do so (over 200 man-hours). So, I really need to automate this somehow. The problem is, I have no idea how to begin, because the words will not be by themselves in their own cell (so I cant use the auto filter functions, pivot tables, or anything else I can think of), but will be part of groups of words or symbols within multiple cells. I think this will need to be done programmatically with code, scripts, macros, or some other method with which I am not familiar, but I am just not sure. Can anyone help me with this? Can anyone point me in the right direction? Is there anything already built in to excel to do this? - James. |
#8
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
Anyone know how to do this?
"J741" wrote: Hi. I have a spreadsheet with almost 5000 rows with about 15 columns of data, 2 columns of which contain sentences, phrases, or paragraphs of comments entered by users. What I need to have as an end result is a list of all unique words found in those two columns, along with the number of occurrences of each word. I will then use this to further analyze the data in the spreadsheet. Right now, I do this manually and it takes a VERY long time for me to do so (over 200 man-hours). So, I really need to automate this somehow. The problem is, I have no idea how to begin, because the words will not be by themselves in their own cell (so I cant use the auto filter functions, pivot tables, or anything else I can think of), but will be part of groups of words or symbols within multiple cells. I think this will need to be done programmatically with code, scripts, macros, or some other method with which I am not familiar, but I am just not sure. Can anyone help me with this? Can anyone point me in the right direction? Is there anything already built in to excel to do this? - James. |
#9
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
I'm working on it but had to stop for a couple of days because of company.
Will get on it Sunday/Monday. Otto "J741" wrote in message ... Anyone know how to do this? "J741" wrote: Hi. I have a spreadsheet with almost 5000 rows with about 15 columns of data, 2 columns of which contain sentences, phrases, or paragraphs of comments entered by users. What I need to have as an end result is a list of all unique words found in those two columns, along with the number of occurrences of each word. I will then use this to further analyze the data in the spreadsheet. Right now, I do this manually and it takes a VERY long time for me to do so (over 200 man-hours). So, I really need to automate this somehow. The problem is, I have no idea how to begin, because the words will not be by themselves in their own cell (so I cant use the auto filter functions, pivot tables, or anything else I can think of), but will be part of groups of words or symbols within multiple cells. I think this will need to be done programmatically with code, scripts, macros, or some other method with which I am not familiar, but I am just not sure. Can anyone help me with this? Can anyone point me in the right direction? Is there anything already built in to excel to do this? - James. |
#10
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
On Thu, 18 Jun 2009 08:36:02 -0700, J741
wrote: Hi. I have a spreadsheet with almost 5000 rows with about 15 columns of data, 2 columns of which contain sentences, phrases, or paragraphs of comments entered by users. What I need to have as an end result is a list of all unique words found in those two columns, along with the number of occurrences of each word. I will then use this to further analyze the data in the spreadsheet. Right now, I do this manually and it takes a VERY long time for me to do so (over 200 man-hours). So, I really need to automate this somehow. The problem is, I have no idea how to begin, because the words will not be by themselves in their own cell (so I cant use the auto filter functions, pivot tables, or anything else I can think of), but will be part of groups of words or symbols within multiple cells. I think this will need to be done programmatically with code, scripts, macros, or some other method with which I am not familiar, but I am just not sure. Can anyone help me with this? Can anyone point me in the right direction? Is there anything already built in to excel to do this? - James. Here's a start. You'll need to properly set the range to process (rSrc in the code) and the range where you want the results (rDest in the code) to match your sheets. For example, you might set rsrc = range("A1:B5000") to encompass 5000 lines in two columns. And, of course, you'll need to set rDest = some cell that is outside your data range. I assumed you wanted the results sorted by frequency of the word, with the most common word being first; but this can be easily changed. Note also that I formatted the entire first column of rDest as TEXT. Without this, any numeric strings in the data would be changed to numbers (if we left the format as General). So long strings might get truncated, or displayed in scientific notation; and leading zeros would be stripped off. Also, for this initial example, words are defined as strings containing only letters, digits, slash or a hyphen. This is done in order to remove punctuation. But it will also remove other substrings that might include other characters. If this will be an issue, changes can be easily made. To enter this Macro (Sub), <alt-F11 opens the Visual Basic Editor. Ensure your project is highlighted in the Project Explorer window. Then, from the top menu, select Insert/Module and paste the code below into the window that opens. To use this Macro (Sub), <alt-F8 opens the macro dialog box. Select the macro by name, and <RUN. ======================================== Option Explicit Sub UniqueWordList() Dim rSrc As Range, rDest As Range, c As Range Dim cWordList As Collection Dim res() As Variant Dim w() As String Dim i As Long Set cWordList = New Collection Set rSrc = Range("A1:B22") Set rDest = Range("M1") rDest.EntireColumn.NumberFormat = "@" For Each c In rSrc w = Split(c.Value) For i = 0 To UBound(w) w(i) = StripWord(w(i)) If Not w(i) = "" Then On Error Resume Next cWordList.Add Item:=w(i), Key:=w(i) On Error GoTo 0 End If Next i Next c 'transfer words to results array ReDim res(1 To cWordList.Count, 0 To 1) For i = 1 To cWordList.Count res(i, 0) = cWordList(i) Next i 'get counts For i = LBound(res) To UBound(res) For Each c In rSrc res(i, 1) = res(i, 1) + CountWord(c.Value, res(i, 0)) Next c Next i 'sort alpha: d=0; sort numeric d=1 'there are various ways of sorting BubbleSort res, 1 For i = LBound(res) To UBound(res) rDest.Offset(i, 0).Value = res(i, 0) rDest.Offset(i, 1).Value = res(i, 1) Next i End Sub Private Function StripWord(s As String) As String Dim re As Object Set re = CreateObject("vbscript.regexp") re.Global = True 'allow only letters, digits, slashes and hyphens re.Pattern = "[^-/A-Za-z0-9]" StripWord = re.Replace(s, "") Set re = Nothing End Function Private Function CountWord(ByVal s As String, sPat) As Long Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.Global = True re.Pattern = "\b" & sPat & "\b" Set mc = re.Execute(s) CountWord = mc.Count End Function Private Sub BubbleSort(TempArray As Variant, d As Long) 'd is 0 based dimension Dim temp(0, 1) As Variant Dim i As Integer Dim NoExchanges As Integer ' Loop until no more "exchanges" are made. Do NoExchanges = True ' Loop through each element in the array. For i = LBound(TempArray) To UBound(TempArray) - 1 ' If the element is less than the element ' following it, exchange the two elements. ' change "<" to "" to sort ascending If TempArray(i, d) < TempArray(i + 1, d) Then NoExchanges = False temp(0, 0) = TempArray(i, 0) temp(0, 1) = TempArray(i, 1) TempArray(i, 0) = TempArray(i + 1, 0) TempArray(i, 1) = TempArray(i + 1, 1) TempArray(i + 1, 0) = temp(0, 0) TempArray(i + 1, 1) = temp(0, 1) End If Next i Loop While Not (NoExchanges) End Sub ====================================== --ron |
#11
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
On Mon, 22 Jun 2009 16:41:46 -0400, Ron Rosenfeld
wrote: On Thu, 18 Jun 2009 08:36:02 -0700, J741 wrote: Hi. I have a spreadsheet with almost 5000 rows with about 15 columns of data, 2 columns of which contain sentences, phrases, or paragraphs of comments entered by users. Small change. The "count" function needs to be made case insensitive. So change the code to: ========================= Option Explicit Sub UniqueWordList() Dim rSrc As Range, rDest As Range, c As Range Dim cWordList As Collection Dim res() As Variant Dim w() As String Dim i As Long Set cWordList = New Collection Set rSrc = Range("A1:B22") Set rDest = Range("M1") rDest.EntireColumn.NumberFormat = "@" For Each c In rSrc w = Split(c.Value) For i = 0 To UBound(w) w(i) = StripWord(w(i)) If Not w(i) = "" Then On Error Resume Next cWordList.Add Item:=w(i), Key:=w(i) On Error GoTo 0 End If Next i Next c 'transfer words to results array ReDim res(1 To cWordList.Count, 0 To 1) For i = 1 To cWordList.Count res(i, 0) = cWordList(i) Next i 'get counts For i = LBound(res) To UBound(res) For Each c In rSrc res(i, 1) = res(i, 1) + CountWord(c.Value, res(i, 0)) Next c Next i 'sort alpha: d=0; sort numeric d=1 'there are various ways of sorting BubbleSort res, 1 For i = LBound(res) To UBound(res) rDest.Offset(i, 0).Value = res(i, 0) rDest.Offset(i, 1).Value = res(i, 1) Next i End Sub Private Function StripWord(s As String) As String Dim re As Object Set re = CreateObject("vbscript.regexp") re.Global = True 'allow only letters, digits, slashes and hyphens re.Pattern = "[^-/A-Za-z0-9]" StripWord = re.Replace(s, "") Set re = Nothing End Function Private Function CountWord(ByVal s As String, sPat) As Long Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.Global = True re.ignorecase = True re.Pattern = "\b" & sPat & "\b" Set mc = re.Execute(s) CountWord = mc.Count End Function Private Sub BubbleSort(TempArray As Variant, d As Long) 'd is 0 based dimension Dim temp(0, 1) As Variant Dim i As Integer Dim NoExchanges As Integer ' Loop until no more "exchanges" are made. Do NoExchanges = True ' Loop through each element in the array. For i = LBound(TempArray) To UBound(TempArray) - 1 ' If the element is less than the element ' following it, exchange the two elements. ' change "<" to "" to sort ascending If TempArray(i, d) < TempArray(i + 1, d) Then NoExchanges = False temp(0, 0) = TempArray(i, 0) temp(0, 1) = TempArray(i, 1) TempArray(i, 0) = TempArray(i + 1, 0) TempArray(i, 1) = TempArray(i + 1, 1) TempArray(i + 1, 0) = temp(0, 0) TempArray(i + 1, 1) = temp(0, 1) End If Next i Loop While Not (NoExchanges) End Sub ================================= --ron |
#12
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
Thanks Ron. That worked.
Now, how can I refine this to ignore words that are smaller than 3 letters in length? Words like 'and', 'the', not', etc. "Ron Rosenfeld" wrote: On Mon, 22 Jun 2009 16:41:46 -0400, Ron Rosenfeld wrote: On Thu, 18 Jun 2009 08:36:02 -0700, J741 wrote: Hi. I have a spreadsheet with almost 5000 rows with about 15 columns of data, 2 columns of which contain sentences, phrases, or paragraphs of comments entered by users. Small change. The "count" function needs to be made case insensitive. So change the code to: ========================= Option Explicit Sub UniqueWordList() Dim rSrc As Range, rDest As Range, c As Range Dim cWordList As Collection Dim res() As Variant Dim w() As String Dim i As Long Set cWordList = New Collection Set rSrc = Range("A1:B22") Set rDest = Range("M1") rDest.EntireColumn.NumberFormat = "@" For Each c In rSrc w = Split(c.Value) For i = 0 To UBound(w) w(i) = StripWord(w(i)) If Not w(i) = "" Then On Error Resume Next cWordList.Add Item:=w(i), Key:=w(i) On Error GoTo 0 End If Next i Next c 'transfer words to results array ReDim res(1 To cWordList.Count, 0 To 1) For i = 1 To cWordList.Count res(i, 0) = cWordList(i) Next i 'get counts For i = LBound(res) To UBound(res) For Each c In rSrc res(i, 1) = res(i, 1) + CountWord(c.Value, res(i, 0)) Next c Next i 'sort alpha: d=0; sort numeric d=1 'there are various ways of sorting BubbleSort res, 1 For i = LBound(res) To UBound(res) rDest.Offset(i, 0).Value = res(i, 0) rDest.Offset(i, 1).Value = res(i, 1) Next i End Sub Private Function StripWord(s As String) As String Dim re As Object Set re = CreateObject("vbscript.regexp") re.Global = True 'allow only letters, digits, slashes and hyphens re.Pattern = "[^-/A-Za-z0-9]" StripWord = re.Replace(s, "") Set re = Nothing End Function Private Function CountWord(ByVal s As String, sPat) As Long Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.Global = True re.ignorecase = True re.Pattern = "\b" & sPat & "\b" Set mc = re.Execute(s) CountWord = mc.Count End Function Private Sub BubbleSort(TempArray As Variant, d As Long) 'd is 0 based dimension Dim temp(0, 1) As Variant Dim i As Integer Dim NoExchanges As Integer ' Loop until no more "exchanges" are made. Do NoExchanges = True ' Loop through each element in the array. For i = LBound(TempArray) To UBound(TempArray) - 1 ' If the element is less than the element ' following it, exchange the two elements. ' change "<" to "" to sort ascending If TempArray(i, d) < TempArray(i + 1, d) Then NoExchanges = False temp(0, 0) = TempArray(i, 0) temp(0, 1) = TempArray(i, 1) TempArray(i, 0) = TempArray(i + 1, 0) TempArray(i, 1) = TempArray(i + 1, 1) TempArray(i + 1, 0) = temp(0, 0) TempArray(i + 1, 1) = temp(0, 1) End If Next i Loop While Not (NoExchanges) End Sub ================================= --ron |
#13
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
On Tue, 23 Jun 2009 13:56:01 -0700, J741
wrote: Thanks Ron. That worked. Glad to hear it. Thanks for the feedback. Now, how can I refine this to ignore words that are smaller than 3 letters in length? Words like 'and', 'the', not', etc. I would do that work in the StripWord function. That's where we clean up and can also easily test words. If a null string is returned to the calling routine, it already ignores it. So, for example, to eliminate words that are 3 or fewer characters in length: ====================== Private Function StripWord(s As String) As String Dim re As Object Set re = CreateObject("vbscript.regexp") re.Global = True 'allow only letters, digits, slashes and hyphens re.Pattern = "[^-/A-Za-z0-9]" StripWord = re.Replace(s, "") ' eliminate words with length of three or less If Len(StripWord) <= 3 Then StripWord = "" Set re = Nothing End Function ======================= Other modifications as to unacceptable words, would be simple to do here, also. --ron |
#14
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
Hi Ron. Thanks for that. However I have a few more questions as follows:
1. How can I get it to include e-mail addresses which contain the symbols '@' and '.'? 2. When I tested it with some data that included mixed case words, I got invalid results. Specifically, three cells with "Seven seven SEVEN", "SeVeN sEvEn", and ""Seven seven SEVEN"" returned a word count of only 2 for the word 'seven'. Why would this be? 3. For the previously mentioned invalid results, when a word such as ' was stripped of non-letter characters, it also got counted as 0 instances of the word. 4. I am trying to understand your code, and the statements 'Set re = CreateObject("vbscript.regexp")' and 'set mc = re.Execute' are confusing to me. Can you explain them please (or point me to another resource that explains them)? Thanks. - James. "Ron Rosenfeld" wrote: On Tue, 23 Jun 2009 13:56:01 -0700, J741 wrote: Thanks Ron. That worked. Glad to hear it. Thanks for the feedback. Now, how can I refine this to ignore words that are smaller than 3 letters in length? Words like 'and', 'the', not', etc. I would do that work in the StripWord function. That's where we clean up and can also easily test words. If a null string is returned to the calling routine, it already ignores it. So, for example, to eliminate words that are 3 or fewer characters in length: ====================== Private Function StripWord(s As String) As String Dim re As Object Set re = CreateObject("vbscript.regexp") re.Global = True 'allow only letters, digits, slashes and hyphens re.Pattern = "[^-/A-Za-z0-9]" StripWord = re.Replace(s, "") ' eliminate words with length of three or less If Len(StripWord) <= 3 Then StripWord = "" Set re = Nothing End Function ======================= Other modifications as to unacceptable words, would be simple to do here, also. --ron |
#15
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
Hi Ron.
I found a problem with the code you provided. Sometimes I get a "Run-time error '13': Type Mismatch". If I hit 'debug', it shows me the line: w = Split(c.Value) When I add a watch for c, this is what I see: Value: Error 2029 Type: Range/Range When I add a watch for c.value, this is what I see: Value: Error 2029 Type: Variant/Error I traced this to a cell which contains the following text: #NAME? However, after removing the data from that cell and trying again, I now get "Run-time error '5020': Application-defined or object-defined error". This error occurs at the following line in the 'CountWord' function: Set mc = re.Execute(s) I'm not having much fun with this code, as I keep running into error that I don't fully understand. - James. "Ron Rosenfeld" wrote: On Tue, 23 Jun 2009 13:56:01 -0700, J741 wrote: Thanks Ron. That worked. Glad to hear it. Thanks for the feedback. Now, how can I refine this to ignore words that are smaller than 3 letters in length? Words like 'and', 'the', not', etc. I would do that work in the StripWord function. That's where we clean up and can also easily test words. If a null string is returned to the calling routine, it already ignores it. So, for example, to eliminate words that are 3 or fewer characters in length: ====================== Private Function StripWord(s As String) As String Dim re As Object Set re = CreateObject("vbscript.regexp") re.Global = True 'allow only letters, digits, slashes and hyphens re.Pattern = "[^-/A-Za-z0-9]" StripWord = re.Replace(s, "") ' eliminate words with length of three or less If Len(StripWord) <= 3 Then StripWord = "" Set re = Nothing End Function ======================= Other modifications as to unacceptable words, would be simple to do here, also. --ron |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
function to retrieve a list of unique characters from a column | Excel Worksheet Functions | |||
How can I get a unique list of a column? | New Users to Excel | |||
create numbered sortable numbered list in excel | Excel Discussion (Misc queries) | |||
list unique values in a column | Excel Worksheet Functions | |||
Compare multiple column of data and list out common and unique component in adj columns | Excel Worksheet Functions |