Home |
Search |
Today's Posts |
|
#1
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
I would like to find all the unique words in a text file and store them
in an Excel 2010 spreadsheet. I am looking for a reasonably efficient method or algorithm to achieve this. I was thinking, how about save all the words that begin with "A" in the A column, all the "B" words get placed in the B column, etc... Also, each word should only be stored ONCE in the spreadsheet, regardless of lowercase, uppercase, mixed, etc. So, if my text file contains "apple", "APPLE", or "aPPle", only one instance of "apple" should appear under the A column. I'd appreciate anything to help me get started. Thank you. Robert |
#2
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Hi Robert,
Am Thu, 7 May 2015 02:45:29 -0700 schrieb Robert Crandal: Also, each word should only be stored ONCE in the spreadsheet, regardless of lowercase, uppercase, mixed, etc. So, if my text file contains "apple", "APPLE", or "aPPle", only one instance of "apple" should appear under the A column. the following code stores all unique words in lowercase in column A. You can sort and move the words if you want: Public objFSO As New Filesystemobject Sub Test() Dim strPath As String Dim strFN As String Dim objReadFile As Object, myDic As Object Dim strText As String Dim varChr As Variant, varTmp As Variant, varOut As Variant Dim i As Long 'Modify path and file name strPath = "C:\Users\Claus\Desktop\" strFN = "Test.txt" varChr = Array(",", ".", ":", "-", "_", "!", "?", "(", ")", Chr(10)) Set myDic = CreateObject("Scripting.Dictionary") Set objReadFile = objFSO.opentextfile _ (strPath & strFN, forreading, tristatefalse) Do While objReadFile.atendofstream < True strText = strText & objReadFile.readline & " " Loop objReadFile.Close For i = LBound(varChr) To UBound(varChr) strText = Replace(strText, varChr(i), "") Next strText = Left(strText, Len(strText) - 1) varTmp = Split(strText, " ") For i = LBound(varTmp) To UBound(varTmp) varTmp(i) = LCase(varTmp(i)) Next For i = LBound(varTmp) To UBound(varTmp) myDic(varTmp(i)) = varTmp(i) Next varOut = myDic.Items Range("A1").Resize(UBound(varOut) + 1) = Application.Transpose(varOut) End Sub Regards Claus B. -- Vista Ultimate / Windows7 Office 2007 Ultimate / 2010 Professional |
#3
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Hi Robert,
Am Thu, 7 May 2015 12:30:24 +0200 schrieb Claus Busch: the following code stores all unique words in lowercase in column A. You can sort and move the words if you want: to distribute the words in alphabetic order try: Public objFSO As New Filesystemobject Sub Test() Dim strPath As String Dim strFN As String Dim objReadFile As Object, myDic As Object Dim strText As String Dim varChr As Variant, varTmp As Variant, varOut As Variant Dim i As Long, LRow As Long, n As Long Application.ScreenUpdating = False 'Modify path and file name strPath = "C:\Users\Claus\Desktop\" strFN = "Test.txt" varChr = Array(",", ".", ":", "-", "_", "!", "?", "(", ")", Chr(10)) Set myDic = CreateObject("Scripting.Dictionary") Set objReadFile = objFSO.opentextfile _ (strPath & strFN, forreading, tristatefalse) Do While objReadFile.atendofstream < True strText = strText & objReadFile.readline & " " Loop objReadFile.Close For i = LBound(varChr) To UBound(varChr) strText = Replace(strText, varChr(i), " ") Next strText = Application.Trim(strText) varTmp = Split(strText, " ") For i = LBound(varTmp) To UBound(varTmp) varTmp(i) = LCase(varTmp(i)) Next For i = LBound(varTmp) To UBound(varTmp) myDic(varTmp(i)) = varTmp(i) Next varOut = myDic.Items Range("A1") = "Header" Range("A2").Resize(UBound(varOut) + 1) = Application.Transpose(varOut) LRow = Cells(Rows.Count, 1).End(xlUp).Row n = 2 For i = 97 To 122 Range("A1:A" & LRow).AutoFilter field:=1, Criteria1:="=" & Chr(i) & "*" If Application.Subtotal(3, Range("A:A")) 1 Then Range("A2:A" & LRow).Copy Cells(1, n) n = n + 1 End If Next Columns("A").Delete Application.ScreenUpdating = True End Sub Regards Claus B. -- Vista Ultimate / Windows7 Office 2007 Ultimate / 2010 Professional |
#4
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Hallo Robert,
Am Thu, 7 May 2015 12:56:09 +0200 schrieb Claus Busch: For i = LBound(varTmp) To UBound(varTmp) varTmp(i) = LCase(varTmp(i)) Next you don't need to set each word in the array to LCase. You can set the whole string to LCase at once: strText = Application.Trim(strText) strText = LCase(strText) varTmp = Split(strText, " ") Regards Claus B. -- Vista Ultimate / Windows7 Office 2007 Ultimate / 2010 Professional |
#5
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
"Claus Busch" wrote:
Set myDic = CreateObject("Scripting.Dictionary") BTW, I never knew about this "Scripting Dictionary". Is this an object that can store an entire library of words? If so, do you think I would be better off storing all of my unique words in the dictionary object, rather than saving them in columns on the spreadsheet? If I am able to store all my unique words in a memory object, I could then iterate through the object and output all the words to a text file. That might even be better for me. |
#6
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Hi Robert,
Am Thu, 7 May 2015 11:20:35 -0700 schrieb Robert Crandal: "Claus Busch" wrote: Set myDic = CreateObject("Scripting.Dictionary") BTW, I never knew about this "Scripting Dictionary". Is this an object that can store an entire library of words? the Scripting.Dictionary is my favorite to create unique items. With For i = LBound(varTmp) To UBound(varTmp) myDic(varTmp(i)) = varTmp(i) Next varOut = myDic.Items you have all your unique words in varOut. Regards Claus B. -- Vista Ultimate / Windows7 Office 2007 Ultimate / 2010 Professional |
#7
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
"Claus Busch" wrote:
Public objFSO As New Filesystemobject This line gives me an error. Here is the error message: Compile error: User-defined type not defined Am I missing something here? Thanks Claus |
#8
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Hi Robert,
Am Thu, 7 May 2015 11:08:48 -0700 schrieb Robert Crandal: Public objFSO As New Filesystemobject This line gives me an error. Here is the error message: you have to activate the reference to "Microsoft Scripting Runtime" Regards Claus B. -- Vista Ultimate / Windows7 Office 2007 Ultimate / 2010 Professional |
#9
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
"Claus Busch" wrote:
Public objFSO As New Filesystemobject This line gives me an error. Here is the error message: Compile error: User-defined type not defined Am I missing something here? Thanks Claus You need to set a project 'Reference' to the Scripting runtime for using this 'early binding' method! This is okay for development but my *preference* is to use 'late binding' for final release... In the General Declarations section of a standard module: Public oFSO As Object In the main startup procedure (ie: "Auto_Open") or an "InitGlobals" procedure called by your main startup procedure... Set oFSO = CreateObject("Scripting.FileSystemObject") ...to make oFSO available to your project during runtime. In the shutdown procedure (ie: "Auto_Close") release the memory... Set oFSO = Nothing -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion |
#10
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Hi Robert,
Am Thu, 7 May 2015 11:08:48 -0700 schrieb Robert Crandal: Public objFSO As New Filesystemobject here is a revised version: Sub Test() Dim strPath As String Dim strFN As String Dim objReadFile As Object, myDic As Object, objFSO As Object Dim strText As String Dim varChr As Variant, varTmp As Variant, varOut As Variant Dim i As Long, LRow As Long, n As Long Application.ScreenUpdating = False 'Modify path and file name strPath = "C:\Users\Claus\Desktop\" strFN = "Test.txt" 'Array with the expected punctuation marks varChr = Array(",", ".", ":", "-", "_", "!", "?", "(", ")", "'", Chr(10)) Set myDic = CreateObject("Scripting.Dictionary") Set objFSO = CreateObject("Scripting.FileSystemObject") 'Opens the text file and read the text into a string Set objReadFile = objFSO.opentextfile(strPath & strFN) Do While objReadFile.atendofstream < True strText = strText & objReadFile.readline & " " Loop objReadFile.Close 'Deletes all punctuation marks For i = LBound(varChr) To UBound(varChr) strText = Replace(strText, varChr(i), " ") Next 'Handles special cases like I'll, it's and so on 'Customize it depending on your text file strText = Replace(Replace(Replace(Replace(Replace(strText, " s ", " is "), " ll ", _ " will "), " d ", " had "), " re ", " are "), " ve ", " have ") 'Deletes superfluous spaces strText = Application.Trim(strText) 'Change all words to lower case strText = LCase(strText) 'Writes the string into an array varTmp = Split(strText, " ") 'Creates unique words For i = LBound(varTmp) To UBound(varTmp) myDic(varTmp(i)) = varTmp(i) Next varOut = myDic.Items 'Writes the unique words to column A with a header for filtering Range("A1") = "Header" Range("A2").Resize(UBound(varOut) + 1) = Application.Transpose(varOut) LRow = Cells(Rows.Count, 1).End(xlUp).Row 'Distrubutes the words in alphabetic order to columns n = 2 For i = 97 To 122 Range("A1:A" & LRow).AutoFilter field:=1, Criteria1:="=" & Chr(i) & "*" If Application.Subtotal(3, Range("A:A")) 1 Then Range("A2:A" & LRow).Copy Cells(1, n) n = n + 1 End If Next Columns("A").Delete Columns("A:Z").AutoFit Application.ScreenUpdating = True End Sub Regards Claus B. -- Vista Ultimate / Windows7 Office 2007 Ultimate / 2010 Professional |
#11
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
"Claus Busch" wrote:
here is a revised version: Sub Test() ' [code snipped] End Sub Thanks Claus. This is a very good start. It worked nicely for small input files. I will probably need to make some modifications so it takes numeric words into considersation, such as "10", "100", etc... But, it works great so far. |
#12
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Hi Robert,
This is a very good start. It worked nicely for small input files. I will probably need to make some modifications so it takes numeric words into considersation, such as "10", "100", etc... try it with Regular Expressions: Sub Test2() Dim strPath As String Dim strFN As String Dim objReadFile As Object, myDic As Object, objFSO As Object, re As Object Dim strText As String Dim varText() As Variant Dim i As Long, LRow As Long, n As Long Dim ptrn, Match, Matches Application.ScreenUpdating = False 'Modify path and file name strPath = "C:\Users\Claus\Desktop\" strFN = "Test.txt" Set myDic = CreateObject("Scripting.Dictionary") Set objFSO = CreateObject("Scripting.FileSystemObject") Set re = CreateObject("vbscript.regexp") 'Opens the text file and read the text into a string Set objReadFile = objFSO.opentextfile(strPath & strFN) Do While objReadFile.atendofstream < True strText = strText & objReadFile.readline & " " Loop objReadFile.Close strText = Application.Trim(LCase(strText)) ptrn = "\w+" re.Pattern = ptrn re.IgnoreCase = False re.Global = True Set Matches = re.Execute(strText) ReDim Preserve varText(Matches.Count - 1) For Each Match In Matches varText(n) = CStr(Match.Value) n = n + 1 Next Range("A:A").NumberFormat = "@" Range("A1") = "Header" Range("A2").Resize(UBound(varText) + 1) = _ Application.Transpose(varText) LRow = Cells(Rows.Count, 1).End(xlUp).Row 'Distrubutes the words in alphabetic order to columns n = 2 For i = 97 To 122 Range("A1:A" & LRow).AutoFilter Field:=1, Criteria1:="=" & Chr(i) & "*" If Application.Subtotal(3, Range("A:A")) 1 Then Range("A2:A" & LRow).Copy Cells(1, n) n = n + 1 End If Next For i = 48 To 57 Range("A1:A" & LRow).AutoFilter Field:=1, Criteria1:="=" & Chr(i) & "*" If Application.Subtotal(3, Range("A:A")) 1 Then Range("A2:A" & LRow).Copy Cells(1, n) n = n + 1 End If Next Columns("A").Delete Columns("A:AJ").AutoFit Application.ScreenUpdating = True End Sub Regards Claus B. -- Vista Ultimate / Windows7 Office 2007 Ultimate / 2010 Professional |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
XML Data Storage | Excel Discussion (Misc queries) | |||
Persistent Storage for Add-Ins ? | Excel Programming | |||
cell value storage | Excel Programming | |||
Data Storage | Excel Discussion (Misc queries) | |||
Variable Storage | Excel Programming |