![]() |
Iterate through all words
I have a text file that is read and stored into a string variable.
Here is the basic code: Dim s as String s = ReadTextFile("C:\data.txt") I would now like to retrieve each and every word that is stored in the variable "s". Loading all the words into an array will be okay for now. For my purposes, a word is any string of characters that does not include whitespace characters, such as spaces, tabs, carriage returns, line feeds, etc... I think those are the only whitespace characters that exist, right? Gary, do you recommend that I use the Split() function for this? |
Iterate through all words
I have a text file that is read and stored into a string variable.
Here is the basic code: Dim s as String s = ReadTextFile("C:\data.txt") I would now like to retrieve each and every word that is stored in the variable "s". Loading all the words into an array will be okay for now. For my purposes, a word is any string of characters that does not include whitespace characters, such as spaces, tabs, carriage returns, line feeds, etc... I think those are the only whitespace characters that exist, right? Gary, do you recommend that I use the Split() function for this? If you want to iterate the words only you'll need to first filter out all possible punctuation characters there might be so words are delimited by the space character. A text file may also contain CrLf characters which you may want to remove as well. Note that doing so will screw you up if you don't replace these with a space character. So then, the approach I suggest is... filter out punctuation characters ".,;!:" replace CrLf characters with " " split the string into a variant using " " as the delimiter ...where you can filter unwanted characters with the following function. Function FilterString$(ByVal TextIn$, Optional IncludeChars$, _ Optional IncludeLetters As Boolean = True, _ Optional IncludeNumbers As Boolean = True) ' Filters out all unwanted characters in a string. ' Arguments: TextIn The string being filtered. ' IncludeChars [Optional] Any non alpha-numeric ' characters to keep. ' IncludeLetters [Optional] Keeps any letters. ' IncludeNumbers [Optional] Keeps any numbers. ' ' Returns: String containing only wanted characters. ' Comments: Works very fast using the Mid$() function over other ' methods. Const sSource As String = "FilterString()" 'The basic characters to always keep by default Const sLetters As String = "abcdefghijklmnopqrstuvwxyz" Const sNumbers As String = "0123456789" Dim i&, CharsToKeep$ CharsToKeep = IncludeChars If IncludeLetters Then _ CharsToKeep = CharsToKeep & sLetters & UCase(sLetters) If IncludeNumbers Then CharsToKeep = CharsToKeep & sNumbers For i = 1 To Len(TextIn) If InStr(CharsToKeep, Mid$(TextIn, i, 1)) Then _ FilterString = FilterString & Mid$(TextIn, i, 1) Next End Function 'FilterString() -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion |
Iterate through all words
For example...
Dim sText$, vText, n& sText = FilterString(ReadTextFile(sFile), ".,;!:") vText = Split(Replace(sText, vbCrLf, " "), " ") For n = LBound(vText) To UBound(vText) Debug.Print vText(n) Next 'n OR you can skip the use of 'sText'... vText = Split(Replace(FilterString(ReadTextFile(sFile), ".,;!:"), _ vbCrLf, " "), " ") ...but using sText is a bit easier to read/digest! -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion |
Iterate through all words
hi Robert,
Am Fri, 15 May 2015 17:37:26 -0700 schrieb Robert Crandal: I have a text file that is read and stored into a string variable. Here is the basic code: Dim s as String s = ReadTextFile("C:\data.txt") I would now like to retrieve each and every word that is stored in the variable "s". Loading all the words into an array will be okay for now. store your words in varText: Sub Test() Dim ptrn As String Dim Matches, Match Dim re As Object Dim varText() As Variant Dim n As Long Set re = CreateObject("vbscript.regexp") ptrn = "\w+" re.Pattern = ptrn re.IgnoreCase = False re.Global = True Set Matches = re.Execute(s) ReDim Preserve varText(Matches.Count - 1) For Each Match In Matches varText(n) = Match.Value n = n + 1 Next End Sub Regards Claus B. -- Vista Ultimate / Windows7 Office 2007 Ultimate / 2010 Professional |
Iterate through all words
"Claus Busch" wrote:
store your words in varText: Sub Test() Dim ptrn As String Dim Matches, Match Dim re As Object Dim varText() As Variant Dim n As Long Set re = CreateObject("vbscript.regexp") ptrn = "\w+" re.Pattern = ptrn re.IgnoreCase = False re.Global = True Set Matches = re.Execute(s) ReDim Preserve varText(Matches.Count - 1) For Each Match In Matches varText(n) = Match.Value n = n + 1 Next End Sub Hi Claus. A few questions.... How do you recommend that I store my words in the varText variable? I typically load an entire text file into a string variable using ReadTextFile(). Also, your code above refers to a variable "s" without initializing any value. What is this variable? Thanks! |
Iterate through all words
Hi Robert,
Am Sat, 16 May 2015 02:33:36 -0700 schrieb Robert Crandal: How do you recommend that I store my words in the varText variable? I typically load an entire text file into a string variable using ReadTextFile(). you already have the string as. You wrote: s = ReadTextFile("C:\data.txt") Also, your code above refers to a variable "s" without initializing any value. What is this variable? The code I posted is refered to that s you already have Regards Claus B. -- Vista Ultimate / Windows7 Office 2007 Ultimate / 2010 Professional |
Iterate through all words
"Claus Busch" wrote:
you already have the string as. You wrote: s = ReadTextFile("C:\data.txt") Okay Claus. I tested it again and it works great. Thanks again . You and Gary are awesome! |
Iterate through all words
"GS" wrote:
sFile = "C:\data.txt" Dim sText$, vText, n& sText = FilterString(ReadTextFile(sFile), ".,;!:") vText = Split(Replace(sText, vbCrLf, " "), " ") For n = LBound(vText) To UBound(vText) Debug.Print vText(n) Next 'n MsgBox vText(0) The above code causes vText to have 1 element, which is a long string of words from the document minus all spacing. Did I do something wrong here? |
Iterate through all words
Hi Robert,
Am Sat, 16 May 2015 03:35:23 -0700 schrieb Robert Crandal: "GS" wrote: sFile = "C:\data.txt" Dim sText$, vText, n& sText = FilterString(ReadTextFile(sFile), ".,;!:") vText = Split(Replace(sText, vbCrLf, " "), " ") For n = LBound(vText) To UBound(vText) Debug.Print vText(n) Next 'n MsgBox vText(0) The above code causes vText to have 1 element, which is a long string of words from the document minus all spacing. replace all punctuation marks with spaces: Sub TextToArray() Dim varChr As Variant, varTmp As Variant Dim s As String Dim i As Long 'Array with the expected punctuation marks varChr = Array(",", ".", ":", "-", "_", "!", "?", "(", ")", "'", Chr(10)) s = ReadTextFile("C:\data.txt") 'Replaces all punctuation marks with spaces For i = LBound(varChr) To UBound(varChr) s = Replace(s, varChr(i), " ") Next 'Deletes superfluous spaces s = Application.Trim(s) varTmp = Split(s, " ") For i = LBound(varTmp) To UBound(varTmp) Debug.Print varTmp(i) Next End Sub Regards Claus B. -- Vista Ultimate / Windows7 Office 2007 Ultimate / 2010 Professional |
Iterate through all words
"GS" wrote:
sFile = "C:\data.txt" Dim sText$, vText, n& sText = FilterString(ReadTextFile(sFile), ".,;!:") vText = Split(Replace(sText, vbCrLf, " "), " ") For n = LBound(vText) To UBound(vText) Debug.Print vText(n) Next 'n MsgBox vText(0) The above code causes vText to have 1 element, which is a long string of words from the document minus all spacing. Did I do something wrong here? That suggests the punctuation characters at the end of sentences are not followed by at least 1 space, and there are no CrLf characters in the file. Please upload the subject file so I can see what's going on... -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion |
Iterate through all words
"GS" wrote:
That suggests the punctuation characters at the end of sentences are not followed by at least 1 space, and there are no CrLf characters in the file. Please upload the subject file so I can see what's going on... I pasted the contents of the file below. BTW, the second suggestion by Claus also worked, where his code replaces all punctuation marks with spaces. ------------------[data.txt]--------------------------- Repulsive questions contented him few extensive supported. Of remarkably thoroughly, he appearance in. Supposing tolerably applauded or of be. Suffering unfeeling so objection agreeable allowance me of. Ask within entire season sex common far who family. As be valley warmth assure on. Park girl they rich hour new well way you. Face ye be me been room we sons fond. Lose eyes get fat shew. Winter can indeed, AA1 10, letters oppose way change tended now. So is improve my charmed picture exposed adapted 22 demands. Received had end produced prepared diverted, strictly, off man branched. Known ye money so large decay voice there to. Preserved be mr cordially incommode as an. He doors quick child an point at. Had share vexed front least style off why him. Call park out she wife face zoo mean. Invitation letter's address excellence imprudence understood it continuing to. Ye show done an into. Fifteen winding related may hearted colonel are way studied. The things so remain oh to elinor. Far merits season better tended any age hunted. Preserved be mr cordially incommode as an. |
Iterate through all words
"GS" wrote:
sFile = "C:\data.txt" Dim sText$, vText, n& sText = FilterString(ReadTextFile(sFile), ".,;!:") vText = Split(Replace(sText, vbCrLf, " "), " ") For n = LBound(vText) To UBound(vText) Debug.Print vText(n) Next 'n MsgBox vText(0) The above code causes vText to have 1 element, which is a long string of words from the document minus all spacing. Did I do something wrong here? No.., I did! Wasn't thinking 'IncludeChars' and so FilterString removed everything but the string passed for the 2nd arg. Here's what we should have done... sText = FilterString(ReadTextFile(sFile), " ") ...which removes all characters except spaces, including CrLf chars. So... vText = Split(FilterString(ReadTextFile(sFile), " "), " ") ...returns a UBound(vText) = 174. -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion |
Iterate through all words
vText = Split(FilterString(ReadTextFile(sFile), " "), " ")
Still not right! Should be... vText = Split(FilterString(Replace(ReadTextFile(sFile), vbCrLf, " "), " "), " ") ...so the last/first word of each line doesn't get concatenated! -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion |
Iterate through all words
"GS" wrote:
Still not right! Should be... vText = Split(FilterString(Replace(ReadTextFile(sFile), vbCrLf, " "), " "), " ") Hi Gary. Using the same "data.txt" file that I uploaded earlier, I then tested the above code like so: Sub MyTest() Dim sFile As String sFile = "C:\data.txt" vText = Split(FilterString(Replace(ReadTextFile(sFile), vbCrLf, " "), " "), " ") MsgBox vText(1) ' Outputs whitespace string MsgBox vText(2) ' Outputs whitespace string End Sub Shouldn't the vText array contain a list of all "words" that are not whitespaces? |
Iterate through all words
"GS" wrote:
Still not right! Should be... vText = Split(FilterString(Replace(ReadTextFile(sFile), vbCrLf, " "), " "), " ") Hi Gary. Using the same "data.txt" file that I uploaded earlier, I then tested the above code like so: Sub MyTest() Dim sFile As String sFile = "C:\data.txt" vText = Split(FilterString(Replace(ReadTextFile(sFile), vbCrLf, " "), " "), " ") MsgBox vText(1) ' Outputs whitespace string MsgBox vText(2) ' Outputs whitespace string End Sub Shouldn't the vText array contain a list of all "words" that are not whitespaces? I'm getting the words in the file. There's a few empty elements but mostly I see the words in the sample file. The delimiter is the space character and since your file has multiple blank lines between the groups of words, those will be empty elements! -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion |
All times are GMT +1. The time now is 02:26 AM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com