ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Programming (https://www.excelbanter.com/excel-programming/)
-   -   Iterate through all words (https://www.excelbanter.com/excel-programming/450882-iterate-through-all-words.html)

Robert Crandal[_3_]

Iterate through all words
 
I have a text file that is read and stored into a string variable.
Here is the basic code:

Dim s as String

s = ReadTextFile("C:\data.txt")

I would now like to retrieve each and every word that is
stored in the variable "s". Loading all the words into an
array will be okay for now.

For my purposes, a word is any string of characters that
does not include whitespace characters, such as spaces,
tabs, carriage returns, line feeds, etc... I think those are
the only whitespace characters that exist, right?

Gary, do you recommend that I use the Split() function for this?




GS[_6_]

Iterate through all words
 
I have a text file that is read and stored into a string variable.
Here is the basic code:

Dim s as String

s = ReadTextFile("C:\data.txt")

I would now like to retrieve each and every word that is
stored in the variable "s". Loading all the words into an
array will be okay for now.

For my purposes, a word is any string of characters that
does not include whitespace characters, such as spaces,
tabs, carriage returns, line feeds, etc... I think those are
the only whitespace characters that exist, right?

Gary, do you recommend that I use the Split() function for this?


If you want to iterate the words only you'll need to first filter out
all possible punctuation characters there might be so words are
delimited by the space character.

A text file may also contain CrLf characters which you may want to
remove as well. Note that doing so will screw you up if you don't
replace these with a space character.

So then, the approach I suggest is...

filter out punctuation characters ".,;!:"
replace CrLf characters with " "
split the string into a variant using " " as the delimiter

...where you can filter unwanted characters with the following function.


Function FilterString$(ByVal TextIn$, Optional IncludeChars$, _
Optional IncludeLetters As Boolean = True, _
Optional IncludeNumbers As Boolean = True)
' Filters out all unwanted characters in a string.
' Arguments: TextIn The string being filtered.
' IncludeChars [Optional] Any non alpha-numeric
' characters to keep.
' IncludeLetters [Optional] Keeps any letters.
' IncludeNumbers [Optional] Keeps any numbers.
'
' Returns: String containing only wanted characters.
' Comments: Works very fast using the Mid$() function over other
' methods.

Const sSource As String = "FilterString()"

'The basic characters to always keep by default
Const sLetters As String = "abcdefghijklmnopqrstuvwxyz"
Const sNumbers As String = "0123456789"

Dim i&, CharsToKeep$

CharsToKeep = IncludeChars
If IncludeLetters Then _
CharsToKeep = CharsToKeep & sLetters & UCase(sLetters)
If IncludeNumbers Then CharsToKeep = CharsToKeep & sNumbers

For i = 1 To Len(TextIn)
If InStr(CharsToKeep, Mid$(TextIn, i, 1)) Then _
FilterString = FilterString & Mid$(TextIn, i, 1)
Next
End Function 'FilterString()

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion



GS[_6_]

Iterate through all words
 
For example...

Dim sText$, vText, n&
sText = FilterString(ReadTextFile(sFile), ".,;!:")
vText = Split(Replace(sText, vbCrLf, " "), " ")

For n = LBound(vText) To UBound(vText)
Debug.Print vText(n)
Next 'n

OR
you can skip the use of 'sText'...

vText = Split(Replace(FilterString(ReadTextFile(sFile), ".,;!:"), _
vbCrLf, " "), " ")

...but using sText is a bit easier to read/digest!

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion



Claus Busch

Iterate through all words
 
hi Robert,

Am Fri, 15 May 2015 17:37:26 -0700 schrieb Robert Crandal:

I have a text file that is read and stored into a string variable.
Here is the basic code:

Dim s as String

s = ReadTextFile("C:\data.txt")

I would now like to retrieve each and every word that is
stored in the variable "s". Loading all the words into an
array will be okay for now.


store your words in varText:

Sub Test()
Dim ptrn As String
Dim Matches, Match
Dim re As Object
Dim varText() As Variant
Dim n As Long

Set re = CreateObject("vbscript.regexp")

ptrn = "\w+"
re.Pattern = ptrn
re.IgnoreCase = False
re.Global = True
Set Matches = re.Execute(s)
ReDim Preserve varText(Matches.Count - 1)
For Each Match In Matches
varText(n) = Match.Value
n = n + 1
Next
End Sub


Regards
Claus B.
--
Vista Ultimate / Windows7
Office 2007 Ultimate / 2010 Professional

Robert Crandal[_3_]

Iterate through all words
 
"Claus Busch" wrote:

store your words in varText:

Sub Test()
Dim ptrn As String
Dim Matches, Match
Dim re As Object
Dim varText() As Variant
Dim n As Long

Set re = CreateObject("vbscript.regexp")

ptrn = "\w+"
re.Pattern = ptrn
re.IgnoreCase = False
re.Global = True
Set Matches = re.Execute(s)
ReDim Preserve varText(Matches.Count - 1)
For Each Match In Matches
varText(n) = Match.Value
n = n + 1
Next
End Sub


Hi Claus. A few questions....

How do you recommend that I store my words
in the varText variable? I typically load an entire text
file into a string variable using ReadTextFile().

Also, your code above refers to a variable "s" without
initializing any value. What is this variable?

Thanks!




Claus Busch

Iterate through all words
 
Hi Robert,

Am Sat, 16 May 2015 02:33:36 -0700 schrieb Robert Crandal:

How do you recommend that I store my words
in the varText variable? I typically load an entire text
file into a string variable using ReadTextFile().


you already have the string as. You wrote:
s = ReadTextFile("C:\data.txt")

Also, your code above refers to a variable "s" without
initializing any value. What is this variable?


The code I posted is refered to that s you already have


Regards
Claus B.
--
Vista Ultimate / Windows7
Office 2007 Ultimate / 2010 Professional

Robert Crandal[_3_]

Iterate through all words
 
"Claus Busch" wrote:

you already have the string as. You wrote:
s = ReadTextFile("C:\data.txt")


Okay Claus. I tested it again and it works great.
Thanks again . You and Gary are awesome!




Robert Crandal[_3_]

Iterate through all words
 
"GS" wrote:

sFile = "C:\data.txt"

Dim sText$, vText, n&
sText = FilterString(ReadTextFile(sFile), ".,;!:")
vText = Split(Replace(sText, vbCrLf, " "), " ")

For n = LBound(vText) To UBound(vText)
Debug.Print vText(n)
Next 'n

MsgBox vText(0)


The above code causes vText to have 1 element, which is
a long string of words from the document minus all spacing.

Did I do something wrong here?




Claus Busch

Iterate through all words
 
Hi Robert,

Am Sat, 16 May 2015 03:35:23 -0700 schrieb Robert Crandal:

"GS" wrote:

sFile = "C:\data.txt"

Dim sText$, vText, n&
sText = FilterString(ReadTextFile(sFile), ".,;!:")
vText = Split(Replace(sText, vbCrLf, " "), " ")

For n = LBound(vText) To UBound(vText)
Debug.Print vText(n)
Next 'n

MsgBox vText(0)


The above code causes vText to have 1 element, which is
a long string of words from the document minus all spacing.


replace all punctuation marks with spaces:

Sub TextToArray()
Dim varChr As Variant, varTmp As Variant
Dim s As String
Dim i As Long

'Array with the expected punctuation marks
varChr = Array(",", ".", ":", "-", "_", "!", "?", "(", ")", "'",
Chr(10))

s = ReadTextFile("C:\data.txt")

'Replaces all punctuation marks with spaces
For i = LBound(varChr) To UBound(varChr)
s = Replace(s, varChr(i), " ")
Next

'Deletes superfluous spaces
s = Application.Trim(s)
varTmp = Split(s, " ")

For i = LBound(varTmp) To UBound(varTmp)
Debug.Print varTmp(i)
Next
End Sub


Regards
Claus B.
--
Vista Ultimate / Windows7
Office 2007 Ultimate / 2010 Professional

GS[_6_]

Iterate through all words
 
"GS" wrote:

sFile = "C:\data.txt"

Dim sText$, vText, n&
sText = FilterString(ReadTextFile(sFile), ".,;!:")
vText = Split(Replace(sText, vbCrLf, " "), " ")

For n = LBound(vText) To UBound(vText)
Debug.Print vText(n)
Next 'n

MsgBox vText(0)


The above code causes vText to have 1 element, which is
a long string of words from the document minus all spacing.

Did I do something wrong here?


That suggests the punctuation characters at the end of sentences are
not followed by at least 1 space, and there are no CrLf characters in
the file. Please upload the subject file so I can see what's going
on...

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion



Robert Crandal[_3_]

Iterate through all words
 
"GS" wrote:

That suggests the punctuation characters at the end of sentences are not
followed by at least 1 space, and there are no CrLf characters in the
file. Please upload the subject file so I can see what's going on...


I pasted the contents of the file below.

BTW, the second suggestion by Claus also worked, where his code
replaces all punctuation marks with spaces.

------------------[data.txt]---------------------------

Repulsive questions contented him few extensive supported. Of remarkably
thoroughly, he appearance in. Supposing tolerably applauded or of be.
Suffering unfeeling so objection agreeable allowance me of. Ask within
entire season sex common far who family. As be valley warmth assure on. Park
girl they rich hour new well way you. Face ye be me been room we sons fond.



Lose eyes get fat shew. Winter can indeed, AA1 10, letters oppose way change
tended now. So is improve my charmed picture exposed adapted 22 demands.
Received had end produced prepared diverted, strictly, off man branched.
Known ye money so large decay voice there to. Preserved be mr cordially
incommode as an. He doors quick child an point at. Had share vexed front
least style off why him.



Call park out she wife face zoo mean. Invitation letter's address excellence
imprudence understood it continuing to. Ye show done an into. Fifteen
winding related may hearted colonel are way studied. The things so remain
oh to elinor. Far merits season better tended any age hunted.



Preserved be mr cordially incommode as an.




GS[_6_]

Iterate through all words
 
"GS" wrote:

sFile = "C:\data.txt"

Dim sText$, vText, n&
sText = FilterString(ReadTextFile(sFile), ".,;!:")
vText = Split(Replace(sText, vbCrLf, " "), " ")

For n = LBound(vText) To UBound(vText)
Debug.Print vText(n)
Next 'n

MsgBox vText(0)


The above code causes vText to have 1 element, which is
a long string of words from the document minus all spacing.

Did I do something wrong here?


No.., I did! Wasn't thinking 'IncludeChars' and so FilterString removed
everything but the string passed for the 2nd arg. Here's what we should
have done...

sText = FilterString(ReadTextFile(sFile), " ")

...which removes all characters except spaces, including CrLf chars.
So...

vText = Split(FilterString(ReadTextFile(sFile), " "), " ")

...returns a UBound(vText) = 174.

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion



GS[_6_]

Iterate through all words
 
vText = Split(FilterString(ReadTextFile(sFile), " "), " ")

Still not right! Should be...

vText = Split(FilterString(Replace(ReadTextFile(sFile), vbCrLf, " "), "
"), " ")

...so the last/first word of each line doesn't get concatenated!

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion



Robert Crandal[_3_]

Iterate through all words
 
"GS" wrote:

Still not right! Should be...

vText = Split(FilterString(Replace(ReadTextFile(sFile), vbCrLf, " "), "
"), " ")


Hi Gary. Using the same "data.txt" file that I uploaded earlier, I then
tested the
above code like so:

Sub MyTest()
Dim sFile As String
sFile = "C:\data.txt"

vText = Split(FilterString(Replace(ReadTextFile(sFile), vbCrLf, " "), "
"), " ")
MsgBox vText(1) ' Outputs whitespace string
MsgBox vText(2) ' Outputs whitespace string
End Sub

Shouldn't the vText array contain a list of all "words" that are not
whitespaces?




GS[_6_]

Iterate through all words
 
"GS" wrote:

Still not right! Should be...

vText = Split(FilterString(Replace(ReadTextFile(sFile), vbCrLf, "
"), " "), " ")


Hi Gary. Using the same "data.txt" file that I uploaded earlier, I
then tested the
above code like so:

Sub MyTest()
Dim sFile As String
sFile = "C:\data.txt"

vText = Split(FilterString(Replace(ReadTextFile(sFile), vbCrLf, "
"), " "), " ")
MsgBox vText(1) ' Outputs whitespace string
MsgBox vText(2) ' Outputs whitespace string
End Sub

Shouldn't the vText array contain a list of all "words" that are not
whitespaces?


I'm getting the words in the file. There's a few empty elements but
mostly I see the words in the sample file. The delimiter is the space
character and since your file has multiple blank lines between the
groups of words, those will be empty elements!

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion




All times are GMT +1. The time now is 02:26 AM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com