Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 161
Default Extract paragraphs from text file

I have a huge text file that contains paragraphs
separated by dotted borders. Each border
is a line of stars such as: ****************

What is an efficient way to selectively extract
paragraphs and append them to a new text file?
I will only select a paragraph if it does not
contain profanity words or other select words.

For example, given this input:

------BEGIN INPUT-------
Hello. This is a short novel written
by someone.
********************
I dont give a **** who reads this.
********************
Hey, sometimes **** happens, but
you gotta keep going.
********************
The end!
********************
------END INPUT-------

The output should be:

OUTPUT FILE:
Hello. This is a short novel written
by someone.
The end!



  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default Extract paragraphs from text file

I have a huge text file that contains paragraphs
separated by dotted borders. Each border
is a line of stars such as: ****************

What is an efficient way to selectively extract
paragraphs and append them to a new text file?
I will only select a paragraph if it does not
contain profanity words or other select words.

For example, given this input:

------BEGIN INPUT-------
Hello. This is a short novel written
by someone.
********************
I dont give a **** who reads this.
********************
Hey, sometimes **** happens, but
you gotta keep going.
********************
The end!
********************
------END INPUT-------

The output should be:

OUTPUT FILE:
Hello. This is a short novel written
by someone.
The end!


You'll need to search each line for the specific profanity words you
want to filter out along with lines containing the asterisks! Once you
have a delimited list of profanity words you can loop through it using
a For..Each construct and the InStr() function.

You'll also need to load the file into an array and inner loop that
using a For..Next construct. If you find words then 'falg' that line's
array element to a single unlikely character (like "~" for example) and
use the Filter() function on the array to strip out elements containing
the 'flag' character. Once done you can write the array out to a
file...

Const sProfaneWords$ = "word1,word2,word3" '//and so on

Dim vWord, vData, n&
' This assumes the full path and filename is held in 'sFilename'
vData = Split(ReadTextFile(sFilename), vbCrLf)

For Each vWord In Split(sProfaneWords, ",")
For n = LBound(vData) to UBound(vData)
If (InStr(vData(n), vWord) 0) _
Or (InStr(vData(n), "*") 0) Then
vData(n) = "~": Exit For
Next 'n
Next 'vWord
vData = Filter(vData, "~", False)

WriteTextFile Join(vData, vbCrLf), sFilename

...which uses the following support routines...

Function ReadTextFile$(Filename$)
' Reads large amounts of data from a text file in one single step.
Dim iNum%
On Error GoTo ErrHandler
iNum = FreeFile(): Open Filename For Input As #iNum
ReadTextFile = Space$(LOF(iNum))
ReadTextFile = Input(LOF(iNum), iNum)

ErrHandler:
Close #iNum: If Err Then Err.Raise Err.Number, , Err.Description
End Function 'ReadTextFile()

Sub WriteTextFile(TextOut$, Filename$, _
Optional AppendMode As Boolean = False)
' Reusable procedure that Writes/Overwrites or Appends
' large amounts of data to a Text file in one single step.
' **Does not create a blank line at the end of the file**
Dim iNum%
On Error GoTo ErrHandler
iNum = FreeFile()
If AppendMode Then
Open Filename For Append As #iNum: Print #iNum, vbCrLf & TextOut;
Else
Open Filename For Output As #iNum: Print #iNum, TextOut;
End If

ErrHandler:
Close #iNum: If Err Then Err.Raise Err.Number, , Err.Description
End Sub 'WriteTextFile()

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion


  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default Extract paragraphs from text file

You might want to loop vData to purge the asterisk lines and any empty
line as would normally be found at then end of the file...

'Filter junk lines
For n = LBound(vData) to UBound(vData)
If (vData(n) = "") _
Or (InStr(vData(n), "*") 0) Then
vData(n) = "~": Exit For
Next 'n

'Filter profane words
For Each vWord In Split(sProfaneWords, ",")
For n = LBound(vData) to UBound(vData)
If (InStr(vData(n), vWord) 0) Then
vData(n) = "~": Exit For
Next 'n
Next 'vWord
vData = Filter(vData, "~", False)

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion


  #4   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 161
Default Extract paragraphs from text file

"GS" wrote:

Const sProfaneWords$ = "word1,word2,word3" '//and so on

Dim vWord, vData, n&
' This assumes the full path and filename is held in 'sFilename'
vData = Split(ReadTextFile(sFilename), vbCrLf)

For Each vWord In Split(sProfaneWords, ",")
For n = LBound(vData) to UBound(vData)
If (InStr(vData(n), vWord) 0) _
Or (InStr(vData(n), "*") 0) Then
vData(n) = "~": Exit For
Next 'n ' ** compile error here ***
Next 'vWord
vData = Filter(vData, "~", False)

WriteTextFile Join(vData, vbCrLf), sFilename


Hi Gary. I tested this code, but I got a compile error that
said "Next without For".

I figured it might be missing an "End If" above that Next n line,
which I was right (, I hope), and the code did produce output,
but.....this was not exactly the desired output. Your code
was only deleting single sentences or lines that contained a
profanity word.

For my purposes, a "paragraph" constitutes the ENTIRE
block of words that occurs between a pair of "*" border
lines. (The only exception is the first paragraph, because
there is no "*" border at the top of the input file)

Is it possible there was another bug in the code that I
missed?



  #5   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default Extract paragraphs from text file

"GS" wrote:

Const sProfaneWords$ = "word1,word2,word3" '//and so on

Dim vWord, vData, n&
' This assumes the full path and filename is held in 'sFilename'
vData = Split(ReadTextFile(sFilename), vbCrLf)

For Each vWord In Split(sProfaneWords, ",")
For n = LBound(vData) to UBound(vData)
If (InStr(vData(n), vWord) 0) _
Or (InStr(vData(n), "*") 0) Then
vData(n) = "~": Exit For
Next 'n ' ** compile error here
***
Next 'vWord
vData = Filter(vData, "~", False)

WriteTextFile Join(vData, vbCrLf), sFilename


Hi Gary. I tested this code, but I got a compile error that
said "Next without For".


Sorry, I did forget the 'End If'.

I figured it might be missing an "End If" above that Next n line,
which I was right (, I hope), and the code did produce output,
but.....this was not exactly the desired output. Your code
was only deleting single sentences or lines that contained a
profanity word.

For my purposes, a "paragraph" constitutes the ENTIRE
block of words that occurs between a pair of "*" border
lines. (The only exception is the first paragraph, because
there is no "*" border at the top of the input file)

Is it possible there was another bug in the code that I
missed?


If you use the 2nd suggestion it will remove all empty lines and
asterisk lines before searching for profane words. It's a 2-step
process but is more efficient than my 1st offering since it treats all
3 aspects of 'cleaning' up your files...

'Filter junk lines
For n = LBound(vData) to UBound(vData)
'I revised the next line to use Len()
If Len(vData(n)) = 0 Or InStr(vData(n), "*") 0 Then
vData(n) = "~": Exit For
End If
Next 'n

'Filter profane words
For Each vWord In Split(sProfaneWords, ",")
For n = LBound(vData) to UBound(vData)
If (InStr(vData(n), vWord) 0) Then
vData(n) = "~": Exit For
End If
Next 'n
Next 'vWord
vData = Filter(vData, "~", False)


You could post a download link to a sample file for testing if you
want!

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion




  #6   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 161
Default Extract paragraphs from text file

"GS" wrote:

If you use the 2nd suggestion it will remove all empty lines and asterisk
lines before searching for profane words. It's a 2-step process but is
more efficient than my 1st offering since it treats all 3 aspects of
'cleaning' up your files...


I tried both versions, but it only removes the sentence that contains
my profanity words. I need to remove the entire paragraph.
Here is my test data:

---------------[BEGIN INPUT "book.txt"]--------------
playboys regrows correality requisition droits offered
angeles surfy wile lacrimation aged seignories practicing
hereinto workmanship fuggy municipally asdf underpinnings
brocket unpremeditated pinochle crazier coaeval obviously
able supinated hostler burrows artichoke vivant crosstown
********************
baneful celebrations angle growler landscape beside tzetzes
normal bootery bespoke henhouses tribuneship bouncer
displeasure crewman tenth curarization honestness sensitize
reminisces cometh fuk obscurantists eventualities mechanics
vanity crap nonalignment dowering nephew nonconfidence
********************
chaotically sooners rocketing luckiest holeproof damnableness
soc infertilely supernumerary expertise sulphid frisson
surceases joyously kins drooled agrarianism paraphrases ribby
wittiness grabbiest junketer accumulable hemokonia matriculants
sieged yuio forgoes staking nonadjacent offprint mug pawpaw
-------------------[END INPUT]----------------------


The desired output should have been:

--------------------[BEGIN OUTPUT]---------------------
playboys regrows correality requisition droits offered
angeles surfy wile lacrimation aged seignories practicing
hereinto workmanship fuggy municipally asdf underpinnings
brocket unpremeditated pinochle crazier coaeval obviously
able supinated hostler burrows artichoke vivant crosstown
chaotically sooners rocketing luckiest holeproof damnableness
soc infertilely supernumerary expertise sulphid frisson
surceases joyously kins drooled agrarianism paraphrases ribby
wittiness grabbiest junketer accumulable hemokonia matriculants
sieged yuio forgoes staking nonadjacent offprint mug pawpaw
------------------[END OUTPUT]---------------------------


And, here is the code that I tested:

Const sProfaneWords$ = "crap,fuk" '//and so on
Sub ExtractTest()

Dim sFilename As String
Dim sOutfile As String
Dim vWord, vData, n&

sFilename = "book.txt"
sOutfile = "out.txt"

' This assumes the full path and filename is held in 'sFilename'
vData = Split(ReadTextFile(sFilename), vbCrLf)

'Filter junk lines
For n = LBound(vData) To UBound(vData)
If (vData(n) = "") _
Or (InStr(vData(n), "*") 0) Then
vData(n) = "~": Exit For
End If
Next 'n

'Filter profane words
For Each vWord In Split(sProfaneWords, ",")
For n = LBound(vData) To UBound(vData)
If (InStr(vData(n), vWord) 0) Then
vData(n) = "~": Exit For
End If
Next 'n
Next 'vWord
vData = Filter(vData, "~", False)

WriteTextFile Join(vData, vbCrLf), sOutfile

End Sub
Function ReadTextFile$(Filename$)
' Reads large amounts of data from a text file in one single step.
Dim iNum%
On Error GoTo ErrHandler
iNum = FreeFile(): Open Filename For Input As #iNum
ReadTextFile = Space$(LOF(iNum))
ReadTextFile = Input(LOF(iNum), iNum)

ErrHandler:
Close #iNum: If Err Then Err.Raise Err.Number, , Err.Description
End Function 'ReadTextFile()
Sub WriteTextFile(TextOut$, Filename$, _
Optional AppendMode As Boolean = False)
' Reusable procedure that Writes/Overwrites or Appends
' large amounts of data to a Text file in one single step.
' **Does not create a blank line at the end of the file**
Dim iNum%
On Error GoTo ErrHandler
iNum = FreeFile()
If AppendMode Then
Open Filename For Append As #iNum: Print #iNum, vbCrLf & TextOut;
Else
Open Filename For Output As #iNum: Print #iNum, TextOut;
End If

ErrHandler:
Close #iNum: If Err Then Err.Raise Err.Number, , Err.Description
End Sub 'WriteTextFile()



  #7   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default Extract paragraphs from text file

This revised code does what you example...

Option Explicit

Const sProfaneWords$ = "fuk,****"
Const sFilename$ = ""


Sub CleanupProfanity()
Dim vData, vWord, n&

'This assumes the full path and filename is held in 'sFilename'
vData = Split(ReadTextFile(sFilename), vbCrLf)

'Group into paragraphs
For n = LBound(vData) To UBound(vData)
'Create paragraph delimiter
If InStr(vData(n), "*") 0 Then vData(n) = "<"
Next 'n
'Rebuild lines into paragraphs
vData = Split(Join(vData, vbCrLf), "<")

'Filter paragraphs with profane words
For Each vWord In Split(sProfaneWords, ",")
For n = LBound(vData) To UBound(vData)
If (InStr(vData(n), vWord) 0) Then
vData(n) = "~": Exit For
End If
Next 'n
Next 'vWord

'Rebuild paragraphs back to lines
vData = Split(Join(vData, vbCrLf), vbCrLf)

'Filter out any blank lines
For n = UBound(vData) To LBound(vData) Step -1
If Len(vData(n)) = 0 Then vData(n) = "~"
Next 'n
vData = Filter(vData, "~", False)

WriteTextFile Join(vData, vbCrLf), sFilename
End Sub

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion


  #8   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default Extract paragraphs from text file (new task)

Note also that the code removes all empty lines and does not put any
empty lines back into the file.

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion


  #9   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default Extract paragraphs from text file (new task)

Testing this new data (5 paragraphs), UBound(vData) is 4. vData(3)
contains paragraph 4 which gets parsed because it contains a listed
profane word.

In all cases the paragraph separator lines are removed as per your
sample results!

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion


  #10   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default Extract paragraphs from text file (new task)

If you're absolutely sure every asterisk line will always be 25
characters long, you can skip the 1st loop entirely...


Sub CleanupProfanity2()
Dim vData, vWord, n&

'Group directly into paragraphs
vData = Split(ReadTextFile(sFilename), String(25, "*"))

'Filter paragraphs with profane words
For Each vWord In Split(sProfaneWords, ",")
For n = LBound(vData) To UBound(vData)
If (InStr(vData(n), vWord) 0) Then
vData(n) = "~": Exit For
End If
Next 'n
Next 'vWord

'Rebuild paragraphs back to lines
'containing a paragraph separator.
' vData = Split(Join(vData, vbCrLf & "<"), vbCrLf)

'containing no paragraph separator.
vData = Split(Join(vData, vbCrLf), vbCrLf) '//no separator

'Filter out any blank lines
For n = UBound(vData) To LBound(vData) Step -1
If Len(vData(n)) = 0 Then vData(n) = "~"
Next 'n
vData = Filter(vData, "~", False)

WriteTextFile Join(vData, vbCrLf), sFilename
End Sub

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion




  #11   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 161
Default Extract paragraphs from text file

"GS" wrote:

Sub CleanupProfanity2()
Dim vData, vWord, n&

'Group directly into paragraphs
vData = Split(ReadTextFile(sFilename), String(25, "*"))

'Filter paragraphs with profane words
For Each vWord In Split(sProfaneWords, ",")
For n = LBound(vData) To UBound(vData)
If (InStr(vData(n), vWord) 0) Then
vData(n) = "~": Exit For
End If
Next 'n
Next 'vWord

'Rebuild paragraphs back to lines
'containing a paragraph separator.
' vData = Split(Join(vData, vbCrLf & "<"), vbCrLf)

'containing no paragraph separator.
vData = Split(Join(vData, vbCrLf), vbCrLf) '//no separator

'Filter out any blank lines
For n = UBound(vData) To LBound(vData) Step -1
If Len(vData(n)) = 0 Then vData(n) = "~"
Next 'n
vData = Filter(vData, "~", False)

WriteTextFile Join(vData, vbCrLf), sFilename
End Sub


Hi Gary! Sorry about the long reply. I got busy with other
issues over the last month, so I was not able to test this code
until now.

In a nutshell, the above code did NOT work. Here is the data
I used:

[BEGIN INPUT]

playboys regrows correality requisition droits offered
angeles surfy wile lacrimation aged seignories practicing
hereinto workmanship fuggy municipally asdf underpinnings
brocket unpremeditated pinochle crazier coaeval obviously
able supinated hostler burrows artichoke vivant crosstown

********************

baneful celebrations angle growler landscape beside tzetzes
normal bootery bespoke henhouses tribuneship bouncer
displeasure crewman tenth curarization honestness sensitize
reminisces cometh fuk obscurantists eventualities mechanics
vanity crap nonalignment dowering nephew nonconfidence

********************

chaotically sooners rocketing luckiest holeproof damnableness
soc infertilely supernumerary expertise sulphid frisson
surceases joyously kins drooled agrarianism paraphrases ribby
wittiness grabbiest junketer accumulable hemokonia matriculants
sieged yuio forgoes *staking* nonadjacent offprint mug pawpaw

[END INPUT]

And, here is the code that I used:

Option Explicit

Const sProfaneWords$ = "fuk,****"
Const sFilename$ = "C:\Documents and Settings\user\Desktop\Excel
Projects\book.txt"
Sub CleanupProfanity2()
Dim vData, vWord, n&

'Group directly into paragraphs
vData = Split(ReadTextFile(sFilename), String(25, "*"))

'Filter paragraphs with profane words
For Each vWord In Split(sProfaneWords, ",")
For n = LBound(vData) To UBound(vData)
If (InStr(vData(n), vWord) 0) Then
vData(n) = "~": Exit For
End If
Next 'n
Next 'vWord

'containing no paragraph separator.
vData = Split(Join(vData, vbCrLf), vbCrLf) '//no separator

'Filter out any blank lines
For n = UBound(vData) To LBound(vData) Step -1
If Len(vData(n)) = 0 Then vData(n) = "~"
Next 'n
vData = Filter(vData, "~", False)

WriteTextFile Join(vData, vbCrLf), "C:\Documents and
Settings\user\Desktop\Excel Projects\out.txt"
End Sub
Function ReadTextFile$(Filename$)
' Reads large amounts of data from a text file in one single step.
Dim iNum%
On Error GoTo ErrHandler
iNum = FreeFile(): Open Filename For Input As #iNum
ReadTextFile = Space$(LOF(iNum))
ReadTextFile = Input(LOF(iNum), iNum)

ErrHandler:
Close #iNum: If Err Then Err.Raise Err.Number, , Err.Description
End Function 'ReadTextFile()
Sub WriteTextFile(TextOut$, Filename$, _
Optional AppendMode As Boolean = False)
' Reusable procedure that Writes/Overwrites or Appends
' large amounts of data to a Text file in one single step.
' **Does not create a blank line at the end of the file**
Dim iNum%
On Error GoTo ErrHandler
iNum = FreeFile()
If AppendMode Then
Open Filename For Append As #iNum: Print #iNum, vbCrLf & TextOut;
Else
Open Filename For Output As #iNum: Print #iNum, TextOut;
End If

ErrHandler:
Close #iNum: If Err Then Err.Raise Err.Number, , Err.Description
End Sub 'WriteTextFile()



I am going to rephrase my problem in a separate post.




  #12   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 161
Default Extract paragraphs from text file (new task)

Hi again Gary! I wanted to approach this a new way.
Suppose I have the following test input, which has 3 paragraphs:

[BEGIN INPUT]
playboys regrows correality requisition droits offered
angeles surfy wile lacrimation aged seignories practicing
hereinto workmanship fuggy municipally asdf underpinnings
brocket unpremeditated pinochle crazier coaeval obviously
able supinated hostler burrows artichoke vivant crosstown
********************
baneful celebrations angle growler landscape beside tzetzes
normal bootery bespoke henhouses tribuneship bouncer
displeasure crewman tenth curarization honestness sensitize
reminisces cometh fuk obscurantists eventualities mechanics
vanity crap nonalignment dowering nephew nonconfidence
********************
chaotically sooners rocketing luckiest holeproof damnableness
soc infertilely supernumerary expertise sulphid frisson
surceases joyously kins drooled agrarianism paraphrases ribby
wittiness grabbiest junketer accumulable hemokonia matriculants
sieged yuio forgoes *staking* nonadjacent offprint mug pawpaw
[END INPUT]


I just want to read the entire file and store each paragraph into
it's own string variable or array. What is an efficient way to
achieve this? (Dont worry about profanity words)

By the way, each paragraph will always be separated by a
border of 25 star characters, except for the first and last
paragraph.

Sorry if I wasn't clear earlier. ~Rob





  #13   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default Extract paragraphs from text file (new task)

Hi again Gary! I wanted to approach this a new way.
Suppose I have the following test input, which has 3 paragraphs:

[BEGIN INPUT]
playboys regrows correality requisition droits offered
angeles surfy wile lacrimation aged seignories practicing
hereinto workmanship fuggy municipally asdf underpinnings
brocket unpremeditated pinochle crazier coaeval obviously
able supinated hostler burrows artichoke vivant crosstown
********************
baneful celebrations angle growler landscape beside tzetzes
normal bootery bespoke henhouses tribuneship bouncer
displeasure crewman tenth curarization honestness sensitize
reminisces cometh fuk obscurantists eventualities mechanics
vanity crap nonalignment dowering nephew nonconfidence
********************
chaotically sooners rocketing luckiest holeproof damnableness
soc infertilely supernumerary expertise sulphid frisson
surceases joyously kins drooled agrarianism paraphrases ribby
wittiness grabbiest junketer accumulable hemokonia matriculants
sieged yuio forgoes *staking* nonadjacent offprint mug pawpaw
[END INPUT]


I just want to read the entire file and store each paragraph into
it's own string variable or array. What is an efficient way to
achieve this? (Dont worry about profanity words)

By the way, each paragraph will always be separated by a
border of 25 star characters, except for the first and last
paragraph.

Sorry if I wasn't clear earlier. ~Rob


I must be missing something because in my tests of my sample code I got
3 separate paragraphs in my array, 0 thru 2. I'm not sure you realize
that no matter how you look at it, each whole paragraph is handled as a
whole paragraph. So...

vData(0) contains paragraph1
vData(1) contains paragraph2
vData(2) contains paragraph3

...as examined in this array resulting from splitting the file by the
asterisk lines...

'Group directly into paragraphs
vData = Split(ReadTextFile(sFilename), String(25, "*"))

Perhaps the file contents you're working with are not structured
*exactly* as posted! It shouldn't matter where empty lines occur
because they get removed before the file gets overwritten...

'Filter out any blank lines
For n = UBound(vData) To LBound(vData) Step -1
If Len(vData(n)) = 0 Then vData(n) = "~"
Next 'n
vData = Filter(vData, "~", False)

WriteTextFile Join(vData, vbCrLf), sFilename
End Sub

OR
Perhaps you don't understand arrays, and so do not realize what the
code is doing (despite the descriptive comments)!

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion


  #14   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 161
Default Extract paragraphs from text file (Got it!)

"GS" wrote:

I must be missing something because in my tests of my sample code I got 3
separate paragraphs in my array, 0 thru 2. I'm not sure you realize that
no matter how you look at it, each whole paragraph is handled as a whole
paragraph. So...

vData(0) contains paragraph1
vData(1) contains paragraph2
vData(2) contains paragraph3


Guess what? You were right! My data file was NOT structured
*exactly* as I posted. I fixed the problem and now it works.

In reality, my source input file is going to be huge, containing a couple
hundred paragraphs. I will be doing search and replace operations
on each paragraph located in vData(n), so I hope this doesn't run
slowly after storing an entire document in a single variable.

Do you think I will be better off reading each line, one at time,
and append each sentence to a string variable until it encounters
the star * border? That way, only one paragraph at a time is loaded
into an array or string or vData??

Thanks again for being such a great help!



  #15   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default Extract paragraphs from text file (Got it!)

Do you think I will be better off reading each line, one at time,
and append each sentence to a string variable until it encounters
the star * border? That way, only one paragraph at a time is loaded
into an array or string or vData??


Reading/working line-by-line is much slower, normally, than
reading/working in memory. Afew hundred paragraphs is relatively small
amount of data.

You can further split each paragraph into sentences...

Dim vTmp, j&
For n = LBound(vData) To UBound(vData)
vTmp = Split(vData(n), vbCrLf)
Debug.Print "Begin paragraph " & n
For j = LBound(vTmp) To UBound(vTmp)
Debug.Print vTmp(j)
Next 'j
Debug.Print "End paragraph " & n
Next 'n

...and view each sentence in the Immediate Window.

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion




  #16   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 161
Default Extract paragraphs from text file

"GS" wrote:

Function ReadTextFile$(Filename$)
' Reads large amounts of data from a text file in one single step.
Dim iNum%
On Error GoTo ErrHandler
iNum = FreeFile(): Open Filename For Input As #iNum
ReadTextFile = Space$(LOF(iNum))
ReadTextFile = Input(LOF(iNum), iNum) ' Does this line execute?

ErrHandler:
Close #iNum: If Err Then Err.Raise Err.Number, , Err.Description
End Function 'ReadTextFile()


I see that the ReadTextFile function is assigned TWO different
values.

First, it is assigned Space$(LOF(iNum))
And then is assigned Input(LOF(iNum), iNum)

I thought a function was supposed to exit once it is assigned
a value? Is it possible to assign a function a value like that?
Twice in succession??




  #17   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default Extract paragraphs from text file

I see that the ReadTextFile function is assigned TWO different
values.

First, it is assigned Space$(LOF(iNum))
And then is assigned Input(LOF(iNum), iNum)

I thought a function was supposed to exit once it is assigned
a value?


I've never heard of such a rule. Who made this one up?

Is it possible to assign a function a value like that?
Twice in succession??


Obviously, yes it is! The 1st assignment sets the buffer size to match
LOF. The 2nd assignment loads the file.

The long coding method would be to dim a string var and use that for
the 2 assignments, then assign it to the function. I just don't see the
point in the extraneous coding!

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion


  #18   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 161
Default Extract paragraphs from text file

"GS" wrote:

I thought a function was supposed to exit once it is assigned
a value?


I've never heard of such a rule. Who made this one up?


I guess I was thinking too much about a different programming
language, such as C or C++ where it uses a "return" statement
to (supposedly) immediately exit from a function and return
one value.

In my mind, I guess was comparing the C/C++ "return" command
to Visual Basic's concept of assigning a value to the name of
function while inside that function. I just thought it was kind of
odd, so I had to ask out of curiousity.

BTW, that ReadTextFile is very useful, especially when used
in conjection with the Split() function. Awesome stuff! Thanks.


  #19   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default Extract paragraphs from text file

"GS" wrote:

I thought a function was supposed to exit once it is assigned
a value?


I've never heard of such a rule. Who made this one up?


I guess I was thinking too much about a different programming
language, such as C or C++ where it uses a "return" statement
to (supposedly) immediately exit from a function and return
one value.

In my mind, I guess was comparing the C/C++ "return" command
to Visual Basic's concept of assigning a value to the name of
function while inside that function. I just thought it was kind of
odd, so I had to ask out of curiousity.


Well, your assessment that it's 'odd' is accurate because I've never
seen an example where a string var is not used (sized/loaded) before
assigning its val to the function. In this case, though, nothing is
being done to the retrieved data that warrants the extra coding.

BTW, that ReadTextFile is very useful, especially when used
in conjection with the Split() function.


Its companion to write is also very useful, especially when used in
conjunction with the Join() function!

Awesome stuff! Thanks.


You're welcome! I appreciate the feedback...

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion


Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How Do I Replace Paragraphs With New Paragraphs In Excel? vselin1 Excel Discussion (Misc queries) 7 July 29th 09 04:21 PM
how to format large word paragraphs into an excel file dingbat Excel Worksheet Functions 1 June 16th 09 07:14 PM
formating text cell paragraphs roypea Excel Discussion (Misc queries) 1 January 30th 08 06:03 PM
Getting paragraphs of text into a textbox control Gordon[_2_] Excel Programming 5 August 19th 05 08:47 AM
Text paragraphs and image Mika[_2_] Excel Programming 1 July 23rd 03 02:38 AM


All times are GMT +1. The time now is 09:21 PM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"