View Single Post
  #2   Report Post  
Posted to microsoft.public.excel.programming
GS[_2_] GS[_2_] is offline
external usenet poster
 
Posts: 3,514
Default Extract paragraphs from text file

I have a huge text file that contains paragraphs
separated by dotted borders. Each border
is a line of stars such as: ****************

What is an efficient way to selectively extract
paragraphs and append them to a new text file?
I will only select a paragraph if it does not
contain profanity words or other select words.

For example, given this input:

------BEGIN INPUT-------
Hello. This is a short novel written
by someone.
********************
I dont give a **** who reads this.
********************
Hey, sometimes **** happens, but
you gotta keep going.
********************
The end!
********************
------END INPUT-------

The output should be:

OUTPUT FILE:
Hello. This is a short novel written
by someone.
The end!


You'll need to search each line for the specific profanity words you
want to filter out along with lines containing the asterisks! Once you
have a delimited list of profanity words you can loop through it using
a For..Each construct and the InStr() function.

You'll also need to load the file into an array and inner loop that
using a For..Next construct. If you find words then 'falg' that line's
array element to a single unlikely character (like "~" for example) and
use the Filter() function on the array to strip out elements containing
the 'flag' character. Once done you can write the array out to a
file...

Const sProfaneWords$ = "word1,word2,word3" '//and so on

Dim vWord, vData, n&
' This assumes the full path and filename is held in 'sFilename'
vData = Split(ReadTextFile(sFilename), vbCrLf)

For Each vWord In Split(sProfaneWords, ",")
For n = LBound(vData) to UBound(vData)
If (InStr(vData(n), vWord) 0) _
Or (InStr(vData(n), "*") 0) Then
vData(n) = "~": Exit For
Next 'n
Next 'vWord
vData = Filter(vData, "~", False)

WriteTextFile Join(vData, vbCrLf), sFilename

...which uses the following support routines...

Function ReadTextFile$(Filename$)
' Reads large amounts of data from a text file in one single step.
Dim iNum%
On Error GoTo ErrHandler
iNum = FreeFile(): Open Filename For Input As #iNum
ReadTextFile = Space$(LOF(iNum))
ReadTextFile = Input(LOF(iNum), iNum)

ErrHandler:
Close #iNum: If Err Then Err.Raise Err.Number, , Err.Description
End Function 'ReadTextFile()

Sub WriteTextFile(TextOut$, Filename$, _
Optional AppendMode As Boolean = False)
' Reusable procedure that Writes/Overwrites or Appends
' large amounts of data to a Text file in one single step.
' **Does not create a blank line at the end of the file**
Dim iNum%
On Error GoTo ErrHandler
iNum = FreeFile()
If AppendMode Then
Open Filename For Append As #iNum: Print #iNum, vbCrLf & TextOut;
Else
Open Filename For Output As #iNum: Print #iNum, TextOut;
End If

ErrHandler:
Close #iNum: If Err Then Err.Raise Err.Number, , Err.Description
End Sub 'WriteTextFile()

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion