View Single Post
  #6   Report Post  
Posted to microsoft.public.excel.programming
Robert Crandal[_3_] Robert Crandal[_3_] is offline
external usenet poster
 
Posts: 161
Default Extract paragraphs from text file

"GS" wrote:

If you use the 2nd suggestion it will remove all empty lines and asterisk
lines before searching for profane words. It's a 2-step process but is
more efficient than my 1st offering since it treats all 3 aspects of
'cleaning' up your files...


I tried both versions, but it only removes the sentence that contains
my profanity words. I need to remove the entire paragraph.
Here is my test data:

---------------[BEGIN INPUT "book.txt"]--------------
playboys regrows correality requisition droits offered
angeles surfy wile lacrimation aged seignories practicing
hereinto workmanship fuggy municipally asdf underpinnings
brocket unpremeditated pinochle crazier coaeval obviously
able supinated hostler burrows artichoke vivant crosstown
********************
baneful celebrations angle growler landscape beside tzetzes
normal bootery bespoke henhouses tribuneship bouncer
displeasure crewman tenth curarization honestness sensitize
reminisces cometh fuk obscurantists eventualities mechanics
vanity crap nonalignment dowering nephew nonconfidence
********************
chaotically sooners rocketing luckiest holeproof damnableness
soc infertilely supernumerary expertise sulphid frisson
surceases joyously kins drooled agrarianism paraphrases ribby
wittiness grabbiest junketer accumulable hemokonia matriculants
sieged yuio forgoes staking nonadjacent offprint mug pawpaw
-------------------[END INPUT]----------------------


The desired output should have been:

--------------------[BEGIN OUTPUT]---------------------
playboys regrows correality requisition droits offered
angeles surfy wile lacrimation aged seignories practicing
hereinto workmanship fuggy municipally asdf underpinnings
brocket unpremeditated pinochle crazier coaeval obviously
able supinated hostler burrows artichoke vivant crosstown
chaotically sooners rocketing luckiest holeproof damnableness
soc infertilely supernumerary expertise sulphid frisson
surceases joyously kins drooled agrarianism paraphrases ribby
wittiness grabbiest junketer accumulable hemokonia matriculants
sieged yuio forgoes staking nonadjacent offprint mug pawpaw
------------------[END OUTPUT]---------------------------


And, here is the code that I tested:

Const sProfaneWords$ = "crap,fuk" '//and so on
Sub ExtractTest()

Dim sFilename As String
Dim sOutfile As String
Dim vWord, vData, n&

sFilename = "book.txt"
sOutfile = "out.txt"

' This assumes the full path and filename is held in 'sFilename'
vData = Split(ReadTextFile(sFilename), vbCrLf)

'Filter junk lines
For n = LBound(vData) To UBound(vData)
If (vData(n) = "") _
Or (InStr(vData(n), "*") 0) Then
vData(n) = "~": Exit For
End If
Next 'n

'Filter profane words
For Each vWord In Split(sProfaneWords, ",")
For n = LBound(vData) To UBound(vData)
If (InStr(vData(n), vWord) 0) Then
vData(n) = "~": Exit For
End If
Next 'n
Next 'vWord
vData = Filter(vData, "~", False)

WriteTextFile Join(vData, vbCrLf), sOutfile

End Sub
Function ReadTextFile$(Filename$)
' Reads large amounts of data from a text file in one single step.
Dim iNum%
On Error GoTo ErrHandler
iNum = FreeFile(): Open Filename For Input As #iNum
ReadTextFile = Space$(LOF(iNum))
ReadTextFile = Input(LOF(iNum), iNum)

ErrHandler:
Close #iNum: If Err Then Err.Raise Err.Number, , Err.Description
End Function 'ReadTextFile()
Sub WriteTextFile(TextOut$, Filename$, _
Optional AppendMode As Boolean = False)
' Reusable procedure that Writes/Overwrites or Appends
' large amounts of data to a Text file in one single step.
' **Does not create a blank line at the end of the file**
Dim iNum%
On Error GoTo ErrHandler
iNum = FreeFile()
If AppendMode Then
Open Filename For Append As #iNum: Print #iNum, vbCrLf & TextOut;
Else
Open Filename For Output As #iNum: Print #iNum, TextOut;
End If

ErrHandler:
Close #iNum: If Err Then Err.Raise Err.Number, , Err.Description
End Sub 'WriteTextFile()