ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Programming (https://www.excelbanter.com/excel-programming/)
-   -   Find & Replace in Multi-Megabyte Strings (https://www.excelbanter.com/excel-programming/326464-find-replace-multi-megabyte-strings.html)

R Avery[_2_]

Find & Replace in Multi-Megabyte Strings
 
This is a re-post of "Batch Replace function for Large strings"

String processing in VBA is very slow when strings are large (1-500
MB). I have a
function that I've been using for doing batch replace ops (in the above
referenced previous post), but
it chokes on large strings with many replacements to do (like 50),
because it has to do 50 passes of the string to perform the
replacements.

Has anyone written a fast function designed to do the same thing for
large strings but only makes one pass through the data?


keepITcool

Find & Replace in Multi-Megabyte Strings
 


Rex Avery... REGEX Avery should be your name!


Have a look at regex. (regular expressions)

Following test builds a 28million character teststring, it will still
loop thru all the replacements, but regex replace is a "trifle" faster
than vba's.

Following example takes 20 seconds for the 6 replacements.
(but note that each replacement occurs 1 million times!

The example uses late bound code, it's a little bit faster when you
reference "Microsoft VBScript Regular Expression 5.5" and dim the rgx
variable as RegExp.

You'll have to learn some regex patterns though!
http://msdn.microsoft.com/library/de.../en-us/script5
6/html/vspropattern.asp

or try and buy RegExBuddy. (be aware that regexbuddy supports more
advanced flavors than vbscripts regex)

Sub TestRegExReplace()

Dim str$, arrPat, arrRep

Dim rgx As Object
Dim t!, n&
'This is slow but nevermind..
'it builds a string of 28 000 000 chars.

str = "This is the string I have." & vbLf
For n = 1 To 20
str = str & str
Next
Debug.Print Len(str)

arrPat = Array("This", "is", "the", "string", "I", "have")
arrRep = Array("That", "was", "that", "text", "you", "had")

t = Timer
Set rgx = CreateObject("vbscript.regexp")
With rgx
.Global = True
.IgnoreCase = True
For n = LBound(arrPat) To UBound(arrPat)
.Pattern = "\b" & arrPat(n) & "\b"
str = .Replace(str, arrRep(n))
Next
End With

t = Timer - t
MsgBox Left(str, InStr(50, str, vbLf)) & "in " & t & "seconds"

End Sub


HTH... (but i'm fairly sure it does)




--
keepITcool
| www.XLsupport.com | keepITcool chello nl | amsterdam


R Avery wrote :

This is a re-post of "Batch Replace function for Large strings"

String processing in VBA is very slow when strings are large (1-500
MB). I have a
function that I've been using for doing batch replace ops (in the
above referenced previous post), but
it chokes on large strings with many replacements to do (like 50),
because it has to do 50 passes of the string to perform the
replacements.

Has anyone written a fast function designed to do the same thing for
large strings but only makes one pass through the data?


R Avery[_2_]

Find & Replace in Multi-Megabyte Strings
 
Cool! The RegEx (early bound) in my tests is 4-5x faster than VBA's
replace. However, I think that this might even be able to be improved
more... I think it should be possible to do all of the find/replaces
in a single pass of the string, further improving the speed by a factor
of however many replaces there are to do.

Since doing 6 search and replaces requires calling the Regex replace
function 6 times, it is iterating over the string 6 times, but it
should really only need to do it once.

Perhaps some sort of clever regex could combine all of the
search/replace ops into a single call of .Replace.

Or perhaps a custom procedure would be better... i'll make a post
if/when i find the answer i'm looking for.

Thanks for the idea! And for now, i'll take the 5x improvement ;D



All times are GMT +1. The time now is 11:16 PM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
ExcelBanter.com