ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Programming (https://www.excelbanter.com/excel-programming/)
-   -   Regex Capture problem (https://www.excelbanter.com/excel-programming/388143-regex-capture-problem.html)

Dave Runyan

Regex Capture problem
 
I am using the VBScript_RegExp_55 library in an Excel-based VBA project to
build a text-file processor that uses rules stored in an Excel spreadsheet.
I have used Regex utilities before, so I understand the concepts of text
capture and re-use in a Regex expression.

But when I try to capture text with (*), for example, and then try to re-use
it with \1, I get either no changes made, or an object-generated error 5081.
I have not been able to isolate when these two results occur.

I believe my program code is basically correct, because I can do search and
replace type stuff just fine, including in the same "run" as the capture
operations that fail. Here are my options settings:

objRgx.IgnoreCase = False ' Set case sensitive.
objRgx.Global = True ' Set replace all.
objRgx.MultiLine = True ' Set multiline mode.

Thanks for any help you can provide.

Ron Rosenfeld

Regex Capture problem
 
On Wed, 25 Apr 2007 08:56:04 -0700, Dave Runyan
wrote:

I am using the VBScript_RegExp_55 library in an Excel-based VBA project to
build a text-file processor that uses rules stored in an Excel spreadsheet.
I have used Regex utilities before, so I understand the concepts of text
capture and re-use in a Regex expression.

But when I try to capture text with (*), for example, and then try to re-use
it with \1, I get either no changes made, or an object-generated error 5081.
I have not been able to isolate when these two results occur.

I believe my program code is basically correct, because I can do search and
replace type stuff just fine, including in the same "run" as the capture
operations that fail. Here are my options settings:

objRgx.IgnoreCase = False ' Set case sensitive.
objRgx.Global = True ' Set replace all.
objRgx.MultiLine = True ' Set multiline mode.

Thanks for any help you can provide.


How about giving an example of your program code, input, and desired output?
--ron

Dave Runyan

Regex Capture problem
 
Here are two regex search and replace strings. The first does works, but the
second does not:

Search for = "<TITLEPage-1</TITLE"
Replace with = "<TITLEStaffing Process Model</TITLE"


This did not do any capture or replace:
Search for = "<CENTER\n([^\n]*)_DPM Model - (.*)HREF="([^\n]*).html"
Replace with = "<CENTER\n\1_DPM Model - \2HREF= "\1_DPM Model - \3.html"

Here are lines in the html- first 3 show where I want to pick-up the
extended file name which is "STAFFING_VISION-REF_2007-03-31_DPM_Model_-_"

<CENTER
<IMG SRC="STAFFING_VISION-REF_2007-03-31_DPM_Model_-_Talent_acquired.jpg"
ALT="" BORDER="0" USEMAP="#image_map"
</CENTER

Here is where I want to insert the extended filename in front of the "Talent
Need defined.html". The stuff after ".html" will be processed and removed in
later steps:

<AREA SHAPE="CIRCLE" COORDS="103,395,44" HREF="Talent Need defined.html,
1UP: Toplevel"

Here are the variables as sent to the function:
strRgxSearchPattern = "<CENTER\n\1_DPM Model - \2HREF= "\1_DPM Model -
\3.html"
strRgxInput = "<CENTER\n([^\n]*)_DPM Model - (.*)HREF="([^\n]*).html"


Here is the function:
Private Function RgxReplaceText(strRgxSearchPattern, strRgxInput,
strRgxOutput) As String
Dim objRgx As New RegExp
objRgx.IgnoreCase = False ' Set case sensitive.
objRgx.Global = True ' Set replace all.
objRgx.MultiLine = True ' Set multiline mode.
objRgx.pattern = strRgxSearchPattern
RgxReplaceText = objRgx.Replace(strRgxInput, strRgxOutput)
End Function

"Ron Rosenfeld" wrote:

On Wed, 25 Apr 2007 08:56:04 -0700, Dave Runyan
wrote:

I am using the VBScript_RegExp_55 library in an Excel-based VBA project to
build a text-file processor that uses rules stored in an Excel spreadsheet.
I have used Regex utilities before, so I understand the concepts of text
capture and re-use in a Regex expression.

But when I try to capture text with (*), for example, and then try to re-use
it with \1, I get either no changes made, or an object-generated error 5081.
I have not been able to isolate when these two results occur.

I believe my program code is basically correct, because I can do search and
replace type stuff just fine, including in the same "run" as the capture
operations that fail. Here are my options settings:

objRgx.IgnoreCase = False ' Set case sensitive.
objRgx.Global = True ' Set replace all.
objRgx.MultiLine = True ' Set multiline mode.

Thanks for any help you can provide.


How about giving an example of your program code, input, and desired output?
--ron


Ron Rosenfeld

Regex Capture problem
 
On Wed, 25 Apr 2007 12:52:01 -0700, Dave Runyan
wrote:

Replace with = "<CENTER\n\1_DPM Model - \2HREF= "\1_DPM Model - \3.html"


I didn't test it but I believe, in the Replace with string, you need to use the
tokens $1 $2 $3
--ron

Ron Rosenfeld

Regex Capture problem
 
On Wed, 25 Apr 2007 08:56:04 -0700, Dave Runyan
wrote:

I am using the VBScript_RegExp_55 library in an Excel-based VBA project to
build a text-file processor that uses rules stored in an Excel spreadsheet.
I have used Regex utilities before, so I understand the concepts of text
capture and re-use in a Regex expression.

But when I try to capture text with (*), for example, and then try to re-use
it with \1, I get either no changes made, or an object-generated error 5081.
I have not been able to isolate when these two results occur.

I believe my program code is basically correct, because I can do search and
replace type stuff just fine, including in the same "run" as the capture
operations that fail. Here are my options settings:

objRgx.IgnoreCase = False ' Set case sensitive.
objRgx.Global = True ' Set replace all.
objRgx.MultiLine = True ' Set multiline mode.

Thanks for any help you can provide.


Unfortunately, I'm still not following your example.

But, so far as your question about capturing text with (*), and then re-using
it, note the following:

Given the UDF:

===================================
Function RESub(str As String, SrchFor As String, ReplWith As String) As String
Dim objRegExp As RegExp

Set objRegExp = New RegExp
objRegExp.Pattern = SrchFor
objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.MultiLine = True

RESub = objRegExp.Replace(str, ReplWith)

End Function
================================

With these simple parameters:

str = "Now is the time"
SrchFor = Now is (.*) time
ReplWith = "$1"

Function returns "the" as would be expected.


--ron

Dave Runyan

Regex Capture problem
 
Thanks, Ron - as it is so often the problem was the nut holding the wheel. I
"learned" my regex using a freeware utility that had slightly different
syntax, so I had to convert many lines of expressions. Along the way I
somehow forgot that * is an enumerator, not a character placeholder - so I
was trying to capture (*) instead of (.*). As soon as I saw your fragment I
realized what the problem was.

Initially I ALSO had the parameters backwards in the method invocation
(having the search in a proprerty, and the input and replacement strings in
the call is just hard for me to grok it seems), but when I fixed that, I then
made the * error and the message I got was the same, 5018, with no helpful
text.

Okay - thanks again for your help - I will probably be asking something
again sooner than I imagine.

"Ron Rosenfeld" wrote:

On Wed, 25 Apr 2007 08:56:04 -0700, Dave Runyan
wrote:

I am using the VBScript_RegExp_55 library in an Excel-based VBA project to
build a text-file processor that uses rules stored in an Excel spreadsheet.
I have used Regex utilities before, so I understand the concepts of text
capture and re-use in a Regex expression.

But when I try to capture text with (*), for example, and then try to re-use
it with \1, I get either no changes made, or an object-generated error 5081.
I have not been able to isolate when these two results occur.

I believe my program code is basically correct, because I can do search and
replace type stuff just fine, including in the same "run" as the capture
operations that fail. Here are my options settings:

objRgx.IgnoreCase = False ' Set case sensitive.
objRgx.Global = True ' Set replace all.
objRgx.MultiLine = True ' Set multiline mode.

Thanks for any help you can provide.


Unfortunately, I'm still not following your example.

But, so far as your question about capturing text with (*), and then re-using
it, note the following:

Given the UDF:

===================================
Function RESub(str As String, SrchFor As String, ReplWith As String) As String
Dim objRegExp As RegExp

Set objRegExp = New RegExp
objRegExp.Pattern = SrchFor
objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.MultiLine = True

RESub = objRegExp.Replace(str, ReplWith)

End Function
================================

With these simple parameters:

str = "Now is the time"
SrchFor = Now is (.*) time
ReplWith = "$1"

Function returns "the" as would be expected.


--ron


Ron Rosenfeld

Regex Capture problem
 
On Thu, 26 Apr 2007 16:30:03 -0700, Dave Runyan
wrote:

Thanks, Ron - as it is so often the problem was the nut holding the wheel. I
"learned" my regex using a freeware utility that had slightly different
syntax, so I had to convert many lines of expressions. Along the way I
somehow forgot that * is an enumerator, not a character placeholder - so I
was trying to capture (*) instead of (.*). As soon as I saw your fragment I
realized what the problem was.

Initially I ALSO had the parameters backwards in the method invocation
(having the search in a proprerty, and the input and replacement strings in
the call is just hard for me to grok it seems), but when I fixed that, I then
made the * error and the message I got was the same, 5018, with no helpful
text.

Okay - thanks again for your help - I will probably be asking something
again sooner than I imagine.


Well, I'm glad my efforts helped point you in a useful direction. There are
certainly some aspects of Regular Expressions as used in Perl, for example,
that are not replicated in VB.

Best wishes.
--ron


All times are GMT +1. The time now is 05:19 PM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com