Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
VBA Regex syntax problems (XL2003)
I have a long string, in the format BAR garbage text FOO desired string 1 BAR desired string 2 BAR desired string 3 BAR desired string 4 BAR (etc). What I'd like to extract is: desired string 1 desired string 2 desired string 3 desired string 4 I'm trying variations of re.Pattern = "FOO|BAR [\s\S]*?BAR " but I'm having some problems. (1) The first returned string returns BAR garbage text FOO desired string 1 BAR where I'd prefer to only start with FOO desired string 1 BAR ... or better, desired string 1 I can work around that manually, but I welcome any suggestions on how to fix it. (2) Each search 'consumes' the terminating string "BAR", causing it to skip every other string, giving me: BAR garbage text FOO desired string 1 BAR BAR desired string 3 BAR BAR desired string 5 BAR How do I force it to re-use the terminating string as the start string of the next part of the search? (3) I'm not sure if I need it or not, but I tried using re.multiline = True just to be cautious, and I get a compile error "Method or data member not found". I do have a reference to Microsoft VBScript Regular Expressions 1.0. Do I need a different reference to enable multiline? Again, the strings still seem to be coming across ok (including across multiple lines), I'm just surprised I don't need (and in fact, can't set) the multiline parameter. Thank you, Keith |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
VBA Regex syntax problems (XL2003)
Hi Keith
Look at this: Sub AAA() Dim MyString As String Dim Output As String MyString = "BAR garbage text FOO desired string 1 BAR desired string 2 BAR desired string 3 BAR desired string 4 BAR" MyString = Mid(MyString, WorksheetFunction.Find("FOO", MyString) + 4) Do Output = Left(MyString, WorksheetFunction.Find(" BAR", MyString)) MyString = Mid(MyString, WorksheetFunction.Find("BAR", MyString) + 4) Debug.Print Output Loop Until MyString = "" End Sub Hopes this helps. .... Per On 15 Jul., 20:12, ker_01 wrote: I have a long string, in the format BAR garbage text FOO desired string 1 BAR desired string 2 BAR desired string 3 BAR desired string 4 BAR (etc). What I'd like to extract is: desired string 1 desired string 2 desired string 3 desired string 4 I'm trying variations of re.Pattern = "FOO|BAR [\s\S]*?BAR " but I'm having some problems. (1) The first returned string returns BAR garbage text FOO desired string 1 BAR where I'd prefer to only start with FOO desired string 1 BAR *... or better, desired string 1 I can work around that manually, but I welcome any suggestions on how to fix it. (2) Each search 'consumes' the terminating string "BAR", causing it to skip every other string, giving me: BAR garbage text FOO desired string 1 BAR BAR desired string 3 BAR BAR desired string 5 BAR How do I force it to re-use the terminating string as the start string of the next part of the search? (3) I'm not sure if I need it or not, but I tried using re.multiline = True just to be cautious, and I get a compile error "Method or data member not found". I do have a reference to Microsoft VBScript Regular Expressions 1..0. Do I need a different reference to enable multiline? Again, the strings still seem to be coming across ok (including across multiple lines), I'm just surprised I don't need (and in fact, can't set) the multiline parameter. Thank you, Keith |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
VBA Regex syntax problems (XL2003)
I'd be interested in knowing what text is in the "(etc)" part of your source
string. Can there be other FOO keywords? If so, do want the desired strings between them and any BAR keywords that may follow it? -- Rick (MVP - Excel) "ker_01" wrote in message ... I have a long string, in the format BAR garbage text FOO desired string 1 BAR desired string 2 BAR desired string 3 BAR desired string 4 BAR (etc). What I'd like to extract is: desired string 1 desired string 2 desired string 3 desired string 4 I'm trying variations of re.Pattern = "FOO|BAR [\s\S]*?BAR " but I'm having some problems. (1) The first returned string returns BAR garbage text FOO desired string 1 BAR where I'd prefer to only start with FOO desired string 1 BAR ... or better, desired string 1 I can work around that manually, but I welcome any suggestions on how to fix it. (2) Each search 'consumes' the terminating string "BAR", causing it to skip every other string, giving me: BAR garbage text FOO desired string 1 BAR BAR desired string 3 BAR BAR desired string 5 BAR How do I force it to re-use the terminating string as the start string of the next part of the search? (3) I'm not sure if I need it or not, but I tried using re.multiline = True just to be cautious, and I get a compile error "Method or data member not found". I do have a reference to Microsoft VBScript Regular Expressions 1.0. Do I need a different reference to enable multiline? Again, the strings still seem to be coming across ok (including across multiple lines), I'm just surprised I don't need (and in fact, can't set) the multiline parameter. Thank you, Keith |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
VBA Regex syntax problems (XL2003)
Hi Rick!
In my source string, the 'FOO' string is only in the template once, right before the first child string; I'm hoping that none of the 'child' strings happen to contain it by chance, or else I'll be doing some hand-editing on those records. There are about 20-30 child strings per parent string I need to process; BAR is in the template at the end of each child string, so it occurs once between each child string and at the end of the last child string. Unfortunately, BAR also occurs once at the top of the template, which is what was giving me issue #1 below. Modified Example: [start of file] garbage text BAR garbage text FOO desired string 1 BAR desired string 2 BAR desired string 3 BAR desired string 4 BAR desired string...30 BAR more garbage text that does not include FOO or BAR [end of file] Thanks! Keith "Rick Rothstein" wrote: I'd be interested in knowing what text is in the "(etc)" part of your source string. Can there be other FOO keywords? If so, do want the desired strings between them and any BAR keywords that may follow it? -- Rick (MVP - Excel) "ker_01" wrote in message ... I have a long string, in the format BAR garbage text FOO desired string 1 BAR desired string 2 BAR desired string 3 BAR desired string 4 BAR (etc). What I'd like to extract is: desired string 1 desired string 2 desired string 3 desired string 4 I'm trying variations of re.Pattern = "FOO|BAR [\s\S]*?BAR " but I'm having some problems. (1) The first returned string returns BAR garbage text FOO desired string 1 BAR where I'd prefer to only start with FOO desired string 1 BAR ... or better, desired string 1 I can work around that manually, but I welcome any suggestions on how to fix it. (2) Each search 'consumes' the terminating string "BAR", causing it to skip every other string, giving me: BAR garbage text FOO desired string 1 BAR BAR desired string 3 BAR BAR desired string 5 BAR How do I force it to re-use the terminating string as the start string of the next part of the search? (3) I'm not sure if I need it or not, but I tried using re.multiline = True just to be cautious, and I get a compile error "Method or data member not found". I do have a reference to Microsoft VBScript Regular Expressions 1.0. Do I need a different reference to enable multiline? Again, the strings still seem to be coming across ok (including across multiple lines), I'm just surprised I don't need (and in fact, can't set) the multiline parameter. Thank you, Keith |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
VBA Regex syntax problems (XL2003)
Consider this coding approach...
Dim Children() As String, Parent As String, X As Long Parent = "Garbage text BAR garbage text FOO desired string 1 BAR " & _ "desired string 2 BAR desired string 3 BAR desired string " & _ "4 BAR desired string 5 BAR More Garbage Text" Children = Split(Split(Replace(Parent, " BAR ", "BAR"), "FOO ")(1), "BAR") ' ' Let's see if it worked... ' For X = 0 To UBound(Children) - 1 Debug.Print Children(X) Next Note: This code is dependent on the keyword "BAR" being the last word of the Parent string (I can code around that if it can be the last word, but your sample seems to indicate it won't be). Also, the code is dependent on the keywords BAR and FOO always being in upper case as your example shows (I can code around this also if this will not always be the case). -- Rick (MVP - Excel) "ker_01" wrote in message ... Hi Rick! In my source string, the 'FOO' string is only in the template once, right before the first child string; I'm hoping that none of the 'child' strings happen to contain it by chance, or else I'll be doing some hand-editing on those records. There are about 20-30 child strings per parent string I need to process; BAR is in the template at the end of each child string, so it occurs once between each child string and at the end of the last child string. Unfortunately, BAR also occurs once at the top of the template, which is what was giving me issue #1 below. Modified Example: [start of file] garbage text BAR garbage text FOO desired string 1 BAR desired string 2 BAR desired string 3 BAR desired string 4 BAR desired string...30 BAR more garbage text that does not include FOO or BAR [end of file] Thanks! Keith "Rick Rothstein" wrote: I'd be interested in knowing what text is in the "(etc)" part of your source string. Can there be other FOO keywords? If so, do want the desired strings between them and any BAR keywords that may follow it? -- Rick (MVP - Excel) "ker_01" wrote in message ... I have a long string, in the format BAR garbage text FOO desired string 1 BAR desired string 2 BAR desired string 3 BAR desired string 4 BAR (etc). What I'd like to extract is: desired string 1 desired string 2 desired string 3 desired string 4 I'm trying variations of re.Pattern = "FOO|BAR [\s\S]*?BAR " but I'm having some problems. (1) The first returned string returns BAR garbage text FOO desired string 1 BAR where I'd prefer to only start with FOO desired string 1 BAR ... or better, desired string 1 I can work around that manually, but I welcome any suggestions on how to fix it. (2) Each search 'consumes' the terminating string "BAR", causing it to skip every other string, giving me: BAR garbage text FOO desired string 1 BAR BAR desired string 3 BAR BAR desired string 5 BAR How do I force it to re-use the terminating string as the start string of the next part of the search? (3) I'm not sure if I need it or not, but I tried using re.multiline = True just to be cautious, and I get a compile error "Method or data member not found". I do have a reference to Microsoft VBScript Regular Expressions 1.0. Do I need a different reference to enable multiline? Again, the strings still seem to be coming across ok (including across multiple lines), I'm just surprised I don't need (and in fact, can't set) the multiline parameter. Thank you, Keith |
#6
Posted to microsoft.public.excel.programming
|
|||
|
|||
VBA Regex syntax problems (XL2003)
On Wed, 15 Jul 2009 11:12:02 -0700, ker_01
wrote: I have a long string, in the format BAR garbage text FOO desired string 1 BAR desired string 2 BAR desired string 3 BAR desired string 4 BAR (etc). What I'd like to extract is: desired string 1 desired string 2 desired string 3 desired string 4 I'm trying variations of re.Pattern = "FOO|BAR [\s\S]*?BAR " but I'm having some problems. (1) The first returned string returns BAR garbage text FOO desired string 1 BAR where I'd prefer to only start with FOO desired string 1 BAR ... or better, desired string 1 I can work around that manually, but I welcome any suggestions on how to fix it. (2) Each search 'consumes' the terminating string "BAR", causing it to skip every other string, giving me: BAR garbage text FOO desired string 1 BAR BAR desired string 3 BAR BAR desired string 5 BAR How do I force it to re-use the terminating string as the start string of the next part of the search? (3) I'm not sure if I need it or not, but I tried using re.multiline = True just to be cautious, and I get a compile error "Method or data member not found". I do have a reference to Microsoft VBScript Regular Expressions 1.0. Do I need a different reference to enable multiline? Again, the strings still seem to be coming across ok (including across multiple lines), I'm just surprised I don't need (and in fact, can't set) the multiline parameter. Thank you, Keith A few points: 1. I'd use Microsoft VBScript Regular Expressions 5.5 2. In VBScript, all the multiline option does is affect whether or not ^$ match at line breaks. Your use of [\s\S]+ is appropriate. 3. I would pre-process the string removing everything before the first FOO. While you're at it, you should probably remove extraneous LF's. e.g. Regex ".*?(FOO)|\n" Replacement: "$1" 4. IT would have been helpful had you posted your code. But probably you need to use submatches to tease out what you want. Here is code that seems to work on your sample: ========================== Option Explicit Function FooBar(s As String, Index As Long) As String Dim re As RegExp, mc As MatchCollection Dim sS As String Set re = New RegExp With re .IgnoreCase = True .Global = True .Pattern = ".*?(FOO)|\n" sS = .Replace(s, "$1") .Pattern = "(FOO|BAR)\s*([\s\S]+?)\s*(?=BAR)" If .test(sS) = True Then Set mc = .Execute(sS) FooBar = mc(Index - 1).submatches(1) End If End With End Function ======================= The Index argument represents which of the instances of desired string are extracted. 1=1st, etc. --ron |
#7
Posted to microsoft.public.excel.programming
|
|||
|
|||
VBA Regex syntax problems (XL2003)
Ron, Rick, and Per- thank you all for your help. I'm going to go back and try
to implement your suggestions now. Ron, also thank you for mentioning the 5.5; I hadn't seen it when I added the 1.0, but I went back and looked, and there it was. :) Keith "Ron Rosenfeld" wrote: On Wed, 15 Jul 2009 11:12:02 -0700, ker_01 wrote: I have a long string, in the format BAR garbage text FOO desired string 1 BAR desired string 2 BAR desired string 3 BAR desired string 4 BAR (etc). What I'd like to extract is: desired string 1 desired string 2 desired string 3 desired string 4 I'm trying variations of re.Pattern = "FOO|BAR [\s\S]*?BAR " but I'm having some problems. (1) The first returned string returns BAR garbage text FOO desired string 1 BAR where I'd prefer to only start with FOO desired string 1 BAR ... or better, desired string 1 I can work around that manually, but I welcome any suggestions on how to fix it. (2) Each search 'consumes' the terminating string "BAR", causing it to skip every other string, giving me: BAR garbage text FOO desired string 1 BAR BAR desired string 3 BAR BAR desired string 5 BAR How do I force it to re-use the terminating string as the start string of the next part of the search? (3) I'm not sure if I need it or not, but I tried using re.multiline = True just to be cautious, and I get a compile error "Method or data member not found". I do have a reference to Microsoft VBScript Regular Expressions 1.0. Do I need a different reference to enable multiline? Again, the strings still seem to be coming across ok (including across multiple lines), I'm just surprised I don't need (and in fact, can't set) the multiline parameter. Thank you, Keith A few points: 1. I'd use Microsoft VBScript Regular Expressions 5.5 2. In VBScript, all the multiline option does is affect whether or not ^$ match at line breaks. Your use of [\s\S]+ is appropriate. 3. I would pre-process the string removing everything before the first FOO. While you're at it, you should probably remove extraneous LF's. e.g. Regex ".*?(FOO)|\n" Replacement: "$1" 4. IT would have been helpful had you posted your code. But probably you need to use submatches to tease out what you want. Here is code that seems to work on your sample: ========================== Option Explicit Function FooBar(s As String, Index As Long) As String Dim re As RegExp, mc As MatchCollection Dim sS As String Set re = New RegExp With re .IgnoreCase = True .Global = True .Pattern = ".*?(FOO)|\n" sS = .Replace(s, "$1") .Pattern = "(FOO|BAR)\s*([\s\S]+?)\s*(?=BAR)" If .test(sS) = True Then Set mc = .Execute(sS) FooBar = mc(Index - 1).submatches(1) End If End With End Function ======================= The Index argument represents which of the instances of desired string are extracted. 1=1st, etc. --ron |
#8
Posted to microsoft.public.excel.programming
|
|||
|
|||
VBA Regex syntax problems (XL2003)
Just for clarification, my code assigns as many child strings as there are
to the Children array... that array's first element is at index 0 (the Split function *always* produces a zero-based array) and the last Child string is contained at index Ubound(Children)-1... there is a last element at index Ubound(Children), but it contains the trailing garbage. While not as efficient code-wise, we can make the code store only the child strings with out the garbage like this... Dim Children() As String, Parent As String, X As Long Parent = "Garbage text BAR garbage text FOO desired string 1 BAR " & _ "desired string 2 BAR desired string 3 BAR desired string " & _ "4 BAR desired string 5 BAR More Garbage Text" Children = Split(Split(Replace(Parent, " BAR ", "BAR"), "FOO ")(1), "BAR") ReDim Preserve Children(0 To UBound(Children) - 1) Doing it this way allows you to iterate the array like this... ' Let's see if it worked... ' For X = 0 To UBound(Children) Debug.Print Children(X) Next where you would use the actual upper bound as the array will now only contain child strings without the trailing garbage. -- Rick (MVP - Excel) "Rick Rothstein" wrote in message ... Consider this coding approach... Dim Children() As String, Parent As String, X As Long Parent = "Garbage text BAR garbage text FOO desired string 1 BAR " & _ "desired string 2 BAR desired string 3 BAR desired string " & _ "4 BAR desired string 5 BAR More Garbage Text" Children = Split(Split(Replace(Parent, " BAR ", "BAR"), "FOO ")(1), "BAR") ' ' Let's see if it worked... ' For X = 0 To UBound(Children) - 1 Debug.Print Children(X) Next Note: This code is dependent on the keyword "BAR" being the last word of the Parent string (I can code around that if it can be the last word, but your sample seems to indicate it won't be). Also, the code is dependent on the keywords BAR and FOO always being in upper case as your example shows (I can code around this also if this will not always be the case). -- Rick (MVP - Excel) "ker_01" wrote in message ... Hi Rick! In my source string, the 'FOO' string is only in the template once, right before the first child string; I'm hoping that none of the 'child' strings happen to contain it by chance, or else I'll be doing some hand-editing on those records. There are about 20-30 child strings per parent string I need to process; BAR is in the template at the end of each child string, so it occurs once between each child string and at the end of the last child string. Unfortunately, BAR also occurs once at the top of the template, which is what was giving me issue #1 below. Modified Example: [start of file] garbage text BAR garbage text FOO desired string 1 BAR desired string 2 BAR desired string 3 BAR desired string 4 BAR desired string...30 BAR more garbage text that does not include FOO or BAR [end of file] Thanks! Keith "Rick Rothstein" wrote: I'd be interested in knowing what text is in the "(etc)" part of your source string. Can there be other FOO keywords? If so, do want the desired strings between them and any BAR keywords that may follow it? -- Rick (MVP - Excel) "ker_01" wrote in message ... I have a long string, in the format BAR garbage text FOO desired string 1 BAR desired string 2 BAR desired string 3 BAR desired string 4 BAR (etc). What I'd like to extract is: desired string 1 desired string 2 desired string 3 desired string 4 I'm trying variations of re.Pattern = "FOO|BAR [\s\S]*?BAR " but I'm having some problems. (1) The first returned string returns BAR garbage text FOO desired string 1 BAR where I'd prefer to only start with FOO desired string 1 BAR ... or better, desired string 1 I can work around that manually, but I welcome any suggestions on how to fix it. (2) Each search 'consumes' the terminating string "BAR", causing it to skip every other string, giving me: BAR garbage text FOO desired string 1 BAR BAR desired string 3 BAR BAR desired string 5 BAR How do I force it to re-use the terminating string as the start string of the next part of the search? (3) I'm not sure if I need it or not, but I tried using re.multiline = True just to be cautious, and I get a compile error "Method or data member not found". I do have a reference to Microsoft VBScript Regular Expressions 1.0. Do I need a different reference to enable multiline? Again, the strings still seem to be coming across ok (including across multiple lines), I'm just surprised I don't need (and in fact, can't set) the multiline parameter. Thank you, Keith |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Regex syntax request for help | Excel Programming | |||
Syntax problems | Excel Programming | |||
Indirect and Sumif Syntax Problems | Excel Discussion (Misc queries) | |||
used to worked OK with Win98+XL2K but problems with WinXP+XL2003 now! | Excel Programming |