Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 395
Default VBA Regex syntax problems (XL2003)


I have a long string, in the format

BAR garbage text FOO desired string 1 BAR desired string 2 BAR desired
string 3 BAR desired string 4 BAR (etc).

What I'd like to extract is:
desired string 1
desired string 2
desired string 3
desired string 4

I'm trying variations of re.Pattern = "FOO|BAR [\s\S]*?BAR "
but I'm having some problems.

(1) The first returned string returns
BAR garbage text FOO desired string 1 BAR
where I'd prefer to only start with
FOO desired string 1 BAR ... or better,
desired string 1

I can work around that manually, but I welcome any suggestions on how to fix
it.

(2) Each search 'consumes' the terminating string "BAR", causing it to skip
every other string, giving me:
BAR garbage text FOO desired string 1 BAR
BAR desired string 3 BAR
BAR desired string 5 BAR

How do I force it to re-use the terminating string as the start string of
the next part of the search?

(3) I'm not sure if I need it or not, but I tried using re.multiline = True
just to be cautious, and I get a compile error "Method or data member not
found". I do have a reference to Microsoft VBScript Regular Expressions 1.0.
Do I need a different reference to enable multiline? Again, the strings still
seem to be coming across ok (including across multiple lines), I'm just
surprised I don't need (and in fact, can't set) the multiline parameter.

Thank you,
Keith
  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 703
Default VBA Regex syntax problems (XL2003)

Hi Keith

Look at this:

Sub AAA()
Dim MyString As String
Dim Output As String

MyString = "BAR garbage text FOO desired string 1 BAR desired string 2
BAR desired string 3 BAR desired string 4 BAR"
MyString = Mid(MyString, WorksheetFunction.Find("FOO", MyString) + 4)
Do
Output = Left(MyString, WorksheetFunction.Find(" BAR", MyString))
MyString = Mid(MyString, WorksheetFunction.Find("BAR", MyString) +
4)
Debug.Print Output
Loop Until MyString = ""
End Sub

Hopes this helps.
....
Per


On 15 Jul., 20:12, ker_01 wrote:
I have a long string, in the format

BAR garbage text FOO desired string 1 BAR desired string 2 BAR desired
string 3 BAR desired string 4 BAR (etc).

What I'd like to extract is:
desired string 1
desired string 2
desired string 3
desired string 4

I'm trying variations of re.Pattern = "FOO|BAR [\s\S]*?BAR "
but I'm having some problems.

(1) The first returned string returns
BAR garbage text FOO desired string 1 BAR
where I'd prefer to only start with
FOO desired string 1 BAR *... or better,
desired string 1

I can work around that manually, but I welcome any suggestions on how to fix
it.

(2) Each search 'consumes' the terminating string "BAR", causing it to skip
every other string, giving me:
BAR garbage text FOO desired string 1 BAR
BAR desired string 3 BAR
BAR desired string 5 BAR

How do I force it to re-use the terminating string as the start string of
the next part of the search?

(3) I'm not sure if I need it or not, but I tried using re.multiline = True
just to be cautious, and I get a compile error "Method or data member not
found". I do have a reference to Microsoft VBScript Regular Expressions 1..0.
Do I need a different reference to enable multiline? Again, the strings still
seem to be coming across ok (including across multiple lines), I'm just
surprised I don't need (and in fact, can't set) the multiline parameter.

Thank you,
Keith


  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,934
Default VBA Regex syntax problems (XL2003)

I'd be interested in knowing what text is in the "(etc)" part of your source
string. Can there be other FOO keywords? If so, do want the desired strings
between them and any BAR keywords that may follow it?

--
Rick (MVP - Excel)


"ker_01" wrote in message
...

I have a long string, in the format

BAR garbage text FOO desired string 1 BAR desired string 2 BAR desired
string 3 BAR desired string 4 BAR (etc).

What I'd like to extract is:
desired string 1
desired string 2
desired string 3
desired string 4

I'm trying variations of re.Pattern = "FOO|BAR [\s\S]*?BAR "
but I'm having some problems.

(1) The first returned string returns
BAR garbage text FOO desired string 1 BAR
where I'd prefer to only start with
FOO desired string 1 BAR ... or better,
desired string 1

I can work around that manually, but I welcome any suggestions on how to
fix
it.

(2) Each search 'consumes' the terminating string "BAR", causing it to
skip
every other string, giving me:
BAR garbage text FOO desired string 1 BAR
BAR desired string 3 BAR
BAR desired string 5 BAR

How do I force it to re-use the terminating string as the start string of
the next part of the search?

(3) I'm not sure if I need it or not, but I tried using re.multiline =
True
just to be cautious, and I get a compile error "Method or data member not
found". I do have a reference to Microsoft VBScript Regular Expressions
1.0.
Do I need a different reference to enable multiline? Again, the strings
still
seem to be coming across ok (including across multiple lines), I'm just
surprised I don't need (and in fact, can't set) the multiline parameter.

Thank you,
Keith


  #4   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 395
Default VBA Regex syntax problems (XL2003)

Hi Rick!

In my source string, the 'FOO' string is only in the template once, right
before the first child string; I'm hoping that none of the 'child' strings
happen to contain it by chance, or else I'll be doing some hand-editing on
those records.

There are about 20-30 child strings per parent string I need to process; BAR
is in the template at the end of each child string, so it occurs once between
each child string and at the end of the last child string. Unfortunately, BAR
also occurs once at the top of the template, which is what was giving me
issue #1 below.

Modified Example:
[start of file]
garbage text BAR garbage text FOO desired string 1 BAR desired string 2 BAR
desired string 3 BAR desired string 4 BAR desired string...30 BAR more
garbage text that does not include FOO or BAR
[end of file]

Thanks!
Keith


"Rick Rothstein" wrote:

I'd be interested in knowing what text is in the "(etc)" part of your source
string. Can there be other FOO keywords? If so, do want the desired strings
between them and any BAR keywords that may follow it?

--
Rick (MVP - Excel)


"ker_01" wrote in message
...

I have a long string, in the format

BAR garbage text FOO desired string 1 BAR desired string 2 BAR desired
string 3 BAR desired string 4 BAR (etc).

What I'd like to extract is:
desired string 1
desired string 2
desired string 3
desired string 4

I'm trying variations of re.Pattern = "FOO|BAR [\s\S]*?BAR "
but I'm having some problems.

(1) The first returned string returns
BAR garbage text FOO desired string 1 BAR
where I'd prefer to only start with
FOO desired string 1 BAR ... or better,
desired string 1

I can work around that manually, but I welcome any suggestions on how to
fix
it.

(2) Each search 'consumes' the terminating string "BAR", causing it to
skip
every other string, giving me:
BAR garbage text FOO desired string 1 BAR
BAR desired string 3 BAR
BAR desired string 5 BAR

How do I force it to re-use the terminating string as the start string of
the next part of the search?

(3) I'm not sure if I need it or not, but I tried using re.multiline =
True
just to be cautious, and I get a compile error "Method or data member not
found". I do have a reference to Microsoft VBScript Regular Expressions
1.0.
Do I need a different reference to enable multiline? Again, the strings
still
seem to be coming across ok (including across multiple lines), I'm just
surprised I don't need (and in fact, can't set) the multiline parameter.

Thank you,
Keith



  #5   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,934
Default VBA Regex syntax problems (XL2003)

Consider this coding approach...

Dim Children() As String, Parent As String, X As Long
Parent = "Garbage text BAR garbage text FOO desired string 1 BAR " & _
"desired string 2 BAR desired string 3 BAR desired string " & _
"4 BAR desired string 5 BAR More Garbage Text"
Children = Split(Split(Replace(Parent, " BAR ", "BAR"), "FOO ")(1), "BAR")
'
' Let's see if it worked...
'
For X = 0 To UBound(Children) - 1
Debug.Print Children(X)
Next

Note: This code is dependent on the keyword "BAR" being the last word of the
Parent string (I can code around that if it can be the last word, but your
sample seems to indicate it won't be). Also, the code is dependent on the
keywords BAR and FOO always being in upper case as your example shows (I can
code around this also if this will not always be the case).

--
Rick (MVP - Excel)


"ker_01" wrote in message
...
Hi Rick!

In my source string, the 'FOO' string is only in the template once, right
before the first child string; I'm hoping that none of the 'child' strings
happen to contain it by chance, or else I'll be doing some hand-editing on
those records.

There are about 20-30 child strings per parent string I need to process;
BAR
is in the template at the end of each child string, so it occurs once
between
each child string and at the end of the last child string. Unfortunately,
BAR
also occurs once at the top of the template, which is what was giving me
issue #1 below.

Modified Example:
[start of file]
garbage text BAR garbage text FOO desired string 1 BAR desired string 2
BAR
desired string 3 BAR desired string 4 BAR desired string...30 BAR more
garbage text that does not include FOO or BAR
[end of file]

Thanks!
Keith


"Rick Rothstein" wrote:

I'd be interested in knowing what text is in the "(etc)" part of your
source
string. Can there be other FOO keywords? If so, do want the desired
strings
between them and any BAR keywords that may follow it?

--
Rick (MVP - Excel)


"ker_01" wrote in message
...

I have a long string, in the format

BAR garbage text FOO desired string 1 BAR desired string 2 BAR desired
string 3 BAR desired string 4 BAR (etc).

What I'd like to extract is:
desired string 1
desired string 2
desired string 3
desired string 4

I'm trying variations of re.Pattern = "FOO|BAR [\s\S]*?BAR "
but I'm having some problems.

(1) The first returned string returns
BAR garbage text FOO desired string 1 BAR
where I'd prefer to only start with
FOO desired string 1 BAR ... or better,
desired string 1

I can work around that manually, but I welcome any suggestions on how
to
fix
it.

(2) Each search 'consumes' the terminating string "BAR", causing it to
skip
every other string, giving me:
BAR garbage text FOO desired string 1 BAR
BAR desired string 3 BAR
BAR desired string 5 BAR

How do I force it to re-use the terminating string as the start string
of
the next part of the search?

(3) I'm not sure if I need it or not, but I tried using re.multiline =
True
just to be cautious, and I get a compile error "Method or data member
not
found". I do have a reference to Microsoft VBScript Regular Expressions
1.0.
Do I need a different reference to enable multiline? Again, the strings
still
seem to be coming across ok (including across multiple lines), I'm just
surprised I don't need (and in fact, can't set) the multiline
parameter.

Thank you,
Keith






  #6   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,651
Default VBA Regex syntax problems (XL2003)

On Wed, 15 Jul 2009 11:12:02 -0700, ker_01
wrote:


I have a long string, in the format

BAR garbage text FOO desired string 1 BAR desired string 2 BAR desired
string 3 BAR desired string 4 BAR (etc).

What I'd like to extract is:
desired string 1
desired string 2
desired string 3
desired string 4

I'm trying variations of re.Pattern = "FOO|BAR [\s\S]*?BAR "
but I'm having some problems.

(1) The first returned string returns
BAR garbage text FOO desired string 1 BAR
where I'd prefer to only start with
FOO desired string 1 BAR ... or better,
desired string 1

I can work around that manually, but I welcome any suggestions on how to fix
it.

(2) Each search 'consumes' the terminating string "BAR", causing it to skip
every other string, giving me:
BAR garbage text FOO desired string 1 BAR
BAR desired string 3 BAR
BAR desired string 5 BAR

How do I force it to re-use the terminating string as the start string of
the next part of the search?

(3) I'm not sure if I need it or not, but I tried using re.multiline = True
just to be cautious, and I get a compile error "Method or data member not
found". I do have a reference to Microsoft VBScript Regular Expressions 1.0.
Do I need a different reference to enable multiline? Again, the strings still
seem to be coming across ok (including across multiple lines), I'm just
surprised I don't need (and in fact, can't set) the multiline parameter.

Thank you,
Keith


A few points:

1. I'd use Microsoft VBScript Regular Expressions 5.5
2. In VBScript, all the multiline option does is affect whether or not ^$
match at line breaks. Your use of [\s\S]+ is appropriate.

3. I would pre-process the string removing everything before the first FOO.
While you're at it, you should probably remove extraneous LF's.

e.g. Regex ".*?(FOO)|\n"
Replacement: "$1"

4. IT would have been helpful had you posted your code. But probably you need
to use submatches to tease out what you want. Here is code that seems to work
on your sample:

==========================
Option Explicit
Function FooBar(s As String, Index As Long) As String
Dim re As RegExp, mc As MatchCollection
Dim sS As String
Set re = New RegExp
With re
.IgnoreCase = True
.Global = True
.Pattern = ".*?(FOO)|\n"
sS = .Replace(s, "$1")

.Pattern = "(FOO|BAR)\s*([\s\S]+?)\s*(?=BAR)"
If .test(sS) = True Then
Set mc = .Execute(sS)
FooBar = mc(Index - 1).submatches(1)
End If
End With
End Function
=======================

The Index argument represents which of the instances of desired string are
extracted. 1=1st, etc.
--ron
  #7   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 395
Default VBA Regex syntax problems (XL2003)

Ron, Rick, and Per- thank you all for your help. I'm going to go back and try
to implement your suggestions now.

Ron, also thank you for mentioning the 5.5; I hadn't seen it when I added
the 1.0, but I went back and looked, and there it was.

:)
Keith

"Ron Rosenfeld" wrote:

On Wed, 15 Jul 2009 11:12:02 -0700, ker_01
wrote:


I have a long string, in the format

BAR garbage text FOO desired string 1 BAR desired string 2 BAR desired
string 3 BAR desired string 4 BAR (etc).

What I'd like to extract is:
desired string 1
desired string 2
desired string 3
desired string 4

I'm trying variations of re.Pattern = "FOO|BAR [\s\S]*?BAR "
but I'm having some problems.

(1) The first returned string returns
BAR garbage text FOO desired string 1 BAR
where I'd prefer to only start with
FOO desired string 1 BAR ... or better,
desired string 1

I can work around that manually, but I welcome any suggestions on how to fix
it.

(2) Each search 'consumes' the terminating string "BAR", causing it to skip
every other string, giving me:
BAR garbage text FOO desired string 1 BAR
BAR desired string 3 BAR
BAR desired string 5 BAR

How do I force it to re-use the terminating string as the start string of
the next part of the search?

(3) I'm not sure if I need it or not, but I tried using re.multiline = True
just to be cautious, and I get a compile error "Method or data member not
found". I do have a reference to Microsoft VBScript Regular Expressions 1.0.
Do I need a different reference to enable multiline? Again, the strings still
seem to be coming across ok (including across multiple lines), I'm just
surprised I don't need (and in fact, can't set) the multiline parameter.

Thank you,
Keith


A few points:

1. I'd use Microsoft VBScript Regular Expressions 5.5
2. In VBScript, all the multiline option does is affect whether or not ^$
match at line breaks. Your use of [\s\S]+ is appropriate.

3. I would pre-process the string removing everything before the first FOO.
While you're at it, you should probably remove extraneous LF's.

e.g. Regex ".*?(FOO)|\n"
Replacement: "$1"

4. IT would have been helpful had you posted your code. But probably you need
to use submatches to tease out what you want. Here is code that seems to work
on your sample:

==========================
Option Explicit
Function FooBar(s As String, Index As Long) As String
Dim re As RegExp, mc As MatchCollection
Dim sS As String
Set re = New RegExp
With re
.IgnoreCase = True
.Global = True
.Pattern = ".*?(FOO)|\n"
sS = .Replace(s, "$1")

.Pattern = "(FOO|BAR)\s*([\s\S]+?)\s*(?=BAR)"
If .test(sS) = True Then
Set mc = .Execute(sS)
FooBar = mc(Index - 1).submatches(1)
End If
End With
End Function
=======================

The Index argument represents which of the instances of desired string are
extracted. 1=1st, etc.
--ron

  #8   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,934
Default VBA Regex syntax problems (XL2003)

Just for clarification, my code assigns as many child strings as there are
to the Children array... that array's first element is at index 0 (the Split
function *always* produces a zero-based array) and the last Child string is
contained at index Ubound(Children)-1... there is a last element at index
Ubound(Children), but it contains the trailing garbage. While not as
efficient code-wise, we can make the code store only the child strings with
out the garbage like this...

Dim Children() As String, Parent As String, X As Long
Parent = "Garbage text BAR garbage text FOO desired string 1 BAR " & _
"desired string 2 BAR desired string 3 BAR desired string " & _
"4 BAR desired string 5 BAR More Garbage Text"
Children = Split(Split(Replace(Parent, " BAR ", "BAR"), "FOO ")(1), "BAR")
ReDim Preserve Children(0 To UBound(Children) - 1)

Doing it this way allows you to iterate the array like this...

' Let's see if it worked...
'
For X = 0 To UBound(Children)
Debug.Print Children(X)
Next

where you would use the actual upper bound as the array will now only
contain child strings without the trailing garbage.

--
Rick (MVP - Excel)


"Rick Rothstein" wrote in message
...
Consider this coding approach...

Dim Children() As String, Parent As String, X As Long
Parent = "Garbage text BAR garbage text FOO desired string 1 BAR " & _
"desired string 2 BAR desired string 3 BAR desired string " & _
"4 BAR desired string 5 BAR More Garbage Text"
Children = Split(Split(Replace(Parent, " BAR ", "BAR"), "FOO ")(1), "BAR")
'
' Let's see if it worked...
'
For X = 0 To UBound(Children) - 1
Debug.Print Children(X)
Next

Note: This code is dependent on the keyword "BAR" being the last word of
the Parent string (I can code around that if it can be the last word, but
your sample seems to indicate it won't be). Also, the code is dependent on
the keywords BAR and FOO always being in upper case as your example shows
(I can code around this also if this will not always be the case).

--
Rick (MVP - Excel)


"ker_01" wrote in message
...
Hi Rick!

In my source string, the 'FOO' string is only in the template once, right
before the first child string; I'm hoping that none of the 'child'
strings
happen to contain it by chance, or else I'll be doing some hand-editing
on
those records.

There are about 20-30 child strings per parent string I need to process;
BAR
is in the template at the end of each child string, so it occurs once
between
each child string and at the end of the last child string. Unfortunately,
BAR
also occurs once at the top of the template, which is what was giving me
issue #1 below.

Modified Example:
[start of file]
garbage text BAR garbage text FOO desired string 1 BAR desired string 2
BAR
desired string 3 BAR desired string 4 BAR desired string...30 BAR more
garbage text that does not include FOO or BAR
[end of file]

Thanks!
Keith


"Rick Rothstein" wrote:

I'd be interested in knowing what text is in the "(etc)" part of your
source
string. Can there be other FOO keywords? If so, do want the desired
strings
between them and any BAR keywords that may follow it?

--
Rick (MVP - Excel)


"ker_01" wrote in message
...

I have a long string, in the format

BAR garbage text FOO desired string 1 BAR desired string 2 BAR desired
string 3 BAR desired string 4 BAR (etc).

What I'd like to extract is:
desired string 1
desired string 2
desired string 3
desired string 4

I'm trying variations of re.Pattern = "FOO|BAR [\s\S]*?BAR "
but I'm having some problems.

(1) The first returned string returns
BAR garbage text FOO desired string 1 BAR
where I'd prefer to only start with
FOO desired string 1 BAR ... or better,
desired string 1

I can work around that manually, but I welcome any suggestions on how
to
fix
it.

(2) Each search 'consumes' the terminating string "BAR", causing it to
skip
every other string, giving me:
BAR garbage text FOO desired string 1 BAR
BAR desired string 3 BAR
BAR desired string 5 BAR

How do I force it to re-use the terminating string as the start string
of
the next part of the search?

(3) I'm not sure if I need it or not, but I tried using re.multiline =
True
just to be cautious, and I get a compile error "Method or data member
not
found". I do have a reference to Microsoft VBScript Regular
Expressions
1.0.
Do I need a different reference to enable multiline? Again, the
strings
still
seem to be coming across ok (including across multiple lines), I'm
just
surprised I don't need (and in fact, can't set) the multiline
parameter.

Thank you,
Keith




Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex syntax request for help Ker_01 Excel Programming 2 February 18th 08 04:07 PM
Syntax problems penguat Excel Programming 3 September 26th 07 02:12 PM
Indirect and Sumif Syntax Problems [email protected] Excel Discussion (Misc queries) 3 August 4th 06 01:59 AM
used to worked OK with Win98+XL2K but problems with WinXP+XL2003 now! J_J[_2_] Excel Programming 0 February 17th 05 04:57 PM


All times are GMT +1. The time now is 06:53 AM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"