Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 86
Default Parsing a string

I want to parse a lot of data with rows like the following:

A A WOODS 1479 22RD STREET CALGARY AB T3H2C4 (403) 555-9999

I would like columns for NAME ADDRESS CITY PROV POSTALCODE PHONENUMBER

I can do it in Excel with formulas, but is there a nice piece of generic
code that does this sort of thing?
(or should I parse it based on using spaces as delimiters and then join the
fields that need joining?)

Jim Berglund

  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1,565
Default Parsing a string

The phone number, postal code, province and city could probably be picked
our pretty easily. Unless there is some consistency in the address and
names grouping, it would be very difficult to pick those out in one fell
swoop. Consistency would be that the address is always street number and a
two part street name separated by spaces, and the person's name would always
be first middle last separated by spaces. I doubt that those are entered
consitently in that fashion. If the street address always begins with a
number, it could be used a key to separate the name and street address,
which would then make it possible to break all the elements into their
respective groups. But I wouldn't want to tackle it.



"Jim Berglund" wrote in message
...
I want to parse a lot of data with rows like the following:

A A WOODS 1479 22RD STREET CALGARY AB T3H2C4 (403) 555-9999

I would like columns for NAME ADDRESS CITY PROV POSTALCODE PHONENUMBER

I can do it in Excel with formulas, but is there a nice piece of generic
code that does this sort of thing?
(or should I parse it based on using spaces as delimiters and then join
the fields that need joining?)

Jim Berglund



  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,651
Default Parsing a string

On Mon, 24 May 2010 16:58:10 -0600, "Jim Berglund" wrote:

I want to parse a lot of data with rows like the following:

A A WOODS 1479 22RD STREET CALGARY AB T3H2C4 (403) 555-9999

I would like columns for NAME ADDRESS CITY PROV POSTALCODE PHONENUMBER

I can do it in Excel with formulas, but is there a nice piece of generic
code that does this sort of thing?
(or should I parse it based on using spaces as delimiters and then join the
fields that need joining?)

Jim Berglund


How easy or hard this might be to do depends on format variations.

If you cannot set out firm rules for the data, you won't be able to do it.

Probably the PROV, POSTALCODE & PHONENUMBER can be easily parsed from the end
of the string.

The CITY would be one or several words that precede the PROV and you could use
a list of valid cities.

The ADDRESS precedes the city and starts with a number.

The NAME ends with the number.

If all of your rows can be unambiguously described, either as I have done or
some other method, then you should be able to parse either with formulas, or
with a macro.

Here's a sample macro with a very short list of possibly valid cities that
parses rows that fit the above rules. It should give you some ideas.

================================
Option Explicit
Sub ParseAddr()
Dim myRegExp As Object, myMatches As Object
Dim rg As Range, c As Range
Dim i As Long

Set rg = Selection
Set myRegExp = CreateObject("vbscript.regexp")
myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" _
& "\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})$"

For Each c In rg
If myRegExp.test(c.Text) = True Then
Set myMatches = myRegExp.Execute(c.Text)
For i = 0 To 5
c.Offset(0, i + 1) = myMatches(0).submatches(i)
Next i
End If
Next c
End Sub
====================================

The list of cities needs to be enclosed in parentheses and be pipe delimited.
They are tested in the order listed, so if you had several cities with similar
names, the order you list them in can be critical.
--ron
  #4   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 86
Default Parsing a string

Ron. Thanks for the effort you put in on this. I would really like to
understand this more.

Is it checking each row to see if it meets the pattern, and if so, doing the
parsing and then building a new, 5-column list?
What is the following line for?
Set myRegExp = CreateObject("vbscript.regexp")

Also, could you please help me understand the following? I've never seen any
code like it and I just don't have a clue how to interpret it...

myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" &
"\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})$"

I tried it and it ran without any errors. But it didn't do anything.

BTW, For anyone else looking at this, I found the following entry...


Formulas for parsing names.

1. Using an index:

=T(INDEX($M$1:$M$6,MAX(INDEX(COUNTIF(C1,$M$2:$M$6& {". *","
*"})*{1;2;3;4;5},0))+1))
(Set up an index in M1:M5)

2. Using OR
=IF(LEFT(A30,3)="Mrs","Mrs",IF(LEFT(A30,4)="Miss", "Miss",IF(OR(LEFT(A30,2)="Mr",LEFT(A30,2)="Ms",LEF T(A30,2)="Dr"),LEFT(A30,2),"X")))

3. Parsing, using VB


Option Explicit
Function ParseName(str As String, Index As Long) As String
Dim re As Object
Dim mc As Object
Dim sPat As String
Dim sTitle As String

'Index code
' 1 = Salutation
' 3 = First Name

'Pipe-delimited list of possible Titles
sTitle = "Mr|Ms|Miss|Mrs|Dr"

sPat = "^((" & sTitle & ")\.?(\s+))?(\w+)"



Set re = CreateObject("vbscript.regexp")
re.ignorecase = True
re.Pattern = sPat

If re.test(str) = True Then
Set mc = re.Execute(str)
ParseName = mc(0).submatches(Index)
End If
End Function

4. Using TRIM
Parsing names in Excel
Having just had to go through a process to clean up a list of names in a
spreadsheet, I thought I'd share the 'magic' formula I came up with.

As you can imagine, there was no pretty way to get the list into Excel,
there were several columns of names and other information, but with no
obvious delimiters to behold. I was amazed when I got the file, how bad it
looked. I can only assume this was a redirected print file from some
ancient mainframe system. It can't be that hard to get a nice comma
delimited file surely!?

The format of the name was either:
First Last
First MI Last
First Middle Last
Obtaining the first name was simple:
=LEFT(A11,(FIND(" ",A11,1)-1))
Getting the last name was a different matter:
=TRIM( IF(ISERR(FIND(" ",A11,(FIND(" ",A11,1)+1))),
MID(A11,FIND(" ",A11,1)+1,(LEN(A11)-FIND(" ",A11,1))),
MID(A11,FIND(" ",A11,(FIND(" ",A11,1)+1))+1,(LEN(A11)-FIND("
",A11,FIND(" ",A11,1))))))

Basically, remove any spaces and if the FIND returns an error, then the name
is just first and last names, otherwise there is a middle name or initial to
deal with. If no middle name/initial then get the last name, which starts 1
char after the space otherwise get the last name after locating the second
space in the string and start from there

Another Method for Parsing Names
Assume you have names of the following types in column C
Mrs Dorothy Hannity
Dr P R Rogers
Dana Delany
Mr Bradley K Pitts
Type the following formulas into the specified cells:
O1=FIND(" ",C1)
Determines the location/existence of the blank following the Salutaton or
First Name
P1=FIND(" ",C1,FIND(" ",C1)+1)
Determines the location/existence of the blank following the First Name or
Middle Initial(MI)
Q1=FIND(" ",R1)
Determines the location/existence of the blank following the Middle Initial
in the next, adjacent cell
R1=IF(ISERROR(P1),RIGHT(C1,LEN(C1)-O1),RIGHT(C1,LEN(C1)-P1))
Defines the Last Name or MI/LN if there is a MI
S1=LEFT(C1,O1-1)
Creates the Salutation column
T1=IF(ISERROR(S1)," ",MID(C1,O1+1,P1-O1))
Creates the First Name column
U1=IF(FIND(" ",R1)=2,LEFT(R1,1),"")
Creates the MI column
V1=IF(U1<"",MID(R1,FIND(" ",R1)+1,99),R1)
Creates the Last Name column
Fill the entries down and then copy the resulting values into another set of
columns.

Jim



"Ron Rosenfeld" wrote in message
...
On Mon, 24 May 2010 16:58:10 -0600, "Jim Berglund"
wrote:

I want to parse a lot of data with rows like the following:

A A WOODS 1479 22RD STREET CALGARY AB T3H2C4 (403) 555-9999

I would like columns for NAME ADDRESS CITY PROV POSTALCODE PHONENUMBER

I can do it in Excel with formulas, but is there a nice piece of generic
code that does this sort of thing?
(or should I parse it based on using spaces as delimiters and then join
the
fields that need joining?)

Jim Berglund


How easy or hard this might be to do depends on format variations.

If you cannot set out firm rules for the data, you won't be able to do it.

Probably the PROV, POSTALCODE & PHONENUMBER can be easily parsed from the
end
of the string.

The CITY would be one or several words that precede the PROV and you could
use
a list of valid cities.

The ADDRESS precedes the city and starts with a number.

The NAME ends with the number.

If all of your rows can be unambiguously described, either as I have done
or
some other method, then you should be able to parse either with formulas,
or
with a macro.

Here's a sample macro with a very short list of possibly valid cities that
parses rows that fit the above rules. It should give you some ideas.

================================
Option Explicit
Sub ParseAddr()
Dim myRegExp As Object, myMatches As Object
Dim rg As Range, c As Range
Dim i As Long

Set rg = Selection
Set myRegExp = CreateObject("vbscript.regexp")
myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" _
& "\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})$"

For Each c In rg
If myRegExp.test(c.Text) = True Then
Set myMatches = myRegExp.Execute(c.Text)
For i = 0 To 5
c.Offset(0, i + 1) = myMatches(0).submatches(i)
Next i
End If
Next c
End Sub
====================================

The list of cities needs to be enclosed in parentheses and be pipe
delimited.
They are tested in the order listed, so if you had several cities with
similar
names, the order you list them in can be critical.
--ron


  #5   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,651
Default Parsing a string

On Mon, 24 May 2010 22:27:18 -0600, "Jim Berglund" wrote:

Ron. Thanks for the effort you put in on this. I would really like to
understand this more.

Is it checking each row to see if it meets the pattern, and if so, doing the
parsing and then building a new, 5-column list?


Yes. And if your row does not meet the format, it will do nothing.


What is the following line for?
Set myRegExp = CreateObject("vbscript.regexp")


That's called late binding, and it invokes another program which allows the use
of Regular Expressions in VBA. Regular Expressions are a powerful method of
manipulating strings.


Also, could you please help me understand the following? I've never seen any
code like it and I just don't have a clue how to interpret it...

myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" &
"\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})$"


That is the pattern derived from the example you gave.

Everything was on one line so:

^ Start at the beginning of the line

(\D+) Capture everything that is not a digit into submatch 0. (In the
definition, the first digit was the beginning of the address). This will be
the name.

\s+ matches the space(s) between the end of the non-digits and the next
group.

(.*) capture everything, into submatch 1, up to the next match. This will
be the address.

\s matches the space between the end of the address and the next group

((CALGARY|MELBOURNE|SYDNEY) Short list of valid city names. This will need
to be extended to match your data. Capture into submatch 2.

\s+ matches the space(s) between the end of the city name and the province.

([A-Z]{2}) capture the next two capital letter abbreviation as the
Province into submatch 3

the rest matches the postcode, and phone number, again, according to the
pattern that was in the example you gave.




I tried it and it ran without any errors. But it didn't do anything.


It ran perfectly here using your line of sample data.

You have to enter and run it properly.

To enter this Macro (Sub), <alt-F11 opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code into the window that opens.

To use this Macro (Sub), FIRST *SELECT* the data you wish to parse. Then
<alt-F8 opens the macro dialog box. Select the macro by name, and <RUN.

If you did all that, and the macro did not do anything, then the data you ran
it against did not match, in format, what I had laid out. Or, possibly, your
newsreader wrapped a line and you did not copy it correctly into the VBEditor.
***Both I and JLGWhiz emphasized to you the importance of properly defining
your format.***

--ron


  #6   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 86
Default Parsing a string

Terrific! I think I am starting to understand what's been done. This is
really powerful stuff.

Is it possible to imbed the code in the program or does it have to be called
in as a macro each time?

I'll work it through again...

OK, I tried a number of variations, without success - even just copying data
into another spreadsheet and trying it.

Here is some of the actual data...

AAINEY J 1193 STRATHCONA DR SW CALGARY, AB T3H4S1 (403) 293-2671
(403) 293-2671
ABEN CONRAD 14 140 STRATHAVEN CI SW CALGARY, AB T3H2N5 (403) 249-1396
(403) 249-1396
ADAMS-CATHERWOOD T 71 STRATTON HILL RI SW CALGARY, AB T3H1W8 (403) 685-9699
(403) 685-9699
ADAMSON BRYCE P 330 555 STRATHCONA BV SW CALGARY, AB T3H2Z9 (403) 242-2000
(403) 242-2000
AIUTO ANTONIO 511 STRATHCONA ME SW CALGARY, AB T3H1X1 (403) 240-2055
(403) 240-2055
ALBINATI M J 115 STRADWICK RI SW CALGARY, AB T3H1G7 (403) 242-1028
(403) 242-1028
ALEXANDER RONALD R 120 STRATHBURY CI SW CALGARY, AB T3H1P9 (403) 249-2801
(403) 249-2801
AL-JANABY I 35 39 STRATHLE CM SW CALGARY, AB T3H5P8 (403) 256-5690
(403) 256-5690
ALLEN JACK 66 STRATHLEA CR SW CALGARY, AB T3H5A8 (403) 249-8289
(403) 249-8289
AL-SAYED M 11 STRATHLEA CM SW CALGARY, AB T3H5C5 (403) 663-8689
(403) 663-8689
ALSHAWI A 124 STRATHCONA CL SW CALGARY, AB T3H1L3 (403) 685-4070
(403) 685-4070
AMOOZEGAR NEZAM 7001 STRATHRIDGE GA SW CALGARY, AB T3H4R9 (403) 686-1116
(403) 686-1116
ANDA RICHARD 69 STRATHRIDGE CR SW CALGARY, AB T3H3R9 (403) 217-4776
(403) 217-4776
ANDERSON D B 80 STRADBROOKE RI SW CALGARY, AB T3H1T9 (403) 686-0702
(403) 686-0702
ANDERSON J 24 STRATHDALE CL SW CALGARY, AB T3H2K3 (403) 685-1424
(403) 685-1424

Would you please take another try and tell me what needs to change? (I think
the difference may be the comma after CALGARY)

Thanks again, Ron.
Jim

"Ron Rosenfeld" wrote in message
...
On Mon, 24 May 2010 22:27:18 -0600, "Jim Berglund"
wrote:

Ron. Thanks for the effort you put in on this. I would really like to
understand this more.

Is it checking each row to see if it meets the pattern, and if so, doing
the
parsing and then building a new, 5-column list?


Yes. And if your row does not meet the format, it will do nothing.


What is the following line for?
Set myRegExp = CreateObject("vbscript.regexp")


That's called late binding, and it invokes another program which allows
the use
of Regular Expressions in VBA. Regular Expressions are a powerful method
of
manipulating strings.


Also, could you please help me understand the following? I've never seen
any
code like it and I just don't have a clue how to interpret it...

myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" &
"\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})$"


That is the pattern derived from the example you gave.

Everything was on one line so:

^ Start at the beginning of the line

(\D+) Capture everything that is not a digit into submatch 0. (In the
definition, the first digit was the beginning of the address). This will
be
the name.

\s+ matches the space(s) between the end of the non-digits and the next
group.

(.*) capture everything, into submatch 1, up to the next match. This will
be the address.

\s matches the space between the end of the address and the next group

((CALGARY|MELBOURNE|SYDNEY) Short list of valid city names. This will
need
to be extended to match your data. Capture into submatch 2.

\s+ matches the space(s) between the end of the city name and the
province.

([A-Z]{2}) capture the next two capital letter abbreviation as the
Province into submatch 3

the rest matches the postcode, and phone number, again, according to the
pattern that was in the example you gave.




I tried it and it ran without any errors. But it didn't do anything.


It ran perfectly here using your line of sample data.

You have to enter and run it properly.

To enter this Macro (Sub), <alt-F11 opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code into the window that opens.

To use this Macro (Sub), FIRST *SELECT* the data you wish to parse. Then
<alt-F8 opens the macro dialog box. Select the macro by name, and <RUN.

If you did all that, and the macro did not do anything, then the data you
ran
it against did not match, in format, what I had laid out. Or, possibly,
your
newsreader wrapped a line and you did not copy it correctly into the
VBEditor.
***Both I and JLGWhiz emphasized to you the importance of properly
defining
your format.***

--ron


  #7   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,651
Default Parsing a string

On Tue, 25 May 2010 17:36:54 -0600, "Jim Berglund" wrote:

Terrific! I think I am starting to understand what's been done. This is
really powerful stuff.

Is it possible to imbed the code in the program or does it have to be called
in as a macro each time?

I'll work it through again...

OK, I tried a number of variations, without success - even just copying data
into another spreadsheet and trying it.

Here is some of the actual data...

AAINEY J 1193 STRATHCONA DR SW CALGARY, AB T3H4S1 (403) 293-2671
(403) 293-2671
ABEN CONRAD 14 140 STRATHAVEN CI SW CALGARY, AB T3H2N5 (403) 249-1396
(403) 249-1396
ADAMS-CATHERWOOD T 71 STRATTON HILL RI SW CALGARY, AB T3H1W8 (403) 685-9699
(403) 685-9699
ADAMSON BRYCE P 330 555 STRATHCONA BV SW CALGARY, AB T3H2Z9 (403) 242-2000
(403) 242-2000
AIUTO ANTONIO 511 STRATHCONA ME SW CALGARY, AB T3H1X1 (403) 240-2055
(403) 240-2055
ALBINATI M J 115 STRADWICK RI SW CALGARY, AB T3H1G7 (403) 242-1028
(403) 242-1028
ALEXANDER RONALD R 120 STRATHBURY CI SW CALGARY, AB T3H1P9 (403) 249-2801
(403) 249-2801
AL-JANABY I 35 39 STRATHLE CM SW CALGARY, AB T3H5P8 (403) 256-5690
(403) 256-5690
ALLEN JACK 66 STRATHLEA CR SW CALGARY, AB T3H5A8 (403) 249-8289
(403) 249-8289
AL-SAYED M 11 STRATHLEA CM SW CALGARY, AB T3H5C5 (403) 663-8689
(403) 663-8689
ALSHAWI A 124 STRATHCONA CL SW CALGARY, AB T3H1L3 (403) 685-4070
(403) 685-4070
AMOOZEGAR NEZAM 7001 STRATHRIDGE GA SW CALGARY, AB T3H4R9 (403) 686-1116
(403) 686-1116
ANDA RICHARD 69 STRATHRIDGE CR SW CALGARY, AB T3H3R9 (403) 217-4776
(403) 217-4776
ANDERSON D B 80 STRADBROOKE RI SW CALGARY, AB T3H1T9 (403) 686-0702
(403) 686-0702
ANDERSON J 24 STRATHDALE CL SW CALGARY, AB T3H2K3 (403) 685-1424
(403) 685-1424

Would you please take another try and tell me what needs to change? (I think
the difference may be the comma after CALGARY)

Thanks again, Ron.
Jim


There are two possible differences between the pattern in the first example and
what you've posted above.

1. The phone number is repeated and on a separate line. I cannot tell from
your posting if it is in the same cell (same row) or a different row.

2. The comma is a major difference. Is there always a comma? or is it
optional?

Two changes to be made in the regex:

1. Add the comma after the city names, but not within a capturing group.

2. Delete the "$" at the end, since there may be something after the phone
number (a duplicate phone number)

Result:

myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" _
& ",\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})"

If the comma is optional, we indicate that in the regex by placing a "?" after
the comma:

myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" _
& ",?\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})"

This now works on all the examples you've posted.
--ron
  #8   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,651
Default Parsing a string

On Tue, 25 May 2010 17:36:54 -0600, "Jim Berglund" wrote:

Is it possible to imbed the code in the program or does it have to be called
in as a macro each time?


I don't know what you mean by this question.

What are you considering to be "in the program"?

Regular expressions come in many flavors, and can be used in a variety of
languages.
--ron
  #9   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 86
Default Parsing a string

The phone number is repeated in each row, and
the comma is always there.

Thanks again,
Jim

"Ron Rosenfeld" wrote in message
...
On Tue, 25 May 2010 17:36:54 -0600, "Jim Berglund"
wrote:

Terrific! I think I am starting to understand what's been done. This is
really powerful stuff.

Is it possible to imbed the code in the program or does it have to be
called
in as a macro each time?

I'll work it through again...

OK, I tried a number of variations, without success - even just copying
data
into another spreadsheet and trying it.

Here is some of the actual data...

AAINEY J 1193 STRATHCONA DR SW CALGARY, AB T3H4S1 (403) 293-2671
(403) 293-2671
ABEN CONRAD 14 140 STRATHAVEN CI SW CALGARY, AB T3H2N5 (403) 249-1396
(403) 249-1396
ADAMS-CATHERWOOD T 71 STRATTON HILL RI SW CALGARY, AB T3H1W8 (403)
685-9699
(403) 685-9699
ADAMSON BRYCE P 330 555 STRATHCONA BV SW CALGARY, AB T3H2Z9 (403) 242-2000
(403) 242-2000
AIUTO ANTONIO 511 STRATHCONA ME SW CALGARY, AB T3H1X1 (403) 240-2055
(403) 240-2055
ALBINATI M J 115 STRADWICK RI SW CALGARY, AB T3H1G7 (403) 242-1028
(403) 242-1028
ALEXANDER RONALD R 120 STRATHBURY CI SW CALGARY, AB T3H1P9 (403) 249-2801
(403) 249-2801
AL-JANABY I 35 39 STRATHLE CM SW CALGARY, AB T3H5P8 (403) 256-5690
(403) 256-5690
ALLEN JACK 66 STRATHLEA CR SW CALGARY, AB T3H5A8 (403) 249-8289
(403) 249-8289
AL-SAYED M 11 STRATHLEA CM SW CALGARY, AB T3H5C5 (403) 663-8689
(403) 663-8689
ALSHAWI A 124 STRATHCONA CL SW CALGARY, AB T3H1L3 (403) 685-4070
(403) 685-4070
AMOOZEGAR NEZAM 7001 STRATHRIDGE GA SW CALGARY, AB T3H4R9 (403) 686-1116
(403) 686-1116
ANDA RICHARD 69 STRATHRIDGE CR SW CALGARY, AB T3H3R9 (403) 217-4776
(403) 217-4776
ANDERSON D B 80 STRADBROOKE RI SW CALGARY, AB T3H1T9 (403) 686-0702
(403) 686-0702
ANDERSON J 24 STRATHDALE CL SW CALGARY, AB T3H2K3 (403) 685-1424
(403) 685-1424

Would you please take another try and tell me what needs to change? (I
think
the difference may be the comma after CALGARY)

Thanks again, Ron.
Jim


There are two possible differences between the pattern in the first
example and
what you've posted above.

1. The phone number is repeated and on a separate line. I cannot tell
from
your posting if it is in the same cell (same row) or a different row.


The phone number is repeated in each row, and
the comma is always there


2. The comma is a major difference. Is there always a comma? or is it
optional?




Two changes to be made in the regex:

1. Add the comma after the city names, but not within a capturing group.

2. Delete the "$" at the end, since there may be something after the
phone
number (a duplicate phone number)

Result:

myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" _
& ",\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})"

If the comma is optional, we indicate that in the regex by placing a "?"
after
the comma:

myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" _
& ",?\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})"

This now works on all the examples you've posted.
--ron


  #10   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,651
Default Parsing a string

On Tue, 25 May 2010 19:29:04 -0600, "Jim Berglund" wrote:

The phone number is repeated in each row, and
the comma is always there.

Thanks again,
Jim


And what happened when you changed the regex as I suggested? It should work,
unless there's some other difference not accounted for,
--ron


  #11   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 4
Default Parsing a string



MCSDPhil
Hi there,
When I do this kind of thing, where the data is a bit rubbishy, in VB I
just pick at the line until the data comes apart. I did this which is
about the best that I could do with it.

Public Class Form1
Private Const DATA_SURNAME = 1
Private Const DATA_INIT1 = 2
Private Const DATA_INIT2 = 3
Private Const DATA_ADDRESSNAMENUMBER = 4
Private Const DATA_ADDRESS = 5
Private Const DATA_TOWN = 6
Private Const DATA_STATE = 7
Private Const DATA_POSTCODE = 8
Private Const DATA_TELEPHONE1 = 9
Private Const DATA_TELEPHONE2 = 10

Private Sub btnParseData_Click(ByVal sender As System.Object, ByVal
e As System.EventArgs) Handles btnParseData.Click
Dim strFilePath As String
Dim strLine As String
Dim intPos As Integer
Dim strData(10) As String
Dim strTemp1 As String
Dim strTemp2 As String
Dim strTemp As String

'OPEN FILE CODE HERE

'SUBSTITUTE READ LINE CODE HERE
strLine = "ALBINATI M J 115 STRADWICK RI SW CALGARY, AB T3H1G7
(403) 242-1028 (403) 242-1028"

Do While strLine < ""
intPos = InStr(strLine, ",")

If intPos 0 Then
strTemp1 = Trim(strLine.Substring(0, intPos - 1))
strTemp2 = Trim(strLine.Substring(intPos))
Debug.Print("strTemp1=" & strTemp1 & vbCrLf)
Debug.Print("strTemp2=" & strTemp2 & vbCrLf)
'Get surname
intPos = InStr(strTemp1, " ")
strData(DATA_SURNAME) = strTemp1.Substring(0, (intPos -
1))
Debug.Print("strData(DATA_SURNAME)=" &
strData(DATA_SURNAME))
strTemp1 = Trim(strTemp1.Substring(intPos))
'Get initials
'Find first initial
intPos = InStr(strTemp1, " ")
If intPos = 2 And Not IsNumeric(strTemp1.Substring(0,
1)) Then
'It is the first initial
strData(DATA_INIT1) = strTemp1.Substring(0, 1)
Debug.Print("strData(DATA_INIT1)=" &
strData(DATA_INIT1))
strTemp1 = Trim(strTemp1.Substring(intPos))
'See if there is a second initial
intPos = InStr(strTemp1, " ")
If intPos = 2 And Not
IsNumeric(strTemp1.Substring(0, 1)) Then
'This is the second initial
strData(DATA_INIT2) = strTemp1.Substring(0, 1)
Debug.Print("strData(DATA_INIT2)=" &
strData(DATA_INIT2))
strTemp1 = Trim(strTemp1.Substring(intPos))
End If
End If
intPos = InStr(strTemp1, " ")
strTemp = Trim(strTemp1.Substring(0, intPos))
If IsNumeric(strTemp) Then
'We probably have the house number
strData(DATA_ADDRESSNAMENUMBER) = strTemp
Debug.Print("strData(DATA_ADDRESSNAMENUMBER)=" &
strData(DATA_ADDRESSNAMENUMBER))
strTemp1 = Trim(strTemp1.Substring(intPos))
End If
'Put the rest of strTemp1 in Address
strData(DATA_ADDRESS) = Trim(strTemp1)
Debug.Print("strData(DATA_ADDRESS)=" &
strData(DATA_ADDRESS))
'Sort out the second part of the string
intPos = InStr(strTemp2, "(")
If intPos 0 Then
'Get the State and PostCode
strTemp = Trim(strTemp2.Substring(0, intPos - 1))
strTemp2 = Trim(strTemp2.Substring(intPos - 1))
'Get the state
If InStr(strTemp, " ") = 3 Then
'We should have the first 2 digits as the state
strData(DATA_STATE) = Trim(strTemp.Substring(0,
2))
Debug.Print("strData(DATA_STATE)=" &
strData(DATA_STATE))
strTemp = Trim(strTemp.Substring(2))
End If
'Put it all in the postcode bit
strData(DATA_POSTCODE) = Trim(strTemp)
Debug.Print("strData(DATA_POSTCODE)=" &
strData(DATA_POSTCODE))
'Get the Phone Numbers
intPos = InStr(3, strTemp2, "(")
If intPos 0 Then
strData(DATA_TELEPHONE1) =
Trim(strTemp2.Substring(0, intPos - 1))
Debug.Print("strData(DATA_TELEPHONE1)=" &
strData(DATA_TELEPHONE1))
strData(DATA_TELEPHONE2) =
Trim(strTemp2.Substring(intPos - 1))
Debug.Print("strData(DATA_TELEPHONE2)=" &
strData(DATA_TELEPHONE2))
Else
strData(DATA_TELEPHONE1) = Trim(strTemp2)
Debug.Print("strData(DATA_TELEPHONE1)=" &
strData(DATA_TELEPHONE1))
End If
End If
End If
'READ NEXT LINE CODE HERE i.e. strLine = ReadLine etc.
Loop

'CLOSE FILE CODE HERE
End Sub
End Class

Hope this helps.
Regards, Phil.

*** Sent via Developersdex http://www.developersdex.com ***
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
parsing a string BorisS Excel Programming 0 April 23rd 07 02:14 AM
Parsing a string neverends Excel Programming 2 June 7th 06 05:38 PM
Parsing a string simonc Excel Programming 4 March 27th 06 08:04 AM
Need help parsing a string Daminc[_38_] Excel Programming 4 January 26th 06 11:53 AM
parsing a string Mark[_57_] Excel Programming 4 April 28th 05 04:42 PM


All times are GMT +1. The time now is 05:53 PM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"