View Single Post
  #4   Report Post  
Posted to microsoft.public.excel.programming
Jim Berglund[_2_] Jim Berglund[_2_] is offline
external usenet poster
 
Posts: 86
Default Parsing a string

Ron. Thanks for the effort you put in on this. I would really like to
understand this more.

Is it checking each row to see if it meets the pattern, and if so, doing the
parsing and then building a new, 5-column list?
What is the following line for?
Set myRegExp = CreateObject("vbscript.regexp")

Also, could you please help me understand the following? I've never seen any
code like it and I just don't have a clue how to interpret it...

myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" &
"\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})$"

I tried it and it ran without any errors. But it didn't do anything.

BTW, For anyone else looking at this, I found the following entry...


Formulas for parsing names.

1. Using an index:

=T(INDEX($M$1:$M$6,MAX(INDEX(COUNTIF(C1,$M$2:$M$6& {". *","
*"})*{1;2;3;4;5},0))+1))
(Set up an index in M1:M5)

2. Using OR
=IF(LEFT(A30,3)="Mrs","Mrs",IF(LEFT(A30,4)="Miss", "Miss",IF(OR(LEFT(A30,2)="Mr",LEFT(A30,2)="Ms",LEF T(A30,2)="Dr"),LEFT(A30,2),"X")))

3. Parsing, using VB


Option Explicit
Function ParseName(str As String, Index As Long) As String
Dim re As Object
Dim mc As Object
Dim sPat As String
Dim sTitle As String

'Index code
' 1 = Salutation
' 3 = First Name

'Pipe-delimited list of possible Titles
sTitle = "Mr|Ms|Miss|Mrs|Dr"

sPat = "^((" & sTitle & ")\.?(\s+))?(\w+)"



Set re = CreateObject("vbscript.regexp")
re.ignorecase = True
re.Pattern = sPat

If re.test(str) = True Then
Set mc = re.Execute(str)
ParseName = mc(0).submatches(Index)
End If
End Function

4. Using TRIM
Parsing names in Excel
Having just had to go through a process to clean up a list of names in a
spreadsheet, I thought I'd share the 'magic' formula I came up with.

As you can imagine, there was no pretty way to get the list into Excel,
there were several columns of names and other information, but with no
obvious delimiters to behold. I was amazed when I got the file, how bad it
looked. I can only assume this was a redirected print file from some
ancient mainframe system. It can't be that hard to get a nice comma
delimited file surely!?

The format of the name was either:
First Last
First MI Last
First Middle Last
Obtaining the first name was simple:
=LEFT(A11,(FIND(" ",A11,1)-1))
Getting the last name was a different matter:
=TRIM( IF(ISERR(FIND(" ",A11,(FIND(" ",A11,1)+1))),
MID(A11,FIND(" ",A11,1)+1,(LEN(A11)-FIND(" ",A11,1))),
MID(A11,FIND(" ",A11,(FIND(" ",A11,1)+1))+1,(LEN(A11)-FIND("
",A11,FIND(" ",A11,1))))))

Basically, remove any spaces and if the FIND returns an error, then the name
is just first and last names, otherwise there is a middle name or initial to
deal with. If no middle name/initial then get the last name, which starts 1
char after the space otherwise get the last name after locating the second
space in the string and start from there

Another Method for Parsing Names
Assume you have names of the following types in column C
Mrs Dorothy Hannity
Dr P R Rogers
Dana Delany
Mr Bradley K Pitts
Type the following formulas into the specified cells:
O1=FIND(" ",C1)
Determines the location/existence of the blank following the Salutaton or
First Name
P1=FIND(" ",C1,FIND(" ",C1)+1)
Determines the location/existence of the blank following the First Name or
Middle Initial(MI)
Q1=FIND(" ",R1)
Determines the location/existence of the blank following the Middle Initial
in the next, adjacent cell
R1=IF(ISERROR(P1),RIGHT(C1,LEN(C1)-O1),RIGHT(C1,LEN(C1)-P1))
Defines the Last Name or MI/LN if there is a MI
S1=LEFT(C1,O1-1)
Creates the Salutation column
T1=IF(ISERROR(S1)," ",MID(C1,O1+1,P1-O1))
Creates the First Name column
U1=IF(FIND(" ",R1)=2,LEFT(R1,1),"")
Creates the MI column
V1=IF(U1<"",MID(R1,FIND(" ",R1)+1,99),R1)
Creates the Last Name column
Fill the entries down and then copy the resulting values into another set of
columns.

Jim



"Ron Rosenfeld" wrote in message
...
On Mon, 24 May 2010 16:58:10 -0600, "Jim Berglund"
wrote:

I want to parse a lot of data with rows like the following:

A A WOODS 1479 22RD STREET CALGARY AB T3H2C4 (403) 555-9999

I would like columns for NAME ADDRESS CITY PROV POSTALCODE PHONENUMBER

I can do it in Excel with formulas, but is there a nice piece of generic
code that does this sort of thing?
(or should I parse it based on using spaces as delimiters and then join
the
fields that need joining?)

Jim Berglund


How easy or hard this might be to do depends on format variations.

If you cannot set out firm rules for the data, you won't be able to do it.

Probably the PROV, POSTALCODE & PHONENUMBER can be easily parsed from the
end
of the string.

The CITY would be one or several words that precede the PROV and you could
use
a list of valid cities.

The ADDRESS precedes the city and starts with a number.

The NAME ends with the number.

If all of your rows can be unambiguously described, either as I have done
or
some other method, then you should be able to parse either with formulas,
or
with a macro.

Here's a sample macro with a very short list of possibly valid cities that
parses rows that fit the above rules. It should give you some ideas.

================================
Option Explicit
Sub ParseAddr()
Dim myRegExp As Object, myMatches As Object
Dim rg As Range, c As Range
Dim i As Long

Set rg = Selection
Set myRegExp = CreateObject("vbscript.regexp")
myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" _
& "\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})$"

For Each c In rg
If myRegExp.test(c.Text) = True Then
Set myMatches = myRegExp.Execute(c.Text)
For i = 0 To 5
c.Offset(0, i + 1) = myMatches(0).submatches(i)
Next i
End If
Next c
End Sub
====================================

The list of cities needs to be enclosed in parentheses and be pipe
delimited.
They are tested in the order listed, so if you had several cities with
similar
names, the order you list them in can be critical.
--ron