View Single Post
  #6   Report Post  
Posted to microsoft.public.excel.programming
Carl S. Carl S. is offline
external usenet poster
 
Posts: 11
Default Parsing Address, City ST ZIP

Thanks! I have no idea what early or late binding is, and I may need to
export the module and send it to a few other offices, and would like it to be
just a straight import to work...will binding affect this? There is a lot
more to Excel VBA than I have brushed with. Bottom line...I was hoping for
something like Cell A1 has address, after running macro cells A1, A2, A3, and
A4 will have parsed data. Your answer goes over my head...I think I will
write something using MID, LEFT, FIND, SUBSTITUTE, ISNUMBER, etc. I
understand it much easier. CA will be chunk I start with, ZIP will be chunk
to right, City will be chunk to left, and chunk before city will be address
and I will strip all commas out...don't need them. Appriciate your effort
though.

"Ron Rosenfeld" wrote:

On Wed, 30 Sep 2009 00:24:01 -0700, Carl S.
wrote:

Someone brilliant have code to parse messy Address City State ZIP into 4
separate fields? If address has Apt number it should NOT be parsed separate
but included in address fields. Here are examples we get of data downloaded
daily...the main common theme is CA is consistent:

15629 FAIRFORD AVENUE, NORWALK, CA 90650
5947-5949 SOUTH SAN PEDRO, STREET, LOS ANGELES, CA 90003
Vacant Land, Long Beach, CA
4627-4627 1/2 STRANGE AVENUE, Los Angeles (Area), CA 90022
15522 SYLVAN STREET, (VAN NUYS AREA) LOS ANGELES, CA 91411
4325, 4327 & 4329 Stern Ave., North Hollywood, CA 91423
19200 LAHEY STREET, UNIT 4,, Los Angeles, CA 91326

Example 1 is typical easy parse, majority of data.
Example 2 the word STREET should not have been separated by comma.
Example 3 has no zip code or street numbers.
Example 4 has dash and / in street numbers and (AREA) should be removed.
Example 5 (VAN NUYS AREA) should be removed.
Example 6 has extra comma in address at front.
Example 7 should have UNIT 4 glued to address, and extra commas need removal.

If someone can write something to handle all of this, you are awsome! If
you can get everything except example 3 that is ok....I can strip those out
ahead of time...they either say VACANT LAND or RAW LAND...but I would like to
have them if possible. Thanks in advance!


For the examples you show, this can be done using regular expressions. In the
sample code below, you can either select the range to parse and then run the
macro; or you can use the Functions individually to dynamically extract the
different portions. The latter would be appropriate if you will be editing the
basic list, and want to see changes dynamically.

Notes:

1. I used early binding, so you MUST SET A REFERENCE (see tools/references) to
Microsoft VBScript Regulare Expressions 5.5. If you are going to distribute
this, you might want to use late binding instead.

2. So far as the extraneous commas are concerned, the Address function will
only remove them if they are at the end of the string, or if they are followed
by the word Street or Ave. This may or may not be satisfactory for you.

3. Zip codes are tested for being "proper". They can be either 9 digits, 5
digits, or in the pattern of 00000-0000

4. If your patterns don't conform roughly to what you have posted, then the
regex may fail, and return blanks. If this is a problem, you'll need to post
more examples.

----------------------------------------------------
Option Explicit
Dim re As RegExp, mc As MatchCollection
Sub ParseAddr()
Dim rg As Range, c As Range
Dim s As String

'set up range to parse
'could be done differently
Set rg = Selection
Range(rg.Offset(0, 1), rg.Offset(0, 4)).Clear

For Each c In rg
s = c.Text
c.Offset(0, 1).Value = Addr(s)
c.Offset(0, 2).Value = City(s)
c.Offset(0, 3).Value = State(s)
'format as text so don't drop leading "0's"
c.Offset(0, 4).NumberFormat = "@"
c.Offset(0, 4).Value = Zip(s)
Next c
End Sub
'--------------------------------------
Function Addr(s As String) As String
Dim sTemp As String
Set re = New RegExp
re.IgnoreCase = True
re.Pattern = "^.*(?=,\s*[^,]+,[^,]+$)"
If re.Test(s) = True Then
Set mc = re.Execute(s)
sTemp = mc(0)
'strip out some of the commas
re.Pattern = ",(?=$|\s+(Street|Ave))"
Addr = re.Replace(sTemp, "")
End If
Set re = Nothing
End Function
'-----------------------------------------
Function City(s As String) As String
Dim sTemp As String
Set re = New RegExp
re.Pattern = ",\s*([^,]+),[^,]+$"
If re.Test(s) = True Then
Set mc = re.Execute(s)
sTemp = mc(0).SubMatches(0)
'strip out text enclosed by parentheses
re.Pattern = "\s*\([^)]+\)\s*"
City = re.Replace(sTemp, "")
End If
Set re = Nothing
End Function
'--------------------------------------
Function State(s As String) As String
Set re = New RegExp
re.Pattern = _
"([A-Z]{2})(?=(\s+(\d{9}|\d{5}-\d{4}|\d{5})\s*$)|\s*$)"
If re.Test(s) = True Then
Set mc = re.Execute(s)
State = mc(0)
End If
Set re = Nothing
End Function
'----------------------------
Function Zip(s As String) As String
Set re = New RegExp
re.Pattern = "\s+(\d{9}|\d{5}-\d{4}|\d{5})\s*$"
If re.Test(s) = True Then
Set mc = re.Execute(s)
Zip = mc(0)
End If
Set re = Nothing
End Function
==========================================
--ron