ExcelBanter - Parsing names

ExcelBanter (https://www.excelbanter.com/)

- Excel Programming (https://www.excelbanter.com/excel-programming/)

- - Parsing names (https://www.excelbanter.com/excel-programming/444633-parsing-names.html)

Robert Crandal[_2_]

Parsing names

Rick Rothstein

Parsing names

I am working with strings that have the following format:

<last name [title] <first name [middle initial] 0000

(The [title] and [middle] are optional, so they might or
might not occur in a string! The title is usually "Jr." or
"Sr" or "III" or "II". Middle initial is only ONE character)

Here are some examples:

MyVar = "Jones Jim R 0000"
MyVar = "Filbert III Bobby 0000"
MyVar = "Smith Jr James F 0000"
MyVar = "Milton Brad 0000"

Can anyone think of a good strategy for parsing each of
the name tokens into their own variables?? I have 4 variables
named "myLast", "myTitle", "myFirst" and "myMiddle" where
I would like to store each token. What makes this difficult
is that the title and middle initial are optional and may not
exist in a string.

I 'd appreciate any help here.

(BTW, assume that the data will be formatted exactly as
above with just one space between each token. There
also wont be any oddball situations, like an empty string)

Something like this should work...

Dim MyVar As String, Parts() As String, First As String
Dim Middle As String, Last As String, Title As String

MyVar = "Smith Jr James F 0000"

Parts = Split(Replace(MyVar, " 0000", ""))
Select Case UBound(Parts)
Case 1
First = Parts(1)
Middle = ""
Last = Parts(0)
Title = ""
Case 2
If Len(Parts(2)) = 1 Then
First = Parts(1)
Middle = Parts(2)
Last = Parts(0)
Title = ""
Else
First = Parts(2)
Middle = ""
Last = Parts(0)
Title = Parts(1)
End If
Case 3
First = Parts(2)
Middle = Parts(3)
Last = Parts(0)
Title = Parts(1)
End Select

Rick Rothstein (MVP - Excel)

Ron Rosenfeld[_2_]

Parsing names

On Wed, 25 May 2011 23:20:21 -0700, "Robert Crandal" wrote:

I am working with strings that have the following format:

<last name [title] <first name [middle initial] 0000

(The [title] and [middle] are optional, so they might or
might not occur in a string! The title is usually "Jr." or
"Sr" or "III" or "II". Middle initial is only ONE character)

Here are some examples:

myVar = "Jones Jim R 0000"
myVar = "Filbert III Bobby 0000"
myVar = "Smith Jr James F 0000"
myVar = "Milton Brad 0000"

Can anyone think of a good strategy for parsing each of
the name tokens into their own variables?? I have 4 variables
named "myLast", "myTitle", "myFirst" and "myMiddle" where
I would like to store each token. What makes this difficult
is that the title and middle initial are optional and may not
exist in a string.

I'd appreciate any help here.

(BTW, assume that the data will be formatted exactly as
above with just one space between each token. There
also wont be any oddball situations, like an empty string)

Thanks!

You could use regular expressions to parse the word, and put each part into one of your variables. Note in the regex the "pipe-separated list" of allowable titles. You may need to add to this.

=====================
Sub NameParts()
Dim re As Object, mc As Object
Dim s As String
Dim myLast As String
Dim myTitle As String
Dim myFirst As String
Dim myMiddle As String

Set re = CreateObject("vbscript.regexp")
re.Pattern = "^(\w+)\s+(?:(Jr|Sr|III|II|IV)\s+)?(?:(\w\w+)\s+)( ?:([A-Z])\s+)?"
re.Global = True
re.ignorecase = True

s = MyVar1

If re.test(s) = True Then
Set mc = re.Execute(s)
myLast = mc(0).submatches(0)
myTitle = mc(0).submatches(1)
myFirst = mc(0).submatches(2)
myMiddle = mc(0).submatches(3)
End If
End Sub
===============================

Robert Crandal[_2_]

Parsing names

Wow, that works nice. You are worthy of that
MVP title. Thank you!

"Rick Rothstein" wrote in message
...

Dim MyVar As String, Parts() As String, First As String
Dim Middle As String, Last As String, Title As String

MyVar = "Smith Jr James F 0000"

Parts = Split(Replace(MyVar, " 0000", ""))
Select Case UBound(Parts)
Case 1
First = Parts(1)
Middle = ""
Last = Parts(0)
Title = ""
Case 2
If Len(Parts(2)) = 1 Then
First = Parts(1)
Middle = Parts(2)
Last = Parts(0)
Title = ""
Else
First = Parts(2)
Middle = ""
Last = Parts(0)
Title = Parts(1)
End If
Case 3
First = Parts(2)
Middle = Parts(3)
Last = Parts(0)
Title = Parts(1)
End Select

Rick Rothstein (MVP - Excel)

Robert Crandal[_2_]

Parsing names

Thank you Ron! I was wondering if I could do this with regular
expressions as well.

"Ron Rosenfeld" wrote in message
...

You could use regular expressions to parse the word, and put each part
into one of your variables. Note in the regex the "pipe-separated list"
of allowable titles. You may need to add to this.

=====================
Sub NameParts()
Dim re As Object, mc As Object
Dim s As String
Dim myLast As String
Dim myTitle As String
Dim myFirst As String
Dim myMiddle As String

Set re = CreateObject("vbscript.regexp")
re.Pattern =
"^(\w+)\s+(?:(Jr|Sr|III|II|IV)\s+)?(?:(\w\w+)\s+)( ?:([A-Z])\s+)?"
re.Global = True
re.ignorecase = True

s = MyVar1

If re.test(s) = True Then
Set mc = re.Execute(s)
myLast = mc(0).submatches(0)
myTitle = mc(0).submatches(1)
myFirst = mc(0).submatches(2)
myMiddle = mc(0).submatches(3)
End If
End Sub
===============================

Ron Rosenfeld[_2_]

Parsing names

On Thu, 26 May 2011 03:45:37 -0700, "Robert Crandal" wrote:

Thank you Ron! I was wondering if I could do this with regular
expressions as well.

Glad to help.

Thanks for the feedback.

Rick Rothstein

Parsing names

Wow, that works nice. ... Thank you!

You are quite welcome. I just want to point out one difference between my
code and the the RegEx solution Ron posted. I'm not up on my regular
expression syntax, but it looks like the pattern Ron used requires all the
possible titles to be listed before hand. So if you had a name come along
with a title of MD (doctor) or DDS (dentist) or LLD (lawyer) or the like, I
think Ron's code would have to modified each time. My code, on the other
hand, is keyed to the "shape" of the entry and not the specifics. If there
are only two parts to the entry, it is assumed to be a last name followed by
a first name; if there are four parts, it is assumed to be last name
followed by title followed by first name followed by middle initial; and
when there are three parts, the last part is examined to see if it is a
single character or not... if a single character, then the parts are assumed
to be last name followed b first name followed by middle initial; otherwise
it is assumed to be last name followed by title followed by first name. What
the parts are made up of, other than the single character check for a
three-part entry, never figure into my code... as long as you know the
entries will always be names and not just gobbledygook, everything should
work fine. By the way, is there ever the possibility of a single name entry
(like Cher for instance) in your data? If so, you will need to add a "Case
0" block to the "Select Case" block I posted where only the first (or maybe
it would be last?) name is assigned Part(0) and all other name parts would
be assigned the empty string.

Rick Rothstein (MVP - Excel)

Ron Rosenfeld[_2_]

Parsing names

On Thu, 26 May 2011 12:07:54 -0400, "Rick Rothstein" wrote:

Wow, that works nice. ... Thank you!

You are quite welcome. I just want to point out one difference between my
code and the the RegEx solution Ron posted. I'm not up on my regular
expression syntax, but it looks like the pattern Ron used requires all the
possible titles to be listed before hand. So if you had a name come along
with a title of MD (doctor) or DDS (dentist) or LLD (lawyer) or the like, I
think Ron's code would have to modified each time.

That's not the case. You just add to the "title" section whatever titles you wish to handle. For example, if you also want to handle MD, DDS, and LLD, you would just use:

^(\w+)\s+(?:(Jr|Sr|III|II|IV|MD|DDS|LLD)\s+)?(?:(\ w\w+)\s+)(?:([A-Z])\s+)?

The regex does require that you know what titles you wish to use ahead of time, but there's no need to modify it each time.

This may also be of value if the OP needs to check validity of entries.

Rick Rothstein

Parsing names

That's not the case. You just add to the "title" section whatever
titles you wish to handle. For example, if you also want to handle
MD, DDS, and LLD, you would just use:

^(\w+)\s+(?:(Jr|Sr|III|II|IV|MD|DDS|LLD)\s+)?(?:(\ w\w+)\s+)(?:([A-Z])\s+)?

The regex does require that you know what titles you wish to use
ahead of time, but there's no need to modify it each time.

What I meant is that if an **unanticipated** title comes along, your code
would have to be modified to account for it. For example, if out of the blue
a name with an unanticipated title such as Esq. comes along... your code
would not be able to handle it until Esq was added to the RegEx pattern.

This may also be of value if the OP needs to check validity of entries.

Yes, I agree with that, which is why I included this statement in my
posting...

"...as long as you know the entries will always be names
and not just gobbledygook, everything should work fine"

Rick Rothstein (MVP - Excel)

All times are GMT +1. The time now is 07:16 AM.