![]() |
Parsing names
I am working with strings that have the following format:
<last name [title] <first name [middle initial] 0000 (The [title] and [middle] are optional, so they might or might not occur in a string! The title is usually "Jr." or "Sr" or "III" or "II". Middle initial is only ONE character) Here are some examples: myVar = "Jones Jim R 0000" myVar = "Filbert III Bobby 0000" myVar = "Smith Jr James F 0000" myVar = "Milton Brad 0000" Can anyone think of a good strategy for parsing each of the name tokens into their own variables?? I have 4 variables named "myLast", "myTitle", "myFirst" and "myMiddle" where I would like to store each token. What makes this difficult is that the title and middle initial are optional and may not exist in a string. I'd appreciate any help here. (BTW, assume that the data will be formatted exactly as above with just one space between each token. There also wont be any oddball situations, like an empty string) Thanks! |
Parsing names
I am working with strings that have the following format:
<last name [title] <first name [middle initial] 0000 (The [title] and [middle] are optional, so they might or might not occur in a string! The title is usually "Jr." or "Sr" or "III" or "II". Middle initial is only ONE character) Here are some examples: MyVar = "Jones Jim R 0000" MyVar = "Filbert III Bobby 0000" MyVar = "Smith Jr James F 0000" MyVar = "Milton Brad 0000" Can anyone think of a good strategy for parsing each of the name tokens into their own variables?? I have 4 variables named "myLast", "myTitle", "myFirst" and "myMiddle" where I would like to store each token. What makes this difficult is that the title and middle initial are optional and may not exist in a string. I 'd appreciate any help here. (BTW, assume that the data will be formatted exactly as above with just one space between each token. There also wont be any oddball situations, like an empty string) Something like this should work... Dim MyVar As String, Parts() As String, First As String Dim Middle As String, Last As String, Title As String MyVar = "Smith Jr James F 0000" Parts = Split(Replace(MyVar, " 0000", "")) Select Case UBound(Parts) Case 1 First = Parts(1) Middle = "" Last = Parts(0) Title = "" Case 2 If Len(Parts(2)) = 1 Then First = Parts(1) Middle = Parts(2) Last = Parts(0) Title = "" Else First = Parts(2) Middle = "" Last = Parts(0) Title = Parts(1) End If Case 3 First = Parts(2) Middle = Parts(3) Last = Parts(0) Title = Parts(1) End Select Rick Rothstein (MVP - Excel) |
Parsing names
On Wed, 25 May 2011 23:20:21 -0700, "Robert Crandal" wrote:
I am working with strings that have the following format: <last name [title] <first name [middle initial] 0000 (The [title] and [middle] are optional, so they might or might not occur in a string! The title is usually "Jr." or "Sr" or "III" or "II". Middle initial is only ONE character) Here are some examples: myVar = "Jones Jim R 0000" myVar = "Filbert III Bobby 0000" myVar = "Smith Jr James F 0000" myVar = "Milton Brad 0000" Can anyone think of a good strategy for parsing each of the name tokens into their own variables?? I have 4 variables named "myLast", "myTitle", "myFirst" and "myMiddle" where I would like to store each token. What makes this difficult is that the title and middle initial are optional and may not exist in a string. I'd appreciate any help here. (BTW, assume that the data will be formatted exactly as above with just one space between each token. There also wont be any oddball situations, like an empty string) Thanks! You could use regular expressions to parse the word, and put each part into one of your variables. Note in the regex the "pipe-separated list" of allowable titles. You may need to add to this. ===================== Sub NameParts() Dim re As Object, mc As Object Dim s As String Dim myLast As String Dim myTitle As String Dim myFirst As String Dim myMiddle As String Set re = CreateObject("vbscript.regexp") re.Pattern = "^(\w+)\s+(?:(Jr|Sr|III|II|IV)\s+)?(?:(\w\w+)\s+)( ?:([A-Z])\s+)?" re.Global = True re.ignorecase = True s = MyVar1 If re.test(s) = True Then Set mc = re.Execute(s) myLast = mc(0).submatches(0) myTitle = mc(0).submatches(1) myFirst = mc(0).submatches(2) myMiddle = mc(0).submatches(3) End If End Sub =============================== |
Parsing names
Wow, that works nice. You are worthy of that
MVP title. Thank you! "Rick Rothstein" wrote in message ... Dim MyVar As String, Parts() As String, First As String Dim Middle As String, Last As String, Title As String MyVar = "Smith Jr James F 0000" Parts = Split(Replace(MyVar, " 0000", "")) Select Case UBound(Parts) Case 1 First = Parts(1) Middle = "" Last = Parts(0) Title = "" Case 2 If Len(Parts(2)) = 1 Then First = Parts(1) Middle = Parts(2) Last = Parts(0) Title = "" Else First = Parts(2) Middle = "" Last = Parts(0) Title = Parts(1) End If Case 3 First = Parts(2) Middle = Parts(3) Last = Parts(0) Title = Parts(1) End Select Rick Rothstein (MVP - Excel) |
Parsing names
Thank you Ron! I was wondering if I could do this with regular
expressions as well. "Ron Rosenfeld" wrote in message ... You could use regular expressions to parse the word, and put each part into one of your variables. Note in the regex the "pipe-separated list" of allowable titles. You may need to add to this. ===================== Sub NameParts() Dim re As Object, mc As Object Dim s As String Dim myLast As String Dim myTitle As String Dim myFirst As String Dim myMiddle As String Set re = CreateObject("vbscript.regexp") re.Pattern = "^(\w+)\s+(?:(Jr|Sr|III|II|IV)\s+)?(?:(\w\w+)\s+)( ?:([A-Z])\s+)?" re.Global = True re.ignorecase = True s = MyVar1 If re.test(s) = True Then Set mc = re.Execute(s) myLast = mc(0).submatches(0) myTitle = mc(0).submatches(1) myFirst = mc(0).submatches(2) myMiddle = mc(0).submatches(3) End If End Sub =============================== |
Parsing names
On Thu, 26 May 2011 03:45:37 -0700, "Robert Crandal" wrote:
Thank you Ron! I was wondering if I could do this with regular expressions as well. Glad to help. Thanks for the feedback. |
Parsing names
Wow, that works nice. ... Thank you!
You are quite welcome. I just want to point out one difference between my code and the the RegEx solution Ron posted. I'm not up on my regular expression syntax, but it looks like the pattern Ron used requires all the possible titles to be listed before hand. So if you had a name come along with a title of MD (doctor) or DDS (dentist) or LLD (lawyer) or the like, I think Ron's code would have to modified each time. My code, on the other hand, is keyed to the "shape" of the entry and not the specifics. If there are only two parts to the entry, it is assumed to be a last name followed by a first name; if there are four parts, it is assumed to be last name followed by title followed by first name followed by middle initial; and when there are three parts, the last part is examined to see if it is a single character or not... if a single character, then the parts are assumed to be last name followed b first name followed by middle initial; otherwise it is assumed to be last name followed by title followed by first name. What the parts are made up of, other than the single character check for a three-part entry, never figure into my code... as long as you know the entries will always be names and not just gobbledygook, everything should work fine. By the way, is there ever the possibility of a single name entry (like Cher for instance) in your data? If so, you will need to add a "Case 0" block to the "Select Case" block I posted where only the first (or maybe it would be last?) name is assigned Part(0) and all other name parts would be assigned the empty string. Rick Rothstein (MVP - Excel) |
Parsing names
On Thu, 26 May 2011 12:07:54 -0400, "Rick Rothstein" wrote:
Wow, that works nice. ... Thank you! You are quite welcome. I just want to point out one difference between my code and the the RegEx solution Ron posted. I'm not up on my regular expression syntax, but it looks like the pattern Ron used requires all the possible titles to be listed before hand. So if you had a name come along with a title of MD (doctor) or DDS (dentist) or LLD (lawyer) or the like, I think Ron's code would have to modified each time. That's not the case. You just add to the "title" section whatever titles you wish to handle. For example, if you also want to handle MD, DDS, and LLD, you would just use: ^(\w+)\s+(?:(Jr|Sr|III|II|IV|MD|DDS|LLD)\s+)?(?:(\ w\w+)\s+)(?:([A-Z])\s+)? The regex does require that you know what titles you wish to use ahead of time, but there's no need to modify it each time. This may also be of value if the OP needs to check validity of entries. |
Parsing names
That's not the case. You just add to the "title" section whatever
titles you wish to handle. For example, if you also want to handle MD, DDS, and LLD, you would just use: ^(\w+)\s+(?:(Jr|Sr|III|II|IV|MD|DDS|LLD)\s+)?(?:(\ w\w+)\s+)(?:([A-Z])\s+)? The regex does require that you know what titles you wish to use ahead of time, but there's no need to modify it each time. What I meant is that if an **unanticipated** title comes along, your code would have to be modified to account for it. For example, if out of the blue a name with an unanticipated title such as Esq. comes along... your code would not be able to handle it until Esq was added to the RegEx pattern. This may also be of value if the OP needs to check validity of entries. Yes, I agree with that, which is why I included this statement in my posting... "...as long as you know the entries will always be names and not just gobbledygook, everything should work fine" Rick Rothstein (MVP - Excel) |
All times are GMT +1. The time now is 07:16 AM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com