Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
Splitting Names Down Columns
This one's a toughie... be prepared!
I have a spreadsheet with several thousand entries that should have been a database instead. It holds a lot of user information, such as names, numbers, addresses, etc. I need to the entries under the Name column into multiple colums. Specifically, Title1, First1, Middle1, Last1, Suffix1, Title2, First2, Middle2, Last2, Suffix2. Of course, all the entries under the Name column are not standard: Craig and Dolores Craig Jeffrey and Cynthia Beegle Jason Piel Mr Kaske Debra Cork-Wahl and Charles Cork Micro Systems Matthew G. Kovar and Karina Rand-Kovar Terrance and Mary Mathews The most common format is "First1 and First2 Last1/2", but that is not the case with a good chunk of the entries. No matter how you look at it, someone is going to have to go through it manually to make sure there are no mistakes, but I was hoping that there would be some way to script a semi-solution to cut down on some of the work. Any takers? :-) Thanks! Nick |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
Splitting Names Down Columns
i think you're right, it will take some manual work, but if the majority is
in one format and the data is separated the same each time (eg, by a comma or by a fixed width) I'd use the 'Text to columns' option from the Data menu and follow the prompts. I'm sure someone will be along shortly with a much more eloquent solution. Tim "Tatakau" wrote in message ... This one's a toughie... be prepared! I have a spreadsheet with several thousand entries that should have been a database instead. It holds a lot of user information, such as names, numbers, addresses, etc. I need to the entries under the Name column into multiple colums. Specifically, Title1, First1, Middle1, Last1, Suffix1, Title2, First2, Middle2, Last2, Suffix2. Of course, all the entries under the Name column are not standard: Craig and Dolores Craig Jeffrey and Cynthia Beegle Jason Piel Mr Kaske Debra Cork-Wahl and Charles Cork Micro Systems Matthew G. Kovar and Karina Rand-Kovar Terrance and Mary Mathews The most common format is "First1 and First2 Last1/2", but that is not the case with a good chunk of the entries. No matter how you look at it, someone is going to have to go through it manually to make sure there are no mistakes, but I was hoping that there would be some way to script a semi-solution to cut down on some of the work. Any takers? :-) Thanks! Nick |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
Splitting Names Down Columns
I've written a few data clean up procedures in the past. As you wrote in
your OP, you just have to do the best job and then tackle it manually. Two initial approaches: (a) Write code to analyse and split it according to rules. This is lengthy but feasible. (b) If option (a) is unsuitable because it's the rules overlap so much, but they're consistent then you could go through the spreadsheet inserting a code as to how the name should be cleaned up. e.g. 1,2,1,1,5 etc. in an adjacent cell. This would make coding simpler, the final process more accurate albeit requiring a little manual work - but it's only a few thousand so teh time saved coding might be greater. (You could leave the most common format blank...) Then... for (a) and (b) I would do something like (unchecked): (set Option Compare Text at the top of your module) For each c in myrange.cells myStr = c.value 'Check for suffices 'first one Select Case trim(Left$(myStr ,3)) case "MR": Suffix1 = "Mr" case "MRS": Suffix1 = "Mrs" 'etc. for Dr, Ms, Sir 'should be ok, no names begin with these 'dr is ok cos "drew" would return "dre" End Select 'remove suffix from our string If len(Suffix1)0 then myStr = trim(mid$(myStr,LEN(Suffix1)+1)) 'now check for a second suffix iPos = instr(mystr," MRS ") If iPos 0 then Suffix2 = "Mrs" myStr = Replace(myStr," MRS ", " ",,vbTextCompare) End if iPos = instr(mystr," MS ") If iPos 0 then Suffix2 = "Ms" myStr = Replace(myStr," Ms ", " ",,vbTextCompare) End if 'etc. 'that's the suffices take care of 'split up on the spaces '(you may like to remove doublespaces first mySplit = Split(c.value," ") 'see if second word is 'and' 'and we only have 4 words if mySplit(1) = "AND" and Ubound(mySplit) = 3 then First1 = mySplit(0) First2 = mySplit(2) Last1 = mySplit(3) Last2 = mySplit(3) elseif mySplit(2) = "AND" and Ubound(mySplit) = 4 then First1 = mySplit(0) Last2 = mySplit(1) First2 = mySplit(3) Last1 = mySplit(4) 'obviously if it was a middle name rather than 'a surname you have no way of knowing. 'That's why I would suggest using option (b) end if 'continue ad nauseum. next c HTH, Gareth Tatakau wrote: This one's a toughie... be prepared! I have a spreadsheet with several thousand entries that should have been a database instead. It holds a lot of user information, such as names, numbers, addresses, etc. I need to the entries under the Name column into multiple colums. Specifically, Title1, First1, Middle1, Last1, Suffix1, Title2, First2, Middle2, Last2, Suffix2. Of course, all the entries under the Name column are not standard: Craig and Dolores Craig Jeffrey and Cynthia Beegle Jason Piel Mr Kaske Debra Cork-Wahl and Charles Cork Micro Systems Matthew G. Kovar and Karina Rand-Kovar Terrance and Mary Mathews The most common format is "First1 and First2 Last1/2", but that is not the case with a good chunk of the entries. No matter how you look at it, someone is going to have to go through it manually to make sure there are no mistakes, but I was hoping that there would be some way to script a semi-solution to cut down on some of the work. Any takers? :-) Thanks! Nick |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
Splitting Names Down Columns
I used the wizard to split to columns by spaces. After that, I was able to
manually fix about 200 records in five minutes. Thanks Tim! Nick "Tim" wrote: i think you're right, it will take some manual work, but if the majority is in one format and the data is separated the same each time (eg, by a comma or by a fixed width) I'd use the 'Text to columns' option from the Data menu and follow the prompts. I'm sure someone will be along shortly with a much more eloquent solution. Tim "Tatakau" wrote in message ... This one's a toughie... be prepared! I have a spreadsheet with several thousand entries that should have been a database instead. It holds a lot of user information, such as names, numbers, addresses, etc. I need to the entries under the Name column into multiple colums. Specifically, Title1, First1, Middle1, Last1, Suffix1, Title2, First2, Middle2, Last2, Suffix2. Of course, all the entries under the Name column are not standard: Craig and Dolores Craig Jeffrey and Cynthia Beegle Jason Piel Mr Kaske Debra Cork-Wahl and Charles Cork Micro Systems Matthew G. Kovar and Karina Rand-Kovar Terrance and Mary Mathews The most common format is "First1 and First2 Last1/2", but that is not the case with a good chunk of the entries. No matter how you look at it, someone is going to have to go through it manually to make sure there are no mistakes, but I was hoping that there would be some way to script a semi-solution to cut down on some of the work. Any takers? :-) Thanks! Nick |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
Splitting Names Down Columns
Gareth, I remember writing delimiting programs in college... no fun at all.
Your solution is of course much more elegant, but I don't think I'm willing to risk any more brain cells (due to banging my head on a desk). :-) Thanks again! Nick "Gareth" wrote: I've written a few data clean up procedures in the past. As you wrote in your OP, you just have to do the best job and then tackle it manually. Two initial approaches: (a) Write code to analyse and split it according to rules. This is lengthy but feasible. (b) If option (a) is unsuitable because it's the rules overlap so much, but they're consistent then you could go through the spreadsheet inserting a code as to how the name should be cleaned up. e.g. 1,2,1,1,5 etc. in an adjacent cell. This would make coding simpler, the final process more accurate albeit requiring a little manual work - but it's only a few thousand so teh time saved coding might be greater. (You could leave the most common format blank...) Then... for (a) and (b) I would do something like (unchecked): (set Option Compare Text at the top of your module) For each c in myrange.cells myStr = c.value 'Check for suffices 'first one Select Case trim(Left$(myStr ,3)) case "MR": Suffix1 = "Mr" case "MRS": Suffix1 = "Mrs" 'etc. for Dr, Ms, Sir 'should be ok, no names begin with these 'dr is ok cos "drew" would return "dre" End Select 'remove suffix from our string If len(Suffix1)0 then myStr = trim(mid$(myStr,LEN(Suffix1)+1)) 'now check for a second suffix iPos = instr(mystr," MRS ") If iPos 0 then Suffix2 = "Mrs" myStr = Replace(myStr," MRS ", " ",,vbTextCompare) End if iPos = instr(mystr," MS ") If iPos 0 then Suffix2 = "Ms" myStr = Replace(myStr," Ms ", " ",,vbTextCompare) End if 'etc. 'that's the suffices take care of 'split up on the spaces '(you may like to remove doublespaces first mySplit = Split(c.value," ") 'see if second word is 'and' 'and we only have 4 words if mySplit(1) = "AND" and Ubound(mySplit) = 3 then First1 = mySplit(0) First2 = mySplit(2) Last1 = mySplit(3) Last2 = mySplit(3) elseif mySplit(2) = "AND" and Ubound(mySplit) = 4 then First1 = mySplit(0) Last2 = mySplit(1) First2 = mySplit(3) Last1 = mySplit(4) 'obviously if it was a middle name rather than 'a surname you have no way of knowing. 'That's why I would suggest using option (b) end if 'continue ad nauseum. next c HTH, Gareth Tatakau wrote: This one's a toughie... be prepared! I have a spreadsheet with several thousand entries that should have been a database instead. It holds a lot of user information, such as names, numbers, addresses, etc. I need to the entries under the Name column into multiple colums. Specifically, Title1, First1, Middle1, Last1, Suffix1, Title2, First2, Middle2, Last2, Suffix2. Of course, all the entries under the Name column are not standard: Craig and Dolores Craig Jeffrey and Cynthia Beegle Jason Piel Mr Kaske Debra Cork-Wahl and Charles Cork Micro Systems Matthew G. Kovar and Karina Rand-Kovar Terrance and Mary Mathews The most common format is "First1 and First2 Last1/2", but that is not the case with a good chunk of the entries. No matter how you look at it, someone is going to have to go through it manually to make sure there are no mistakes, but I was hoping that there would be some way to script a semi-solution to cut down on some of the work. Any takers? :-) Thanks! Nick |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Splitting names | Excel Discussion (Misc queries) | |||
Question for Bob Phillips re Splitting Names from Cells | Excel Discussion (Misc queries) | |||
Splitting names from cells | Excel Discussion (Misc queries) | |||
splitting names & email add. | Excel Programming | |||
Siple but what was the formula for splitting names? | Excel Programming |