Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
Parsing a string
I want to parse a lot of data with rows like the following:
A A WOODS 1479 22RD STREET CALGARY AB T3H2C4 (403) 555-9999 I would like columns for NAME ADDRESS CITY PROV POSTALCODE PHONENUMBER I can do it in Excel with formulas, but is there a nice piece of generic code that does this sort of thing? (or should I parse it based on using spaces as delimiters and then join the fields that need joining?) Jim Berglund |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
Parsing a string
The phone number, postal code, province and city could probably be picked
our pretty easily. Unless there is some consistency in the address and names grouping, it would be very difficult to pick those out in one fell swoop. Consistency would be that the address is always street number and a two part street name separated by spaces, and the person's name would always be first middle last separated by spaces. I doubt that those are entered consitently in that fashion. If the street address always begins with a number, it could be used a key to separate the name and street address, which would then make it possible to break all the elements into their respective groups. But I wouldn't want to tackle it. "Jim Berglund" wrote in message ... I want to parse a lot of data with rows like the following: A A WOODS 1479 22RD STREET CALGARY AB T3H2C4 (403) 555-9999 I would like columns for NAME ADDRESS CITY PROV POSTALCODE PHONENUMBER I can do it in Excel with formulas, but is there a nice piece of generic code that does this sort of thing? (or should I parse it based on using spaces as delimiters and then join the fields that need joining?) Jim Berglund |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
Parsing a string
On Mon, 24 May 2010 16:58:10 -0600, "Jim Berglund" wrote:
I want to parse a lot of data with rows like the following: A A WOODS 1479 22RD STREET CALGARY AB T3H2C4 (403) 555-9999 I would like columns for NAME ADDRESS CITY PROV POSTALCODE PHONENUMBER I can do it in Excel with formulas, but is there a nice piece of generic code that does this sort of thing? (or should I parse it based on using spaces as delimiters and then join the fields that need joining?) Jim Berglund How easy or hard this might be to do depends on format variations. If you cannot set out firm rules for the data, you won't be able to do it. Probably the PROV, POSTALCODE & PHONENUMBER can be easily parsed from the end of the string. The CITY would be one or several words that precede the PROV and you could use a list of valid cities. The ADDRESS precedes the city and starts with a number. The NAME ends with the number. If all of your rows can be unambiguously described, either as I have done or some other method, then you should be able to parse either with formulas, or with a macro. Here's a sample macro with a very short list of possibly valid cities that parses rows that fit the above rules. It should give you some ideas. ================================ Option Explicit Sub ParseAddr() Dim myRegExp As Object, myMatches As Object Dim rg As Range, c As Range Dim i As Long Set rg = Selection Set myRegExp = CreateObject("vbscript.regexp") myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" _ & "\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})$" For Each c In rg If myRegExp.test(c.Text) = True Then Set myMatches = myRegExp.Execute(c.Text) For i = 0 To 5 c.Offset(0, i + 1) = myMatches(0).submatches(i) Next i End If Next c End Sub ==================================== The list of cities needs to be enclosed in parentheses and be pipe delimited. They are tested in the order listed, so if you had several cities with similar names, the order you list them in can be critical. --ron |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
Parsing a string
Ron. Thanks for the effort you put in on this. I would really like to
understand this more. Is it checking each row to see if it meets the pattern, and if so, doing the parsing and then building a new, 5-column list? What is the following line for? Set myRegExp = CreateObject("vbscript.regexp") Also, could you please help me understand the following? I've never seen any code like it and I just don't have a clue how to interpret it... myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" & "\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})$" I tried it and it ran without any errors. But it didn't do anything. BTW, For anyone else looking at this, I found the following entry... Formulas for parsing names. 1. Using an index: =T(INDEX($M$1:$M$6,MAX(INDEX(COUNTIF(C1,$M$2:$M$6& {". *"," *"})*{1;2;3;4;5},0))+1)) (Set up an index in M1:M5) 2. Using OR =IF(LEFT(A30,3)="Mrs","Mrs",IF(LEFT(A30,4)="Miss", "Miss",IF(OR(LEFT(A30,2)="Mr",LEFT(A30,2)="Ms",LEF T(A30,2)="Dr"),LEFT(A30,2),"X"))) 3. Parsing, using VB Option Explicit Function ParseName(str As String, Index As Long) As String Dim re As Object Dim mc As Object Dim sPat As String Dim sTitle As String 'Index code ' 1 = Salutation ' 3 = First Name 'Pipe-delimited list of possible Titles sTitle = "Mr|Ms|Miss|Mrs|Dr" sPat = "^((" & sTitle & ")\.?(\s+))?(\w+)" Set re = CreateObject("vbscript.regexp") re.ignorecase = True re.Pattern = sPat If re.test(str) = True Then Set mc = re.Execute(str) ParseName = mc(0).submatches(Index) End If End Function 4. Using TRIM Parsing names in Excel Having just had to go through a process to clean up a list of names in a spreadsheet, I thought I'd share the 'magic' formula I came up with. As you can imagine, there was no pretty way to get the list into Excel, there were several columns of names and other information, but with no obvious delimiters to behold. I was amazed when I got the file, how bad it looked. I can only assume this was a redirected print file from some ancient mainframe system. It can't be that hard to get a nice comma delimited file surely!? The format of the name was either: First Last First MI Last First Middle Last Obtaining the first name was simple: =LEFT(A11,(FIND(" ",A11,1)-1)) Getting the last name was a different matter: =TRIM( IF(ISERR(FIND(" ",A11,(FIND(" ",A11,1)+1))), MID(A11,FIND(" ",A11,1)+1,(LEN(A11)-FIND(" ",A11,1))), MID(A11,FIND(" ",A11,(FIND(" ",A11,1)+1))+1,(LEN(A11)-FIND(" ",A11,FIND(" ",A11,1)))))) Basically, remove any spaces and if the FIND returns an error, then the name is just first and last names, otherwise there is a middle name or initial to deal with. If no middle name/initial then get the last name, which starts 1 char after the space otherwise get the last name after locating the second space in the string and start from there Another Method for Parsing Names Assume you have names of the following types in column C Mrs Dorothy Hannity Dr P R Rogers Dana Delany Mr Bradley K Pitts Type the following formulas into the specified cells: O1=FIND(" ",C1) Determines the location/existence of the blank following the Salutaton or First Name P1=FIND(" ",C1,FIND(" ",C1)+1) Determines the location/existence of the blank following the First Name or Middle Initial(MI) Q1=FIND(" ",R1) Determines the location/existence of the blank following the Middle Initial in the next, adjacent cell R1=IF(ISERROR(P1),RIGHT(C1,LEN(C1)-O1),RIGHT(C1,LEN(C1)-P1)) Defines the Last Name or MI/LN if there is a MI S1=LEFT(C1,O1-1) Creates the Salutation column T1=IF(ISERROR(S1)," ",MID(C1,O1+1,P1-O1)) Creates the First Name column U1=IF(FIND(" ",R1)=2,LEFT(R1,1),"") Creates the MI column V1=IF(U1<"",MID(R1,FIND(" ",R1)+1,99),R1) Creates the Last Name column Fill the entries down and then copy the resulting values into another set of columns. Jim "Ron Rosenfeld" wrote in message ... On Mon, 24 May 2010 16:58:10 -0600, "Jim Berglund" wrote: I want to parse a lot of data with rows like the following: A A WOODS 1479 22RD STREET CALGARY AB T3H2C4 (403) 555-9999 I would like columns for NAME ADDRESS CITY PROV POSTALCODE PHONENUMBER I can do it in Excel with formulas, but is there a nice piece of generic code that does this sort of thing? (or should I parse it based on using spaces as delimiters and then join the fields that need joining?) Jim Berglund How easy or hard this might be to do depends on format variations. If you cannot set out firm rules for the data, you won't be able to do it. Probably the PROV, POSTALCODE & PHONENUMBER can be easily parsed from the end of the string. The CITY would be one or several words that precede the PROV and you could use a list of valid cities. The ADDRESS precedes the city and starts with a number. The NAME ends with the number. If all of your rows can be unambiguously described, either as I have done or some other method, then you should be able to parse either with formulas, or with a macro. Here's a sample macro with a very short list of possibly valid cities that parses rows that fit the above rules. It should give you some ideas. ================================ Option Explicit Sub ParseAddr() Dim myRegExp As Object, myMatches As Object Dim rg As Range, c As Range Dim i As Long Set rg = Selection Set myRegExp = CreateObject("vbscript.regexp") myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" _ & "\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})$" For Each c In rg If myRegExp.test(c.Text) = True Then Set myMatches = myRegExp.Execute(c.Text) For i = 0 To 5 c.Offset(0, i + 1) = myMatches(0).submatches(i) Next i End If Next c End Sub ==================================== The list of cities needs to be enclosed in parentheses and be pipe delimited. They are tested in the order listed, so if you had several cities with similar names, the order you list them in can be critical. --ron |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
Parsing a string
On Mon, 24 May 2010 22:27:18 -0600, "Jim Berglund" wrote:
Ron. Thanks for the effort you put in on this. I would really like to understand this more. Is it checking each row to see if it meets the pattern, and if so, doing the parsing and then building a new, 5-column list? Yes. And if your row does not meet the format, it will do nothing. What is the following line for? Set myRegExp = CreateObject("vbscript.regexp") That's called late binding, and it invokes another program which allows the use of Regular Expressions in VBA. Regular Expressions are a powerful method of manipulating strings. Also, could you please help me understand the following? I've never seen any code like it and I just don't have a clue how to interpret it... myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" & "\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})$" That is the pattern derived from the example you gave. Everything was on one line so: ^ Start at the beginning of the line (\D+) Capture everything that is not a digit into submatch 0. (In the definition, the first digit was the beginning of the address). This will be the name. \s+ matches the space(s) between the end of the non-digits and the next group. (.*) capture everything, into submatch 1, up to the next match. This will be the address. \s matches the space between the end of the address and the next group ((CALGARY|MELBOURNE|SYDNEY) Short list of valid city names. This will need to be extended to match your data. Capture into submatch 2. \s+ matches the space(s) between the end of the city name and the province. ([A-Z]{2}) capture the next two capital letter abbreviation as the Province into submatch 3 the rest matches the postcode, and phone number, again, according to the pattern that was in the example you gave. I tried it and it ran without any errors. But it didn't do anything. It ran perfectly here using your line of sample data. You have to enter and run it properly. To enter this Macro (Sub), <alt-F11 opens the Visual Basic Editor. Ensure your project is highlighted in the Project Explorer window. Then, from the top menu, select Insert/Module and paste the code into the window that opens. To use this Macro (Sub), FIRST *SELECT* the data you wish to parse. Then <alt-F8 opens the macro dialog box. Select the macro by name, and <RUN. If you did all that, and the macro did not do anything, then the data you ran it against did not match, in format, what I had laid out. Or, possibly, your newsreader wrapped a line and you did not copy it correctly into the VBEditor. ***Both I and JLGWhiz emphasized to you the importance of properly defining your format.*** --ron |
#6
Posted to microsoft.public.excel.programming
|
|||
|
|||
Parsing a string
Terrific! I think I am starting to understand what's been done. This is
really powerful stuff. Is it possible to imbed the code in the program or does it have to be called in as a macro each time? I'll work it through again... OK, I tried a number of variations, without success - even just copying data into another spreadsheet and trying it. Here is some of the actual data... AAINEY J 1193 STRATHCONA DR SW CALGARY, AB T3H4S1 (403) 293-2671 (403) 293-2671 ABEN CONRAD 14 140 STRATHAVEN CI SW CALGARY, AB T3H2N5 (403) 249-1396 (403) 249-1396 ADAMS-CATHERWOOD T 71 STRATTON HILL RI SW CALGARY, AB T3H1W8 (403) 685-9699 (403) 685-9699 ADAMSON BRYCE P 330 555 STRATHCONA BV SW CALGARY, AB T3H2Z9 (403) 242-2000 (403) 242-2000 AIUTO ANTONIO 511 STRATHCONA ME SW CALGARY, AB T3H1X1 (403) 240-2055 (403) 240-2055 ALBINATI M J 115 STRADWICK RI SW CALGARY, AB T3H1G7 (403) 242-1028 (403) 242-1028 ALEXANDER RONALD R 120 STRATHBURY CI SW CALGARY, AB T3H1P9 (403) 249-2801 (403) 249-2801 AL-JANABY I 35 39 STRATHLE CM SW CALGARY, AB T3H5P8 (403) 256-5690 (403) 256-5690 ALLEN JACK 66 STRATHLEA CR SW CALGARY, AB T3H5A8 (403) 249-8289 (403) 249-8289 AL-SAYED M 11 STRATHLEA CM SW CALGARY, AB T3H5C5 (403) 663-8689 (403) 663-8689 ALSHAWI A 124 STRATHCONA CL SW CALGARY, AB T3H1L3 (403) 685-4070 (403) 685-4070 AMOOZEGAR NEZAM 7001 STRATHRIDGE GA SW CALGARY, AB T3H4R9 (403) 686-1116 (403) 686-1116 ANDA RICHARD 69 STRATHRIDGE CR SW CALGARY, AB T3H3R9 (403) 217-4776 (403) 217-4776 ANDERSON D B 80 STRADBROOKE RI SW CALGARY, AB T3H1T9 (403) 686-0702 (403) 686-0702 ANDERSON J 24 STRATHDALE CL SW CALGARY, AB T3H2K3 (403) 685-1424 (403) 685-1424 Would you please take another try and tell me what needs to change? (I think the difference may be the comma after CALGARY) Thanks again, Ron. Jim "Ron Rosenfeld" wrote in message ... On Mon, 24 May 2010 22:27:18 -0600, "Jim Berglund" wrote: Ron. Thanks for the effort you put in on this. I would really like to understand this more. Is it checking each row to see if it meets the pattern, and if so, doing the parsing and then building a new, 5-column list? Yes. And if your row does not meet the format, it will do nothing. What is the following line for? Set myRegExp = CreateObject("vbscript.regexp") That's called late binding, and it invokes another program which allows the use of Regular Expressions in VBA. Regular Expressions are a powerful method of manipulating strings. Also, could you please help me understand the following? I've never seen any code like it and I just don't have a clue how to interpret it... myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" & "\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})$" That is the pattern derived from the example you gave. Everything was on one line so: ^ Start at the beginning of the line (\D+) Capture everything that is not a digit into submatch 0. (In the definition, the first digit was the beginning of the address). This will be the name. \s+ matches the space(s) between the end of the non-digits and the next group. (.*) capture everything, into submatch 1, up to the next match. This will be the address. \s matches the space between the end of the address and the next group ((CALGARY|MELBOURNE|SYDNEY) Short list of valid city names. This will need to be extended to match your data. Capture into submatch 2. \s+ matches the space(s) between the end of the city name and the province. ([A-Z]{2}) capture the next two capital letter abbreviation as the Province into submatch 3 the rest matches the postcode, and phone number, again, according to the pattern that was in the example you gave. I tried it and it ran without any errors. But it didn't do anything. It ran perfectly here using your line of sample data. You have to enter and run it properly. To enter this Macro (Sub), <alt-F11 opens the Visual Basic Editor. Ensure your project is highlighted in the Project Explorer window. Then, from the top menu, select Insert/Module and paste the code into the window that opens. To use this Macro (Sub), FIRST *SELECT* the data you wish to parse. Then <alt-F8 opens the macro dialog box. Select the macro by name, and <RUN. If you did all that, and the macro did not do anything, then the data you ran it against did not match, in format, what I had laid out. Or, possibly, your newsreader wrapped a line and you did not copy it correctly into the VBEditor. ***Both I and JLGWhiz emphasized to you the importance of properly defining your format.*** --ron |
#7
Posted to microsoft.public.excel.programming
|
|||
|
|||
Parsing a string
On Tue, 25 May 2010 17:36:54 -0600, "Jim Berglund" wrote:
Terrific! I think I am starting to understand what's been done. This is really powerful stuff. Is it possible to imbed the code in the program or does it have to be called in as a macro each time? I'll work it through again... OK, I tried a number of variations, without success - even just copying data into another spreadsheet and trying it. Here is some of the actual data... AAINEY J 1193 STRATHCONA DR SW CALGARY, AB T3H4S1 (403) 293-2671 (403) 293-2671 ABEN CONRAD 14 140 STRATHAVEN CI SW CALGARY, AB T3H2N5 (403) 249-1396 (403) 249-1396 ADAMS-CATHERWOOD T 71 STRATTON HILL RI SW CALGARY, AB T3H1W8 (403) 685-9699 (403) 685-9699 ADAMSON BRYCE P 330 555 STRATHCONA BV SW CALGARY, AB T3H2Z9 (403) 242-2000 (403) 242-2000 AIUTO ANTONIO 511 STRATHCONA ME SW CALGARY, AB T3H1X1 (403) 240-2055 (403) 240-2055 ALBINATI M J 115 STRADWICK RI SW CALGARY, AB T3H1G7 (403) 242-1028 (403) 242-1028 ALEXANDER RONALD R 120 STRATHBURY CI SW CALGARY, AB T3H1P9 (403) 249-2801 (403) 249-2801 AL-JANABY I 35 39 STRATHLE CM SW CALGARY, AB T3H5P8 (403) 256-5690 (403) 256-5690 ALLEN JACK 66 STRATHLEA CR SW CALGARY, AB T3H5A8 (403) 249-8289 (403) 249-8289 AL-SAYED M 11 STRATHLEA CM SW CALGARY, AB T3H5C5 (403) 663-8689 (403) 663-8689 ALSHAWI A 124 STRATHCONA CL SW CALGARY, AB T3H1L3 (403) 685-4070 (403) 685-4070 AMOOZEGAR NEZAM 7001 STRATHRIDGE GA SW CALGARY, AB T3H4R9 (403) 686-1116 (403) 686-1116 ANDA RICHARD 69 STRATHRIDGE CR SW CALGARY, AB T3H3R9 (403) 217-4776 (403) 217-4776 ANDERSON D B 80 STRADBROOKE RI SW CALGARY, AB T3H1T9 (403) 686-0702 (403) 686-0702 ANDERSON J 24 STRATHDALE CL SW CALGARY, AB T3H2K3 (403) 685-1424 (403) 685-1424 Would you please take another try and tell me what needs to change? (I think the difference may be the comma after CALGARY) Thanks again, Ron. Jim There are two possible differences between the pattern in the first example and what you've posted above. 1. The phone number is repeated and on a separate line. I cannot tell from your posting if it is in the same cell (same row) or a different row. 2. The comma is a major difference. Is there always a comma? or is it optional? Two changes to be made in the regex: 1. Add the comma after the city names, but not within a capturing group. 2. Delete the "$" at the end, since there may be something after the phone number (a duplicate phone number) Result: myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" _ & ",\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})" If the comma is optional, we indicate that in the regex by placing a "?" after the comma: myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" _ & ",?\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})" This now works on all the examples you've posted. --ron |
#8
Posted to microsoft.public.excel.programming
|
|||
|
|||
Parsing a string
On Tue, 25 May 2010 17:36:54 -0600, "Jim Berglund" wrote:
Is it possible to imbed the code in the program or does it have to be called in as a macro each time? I don't know what you mean by this question. What are you considering to be "in the program"? Regular expressions come in many flavors, and can be used in a variety of languages. --ron |
#9
Posted to microsoft.public.excel.programming
|
|||
|
|||
Parsing a string
The phone number is repeated in each row, and
the comma is always there. Thanks again, Jim "Ron Rosenfeld" wrote in message ... On Tue, 25 May 2010 17:36:54 -0600, "Jim Berglund" wrote: Terrific! I think I am starting to understand what's been done. This is really powerful stuff. Is it possible to imbed the code in the program or does it have to be called in as a macro each time? I'll work it through again... OK, I tried a number of variations, without success - even just copying data into another spreadsheet and trying it. Here is some of the actual data... AAINEY J 1193 STRATHCONA DR SW CALGARY, AB T3H4S1 (403) 293-2671 (403) 293-2671 ABEN CONRAD 14 140 STRATHAVEN CI SW CALGARY, AB T3H2N5 (403) 249-1396 (403) 249-1396 ADAMS-CATHERWOOD T 71 STRATTON HILL RI SW CALGARY, AB T3H1W8 (403) 685-9699 (403) 685-9699 ADAMSON BRYCE P 330 555 STRATHCONA BV SW CALGARY, AB T3H2Z9 (403) 242-2000 (403) 242-2000 AIUTO ANTONIO 511 STRATHCONA ME SW CALGARY, AB T3H1X1 (403) 240-2055 (403) 240-2055 ALBINATI M J 115 STRADWICK RI SW CALGARY, AB T3H1G7 (403) 242-1028 (403) 242-1028 ALEXANDER RONALD R 120 STRATHBURY CI SW CALGARY, AB T3H1P9 (403) 249-2801 (403) 249-2801 AL-JANABY I 35 39 STRATHLE CM SW CALGARY, AB T3H5P8 (403) 256-5690 (403) 256-5690 ALLEN JACK 66 STRATHLEA CR SW CALGARY, AB T3H5A8 (403) 249-8289 (403) 249-8289 AL-SAYED M 11 STRATHLEA CM SW CALGARY, AB T3H5C5 (403) 663-8689 (403) 663-8689 ALSHAWI A 124 STRATHCONA CL SW CALGARY, AB T3H1L3 (403) 685-4070 (403) 685-4070 AMOOZEGAR NEZAM 7001 STRATHRIDGE GA SW CALGARY, AB T3H4R9 (403) 686-1116 (403) 686-1116 ANDA RICHARD 69 STRATHRIDGE CR SW CALGARY, AB T3H3R9 (403) 217-4776 (403) 217-4776 ANDERSON D B 80 STRADBROOKE RI SW CALGARY, AB T3H1T9 (403) 686-0702 (403) 686-0702 ANDERSON J 24 STRATHDALE CL SW CALGARY, AB T3H2K3 (403) 685-1424 (403) 685-1424 Would you please take another try and tell me what needs to change? (I think the difference may be the comma after CALGARY) Thanks again, Ron. Jim There are two possible differences between the pattern in the first example and what you've posted above. 1. The phone number is repeated and on a separate line. I cannot tell from your posting if it is in the same cell (same row) or a different row. The phone number is repeated in each row, and the comma is always there 2. The comma is a major difference. Is there always a comma? or is it optional? Two changes to be made in the regex: 1. Add the comma after the city names, but not within a capturing group. 2. Delete the "$" at the end, since there may be something after the phone number (a duplicate phone number) Result: myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" _ & ",\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})" If the comma is optional, we indicate that in the regex by placing a "?" after the comma: myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" _ & ",?\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})" This now works on all the examples you've posted. --ron |
#10
Posted to microsoft.public.excel.programming
|
|||
|
|||
Parsing a string
On Tue, 25 May 2010 19:29:04 -0600, "Jim Berglund" wrote:
The phone number is repeated in each row, and the comma is always there. Thanks again, Jim And what happened when you changed the regex as I suggested? It should work, unless there's some other difference not accounted for, --ron |
#11
Posted to microsoft.public.excel.programming
|
|||
|
|||
Parsing a string
MCSDPhil Hi there, When I do this kind of thing, where the data is a bit rubbishy, in VB I just pick at the line until the data comes apart. I did this which is about the best that I could do with it. Public Class Form1 Private Const DATA_SURNAME = 1 Private Const DATA_INIT1 = 2 Private Const DATA_INIT2 = 3 Private Const DATA_ADDRESSNAMENUMBER = 4 Private Const DATA_ADDRESS = 5 Private Const DATA_TOWN = 6 Private Const DATA_STATE = 7 Private Const DATA_POSTCODE = 8 Private Const DATA_TELEPHONE1 = 9 Private Const DATA_TELEPHONE2 = 10 Private Sub btnParseData_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnParseData.Click Dim strFilePath As String Dim strLine As String Dim intPos As Integer Dim strData(10) As String Dim strTemp1 As String Dim strTemp2 As String Dim strTemp As String 'OPEN FILE CODE HERE 'SUBSTITUTE READ LINE CODE HERE strLine = "ALBINATI M J 115 STRADWICK RI SW CALGARY, AB T3H1G7 (403) 242-1028 (403) 242-1028" Do While strLine < "" intPos = InStr(strLine, ",") If intPos 0 Then strTemp1 = Trim(strLine.Substring(0, intPos - 1)) strTemp2 = Trim(strLine.Substring(intPos)) Debug.Print("strTemp1=" & strTemp1 & vbCrLf) Debug.Print("strTemp2=" & strTemp2 & vbCrLf) 'Get surname intPos = InStr(strTemp1, " ") strData(DATA_SURNAME) = strTemp1.Substring(0, (intPos - 1)) Debug.Print("strData(DATA_SURNAME)=" & strData(DATA_SURNAME)) strTemp1 = Trim(strTemp1.Substring(intPos)) 'Get initials 'Find first initial intPos = InStr(strTemp1, " ") If intPos = 2 And Not IsNumeric(strTemp1.Substring(0, 1)) Then 'It is the first initial strData(DATA_INIT1) = strTemp1.Substring(0, 1) Debug.Print("strData(DATA_INIT1)=" & strData(DATA_INIT1)) strTemp1 = Trim(strTemp1.Substring(intPos)) 'See if there is a second initial intPos = InStr(strTemp1, " ") If intPos = 2 And Not IsNumeric(strTemp1.Substring(0, 1)) Then 'This is the second initial strData(DATA_INIT2) = strTemp1.Substring(0, 1) Debug.Print("strData(DATA_INIT2)=" & strData(DATA_INIT2)) strTemp1 = Trim(strTemp1.Substring(intPos)) End If End If intPos = InStr(strTemp1, " ") strTemp = Trim(strTemp1.Substring(0, intPos)) If IsNumeric(strTemp) Then 'We probably have the house number strData(DATA_ADDRESSNAMENUMBER) = strTemp Debug.Print("strData(DATA_ADDRESSNAMENUMBER)=" & strData(DATA_ADDRESSNAMENUMBER)) strTemp1 = Trim(strTemp1.Substring(intPos)) End If 'Put the rest of strTemp1 in Address strData(DATA_ADDRESS) = Trim(strTemp1) Debug.Print("strData(DATA_ADDRESS)=" & strData(DATA_ADDRESS)) 'Sort out the second part of the string intPos = InStr(strTemp2, "(") If intPos 0 Then 'Get the State and PostCode strTemp = Trim(strTemp2.Substring(0, intPos - 1)) strTemp2 = Trim(strTemp2.Substring(intPos - 1)) 'Get the state If InStr(strTemp, " ") = 3 Then 'We should have the first 2 digits as the state strData(DATA_STATE) = Trim(strTemp.Substring(0, 2)) Debug.Print("strData(DATA_STATE)=" & strData(DATA_STATE)) strTemp = Trim(strTemp.Substring(2)) End If 'Put it all in the postcode bit strData(DATA_POSTCODE) = Trim(strTemp) Debug.Print("strData(DATA_POSTCODE)=" & strData(DATA_POSTCODE)) 'Get the Phone Numbers intPos = InStr(3, strTemp2, "(") If intPos 0 Then strData(DATA_TELEPHONE1) = Trim(strTemp2.Substring(0, intPos - 1)) Debug.Print("strData(DATA_TELEPHONE1)=" & strData(DATA_TELEPHONE1)) strData(DATA_TELEPHONE2) = Trim(strTemp2.Substring(intPos - 1)) Debug.Print("strData(DATA_TELEPHONE2)=" & strData(DATA_TELEPHONE2)) Else strData(DATA_TELEPHONE1) = Trim(strTemp2) Debug.Print("strData(DATA_TELEPHONE1)=" & strData(DATA_TELEPHONE1)) End If End If End If 'READ NEXT LINE CODE HERE i.e. strLine = ReadLine etc. Loop 'CLOSE FILE CODE HERE End Sub End Class Hope this helps. Regards, Phil. *** Sent via Developersdex http://www.developersdex.com *** |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
parsing a string | Excel Programming | |||
Parsing a string | Excel Programming | |||
Parsing a string | Excel Programming | |||
Need help parsing a string | Excel Programming | |||
parsing a string | Excel Programming |