![]() |
Transpose rows to columns w/varying numbers of lines per record
I have a text-file address listing with a couple of thousand entries of
varying number of lines--some have 3 lines: Name Address City, State Zip and others have 4: Name Etc (like "c/o" or contact name--no particular pattern to this) Address City, State Zip There are no blank lines between records. I don't have any control over the format of the text file, as it's an extract from another program and I can't change the way the data is extracted. I can't use Excel's transpose feature to get a useful spreadsheet because the records aren't consistent in the number of lines per record. Is there a way to convert this data to a spreadsheet with 4 columns per record (with a blank column for the 3-line records)? |
Transpose rows to columns w/varying numbers of lines per record
You could use a VBA routine, but only if there is some discernible way to
tell when one address ends and the next begins. So...is there a ZIP at the very end of EVERY SINGLE ADDRESS? Is it always a ZIP or is it sometimes a ZIP+4? Any non-US addresses using a nonnumeric postal code? Tell people here how to objectively determine when to end one address and start another, and you'll probably get some valuable help "MG" wrote: I have a text-file address listing with a couple of thousand entries of varying number of lines--some have 3 lines: Name Address City, State Zip and others have 4: Name Etc (like "c/o" or contact name--no particular pattern to this) Address City, State Zip There are no blank lines between records. I don't have any control over the format of the text file, as it's an extract from another program and I can't change the way the data is extracted. I can't use Excel's transpose feature to get a useful spreadsheet because the records aren't consistent in the number of lines per record. Is there a way to convert this data to a spreadsheet with 4 columns per record (with a blank column for the 3-line records)? |
Transpose rows to columns w/varying numbers of lines per recor
Thanks, Duke. Every address ends in either a ZIP or a ZIP+4.
"Duke Carey" wrote: You could use a VBA routine, but only if there is some discernible way to tell when one address ends and the next begins. So...is there a ZIP at the very end of EVERY SINGLE ADDRESS? Is it always a ZIP or is it sometimes a ZIP+4? Any non-US addresses using a nonnumeric postal code? Tell people here how to objectively determine when to end one address and start another, and you'll probably get some valuable help "MG" wrote: I have a text-file address listing with a couple of thousand entries of varying number of lines--some have 3 lines: Name Address City, State Zip and others have 4: Name Etc (like "c/o" or contact name--no particular pattern to this) Address City, State Zip There are no blank lines between records. I don't have any control over the format of the text file, as it's an extract from another program and I can't change the way the data is extracted. I can't use Excel's transpose feature to get a useful spreadsheet because the records aren't consistent in the number of lines per record. Is there a way to convert this data to a spreadsheet with 4 columns per record (with a blank column for the 3-line records)? |
Transpose rows to columns w/varying numbers of lines per recor
If you need help with where to put VBA code, look at this site
http://www.contextures.com/xlvba01.html or http://www.cpearson.com/excel/codemods.htm Try this code: Sub ParseAddresses() Dim rngSrc As Range Dim rngTgt As Range Dim intLineNum As Integer ' change the next line to reflect the first cell of your data Set rngSrc = Range("A1") Set rngTgt = Range("C1") intLineNum = 0 Do While True If Len(rngSrc.Text) = 0 Then Exit Sub If Not IsNumeric(Right(rngSrc.Text, 4)) Then ' part of the body of the address rngTgt.Offset(0, intLineNum).Value = rngSrc.Text intLineNum = intLineNum + 1 Set rngSrc = rngSrc.Offset(1, 0) Else rngTgt.Offset(0, 3).Value = rngSrc.Text intLineNum = 0 Set rngSrc = rngSrc.Offset(1, 0) Set rngTgt = rngTgt.Offset(1, 0) ' it's PRESUMABLY a zip code & the final address line End If Loop End Sub |
Transpose rows to columns w/varying numbers of lines per recor
It successfully placed A1 in C1, A2 in D1, and A3 in E1 (the first record is
a 3-line record) but then placed A4 in F1, A5 in G1 and A6 in H1 (the second record is also a 3-line record). It ran into the 256-column limit, apparently, and returned runtime error 1004 for application-defined or object-defined error. It's definitely on the right track, though. The records referred to in the 256 columns do include both 3- and 4-line records, just no new row for each record. Would it be easier to write a piece of code that would insert a blank line after each 3-line record, and then use transpose to get the 4 column list? "Duke Carey" wrote: If you need help with where to put VBA code, look at this site http://www.contextures.com/xlvba01.html or http://www.cpearson.com/excel/codemods.htm Try this code: Sub ParseAddresses() Dim rngSrc As Range Dim rngTgt As Range Dim intLineNum As Integer ' change the next line to reflect the first cell of your data Set rngSrc = Range("A1") Set rngTgt = Range("C1") intLineNum = 0 Do While True If Len(rngSrc.Text) = 0 Then Exit Sub If Not IsNumeric(Right(rngSrc.Text, 4)) Then ' part of the body of the address rngTgt.Offset(0, intLineNum).Value = rngSrc.Text intLineNum = intLineNum + 1 Set rngSrc = rngSrc.Offset(1, 0) Else rngTgt.Offset(0, 3).Value = rngSrc.Text intLineNum = 0 Set rngSrc = rngSrc.Offset(1, 0) Set rngTgt = rngTgt.Offset(1, 0) ' it's PRESUMABLY a zip code & the final address line End If Loop End Sub |
Transpose rows to columns w/varying numbers of lines per recor
The code checked the last for characters of each line to see if they were
numeric. If so it assumed it was a Zip code and that line marked the end of the address. What you describe sounds as if it never found a single ZIP, making me wonder if the lines are padded with spaces to the right. Select one of the lines that ends witha ZIP code, press the F2 key, and see if the cursor appears off to the right of the last character. If so, it's padded with spaces,a nd a slight modification to the code should fix it. Change this line If Not IsNumeric(Right(rngSrc.Text, 4)) Then to If Not IsNumeric(Right(Trim(rngSrc.Text), 4)) Then and then re-run it to see what happens "MG" wrote: It successfully placed A1 in C1, A2 in D1, and A3 in E1 (the first record is a 3-line record) but then placed A4 in F1, A5 in G1 and A6 in H1 (the second record is also a 3-line record). It ran into the 256-column limit, apparently, and returned runtime error 1004 for application-defined or object-defined error. It's definitely on the right track, though. The records referred to in the 256 columns do include both 3- and 4-line records, just no new row for each record. Would it be easier to write a piece of code that would insert a blank line after each 3-line record, and then use transpose to get the 4 column list? |
Transpose rows to columns w/varying numbers of lines per recor
Thanks, Duke; much closer. There was much padding, and trim stripped that
out. There are some lines that have numeric characters at the end, but which aren't the city/state/zip. As a result, they are identified as belonging in the last column. Addresses with PO boxes have numeric characters at the end. All PO box addresses in the file are formatted with "P.O. BOX " (with periods, all caps, space after the "X") as the first characters for that line and then a box number, the length of which can vary (ex: P.O. BOX 3, P.O. BOX 34, P.O. BOX 586, etc.). Addresses on highways have numeric characters at the end. All highway addresses in the file are formatted with " HWY " following an address number, the length of which can vary, and preceding a highway number, the length of which can vary (ex: 123 HWY 173, 15435 HWY 10, 56 HWY 4995, etc.). As far as I can see, these are the only variations that need to be dealt with--I didn't see any other kinds of addresses with numbers at the end. Finally, the last line of each address always has a text string of varying length (the city name), then a comma, a space, a two-character alpha state code, another space, and then the zip or zip+4 (ex: PORTLAND, OR 97202 or REDONDO BEACH, CA 90277). Do you see any way that these text strings can be identified to put the C/S/Z line as the last field? Much thanks for your help--I really appreciate it. "Duke Carey" wrote: The code checked the last for characters of each line to see if they were numeric. If so it assumed it was a Zip code and that line marked the end of the address. What you describe sounds as if it never found a single ZIP, making me wonder if the lines are padded with spaces to the right. Select one of the lines that ends witha ZIP code, press the F2 key, and see if the cursor appears off to the right of the last character. If so, it's padded with spaces,a nd a slight modification to the code should fix it. Change this line If Not IsNumeric(Right(rngSrc.Text, 4)) Then to If Not IsNumeric(Right(Trim(rngSrc.Text), 4)) Then and then re-run it to see what happens "MG" wrote: It successfully placed A1 in C1, A2 in D1, and A3 in E1 (the first record is a 3-line record) but then placed A4 in F1, A5 in G1 and A6 in H1 (the second record is also a 3-line record). It ran into the 256-column limit, apparently, and returned runtime error 1004 for application-defined or object-defined error. It's definitely on the right track, though. The records referred to in the 256 columns do include both 3- and 4-line records, just no new row for each record. Would it be easier to write a piece of code that would insert a blank line after each 3-line record, and then use transpose to get the 4 column list? |
Transpose rows to columns w/varying numbers of lines per recor
Well, I've changed it so it WILL NOT treat lines containing P.O. Box or HWY
as the end of the address. Should get you closer Sub ParseAddresses() Dim rngSrc As Range Dim rngTgt As Range Dim intLineNum As Integer Dim strAddress As String ' change the next line to reflect the first cell of your data Set rngSrc = Range("A1") Set rngTgt = Range("C1") intLineNum = 0 Do While True If Len(rngSrc.Text) = 0 Then Exit Sub strAddress = Trim(rngSrc.Text) If IsNumeric(Right(strAddress, 4)) Then If (InStr(1, strAddress, "P.O. Box", vbTextCompare) 1 _ Or InStr(1, strAddress, "HWY", vbTextCompare) 1) Then ' part of the body of the address rngTgt.Offset(0, intLineNum).Value = rngSrc.Text intLineNum = intLineNum + 1 Set rngSrc = rngSrc.Offset(1, 0) Else ' it's PRESUMABLY a zip code & the final address line rngTgt.Offset(0, 3).Value = strAddress intLineNum = 0 Set rngSrc = rngSrc.Offset(1, 0) Set rngTgt = rngTgt.Offset(1, 0) End If Else ' part of the body of the address rngTgt.Offset(0, intLineNum).Value = rngSrc.Text intLineNum = intLineNum + 1 Set rngSrc = rngSrc.Offset(1, 0) End If Loop End Sub |
Transpose rows to columns w/varying numbers of lines per recor
Thanks, Duke; I noticed that some entries seem to handle the PO box and
highway addresses correctly, and some still put them in the last column. As far as I can tell, if an address has a 2-digit PO box, it places that information in the 2nd column (or 3rd, if there's an "attn" entry in the 2ndposition) and correctly places the city/state/zip in the 4th column. If it has a 3- or more digit box number, it puts that information in the 4th column with the city/state/zip in the 4th column on the next line. For the highway addresses, they all seem to go correctly into the 2nd (or 3rd) position, and none of them show up in the 4th column. It looks like they were all 2-digit highway numbers (there was one that was 12345 HWY 99 #2 that also worked properly). I changed If IsNumeric(Right(strAddress, 4)) Then to If IsNumeric(Right(strAddress, 5)) Then and this resulted in the 3-digit box addresses working properly, but there are some 6-digit box numbers that still get put in the last column. I'm going to futz with it a bit, as I think I might be on the right track. Duke, thanks so much for all of your help. My one term of beginning VB sure didn't get me where I need to be for this task! "Duke Carey" wrote: Well, I've changed it so it WILL NOT treat lines containing P.O. Box or HWY as the end of the address. Should get you closer Sub ParseAddresses() Dim rngSrc As Range Dim rngTgt As Range Dim intLineNum As Integer Dim strAddress As String ' change the next line to reflect the first cell of your data Set rngSrc = Range("A1") Set rngTgt = Range("C1") intLineNum = 0 Do While True If Len(rngSrc.Text) = 0 Then Exit Sub strAddress = Trim(rngSrc.Text) If IsNumeric(Right(strAddress, 4)) Then If (InStr(1, strAddress, "P.O. Box", vbTextCompare) 1 _ Or InStr(1, strAddress, "HWY", vbTextCompare) 1) Then ' part of the body of the address rngTgt.Offset(0, intLineNum).Value = rngSrc.Text intLineNum = intLineNum + 1 Set rngSrc = rngSrc.Offset(1, 0) Else ' it's PRESUMABLY a zip code & the final address line rngTgt.Offset(0, 3).Value = strAddress intLineNum = 0 Set rngSrc = rngSrc.Offset(1, 0) Set rngTgt = rngTgt.Offset(1, 0) End If Else ' part of the body of the address rngTgt.Offset(0, intLineNum).Value = rngSrc.Text intLineNum = intLineNum + 1 Set rngSrc = rngSrc.Offset(1, 0) End If Loop End Sub |
All times are GMT +1. The time now is 07:00 PM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com