Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
Parsing a Text file into worksheet with VBA
Hi,
I'm trying to take a structured text file (saved from a PDF file) and read it into an Excel worksheet with a macro. The problem is that the structure isn't straightforward. Every section in the file contains ~50 rows, and the delimiters aren't consistent. For example: Section1 Header line Customer: Acme Rockets Address: 22 Middle Street State: AZ Product: Super Rocket Qty: 12 .. . . Section2 Header line Customer: Acme Fireworks Address: 66 B Street State: AB Product: Coyote Killer Qty: 24 .. . . The "Header line" is always the same, and not needed in the Excel file. I want the worksheet to have one row of data for each section. Customer Address State Product Qty Acme Rockets 22 Middle Street AZ Super Rocket 12 Acme Fireworks 66 B Street AB Coyote Killer 24 I did look at the often-linked page: http://www.cpearson.com/excel/imptext.htm But since my delimiters are not consistent, I was torn on how to accomplish this. Also, since they aren't necessarily all on newlines, I'm having trouble coming up with the best way to break them out. I was thinking of using an array of delimiters and then cycling through that as I read each line of the file, but using the approach from the link above, that gets problematic when the field is on a different line. Anyone have any suggestions? The text file from the PDF appears to be the only option - HTML and XML both end up representing the details on the page as images. Same for RTF or DOC files. Thanks Matt |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
Parsing a Text file into worksheet with VBA
Hard to say much as your 2 example have the same layout.
But if you text file really is a mess, maybe you need to improve its quality from the start, with however you get the data out of pdf. But assuming the samples are what correct, here's one way you could try: - Read the whole text file into a string with 'Open TextFile.txt for input" etc. - Remove all vbCrLF (or the new line markers) with Replace - Split on "Header line" - Split each elememt above on ":". - Take the even index elements and Trim() NickHK wrote in message oups.com... Hi, I'm trying to take a structured text file (saved from a PDF file) and read it into an Excel worksheet with a macro. The problem is that the structure isn't straightforward. Every section in the file contains ~50 rows, and the delimiters aren't consistent. For example: Section1 Header line Customer: Acme Rockets Address: 22 Middle Street State: AZ Product: Super Rocket Qty: 12 . . . Section2 Header line Customer: Acme Fireworks Address: 66 B Street State: AB Product: Coyote Killer Qty: 24 . . . The "Header line" is always the same, and not needed in the Excel file. I want the worksheet to have one row of data for each section. Customer Address State Product Qty Acme Rockets 22 Middle Street AZ Super Rocket 12 Acme Fireworks 66 B Street AB Coyote Killer 24 I did look at the often-linked page: http://www.cpearson.com/excel/imptext.htm But since my delimiters are not consistent, I was torn on how to accomplish this. Also, since they aren't necessarily all on newlines, I'm having trouble coming up with the best way to break them out. I was thinking of using an array of delimiters and then cycling through that as I read each line of the file, but using the approach from the link above, that gets problematic when the field is on a different line. Anyone have any suggestions? The text file from the PDF appears to be the only option - HTML and XML both end up representing the details on the page as images. Same for RTF or DOC files. Thanks Matt |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
Parsing a Text file into worksheet with VBA
Looks like I can't split on anything other than a single character - is
that true? I did find one character that I can split on, finally - though it's still not 100% correct. Now my issue is that some of the fields are repeated, and my parse is not picking up the duplicates. I'm using instr to find the start of the delimiters and then mid to get the data between the end of the first delimiter and the start of the second delimiter. Anyway to find the 2nd, 3rd, 4th, etc. occurrence in a string? Thanks On Oct 19, 10:59 pm, "NickHK" wrote: Hard to say much as your 2 example have the same layout. But if you text file really is a mess, maybe you need to improve its quality from the start, with however you get the data out of pdf. But assuming the samples are what correct, here's one way you could try: - Read the whole text file into a string with 'Open TextFile.txt for input" etc. - Remove all vbCrLF (or the new line markers) with Replace - Split on "Header line" - Split each elememt above on ":". - Take the even index elements and Trim() NickHK wrote in ooglegroups.com... Hi, I'm trying to take a structured text file (saved from a PDF file) and read it into an Excel worksheet with a macro. The problem is that the structure isn't straightforward. Every section in the file contains ~50 rows, and the delimiters aren't consistent. For example: Section1 Header line Customer: Acme Rockets Address: 22 Middle Street State: AZ Product: Super Rocket Qty: 12 . . . Section2 Header line Customer: Acme Fireworks Address: 66 B Street State: AB Product: Coyote Killer Qty: 24 . . . The "Header line" is always the same, and not needed in the Excel file. I want the worksheet to have one row of data for each section. Customer Address State Product Qty Acme Rockets 22 Middle Street AZ Super Rocket 12 Acme Fireworks 66 B Street AB Coyote Killer 24 I did look at the often-linked page: http://www.cpearson.com/excel/imptext.htm But since my delimiters are not consistent, I was torn on how to accomplish this. Also, since they aren't necessarily all on newlines, I'm having trouble coming up with the best way to break them out. I was thinking of using an array of delimiters and then cycling through that as I read each line of the file, but using the approach from the link above, that gets problematic when the field is on a different line. Anyone have any suggestions? The text file from the PDF appears to be the only option - HTML and XML both end up representing the details on the page as images. Same for RTF or DOC files. Thanks Matt- Hide quoted text -- Show quoted text - |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
Parsing a Text file into worksheet with VBA
Did you try it with something like :
Const TestStr As String = "ahefgDELIMahefgDELIMahefgDELIMahefgDELIMahefgDELI MahefgDELIMahefg" You didn't post any code, so hard to tell what you are doing, but the "Header line" and ":" are suitable delimiters. NickHK wrote in message ups.com... Looks like I can't split on anything other than a single character - is that true? I did find one character that I can split on, finally - though it's still not 100% correct. Now my issue is that some of the fields are repeated, and my parse is not picking up the duplicates. I'm using instr to find the start of the delimiters and then mid to get the data between the end of the first delimiter and the start of the second delimiter. Anyway to find the 2nd, 3rd, 4th, etc. occurrence in a string? Thanks On Oct 19, 10:59 pm, "NickHK" wrote: Hard to say much as your 2 example have the same layout. But if you text file really is a mess, maybe you need to improve its quality from the start, with however you get the data out of pdf. But assuming the samples are what correct, here's one way you could try: - Read the whole text file into a string with 'Open TextFile.txt for input" etc. - Remove all vbCrLF (or the new line markers) with Replace - Split on "Header line" - Split each elememt above on ":". - Take the even index elements and Trim() NickHK wrote in ooglegroups.com... Hi, I'm trying to take a structured text file (saved from a PDF file) and read it into an Excel worksheet with a macro. The problem is that the structure isn't straightforward. Every section in the file contains ~50 rows, and the delimiters aren't consistent. For example: Section1 Header line Customer: Acme Rockets Address: 22 Middle Street State: AZ Product: Super Rocket Qty: 12 . . . Section2 Header line Customer: Acme Fireworks Address: 66 B Street State: AB Product: Coyote Killer Qty: 24 . . . The "Header line" is always the same, and not needed in the Excel file. I want the worksheet to have one row of data for each section. Customer Address State Product Qty Acme Rockets 22 Middle Street AZ Super Rocket 12 Acme Fireworks 66 B Street AB Coyote Killer 24 I did look at the often-linked page: http://www.cpearson.com/excel/imptext.htm But since my delimiters are not consistent, I was torn on how to accomplish this. Also, since they aren't necessarily all on newlines, I'm having trouble coming up with the best way to break them out. I was thinking of using an array of delimiters and then cycling through that as I read each line of the file, but using the approach from the link above, that gets problematic when the field is on a different line. Anyone have any suggestions? The text file from the PDF appears to be the only option - HTML and XML both end up representing the details on the page as images. Same for RTF or DOC files. Thanks Matt- Hide quoted text -- Show quoted text - |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Parsing a Text File | Excel Programming | |||
Pause Macro For Start Row While Parsing a Text File | Excel Programming | |||
Issue with parsing text file into worksheet | Excel Programming | |||
VBA Import of text file & Array parsing of that data | Excel Discussion (Misc queries) | |||
Parsing imported text file with macro... help! | Excel Programming |