Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2
Default Parsing a Text file into worksheet with VBA

Hi,

I'm trying to take a structured text file (saved from a PDF file) and
read it into an Excel worksheet with a macro. The problem is that the
structure isn't straightforward. Every section in the file contains
~50 rows, and the delimiters aren't consistent. For example:

Section1
Header line
Customer: Acme Rockets Address: 22 Middle Street
State: AZ
Product: Super Rocket Qty: 12
.. . .
Section2
Header line
Customer: Acme Fireworks Address: 66 B Street
State: AB
Product: Coyote Killer Qty: 24
.. . .

The "Header line" is always the same, and not needed in the Excel file.
I want the worksheet to have one row of data for each section.

Customer Address State Product Qty
Acme Rockets 22 Middle Street AZ Super Rocket 12
Acme Fireworks 66 B Street AB Coyote Killer 24

I did look at the often-linked page:
http://www.cpearson.com/excel/imptext.htm
But since my delimiters are not consistent, I was torn on how to
accomplish this. Also, since they aren't necessarily all on newlines,
I'm having trouble coming up with the best way to break them out.

I was thinking of using an array of delimiters and then cycling through
that as I read each line of the file, but using the approach from the
link above, that gets problematic when the field is on a different
line. Anyone have any suggestions?

The text file from the PDF appears to be the only option - HTML and XML
both end up representing the details on the page as images. Same for
RTF or DOC files.

Thanks

Matt

  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 4,391
Default Parsing a Text file into worksheet with VBA

Hard to say much as your 2 example have the same layout.
But if you text file really is a mess, maybe you need to improve its quality
from the start, with however you get the data out of pdf.

But assuming the samples are what correct, here's one way you could try:
- Read the whole text file into a string with 'Open TextFile.txt for input"
etc.
- Remove all vbCrLF (or the new line markers) with Replace
- Split on "Header line"
- Split each elememt above on ":".
- Take the even index elements and Trim()

NickHK

wrote in message
oups.com...
Hi,

I'm trying to take a structured text file (saved from a PDF file) and
read it into an Excel worksheet with a macro. The problem is that the
structure isn't straightforward. Every section in the file contains
~50 rows, and the delimiters aren't consistent. For example:

Section1
Header line
Customer: Acme Rockets Address: 22 Middle Street
State: AZ
Product: Super Rocket Qty: 12
. . .
Section2
Header line
Customer: Acme Fireworks Address: 66 B Street
State: AB
Product: Coyote Killer Qty: 24
. . .

The "Header line" is always the same, and not needed in the Excel file.
I want the worksheet to have one row of data for each section.

Customer Address State Product Qty
Acme Rockets 22 Middle Street AZ Super Rocket 12
Acme Fireworks 66 B Street AB Coyote Killer 24

I did look at the often-linked page:
http://www.cpearson.com/excel/imptext.htm
But since my delimiters are not consistent, I was torn on how to
accomplish this. Also, since they aren't necessarily all on newlines,
I'm having trouble coming up with the best way to break them out.

I was thinking of using an array of delimiters and then cycling through
that as I read each line of the file, but using the approach from the
link above, that gets problematic when the field is on a different
line. Anyone have any suggestions?

The text file from the PDF appears to be the only option - HTML and XML
both end up representing the details on the page as images. Same for
RTF or DOC files.

Thanks

Matt



  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2
Default Parsing a Text file into worksheet with VBA

Looks like I can't split on anything other than a single character - is
that true?

I did find one character that I can split on, finally - though it's
still not 100% correct.

Now my issue is that some of the fields are repeated, and my parse is
not picking up the duplicates.

I'm using instr to find the start of the delimiters and then mid to get
the data between the end of the first delimiter and the start of the
second delimiter. Anyway to find the 2nd, 3rd, 4th, etc. occurrence in
a string?

Thanks



On Oct 19, 10:59 pm, "NickHK" wrote:
Hard to say much as your 2 example have the same layout.
But if you text file really is a mess, maybe you need to improve its quality
from the start, with however you get the data out of pdf.

But assuming the samples are what correct, here's one way you could try:
- Read the whole text file into a string with 'Open TextFile.txt for input"
etc.
- Remove all vbCrLF (or the new line markers) with Replace
- Split on "Header line"
- Split each elememt above on ":".
- Take the even index elements and Trim()

NickHK

wrote in ooglegroups.com...



Hi,


I'm trying to take a structured text file (saved from a PDF file) and
read it into an Excel worksheet with a macro. The problem is that the
structure isn't straightforward. Every section in the file contains
~50 rows, and the delimiters aren't consistent. For example:


Section1
Header line
Customer: Acme Rockets Address: 22 Middle Street
State: AZ
Product: Super Rocket Qty: 12
. . .
Section2
Header line
Customer: Acme Fireworks Address: 66 B Street
State: AB
Product: Coyote Killer Qty: 24
. . .


The "Header line" is always the same, and not needed in the Excel file.
I want the worksheet to have one row of data for each section.


Customer Address State Product Qty
Acme Rockets 22 Middle Street AZ Super Rocket 12
Acme Fireworks 66 B Street AB Coyote Killer 24


I did look at the often-linked page:
http://www.cpearson.com/excel/imptext.htm
But since my delimiters are not consistent, I was torn on how to
accomplish this. Also, since they aren't necessarily all on newlines,
I'm having trouble coming up with the best way to break them out.


I was thinking of using an array of delimiters and then cycling through
that as I read each line of the file, but using the approach from the
link above, that gets problematic when the field is on a different
line. Anyone have any suggestions?


The text file from the PDF appears to be the only option - HTML and XML
both end up representing the details on the page as images. Same for
RTF or DOC files.


Thanks


Matt- Hide quoted text -- Show quoted text -


  #4   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 4,391
Default Parsing a Text file into worksheet with VBA

Did you try it with something like :
Const TestStr As String =
"ahefgDELIMahefgDELIMahefgDELIMahefgDELIMahefgDELI MahefgDELIMahefg"

You didn't post any code, so hard to tell what you are doing, but the
"Header line" and ":" are suitable delimiters.

NickHK

wrote in message
ups.com...
Looks like I can't split on anything other than a single character - is
that true?

I did find one character that I can split on, finally - though it's
still not 100% correct.

Now my issue is that some of the fields are repeated, and my parse is
not picking up the duplicates.

I'm using instr to find the start of the delimiters and then mid to get
the data between the end of the first delimiter and the start of the
second delimiter. Anyway to find the 2nd, 3rd, 4th, etc. occurrence in
a string?

Thanks



On Oct 19, 10:59 pm, "NickHK" wrote:
Hard to say much as your 2 example have the same layout.
But if you text file really is a mess, maybe you need to improve its

quality
from the start, with however you get the data out of pdf.

But assuming the samples are what correct, here's one way you could try:
- Read the whole text file into a string with 'Open TextFile.txt for

input"
etc.
- Remove all vbCrLF (or the new line markers) with Replace
- Split on "Header line"
- Split each elememt above on ":".
- Take the even index elements and Trim()

NickHK

wrote in

ooglegroups.com...



Hi,


I'm trying to take a structured text file (saved from a PDF file) and
read it into an Excel worksheet with a macro. The problem is that the
structure isn't straightforward. Every section in the file contains
~50 rows, and the delimiters aren't consistent. For example:


Section1
Header line
Customer: Acme Rockets Address: 22 Middle Street
State: AZ
Product: Super Rocket Qty: 12
. . .
Section2
Header line
Customer: Acme Fireworks Address: 66 B Street
State: AB
Product: Coyote Killer Qty: 24
. . .


The "Header line" is always the same, and not needed in the Excel

file.
I want the worksheet to have one row of data for each section.


Customer Address State Product Qty
Acme Rockets 22 Middle Street AZ Super Rocket 12
Acme Fireworks 66 B Street AB Coyote Killer 24


I did look at the often-linked page:
http://www.cpearson.com/excel/imptext.htm
But since my delimiters are not consistent, I was torn on how to
accomplish this. Also, since they aren't necessarily all on newlines,
I'm having trouble coming up with the best way to break them out.


I was thinking of using an array of delimiters and then cycling

through
that as I read each line of the file, but using the approach from the
link above, that gets problematic when the field is on a different
line. Anyone have any suggestions?


The text file from the PDF appears to be the only option - HTML and

XML
both end up representing the details on the page as images. Same for
RTF or DOC files.


Thanks


Matt- Hide quoted text -- Show quoted text -




Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Parsing a Text File [email protected] Excel Programming 8 January 30th 06 04:43 PM
Pause Macro For Start Row While Parsing a Text File Bill Foster[_2_] Excel Programming 2 November 9th 05 08:47 PM
Issue with parsing text file into worksheet primrose Excel Programming 1 May 25th 05 05:09 PM
VBA Import of text file & Array parsing of that data Dennis Excel Discussion (Misc queries) 4 November 28th 04 10:20 PM
Parsing imported text file with macro... help! scrupul0us[_2_] Excel Programming 0 September 7th 04 10:13 PM


All times are GMT +1. The time now is 03:39 PM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"