Home |
Search |
Today's Posts |
|
#1
![]()
Posted to microsoft.public.excel.programming,microsoft.public.vb.general.discussion
|
|||
|
|||
![]()
I distributed some workbook apps in 2001 that are currently being used
for budgeting and forecasting work. A main feature of the apps is the ability to pull current-year data from the proprietary accounting software that the businesses use. The users were able to run a general ledger trial balance, save it in plain text format, and my apps would parse the data and import it into the workbooks. Well, now their accounting software has been updated to export the reports into html format, not plain text. So, instead of seeing: ACCOUNT-NO DESCRIPTION BEGINNING BALANCE ---------TOTAL-------- NET CHANGE ENDING BALANCE BUDGET PRIOR-YEAR 010-30120-00 14,606.00- DEBIT .00 8,185.00- 22,791.00- 13,952- 15,549- IND PREFERRED CREDIT 8,185.00 TRX-DATE DR-AMOUNT CR-AMOUNT RUNNING-BALANCE SOURCE REFERENCE 08/01/00 227.00 14,833.00- IJ01 FRIDAY 08/02/00 216.00 15,049.00- IJ02 SATURDAY 08/03/00 233.00 15,282.00- IJ03 SUNDAY It now looks like this: <table <TR<TD010-30110-00 </TD<TD ALIGN=RIGHT 14,606.00-</TD<TD ALIGN=RIGHTDEBIT</TD<TD COLSPAN=2 ALIGN=RIGHT .00 </TD<TD COLSPAN=2 ALIGN=RIGHT 8,185.00-</TD<TD ALIGN=RIGHT 22,791.00-</TD<TD ALIGN=RIGHT 13,952-</TD<TD ALIGN=RIGHT 15,549-</TD</TR <TR<TDIND - FIT PACKAGE </TD<TD COLSPAN=2 ALIGN=RIGHTCREDIT</TD<TD COLSPAN=2 ALIGN=RIGHT 8,185.00</TD<TD COLSPAN=6 </TD</TR <TR<TD COLSPAN=10 </TD</TR <TR<TD ALIGN=RIGHTTRX-DATE</TD<TD ALIGN=CENTERDR-AMOUNT</TD<TD COLSPAN=2 ALIGN=CENTERCR-AMOUNT</TD<TD COLSPAN=2 ALIGN=RIGHT RUNNG-BAL </TD<TD ALIGN=CENTERSOURCE</TD<TD COLSPAN=3 ALIGN=CENTERREFERENCE</TD</TR <TR<TD ALIGN=RIGHT08/01/03</TD<TD ALIGN=RIGHT </TD<TD COLSPAN=2 ALIGN=RIGHT 227.00</TD<TD COLSPAN=2 ALIGN=RIGHT 14,833.00-</TD<TD ALIGN=CENTERIJ01 </TD<TD COLSPAN=3 Friday </TD</TR <TR<TD ALIGN=RIGHT08/02/03</TD<TD ALIGN=RIGHT </TD<TD COLSPAN=2 ALIGN=RIGHT 216.00</TD<TD COLSPAN=2 ALIGN=RIGHT 15,049.00-</TD<TD ALIGN=CENTERIJ02 </TD<TD COLSPAN=3 Saturday </TD</TR <TR<TD ALIGN=RIGHT08/03/03</TD<TD ALIGN=RIGHT </TD<TD COLSPAN=2 ALIGN=RIGHT 233.00</TD<TD COLSPAN=2 ALIGN=RIGHT 15,282.00-</TD<TD ALIGN=CENTERIJ03 </TD<TD COLSPAN=3 Sunday </TD</TR </table Any advice on what is the best way to parse data in HTML format? Does VBA have any built-in support for this purpose? The versions of Excel being used are 2000 and 2002 exclusively. Should I bypass Excel altogether and write a VB6 app that converts the html formatted report into the text format that Excel expects? I'm thinking that might be the best bet, in case future software updates to their accounting software make any html formatting changes. In any case, I would still love to know if there is an easier way to parse the html. It looks like all I really need to do is to be able to read the table cell contents in a logical manner. Thanks Steve |
#2
![]()
Posted to microsoft.public.excel.programming,microsoft.public.vb.general.discussion
|
|||
|
|||
![]()
Just open the html file in Excel using file open. then look at the result.
Most of it should be parsed into separate columns. -- Regards, Tom Ogilvy "Steve" wrote in message ... I distributed some workbook apps in 2001 that are currently being used for budgeting and forecasting work. A main feature of the apps is the ability to pull current-year data from the proprietary accounting software that the businesses use. The users were able to run a general ledger trial balance, save it in plain text format, and my apps would parse the data and import it into the workbooks. Well, now their accounting software has been updated to export the reports into html format, not plain text. So, instead of seeing: ACCOUNT-NO DESCRIPTION BEGINNING BALANCE ---------TOTAL-------- NET CHANGE ENDING BALANCE BUDGET PRIOR-YEAR 010-30120-00 14,606.00- DEBIT .00 8,185.00- 22,791.00- 13,952- 15,549- IND PREFERRED CREDIT 8,185.00 TRX-DATE DR-AMOUNT CR-AMOUNT RUNNING-BALANCE SOURCE REFERENCE 08/01/00 227.00 14,833.00- IJ01 FRIDAY 08/02/00 216.00 15,049.00- IJ02 SATURDAY 08/03/00 233.00 15,282.00- IJ03 SUNDAY It now looks like this: <table <TR<TD010-30110-00 </TD<TD ALIGN=RIGHT 14,606.00-</TD<TD ALIGN=RIGHTDEBIT</TD<TD COLSPAN=2 ALIGN=RIGHT ..00 </TD<TD COLSPAN=2 ALIGN=RIGHT 8,185.00-</TD<TD ALIGN=RIGHT 22,791.00-</TD<TD ALIGN=RIGHT 13,952-</TD<TD ALIGN=RIGHT 15,549-</TD</TR <TR<TDIND - FIT PACKAGE </TD<TD COLSPAN=2 ALIGN=RIGHTCREDIT</TD<TD COLSPAN=2 ALIGN=RIGHT 8,185.00</TD<TD COLSPAN=6 </TD</TR <TR<TD COLSPAN=10 </TD</TR <TR<TD ALIGN=RIGHTTRX-DATE</TD<TD ALIGN=CENTERDR-AMOUNT</TD<TD COLSPAN=2 ALIGN=CENTERCR-AMOUNT</TD<TD COLSPAN=2 ALIGN=RIGHT RUNNG-BAL </TD<TD ALIGN=CENTERSOURCE</TD<TD COLSPAN=3 ALIGN=CENTERREFERENCE</TD</TR <TR<TD ALIGN=RIGHT08/01/03</TD<TD ALIGN=RIGHT </TD<TD COLSPAN=2 ALIGN=RIGHT 227.00</TD<TD COLSPAN=2 ALIGN=RIGHT 14,833.00-</TD<TD ALIGN=CENTERIJ01 </TD<TD COLSPAN=3 Friday </TD</TR <TR<TD ALIGN=RIGHT08/02/03</TD<TD ALIGN=RIGHT </TD<TD COLSPAN=2 ALIGN=RIGHT 216.00</TD<TD COLSPAN=2 ALIGN=RIGHT 15,049.00-</TD<TD ALIGN=CENTERIJ02 </TD<TD COLSPAN=3 Saturday </TD</TR <TR<TD ALIGN=RIGHT08/03/03</TD<TD ALIGN=RIGHT </TD<TD COLSPAN=2 ALIGN=RIGHT 233.00</TD<TD COLSPAN=2 ALIGN=RIGHT 15,282.00-</TD<TD ALIGN=CENTERIJ03 </TD<TD COLSPAN=3 Sunday </TD</TR </table Any advice on what is the best way to parse data in HTML format? Does VBA have any built-in support for this purpose? The versions of Excel being used are 2000 and 2002 exclusively. Should I bypass Excel altogether and write a VB6 app that converts the html formatted report into the text format that Excel expects? I'm thinking that might be the best bet, in case future software updates to their accounting software make any html formatting changes. In any case, I would still love to know if there is an easier way to parse the html. It looks like all I really need to do is to be able to read the table cell contents in a logical manner. Thanks Steve |
#3
![]()
Posted to microsoft.public.excel.programming,microsoft.public.vb.general.discussion
|
|||
|
|||
![]() "Steve" wrote in message ... I distributed some workbook apps in 2001 that are currently being used for budgeting and forecasting work. A main feature of the apps is the ability to pull current-year data from the proprietary accounting software that the businesses use. The users were able to run a general ledger trial balance, save it in plain text format, and my apps would parse the data and import it into the workbooks. Well, now their accounting software has been updated to export the reports into html format, not plain text. A lot of accounting software these days will export to XML format. See if maybe that is an option. If so you can use MSXML to easily parse the XML. If not, and you are stuck with HTML, the first thing I would do is find the <Table...</Table section and strip it out. Then I would scan through the remainder, and count the number of rows, at the same time I would count the max number of columns. This is easily done by counting the <TR tags for rows, and each time you see a <TR count the number of <TD or <TH tags that show up before the next </TR. Only keep track of the max number of columns. Then redim an array as myArray(0 to numCols - 1, 0 to numRows - 1) Now read throught the data again, and start at row 0, col 0. Each time you hit a new </TD or </TH tag increment the col pointer. Each time you hit a </TR tag increment the row pointer. Read the data between each tag in the corresponding myArray(curCol, curRow). There are probably easier ways to do this. But this is just the method that first came to mind. Andrew Faust |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Stop excel from processing HTML? IE paste raw html. | Excel Discussion (Misc queries) | |||
What is the simplest IPC from Excel VBA to a .NET app? | Excel Discussion (Misc queries) | |||
Generate html Tags from Excel formatted text | Excel Discussion (Misc queries) | |||
Parse row data in Excel | Excel Discussion (Misc queries) | |||
What's the simplest way to copy data from another workbook | Excel Worksheet Functions |