Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
Excel 4.0 parser
Hi,
I have an Excel 4.0 file, generated by a 3rd party software. MS Office can open it, OpenOffice.org can open it, but if I use some external tools (ASP+ MS Jet 4 driver, or MS Excel (*.xls) driver)), it doesn't work. I have triied many demo versions of ASP tools that are supposed to convert any Excel file to CSV, PDF or so. But all of them said the file format is not Excel. Now, I have looked at the BIFF4 file format, and I have managed to write a script that can parse the excel file. It is working well, but on one part I am a bit cheating. I explain. The file contains BIFF4 streamed data. There is a frst part which contain many headers and descriptions (for fonts, size, ....). Then comes the cell data, one after the other: [BOF][big header][cell-1x1][cell-1x2]....[cell-4x2]...[EOF] Each cell is formatted with header+data that I know how to identify and translate. The part I am cheating is that I read&skip the header based on a fixed size: 722. This is because I have several files in this format and I have found that the 1st row 1st col cell header starts at the 723rd byte. I can't find the part of the header that really indicates the beginning of the cell data. As long as I use the same 3rd party software to generate the Excel file, it'll be ok, but I would like to be more strict on the way to find the beginning of the 1st cell. Thank you for any help on this file format. (Note: I have found some info on the www.wotsit.org Files Format website) -- JSC |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
Excel 4.0 parser
You'll find many of the required information in this PDF document fro open office : http://sc.openoffice.org/excelfileformat.pdf (1MB). In essence, you should not access a fix offset for many reasons. Th first being that the internal Excel file format is inside an OLE strea whose compound size is variable. And that this offset will becom meaningless based on many actions done on the Excel file. In short, you should instead open the OLE stream using the appropriat WIN32 OLE API, and then you can read Excel records one by one. Eac record is a 2-byte identifier, followed by a 2-byte length of th associated buffer, then followed by the buffer itself. Each time th buffer is over 8228 bytes, special continue records are used. Yo should easily get access to numbers, if you know how to decode them But accessing strings should be more difficult since they are shared i a global dictionary. And that unfolds even more details to worr about.. -- Stephane Rodrigue ----------------------------------------------------------------------- Stephane Rodriguez's Profile: http://www.hightechtalks.com/m33 View this thread: http://www.hightechtalks.com/t229359 |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
xml parser in Excel | Excel Programming | |||
MS Excel - SQL Parser with VB .NET | Excel Programming |