Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1
Default Excel 4.0 parser

Hi,

I have an Excel 4.0 file, generated by a 3rd party software. MS Office
can open it, OpenOffice.org can open it, but if I use some external
tools (ASP+ MS Jet 4 driver, or MS Excel (*.xls) driver)), it doesn't
work. I have triied many demo versions of ASP tools that are supposed to
convert any Excel file to CSV, PDF or so. But all of them said the file
format is not Excel.
Now, I have looked at the BIFF4 file format, and I have managed to write
a script that can parse the excel file.
It is working well, but on one part I am a bit cheating.
I explain. The file contains BIFF4 streamed data. There is a frst part
which contain many headers and descriptions (for fonts, size, ....).
Then comes the cell data, one after the other:
[BOF][big header][cell-1x1][cell-1x2]....[cell-4x2]...[EOF]
Each cell is formatted with header+data that I know how to identify and
translate.
The part I am cheating is that I read&skip the header based on a fixed
size: 722. This is because I have several files in this format and I
have found that the 1st row 1st col cell header starts at the 723rd byte.
I can't find the part of the header that really indicates the beginning
of the cell data. As long as I use the same 3rd party software to
generate the Excel file, it'll be ok, but I would like to be more strict
on the way to find the beginning of the 1st cell.

Thank you for any help on this file format.
(Note: I have found some info on the www.wotsit.org Files Format website)

--
JSC
  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1
Default Excel 4.0 parser


You'll find many of the required information in this PDF document fro
open office : http://sc.openoffice.org/excelfileformat.pdf (1MB).

In essence, you should not access a fix offset for many reasons. Th
first being that the internal Excel file format is inside an OLE strea
whose compound size is variable. And that this offset will becom
meaningless based on many actions done on the Excel file.

In short, you should instead open the OLE stream using the appropriat
WIN32 OLE API, and then you can read Excel records one by one. Eac
record is a 2-byte identifier, followed by a 2-byte length of th
associated buffer, then followed by the buffer itself. Each time th
buffer is over 8228 bytes, special continue records are used. Yo
should easily get access to numbers, if you know how to decode them
But accessing strings should be more difficult since they are shared i
a global dictionary. And that unfolds even more details to worr
about..

--
Stephane Rodrigue

-----------------------------------------------------------------------
Stephane Rodriguez's Profile: http://www.hightechtalks.com/m33
View this thread: http://www.hightechtalks.com/t229359

Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
xml parser in Excel Mark[_36_] Excel Programming 4 February 18th 04 03:16 PM
MS Excel - SQL Parser with VB .NET Marcela[_2_] Excel Programming 0 July 16th 03 07:02 PM


All times are GMT +1. The time now is 01:11 AM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"