View Single Post
  #1   Report Post  
Posted to microsoft.public.excel.programming
dermot dermot is offline
external usenet poster
 
Posts: 5
Default HOW TO: embed binary data in Excel - an odyssey

Imagine if you could embed binary data within an Excel
file. You could
* encrypt and store confidential data extremely securely
* store compressed data compactly
* include setup files inside a single Excel file
and "unpack" them when running it the first time (useful
for web downloads), eg a little DLL

I'll illustrate the second of these. I had some data I
needed to include in a workbook to use with my code.
* Size of data as a stand-alone CSV file: 670k
* Size of the same data included in an otherwise empty
XLS file: 2,100k (bloatware!)
* Size of the same data compressed and embedded in an XLS
file: just 230k, including 50k of VBA compression code

So how can you embed data in an Excel file? It was an
odyssey for me because Excel is so inconsistent in the
way it stores things.

My first thought was a bitmap, because they get stored
pixel for pixel in the Excel file on disk. I planned to
create an empty bitmap of the right size inside the
workbook, save the workbook, and then overwrite the
picture part of the file on disk, with my data. The
problem was that no matter what I did, Excel wouldn't
leave the bitmap alone. Yes, it stored it all right, but
it kept changing the bits, or inserting chunks of its own
code at intervals throughout the bitmap. Very frustrating.

I looked at other standard "containers" that could
include binary data, considering only standard objects
included with Excel. I didn't find any.

So then I looked at text containers. Most of them, such
as file properties and textboxes, had severe length
limits. However, the userform Textbox control could take
an indefinite string, it seemed. The only problem was
that certain ASCII characters caused the string to
truncate.

So I "remapped" the ASCII strings. I discovered which
characters caused the problem, and replaced them by two
characters - character 1 (which I added to the list of
prohibited characters so it could be used for this
purpose) and the offending character, shifted 20
characters forward to a less offensive character. So for
example, character 10 would be replaced by characters 1
and then 30. When decoding, every time you hit character
1, you skip it and subtract 20 from the next character.
This remapping increased the length of a string by about
3%, quite a small penalty.

After all this effort, it worked fine - except that it
became very, very slow for long strings over 100k. Grrrr..

I tried just about everything, even worksheet
DocumentProperties, which seem to be new with XP. But
would you believe it? Each character is stored as two
characters, the first being a zero, so that doubles the
original length of the string.

Eventually, after lots of testing, I tried cellnotes.
They can take 32,767 bytes each, and they dislike about
25 characters (different ones from the textbox,
naturally), but are fast. Interestingly, when Excel hits
one of the offending characters, it doesn't truncate the
string. Instead, it allocates two characters to ALL of
the original characters, even those which were not
offensive, doubling the string length. So as long as I
remap my data to avoid those ASCII characters, I avoid
doubling up. This remapping adds about 12% to the length
of the string, but that's acceptable.

The result is that using only VBA, you can store binary
data efficiently inside an Excel workbook.

I should add that I have pure VBA compression code which
does a little better than Zip but is of course a lot
slower - so you can do everything with VBA.

I'll post some sample code soon on my site, for anyone
who is interested.

Dermot Balson
Free VBA code for user interfaces, internet connectivity,
encryption
http://www.webace.com.au/~balson/Ins...l/Default.html
Last updated March 2003