ExcelBanter - View Single Post - Importing Text files w/o losing special characters

T_o_n_y

Nick,

Please see my note to Art. I appeciate all the help and I've definitely
learned a thing or two on this one.

-Tony

"NickHK" wrote:

Tony,
To continue using your current, working solution, why not pre-process these
problem files by replacing the nulls with spaces.
If it is only those 6 bytes you need to change, using something with:
Open Yourfile For Binary as #freefile
Replace the "00"s with "20" '&H20=decimal 32, [SPACE]
Close #freefile

NickHK

"T_o_n_y" wrote in message
...
Nick,

The presence of the null characters is important because I'm importing
data
from thousands of files like these using an Excel VBA routine. The data
within these files is not well-organized by today's standards, but it can
be
extracted using string functions such as MID(,,) as long as it is known
where
particular fields begin and end. A series of FORTRAN programs has been
used
for 30-40 years to manipulate these files, and now the company wants to
modernize.

Up until now, my Excel VBA routine has successfully extracted the data
from
thousands of these files so that it can be placed into Excel format.
Except
for certain files; the ones having this problem. Apparently a FORTRAN
programmer inadvertantly placed the null characters into certain of these
files while developing certain aspects of the programs.

If the null characters were spaces instead of null characters there would
be
little problem since the MID(,,) function would still collect the
following
fields in the string correctly. But the null characters being stripped
away
means that everything past that point gets offset by 6 characters, causing
things like string conversion to doubles, for example, to cause the
program
to crash.

If I can't find a way to keep Excel from stripping the nulls, I suppose
I'll
work out an error trapping routine that tells the user to manually fix
that
file. I just hope it doesn't amount to hundreds of files!

-Tony

"NickHK" wrote:

Tony,
Probably, but what use are 6 x null characters ?
Do they actually mean anything to you ?
Or are they some garbage that your Fortran app outputs ?

NickHK

"T_o_n_y" wrote in message
...
Nick,

What you've written makes sense to me, and that is why I've had
trouble
concluding that it is a unicode file. As I wrote in an earlier post,
I'm
leaning toward it being a text file that has 6 Ascii(0) characters
inserted
into it.

The question remains: is there a way to have Excel import these
characters
rather than completely ignoring them?

-Tony

"NickHK" wrote:

Tony,
The "special characters" I see in your uploaded file are Asc(0). It
is
not a
Unicode file.

As for the "0" for every other cell, that is expected if you have
all
ANSI
text stored in a UNICODE format. The lower byte will always be 0 as
no
values exceed decimal 255 or FF hex.

So do you have a Unicode file or not ?

NickHK

"T_o_n_y" wrote in message
...
Tom,

I'm perplexed at your response because, I'm about as far from
ignoring
your
posts as possible. Indeed, I generally skip directly to your
posts
when
on
this newsgroup since they are more helpful than anyone's,
containing
actual
sample code that can be used.

It's just that the output from the macro you sent me led me to the
conclusion that you were mistaken this time. The output shows "0"
for
every
other cell which is not what I would expect from UNICODE with 2
bytes
per
character. Furthermore, rather than revealing the presence of the
special
characters, your macro also had them stripped away.

Here's what I mean. The file I've uploaded contains the following
in
the
first line,

$$158++yyyyyy++1++8++4.50 etc...

I've substituting + for spaces and y for the special characters
above.
As
you can see there are 2 spaces followed by 6 special characters
followed
by 2
spaces. The output from your macro completely omits the 6 special
characters, if I'm reading it correctly.

As I wrote, I spent "anoter few hours" researching into UNICODE in
order
to
investigate the possibility you raised...but nothing I found
seemed to
confirm it. In addition, Excel has two different UNICODE types
(UTF-8
and
UTF-7) which one can select in the text import wizard. I tried
both
of
them
and neither gave me success in importing the special characters,
as
judged
by
using c pearson's CellView add-in, which allows character by
character
visualization of cell contents.

Thank you again for your help,
-Tony

"Tom Ogilvy" wrote:

Guess it was a waste of time trying to explain it to you. Did
you
bother
to
read it?

--
Regards,
Tom Ogilvy

"T_o_n_y" wrote in message
...
I tried your macro, but unfortunately Excel still did not
import
the
special
characters. Recall that there are 6 special characters
between
the
$$158
and
the 1 8 in the first line of the file:
$$158 1 8 4.50 1.0000 0.8000 3.0010
1.5740

For that section, the output from your macro looked like this:
36 = $
0 =
36 = $
0 =
49 = 1
0 =
53 = 5
0 =
56 = 8
0 =
32 =
0 =
32 =
0 =
32 =
0 =
32 =
0 =
49 = 1
0 =
32 =
0 =
32 =
0 =
56 = 8

In other words, the 6 characters got stripped away again so
that
all
you
see
are the 2 spaces which appear on either side of the 6 special
characters.

The only way I've found for Excel to even recognize that those
characters
exist is to use the "Delimited" option during text import and
specify
"spaces" as the delimiting character with the "Treat
consecutive
delimiters
as one" feature unchecked. Unfortunately, that method of
importing
would
mean a huge rework of my existing code.

I spent another few hours trying to research the UNICODE
possibilty
you
mentioned, but still was unable to come up with anything.

At a loss...
-Tony

"Tom Ogilvy" wrote:

put this in a workbook. Change the path to point to your
file:

Sub ReadStraightTextFile()
Dim strTest As String
Dim bytArray() As Byte
Dim intcount As Integer
Dim col As Long
Open "E:\Data1\W158.DAT" For Input As #1
col = 0
Do While Not EOF(1)
Line Input #1, strTest
col = col + 1
bytArray = strTest
i = 0
For intcount = LBound(bytArray) To UBound(bytArray)
i = i + 1
Cells(i, col) = bytArray(intcount) & " = " &
Chr(bytArray(intcount))
Next

Loop 'Close the file

Close #1
End Sub

Have blank sheet as the activesheet. Run the macro.

It appears to me that the file is UNICODE. unlike an ascii
file
that
has
one byte per character, a unicode file has two bytes per
character.

there are 8 bits to a byte, so an ascii file can have 8^2 =
256
different/unique character codes. In a unicode file, 2
bytes is
16
bits,
so 2^16 = 65536 possible unique characters.

I didn't see any actual characters that couldn't be
represented
by
Ascii,
so
you could read every Odd character .