Home |
Search |
Today's Posts |
#1
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
I need to import text files into Excel without losing special characters.
I've tried several methods, but each time Excel imports in the file, ignoring those characters. The following is an example line, but what you can't see are the 6 special characters which appear between the $$158 and the 1 8! $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 I know they are there, however since I opened the document using Word, which displays them as a y with 2 dots above them. My Excel VBA code needs to import these characters so that it doesn't get lost when extracting the data using MID(,,,) function. The text file were generated using old FORTRAN programs, and there are thousands of them...my VBA routines need to access these files in order to modernize our system. Examples of what I've tried (all of these ignore the y characters) Workbooks.OpenText Filename:=fname, Origin:=437, _ StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2) With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname, Destination:=Cells(2, Col)) Open FName For Input Access Read As #1 While Not EOF(1) Line Input #1, WholeLine Cells(RowNdx, ColNdx).Value = WholeLine RowNdx = RowNdx + 1 Wend Close #1 I would upload an example file showing the characters if someone tells me how. I would also tell you what the characters are, again, if someone tells me how. Thanks, Tony |
#2
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
put your string in cell A1. Then in B1 or another cell in the first row put
in this formula =CODE(MID($A$1,ROW(),1)) Assume the above formula is in B1 in C1: =CHAR(B1) now select B1:C1 and drag fill down until the formula starts returning #Value errors. The only thing between the characters in your post are ascii code 32 which is a space. Possibly they didn't get carried forward in the email. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I need to import text files into Excel without losing special characters. I've tried several methods, but each time Excel imports in the file, ignoring those characters. The following is an example line, but what you can't see are the 6 special characters which appear between the $$158 and the 1 8! $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 I know they are there, however since I opened the document using Word, which displays them as a y with 2 dots above them. My Excel VBA code needs to import these characters so that it doesn't get lost when extracting the data using MID(,,,) function. The text file were generated using old FORTRAN programs, and there are thousands of them...my VBA routines need to access these files in order to modernize our system. Examples of what I've tried (all of these ignore the y characters) Workbooks.OpenText Filename:=fname, Origin:=437, _ StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2) With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname, Destination:=Cells(2, Col)) Open FName For Input Access Read As #1 While Not EOF(1) Line Input #1, WholeLine Cells(RowNdx, ColNdx).Value = WholeLine RowNdx = RowNdx + 1 Wend Close #1 I would upload an example file showing the characters if someone tells me how. I would also tell you what the characters are, again, if someone tells me how. Thanks, Tony |
#3
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Tony,
Are the columns fixed width, as the parameter dataType:=xlFixedWidth suggests ? Are these special characters, ASCII value 255 ? As for uploading a samle of the data, http://savefile.com/ NickHK "T_o_n_y" wrote in message ... I need to import text files into Excel without losing special characters. I've tried several methods, but each time Excel imports in the file, ignoring those characters. The following is an example line, but what you can't see are the 6 special characters which appear between the $$158 and the 1 8! $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 I know they are there, however since I opened the document using Word, which displays them as a y with 2 dots above them. My Excel VBA code needs to import these characters so that it doesn't get lost when extracting the data using MID(,,,) function. The text file were generated using old FORTRAN programs, and there are thousands of them...my VBA routines need to access these files in order to modernize our system. Examples of what I've tried (all of these ignore the y characters) Workbooks.OpenText Filename:=fname, Origin:=437, _ StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2) With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname, Destination:=Cells(2, Col)) Open FName For Input Access Read As #1 While Not EOF(1) Line Input #1, WholeLine Cells(RowNdx, ColNdx).Value = WholeLine RowNdx = RowNdx + 1 Wend Close #1 I would upload an example file showing the characters if someone tells me how. I would also tell you what the characters are, again, if someone tells me how. Thanks, Tony |
#4
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Tom,
Thank you for your reply. I followed your procedure but only got four "32" s in that blank section; that is, there are only 4 spaces there. This confirms what I've suspected, namely, that Excel is simply not importing those characters. I've also tried using C. Pearson's Cell View Add-in with the same result (http://www.cpearson.com/excel/CellView.htm). As you point out, the characters also get stripped when I cut and paste into this forum. Therefore, I've emailed you separately the file I referred to as an attachment (it's a text document called W158.DAT) sent from myother_acct. If I knew how to post it to this forum, I would. I appreaciate your help...this is a frustrating problem for me. Is there a way to import the text file character by character? -Tony "Tom Ogilvy" wrote: put your string in cell A1. Then in B1 or another cell in the first row put in this formula =CODE(MID($A$1,ROW(),1)) Assume the above formula is in B1 in C1: =CHAR(B1) now select B1:C1 and drag fill down until the formula starts returning #Value errors. The only thing between the characters in your post are ascii code 32 which is a space. Possibly they didn't get carried forward in the email. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I need to import text files into Excel without losing special characters. I've tried several methods, but each time Excel imports in the file, ignoring those characters. The following is an example line, but what you can't see are the 6 special characters which appear between the $$158 and the 1 8! $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 I know they are there, however since I opened the document using Word, which displays them as a y with 2 dots above them. My Excel VBA code needs to import these characters so that it doesn't get lost when extracting the data using MID(,,,) function. The text file were generated using old FORTRAN programs, and there are thousands of them...my VBA routines need to access these files in order to modernize our system. Examples of what I've tried (all of these ignore the y characters) Workbooks.OpenText Filename:=fname, Origin:=437, _ StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2) With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname, Destination:=Cells(2, Col)) Open FName For Input Access Read As #1 While Not EOF(1) Line Input #1, WholeLine Cells(RowNdx, ColNdx).Value = WholeLine RowNdx = RowNdx + 1 Wend Close #1 I would upload an example file showing the characters if someone tells me how. I would also tell you what the characters are, again, if someone tells me how. Thanks, Tony |
#5
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Thank you, Nick, for showing me how to upload the file. Here is the link for
the file: http://www.savefile.com/files/150039 To answer your question, the text file data is not fixed with but I've tried fixed width as well as delimited...nothing works for me...the characters are always stripped away. "NickHK" wrote: Tony, Are the columns fixed width, as the parameter dataType:=xlFixedWidth suggests ? Are these special characters, ASCII value 255 ? As for uploading a samle of the data, http://savefile.com/ NickHK "T_o_n_y" wrote in message ... I need to import text files into Excel without losing special characters. I've tried several methods, but each time Excel imports in the file, ignoring those characters. The following is an example line, but what you can't see are the 6 special characters which appear between the $$158 and the 1 8! $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 I know they are there, however since I opened the document using Word, which displays them as a y with 2 dots above them. My Excel VBA code needs to import these characters so that it doesn't get lost when extracting the data using MID(,,,) function. The text file were generated using old FORTRAN programs, and there are thousands of them...my VBA routines need to access these files in order to modernize our system. Examples of what I've tried (all of these ignore the y characters) Workbooks.OpenText Filename:=fname, Origin:=437, _ StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2) With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname, Destination:=Cells(2, Col)) Open FName For Input Access Read As #1 While Not EOF(1) Line Input #1, WholeLine Cells(RowNdx, ColNdx).Value = WholeLine RowNdx = RowNdx + 1 Wend Close #1 I would upload an example file showing the characters if someone tells me how. I would also tell you what the characters are, again, if someone tells me how. Thanks, Tony |
#6
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Thank you for showing me how to upload the file.
Here is the link: http://www.savefile.com/files/150039 To answer your question, the files are not fixed width, although I have tried that option my attempts to import. No matter what I try, fixed width, delimited, etc... Excel always seems to strip those characters away and all I'm left with are the spaces around the special characters...but as I said, my code needs to count those special characters in order for the string manipulations to properly gather data. Thanks again, Tony "NickHK" wrote: Tony, Are the columns fixed width, as the parameter dataType:=xlFixedWidth suggests ? Are these special characters, ASCII value 255 ? As for uploading a samle of the data, http://savefile.com/ NickHK "T_o_n_y" wrote in message ... I need to import text files into Excel without losing special characters. I've tried several methods, but each time Excel imports in the file, ignoring those characters. The following is an example line, but what you can't see are the 6 special characters which appear between the $$158 and the 1 8! $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 I know they are there, however since I opened the document using Word, which displays them as a y with 2 dots above them. My Excel VBA code needs to import these characters so that it doesn't get lost when extracting the data using MID(,,,) function. The text file were generated using old FORTRAN programs, and there are thousands of them...my VBA routines need to access these files in order to modernize our system. Examples of what I've tried (all of these ignore the y characters) Workbooks.OpenText Filename:=fname, Origin:=437, _ StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2) With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname, Destination:=Cells(2, Col)) Open FName For Input Access Read As #1 While Not EOF(1) Line Input #1, WholeLine Cells(RowNdx, ColNdx).Value = WholeLine RowNdx = RowNdx + 1 Wend Close #1 I would upload an example file showing the characters if someone tells me how. I would also tell you what the characters are, again, if someone tells me how. Thanks, Tony |
#7
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
put this in a workbook. Change the path to point to your file:
Sub ReadStraightTextFile() Dim strTest As String Dim bytArray() As Byte Dim intcount As Integer Dim col As Long Open "E:\Data1\W158.DAT" For Input As #1 col = 0 Do While Not EOF(1) Line Input #1, strTest col = col + 1 bytArray = strTest i = 0 For intcount = LBound(bytArray) To UBound(bytArray) i = i + 1 Cells(i, col) = bytArray(intcount) & " = " & Chr(bytArray(intcount)) Next Loop 'Close the file Close #1 End Sub Have blank sheet as the activesheet. Run the macro. It appears to me that the file is UNICODE. unlike an ascii file that has one byte per character, a unicode file has two bytes per character. there are 8 bits to a byte, so an ascii file can have 8^2 = 256 different/unique character codes. In a unicode file, 2 bytes is 16 bits, so 2^16 = 65536 possible unique characters. I didn't see any actual characters that couldn't be represented by Ascii, so you could read every Odd character . It appears that opening it in Excel automatically converts it to Ascii, so you haven't lost any information, but if you want to edit it and write it back out, you would need to save it as Unicode Text. I know that is an option in at least xl2000 and I assume later. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... Tom, Thank you for your reply. I followed your procedure but only got four "32" s in that blank section; that is, there are only 4 spaces there. This confirms what I've suspected, namely, that Excel is simply not importing those characters. I've also tried using C. Pearson's Cell View Add-in with the same result (http://www.cpearson.com/excel/CellView.htm). As you point out, the characters also get stripped when I cut and paste into this forum. Therefore, I've emailed you separately the file I referred to as an attachment (it's a text document called W158.DAT) sent from myother_acct. If I knew how to post it to this forum, I would. I appreaciate your help...this is a frustrating problem for me. Is there a way to import the text file character by character? -Tony "Tom Ogilvy" wrote: put your string in cell A1. Then in B1 or another cell in the first row put in this formula =CODE(MID($A$1,ROW(),1)) Assume the above formula is in B1 in C1: =CHAR(B1) now select B1:C1 and drag fill down until the formula starts returning #Value errors. The only thing between the characters in your post are ascii code 32 which is a space. Possibly they didn't get carried forward in the email. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I need to import text files into Excel without losing special characters. I've tried several methods, but each time Excel imports in the file, ignoring those characters. The following is an example line, but what you can't see are the 6 special characters which appear between the $$158 and the 1 8! $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 I know they are there, however since I opened the document using Word, which displays them as a y with 2 dots above them. My Excel VBA code needs to import these characters so that it doesn't get lost when extracting the data using MID(,,,) function. The text file were generated using old FORTRAN programs, and there are thousands of them...my VBA routines need to access these files in order to modernize our system. Examples of what I've tried (all of these ignore the y characters) Workbooks.OpenText Filename:=fname, Origin:=437, _ StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2) With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname, Destination:=Cells(2, Col)) Open FName For Input Access Read As #1 While Not EOF(1) Line Input #1, WholeLine Cells(RowNdx, ColNdx).Value = WholeLine RowNdx = RowNdx + 1 Wend Close #1 I would upload an example file showing the characters if someone tells me how. I would also tell you what the characters are, again, if someone tells me how. Thanks, Tony |
#8
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Tony,
Looking the first 16 bytes of your file in a Hex editor, you have: 24 24 31 35 38 20 20 00 00 00 00 00 00 20 20 31 The "20"=decimal 32, which is a normal space. The "00" are = vbNullChar and are not printable, I get a "" in Word. Not sure where your "y with 2 dots above them" came from. As such, do they actually mean anything ? Your data seems such a mix of formats, I doubt Excel would be able to make much sense of it. You would be better writing your own parsing routine : <Air Code Open "the file" For Input Do until EndOfFile LineInput ToAVariable Call DecideWhatThisLineMeans 'Process the data Loop Close file NickHK "T_o_n_y" wrote in message ... Thank you, Nick, for showing me how to upload the file. Here is the link for the file: http://www.savefile.com/files/150039 To answer your question, the text file data is not fixed with but I've tried fixed width as well as delimited...nothing works for me...the characters are always stripped away. "NickHK" wrote: Tony, Are the columns fixed width, as the parameter dataType:=xlFixedWidth suggests ? Are these special characters, ASCII value 255 ? As for uploading a samle of the data, http://savefile.com/ NickHK "T_o_n_y" wrote in message ... I need to import text files into Excel without losing special characters. I've tried several methods, but each time Excel imports in the file, ignoring those characters. The following is an example line, but what you can't see are the 6 special characters which appear between the $$158 and the 1 8! $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 I know they are there, however since I opened the document using Word, which displays them as a y with 2 dots above them. My Excel VBA code needs to import these characters so that it doesn't get lost when extracting the data using MID(,,,) function. The text file were generated using old FORTRAN programs, and there are thousands of them...my VBA routines need to access these files in order to modernize our system. Examples of what I've tried (all of these ignore the y characters) Workbooks.OpenText Filename:=fname, Origin:=437, _ StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2) With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname, Destination:=Cells(2, Col)) Open FName For Input Access Read As #1 While Not EOF(1) Line Input #1, WholeLine Cells(RowNdx, ColNdx).Value = WholeLine RowNdx = RowNdx + 1 Wend Close #1 I would upload an example file showing the characters if someone tells me how. I would also tell you what the characters are, again, if someone tells me how. Thanks, Tony |
#9
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
I tried your macro, but unfortunately Excel still did not import the special
characters. Recall that there are 6 special characters between the $$158 and the 1 8 in the first line of the file: $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 For that section, the output from your macro looked like this: 36 = $ 0 = 36 = $ 0 = 49 = 1 0 = 53 = 5 0 = 56 = 8 0 = 32 = 0 = 32 = 0 = 32 = 0 = 32 = 0 = 49 = 1 0 = 32 = 0 = 32 = 0 = 56 = 8 In other words, the 6 characters got stripped away again so that all you see are the 2 spaces which appear on either side of the 6 special characters. The only way I've found for Excel to even recognize that those characters exist is to use the "Delimited" option during text import and specify "spaces" as the delimiting character with the "Treat consecutive delimiters as one" feature unchecked. Unfortunately, that method of importing would mean a huge rework of my existing code. I spent another few hours trying to research the UNICODE possibilty you mentioned, but still was unable to come up with anything. At a loss... -Tony "Tom Ogilvy" wrote: put this in a workbook. Change the path to point to your file: Sub ReadStraightTextFile() Dim strTest As String Dim bytArray() As Byte Dim intcount As Integer Dim col As Long Open "E:\Data1\W158.DAT" For Input As #1 col = 0 Do While Not EOF(1) Line Input #1, strTest col = col + 1 bytArray = strTest i = 0 For intcount = LBound(bytArray) To UBound(bytArray) i = i + 1 Cells(i, col) = bytArray(intcount) & " = " & Chr(bytArray(intcount)) Next Loop 'Close the file Close #1 End Sub Have blank sheet as the activesheet. Run the macro. It appears to me that the file is UNICODE. unlike an ascii file that has one byte per character, a unicode file has two bytes per character. there are 8 bits to a byte, so an ascii file can have 8^2 = 256 different/unique character codes. In a unicode file, 2 bytes is 16 bits, so 2^16 = 65536 possible unique characters. I didn't see any actual characters that couldn't be represented by Ascii, so you could read every Odd character . It appears that opening it in Excel automatically converts it to Ascii, so you haven't lost any information, but if you want to edit it and write it back out, you would need to save it as Unicode Text. I know that is an option in at least xl2000 and I assume later. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... Tom, Thank you for your reply. I followed your procedure but only got four "32" s in that blank section; that is, there are only 4 spaces there. This confirms what I've suspected, namely, that Excel is simply not importing those characters. I've also tried using C. Pearson's Cell View Add-in with the same result (http://www.cpearson.com/excel/CellView.htm). As you point out, the characters also get stripped when I cut and paste into this forum. Therefore, I've emailed you separately the file I referred to as an attachment (it's a text document called W158.DAT) sent from myother_acct. If I knew how to post it to this forum, I would. I appreaciate your help...this is a frustrating problem for me. Is there a way to import the text file character by character? -Tony "Tom Ogilvy" wrote: put your string in cell A1. Then in B1 or another cell in the first row put in this formula =CODE(MID($A$1,ROW(),1)) Assume the above formula is in B1 in C1: =CHAR(B1) now select B1:C1 and drag fill down until the formula starts returning #Value errors. The only thing between the characters in your post are ascii code 32 which is a space. Possibly they didn't get carried forward in the email. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I need to import text files into Excel without losing special characters. I've tried several methods, but each time Excel imports in the file, ignoring those characters. The following is an example line, but what you can't see are the 6 special characters which appear between the $$158 and the 1 8! $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 I know they are there, however since I opened the document using Word, which displays them as a y with 2 dots above them. My Excel VBA code needs to import these characters so that it doesn't get lost when extracting the data using MID(,,,) function. The text file were generated using old FORTRAN programs, and there are thousands of them...my VBA routines need to access these files in order to modernize our system. Examples of what I've tried (all of these ignore the y characters) Workbooks.OpenText Filename:=fname, Origin:=437, _ StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2) With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname, Destination:=Cells(2, Col)) Open FName For Input Access Read As #1 While Not EOF(1) Line Input #1, WholeLine Cells(RowNdx, ColNdx).Value = WholeLine RowNdx = RowNdx + 1 Wend Close #1 I would upload an example file showing the characters if someone tells me how. I would also tell you what the characters are, again, if someone tells me how. Thanks, Tony |
#10
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Tony,
Please send your file directly to me via an attachment. I think I can help. T_o_n_y wrote: I tried your macro, but unfortunately Excel still did not import the special characters. Recall that there are 6 special characters between the $$158 and the 1 8 in the first line of the file: $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 For that section, the output from your macro looked like this: 36 = $ 0 = 36 = $ 0 = 49 = 1 0 = 53 = 5 0 = 56 = 8 0 = 32 = 0 = 32 = 0 = 32 = 0 = 32 = 0 = 49 = 1 0 = 32 = 0 = 32 = 0 = 56 = 8 In other words, the 6 characters got stripped away again so that all you see are the 2 spaces which appear on either side of the 6 special characters. The only way I've found for Excel to even recognize that those characters exist is to use the "Delimited" option during text import and specify "spaces" as the delimiting character with the "Treat consecutive delimiters as one" feature unchecked. Unfortunately, that method of importing would mean a huge rework of my existing code. I spent another few hours trying to research the UNICODE possibilty you mentioned, but still was unable to come up with anything. At a loss... -Tony "Tom Ogilvy" wrote: put this in a workbook. Change the path to point to your file: Sub ReadStraightTextFile() Dim strTest As String Dim bytArray() As Byte Dim intcount As Integer Dim col As Long Open "E:\Data1\W158.DAT" For Input As #1 col = 0 Do While Not EOF(1) Line Input #1, strTest col = col + 1 bytArray = strTest i = 0 For intcount = LBound(bytArray) To UBound(bytArray) i = i + 1 Cells(i, col) = bytArray(intcount) & " = " & Chr(bytArray(intcount)) Next Loop 'Close the file Close #1 End Sub Have blank sheet as the activesheet. Run the macro. It appears to me that the file is UNICODE. unlike an ascii file that has one byte per character, a unicode file has two bytes per character. there are 8 bits to a byte, so an ascii file can have 8^2 = 256 different/unique character codes. In a unicode file, 2 bytes is 16 bits, so 2^16 = 65536 possible unique characters. I didn't see any actual characters that couldn't be represented by Ascii, so you could read every Odd character . It appears that opening it in Excel automatically converts it to Ascii, so you haven't lost any information, but if you want to edit it and write it back out, you would need to save it as Unicode Text. I know that is an option in at least xl2000 and I assume later. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... Tom, Thank you for your reply. I followed your procedure but only got four "32" s in that blank section; that is, there are only 4 spaces there. This confirms what I've suspected, namely, that Excel is simply not importing those characters. I've also tried using C. Pearson's Cell View Add-in with the same result (http://www.cpearson.com/excel/CellView.htm). As you point out, the characters also get stripped when I cut and paste into this forum. Therefore, I've emailed you separately the file I referred to as an attachment (it's a text document called W158.DAT) sent from myother_acct. If I knew how to post it to this forum, I would. I appreaciate your help...this is a frustrating problem for me. Is there a way to import the text file character by character? -Tony "Tom Ogilvy" wrote: put your string in cell A1. Then in B1 or another cell in the first row put in this formula =CODE(MID($A$1,ROW(),1)) Assume the above formula is in B1 in C1: =CHAR(B1) now select B1:C1 and drag fill down until the formula starts returning #Value errors. The only thing between the characters in your post are ascii code 32 which is a space. Possibly they didn't get carried forward in the email. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I need to import text files into Excel without losing special characters. I've tried several methods, but each time Excel imports in the file, ignoring those characters. The following is an example line, but what you can't see are the 6 special characters which appear between the $$158 and the 1 8! $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 I know they are there, however since I opened the document using Word, which displays them as a y with 2 dots above them. My Excel VBA code needs to import these characters so that it doesn't get lost when extracting the data using MID(,,,) function. The text file were generated using old FORTRAN programs, and there are thousands of them...my VBA routines need to access these files in order to modernize our system. Examples of what I've tried (all of these ignore the y characters) Workbooks.OpenText Filename:=fname, Origin:=437, _ StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2) With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname, Destination:=Cells(2, Col)) Open FName For Input Access Read As #1 While Not EOF(1) Line Input #1, WholeLine Cells(RowNdx, ColNdx).Value = WholeLine RowNdx = RowNdx + 1 Wend Close #1 I would upload an example file showing the characters if someone tells me how. I would also tell you what the characters are, again, if someone tells me how. Thanks, Tony |
#11
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Art,
I've done as you requested, although the file is also available from the link I posted earlier, i.e., http://www.savefile.com/files/150039 I've concluded that the special characters are most likely null characters (i.e., ascii 000), but I'm still not 100% sure. If so, it appears Excel strips away null characters when importing into cells. Although it would be best if I could somehow import these characters into an Excel cell or string variable, a work around for me would be to search for null characters in multiple text files and replace them with spaces. If anyone knows how to do this, I'd be interested. I have a program called "Advanced Search and Replace" with which I've tried to do this, but without success...again making me wonder if the characters are, in fact, ascii 000. I appreaciate the continued help. -Tony "Art H" wrote: Tony, Please send your file directly to me via an attachment. I think I can help. T_o_n_y wrote: I tried your macro, but unfortunately Excel still did not import the special characters. Recall that there are 6 special characters between the $$158 and the 1 8 in the first line of the file: $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 For that section, the output from your macro looked like this: 36 = $ 0 = 36 = $ 0 = 49 = 1 0 = 53 = 5 0 = 56 = 8 0 = 32 = 0 = 32 = 0 = 32 = 0 = 32 = 0 = 49 = 1 0 = 32 = 0 = 32 = 0 = 56 = 8 In other words, the 6 characters got stripped away again so that all you see are the 2 spaces which appear on either side of the 6 special characters. The only way I've found for Excel to even recognize that those characters exist is to use the "Delimited" option during text import and specify "spaces" as the delimiting character with the "Treat consecutive delimiters as one" feature unchecked. Unfortunately, that method of importing would mean a huge rework of my existing code. I spent another few hours trying to research the UNICODE possibilty you mentioned, but still was unable to come up with anything. At a loss... -Tony "Tom Ogilvy" wrote: put this in a workbook. Change the path to point to your file: Sub ReadStraightTextFile() Dim strTest As String Dim bytArray() As Byte Dim intcount As Integer Dim col As Long Open "E:\Data1\W158.DAT" For Input As #1 col = 0 Do While Not EOF(1) Line Input #1, strTest col = col + 1 bytArray = strTest i = 0 For intcount = LBound(bytArray) To UBound(bytArray) i = i + 1 Cells(i, col) = bytArray(intcount) & " = " & Chr(bytArray(intcount)) Next Loop 'Close the file Close #1 End Sub Have blank sheet as the activesheet. Run the macro. It appears to me that the file is UNICODE. unlike an ascii file that has one byte per character, a unicode file has two bytes per character. there are 8 bits to a byte, so an ascii file can have 8^2 = 256 different/unique character codes. In a unicode file, 2 bytes is 16 bits, so 2^16 = 65536 possible unique characters. I didn't see any actual characters that couldn't be represented by Ascii, so you could read every Odd character . It appears that opening it in Excel automatically converts it to Ascii, so you haven't lost any information, but if you want to edit it and write it back out, you would need to save it as Unicode Text. I know that is an option in at least xl2000 and I assume later. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... Tom, Thank you for your reply. I followed your procedure but only got four "32" s in that blank section; that is, there are only 4 spaces there. This confirms what I've suspected, namely, that Excel is simply not importing those characters. I've also tried using C. Pearson's Cell View Add-in with the same result (http://www.cpearson.com/excel/CellView.htm). As you point out, the characters also get stripped when I cut and paste into this forum. Therefore, I've emailed you separately the file I referred to as an attachment (it's a text document called W158.DAT) sent from myother_acct. If I knew how to post it to this forum, I would. I appreaciate your help...this is a frustrating problem for me. Is there a way to import the text file character by character? -Tony "Tom Ogilvy" wrote: put your string in cell A1. Then in B1 or another cell in the first row put in this formula =CODE(MID($A$1,ROW(),1)) Assume the above formula is in B1 in C1: =CHAR(B1) now select B1:C1 and drag fill down until the formula starts returning #Value errors. The only thing between the characters in your post are ascii code 32 which is a space. Possibly they didn't get carried forward in the email. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I need to import text files into Excel without losing special characters. I've tried several methods, but each time Excel imports in the file, ignoring those characters. The following is an example line, but what you can't see are the 6 special characters which appear between the $$158 and the 1 8! $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 I know they are there, however since I opened the document using Word, which displays them as a y with 2 dots above them. My Excel VBA code needs to import these characters so that it doesn't get lost when extracting the data using MID(,,,) function. The text file were generated using old FORTRAN programs, and there are thousands of them...my VBA routines need to access these files in order to modernize our system. Examples of what I've tried (all of these ignore the y characters) Workbooks.OpenText Filename:=fname, Origin:=437, _ StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2) With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname, Destination:=Cells(2, Col)) Open FName For Input Access Read As #1 While Not EOF(1) Line Input #1, WholeLine Cells(RowNdx, ColNdx).Value = WholeLine RowNdx = RowNdx + 1 Wend Close #1 I would upload an example file showing the characters if someone tells me how. I would also tell you what the characters are, again, if someone tells me how. Thanks, Tony |
#12
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Tony,
Thanks for sending the file directly to me. I just wanted to make sure the file at http://www.savefile.com/files/150039 did not get translated somehow. I have another request. Please send me a screen shot of seeing "a y with 2 dots above" the y's. Art T_o_n_y wrote: Art, I've done as you requested, although the file is also available from the link I posted earlier, i.e., http://www.savefile.com/files/150039 I've concluded that the special characters are most likely null characters (i.e., ascii 000), but I'm still not 100% sure. If so, it appears Excel strips away null characters when importing into cells. Although it would be best if I could somehow import these characters into an Excel cell or string variable, a work around for me would be to search for null characters in multiple text files and replace them with spaces. If anyone knows how to do this, I'd be interested. I have a program called "Advanced Search and Replace" with which I've tried to do this, but without success...again making me wonder if the characters are, in fact, ascii 000. I appreaciate the continued help. -Tony "Art H" wrote: Tony, Please send your file directly to me via an attachment. I think I can help. T_o_n_y wrote: I tried your macro, but unfortunately Excel still did not import the special characters. Recall that there are 6 special characters between the $$158 and the 1 8 in the first line of the file: $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 For that section, the output from your macro looked like this: 36 = $ 0 = 36 = $ 0 = 49 = 1 0 = 53 = 5 0 = 56 = 8 0 = 32 = 0 = 32 = 0 = 32 = 0 = 32 = 0 = 49 = 1 0 = 32 = 0 = 32 = 0 = 56 = 8 In other words, the 6 characters got stripped away again so that all you see are the 2 spaces which appear on either side of the 6 special characters. The only way I've found for Excel to even recognize that those characters exist is to use the "Delimited" option during text import and specify "spaces" as the delimiting character with the "Treat consecutive delimiters as one" feature unchecked. Unfortunately, that method of importing would mean a huge rework of my existing code. I spent another few hours trying to research the UNICODE possibilty you mentioned, but still was unable to come up with anything. At a loss... -Tony "Tom Ogilvy" wrote: put this in a workbook. Change the path to point to your file: Sub ReadStraightTextFile() Dim strTest As String Dim bytArray() As Byte Dim intcount As Integer Dim col As Long Open "E:\Data1\W158.DAT" For Input As #1 col = 0 Do While Not EOF(1) Line Input #1, strTest col = col + 1 bytArray = strTest i = 0 For intcount = LBound(bytArray) To UBound(bytArray) i = i + 1 Cells(i, col) = bytArray(intcount) & " = " & Chr(bytArray(intcount)) Next Loop 'Close the file Close #1 End Sub Have blank sheet as the activesheet. Run the macro. It appears to me that the file is UNICODE. unlike an ascii file that has one byte per character, a unicode file has two bytes per character. there are 8 bits to a byte, so an ascii file can have 8^2 = 256 different/unique character codes. In a unicode file, 2 bytes is 16 bits, so 2^16 = 65536 possible unique characters. I didn't see any actual characters that couldn't be represented by Ascii, so you could read every Odd character . It appears that opening it in Excel automatically converts it to Ascii, so you haven't lost any information, but if you want to edit it and write it back out, you would need to save it as Unicode Text. I know that is an option in at least xl2000 and I assume later. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... Tom, Thank you for your reply. I followed your procedure but only got four "32" s in that blank section; that is, there are only 4 spaces there. This confirms what I've suspected, namely, that Excel is simply not importing those characters. I've also tried using C. Pearson's Cell View Add-in with the same result (http://www.cpearson.com/excel/CellView.htm). As you point out, the characters also get stripped when I cut and paste into this forum. Therefore, I've emailed you separately the file I referred to as an attachment (it's a text document called W158.DAT) sent from myother_acct. If I knew how to post it to this forum, I would. I appreaciate your help...this is a frustrating problem for me. Is there a way to import the text file character by character? -Tony "Tom Ogilvy" wrote: put your string in cell A1. Then in B1 or another cell in the first row put in this formula =CODE(MID($A$1,ROW(),1)) Assume the above formula is in B1 in C1: =CHAR(B1) now select B1:C1 and drag fill down until the formula starts returning #Value errors. The only thing between the characters in your post are ascii code 32 which is a space. Possibly they didn't get carried forward in the email. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I need to import text files into Excel without losing special characters. I've tried several methods, but each time Excel imports in the file, ignoring those characters. The following is an example line, but what you can't see are the 6 special characters which appear between the $$158 and the 1 8! $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 I know they are there, however since I opened the document using Word, which displays them as a y with 2 dots above them. My Excel VBA code needs to import these characters so that it doesn't get lost when extracting the data using MID(,,,) function. The text file were generated using old FORTRAN programs, and there are thousands of them...my VBA routines need to access these files in order to modernize our system. Examples of what I've tried (all of these ignore the y characters) Workbooks.OpenText Filename:=fname, Origin:=437, _ StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2) With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname, Destination:=Cells(2, Col)) Open FName For Input Access Read As #1 While Not EOF(1) Line Input #1, WholeLine Cells(RowNdx, ColNdx).Value = WholeLine RowNdx = RowNdx + 1 Wend Close #1 I would upload an example file showing the characters if someone tells me how. I would also tell you what the characters are, again, if someone tells me how. Thanks, Tony |
#13
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Guess it was a waste of time trying to explain it to you. Did you bother to
read it? -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I tried your macro, but unfortunately Excel still did not import the special characters. Recall that there are 6 special characters between the $$158 and the 1 8 in the first line of the file: $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 For that section, the output from your macro looked like this: 36 = $ 0 = 36 = $ 0 = 49 = 1 0 = 53 = 5 0 = 56 = 8 0 = 32 = 0 = 32 = 0 = 32 = 0 = 32 = 0 = 49 = 1 0 = 32 = 0 = 32 = 0 = 56 = 8 In other words, the 6 characters got stripped away again so that all you see are the 2 spaces which appear on either side of the 6 special characters. The only way I've found for Excel to even recognize that those characters exist is to use the "Delimited" option during text import and specify "spaces" as the delimiting character with the "Treat consecutive delimiters as one" feature unchecked. Unfortunately, that method of importing would mean a huge rework of my existing code. I spent another few hours trying to research the UNICODE possibilty you mentioned, but still was unable to come up with anything. At a loss... -Tony "Tom Ogilvy" wrote: put this in a workbook. Change the path to point to your file: Sub ReadStraightTextFile() Dim strTest As String Dim bytArray() As Byte Dim intcount As Integer Dim col As Long Open "E:\Data1\W158.DAT" For Input As #1 col = 0 Do While Not EOF(1) Line Input #1, strTest col = col + 1 bytArray = strTest i = 0 For intcount = LBound(bytArray) To UBound(bytArray) i = i + 1 Cells(i, col) = bytArray(intcount) & " = " & Chr(bytArray(intcount)) Next Loop 'Close the file Close #1 End Sub Have blank sheet as the activesheet. Run the macro. It appears to me that the file is UNICODE. unlike an ascii file that has one byte per character, a unicode file has two bytes per character. there are 8 bits to a byte, so an ascii file can have 8^2 = 256 different/unique character codes. In a unicode file, 2 bytes is 16 bits, so 2^16 = 65536 possible unique characters. I didn't see any actual characters that couldn't be represented by Ascii, so you could read every Odd character . It appears that opening it in Excel automatically converts it to Ascii, so you haven't lost any information, but if you want to edit it and write it back out, you would need to save it as Unicode Text. I know that is an option in at least xl2000 and I assume later. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... Tom, Thank you for your reply. I followed your procedure but only got four "32" s in that blank section; that is, there are only 4 spaces there. This confirms what I've suspected, namely, that Excel is simply not importing those characters. I've also tried using C. Pearson's Cell View Add-in with the same result (http://www.cpearson.com/excel/CellView.htm). As you point out, the characters also get stripped when I cut and paste into this forum. Therefore, I've emailed you separately the file I referred to as an attachment (it's a text document called W158.DAT) sent from myother_acct. If I knew how to post it to this forum, I would. I appreaciate your help...this is a frustrating problem for me. Is there a way to import the text file character by character? -Tony "Tom Ogilvy" wrote: put your string in cell A1. Then in B1 or another cell in the first row put in this formula =CODE(MID($A$1,ROW(),1)) Assume the above formula is in B1 in C1: =CHAR(B1) now select B1:C1 and drag fill down until the formula starts returning #Value errors. The only thing between the characters in your post are ascii code 32 which is a space. Possibly they didn't get carried forward in the email. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I need to import text files into Excel without losing special characters. I've tried several methods, but each time Excel imports in the file, ignoring those characters. The following is an example line, but what you can't see are the 6 special characters which appear between the $$158 and the 1 8! $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 I know they are there, however since I opened the document using Word, which displays them as a y with 2 dots above them. My Excel VBA code needs to import these characters so that it doesn't get lost when extracting the data using MID(,,,) function. The text file were generated using old FORTRAN programs, and there are thousands of them...my VBA routines need to access these files in order to modernize our system. Examples of what I've tried (all of these ignore the y characters) Workbooks.OpenText Filename:=fname, Origin:=437, _ StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2) With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname, Destination:=Cells(2, Col)) Open FName For Input Access Read As #1 While Not EOF(1) Line Input #1, WholeLine Cells(RowNdx, ColNdx).Value = WholeLine RowNdx = RowNdx + 1 Wend Close #1 I would upload an example file showing the characters if someone tells me how. I would also tell you what the characters are, again, if someone tells me how. Thanks, Tony |
#14
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Art,
open excel open a new workbook type a bunch of spaces in the first couple of cells in the first column of the workbook Now do Save as and select Unicode Text (*.txt) as the file type. Now close excel. Open the file in word. It if offers to open it as Unicode, unselect that option and select the option to the left (normal). when it is open in Word, select the Paragraph symbol so you can see the character returns. You will see the y with two dots. (so those are the null characters - the second part/byte of the unicode two byte characters. -- Regards, Tom Ogilvy "Art H" wrote in message ups.com... Tony, Thanks for sending the file directly to me. I just wanted to make sure the file at http://www.savefile.com/files/150039 did not get translated somehow. I have another request. Please send me a screen shot of seeing "a y with 2 dots above" the y's. Art T_o_n_y wrote: Art, I've done as you requested, although the file is also available from the link I posted earlier, i.e., http://www.savefile.com/files/150039 I've concluded that the special characters are most likely null characters (i.e., ascii 000), but I'm still not 100% sure. If so, it appears Excel strips away null characters when importing into cells. Although it would be best if I could somehow import these characters into an Excel cell or string variable, a work around for me would be to search for null characters in multiple text files and replace them with spaces. If anyone knows how to do this, I'd be interested. I have a program called "Advanced Search and Replace" with which I've tried to do this, but without success...again making me wonder if the characters are, in fact, ascii 000. I appreaciate the continued help. -Tony "Art H" wrote: Tony, Please send your file directly to me via an attachment. I think I can help. T_o_n_y wrote: I tried your macro, but unfortunately Excel still did not import the special characters. Recall that there are 6 special characters between the $$158 and the 1 8 in the first line of the file: $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 For that section, the output from your macro looked like this: 36 = $ 0 = 36 = $ 0 = 49 = 1 0 = 53 = 5 0 = 56 = 8 0 = 32 = 0 = 32 = 0 = 32 = 0 = 32 = 0 = 49 = 1 0 = 32 = 0 = 32 = 0 = 56 = 8 In other words, the 6 characters got stripped away again so that all you see are the 2 spaces which appear on either side of the 6 special characters. The only way I've found for Excel to even recognize that those characters exist is to use the "Delimited" option during text import and specify "spaces" as the delimiting character with the "Treat consecutive delimiters as one" feature unchecked. Unfortunately, that method of importing would mean a huge rework of my existing code. I spent another few hours trying to research the UNICODE possibilty you mentioned, but still was unable to come up with anything. At a loss... -Tony "Tom Ogilvy" wrote: put this in a workbook. Change the path to point to your file: Sub ReadStraightTextFile() Dim strTest As String Dim bytArray() As Byte Dim intcount As Integer Dim col As Long Open "E:\Data1\W158.DAT" For Input As #1 col = 0 Do While Not EOF(1) Line Input #1, strTest col = col + 1 bytArray = strTest i = 0 For intcount = LBound(bytArray) To UBound(bytArray) i = i + 1 Cells(i, col) = bytArray(intcount) & " = " & Chr(bytArray(intcount)) Next Loop 'Close the file Close #1 End Sub Have blank sheet as the activesheet. Run the macro. It appears to me that the file is UNICODE. unlike an ascii file that has one byte per character, a unicode file has two bytes per character. there are 8 bits to a byte, so an ascii file can have 8^2 = 256 different/unique character codes. In a unicode file, 2 bytes is 16 bits, so 2^16 = 65536 possible unique characters. I didn't see any actual characters that couldn't be represented by Ascii, so you could read every Odd character . It appears that opening it in Excel automatically converts it to Ascii, so you haven't lost any information, but if you want to edit it and write it back out, you would need to save it as Unicode Text. I know that is an option in at least xl2000 and I assume later. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... Tom, Thank you for your reply. I followed your procedure but only got four "32" s in that blank section; that is, there are only 4 spaces there. This confirms what I've suspected, namely, that Excel is simply not importing those characters. I've also tried using C. Pearson's Cell View Add-in with the same result (http://www.cpearson.com/excel/CellView.htm). As you point out, the characters also get stripped when I cut and paste into this forum. Therefore, I've emailed you separately the file I referred to as an attachment (it's a text document called W158.DAT) sent from myother_acct. If I knew how to post it to this forum, I would. I appreaciate your help...this is a frustrating problem for me. Is there a way to import the text file character by character? -Tony "Tom Ogilvy" wrote: put your string in cell A1. Then in B1 or another cell in the first row put in this formula =CODE(MID($A$1,ROW(),1)) Assume the above formula is in B1 in C1: =CHAR(B1) now select B1:C1 and drag fill down until the formula starts returning #Value errors. The only thing between the characters in your post are ascii code 32 which is a space. Possibly they didn't get carried forward in the email. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I need to import text files into Excel without losing special characters. I've tried several methods, but each time Excel imports in the file, ignoring those characters. The following is an example line, but what you can't see are the 6 special characters which appear between the $$158 and the 1 8! $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 I know they are there, however since I opened the document using Word, which displays them as a y with 2 dots above them. My Excel VBA code needs to import these characters so that it doesn't get lost when extracting the data using MID(,,,) function. The text file were generated using old FORTRAN programs, and there are thousands of them...my VBA routines need to access these files in order to modernize our system. Examples of what I've tried (all of these ignore the y characters) Workbooks.OpenText Filename:=fname, Origin:=437, _ StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2) With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname, Destination:=Cells(2, Col)) Open FName For Input Access Read As #1 While Not EOF(1) Line Input #1, WholeLine Cells(RowNdx, ColNdx).Value = WholeLine RowNdx = RowNdx + 1 Wend Close #1 I would upload an example file showing the characters if someone tells me how. I would also tell you what the characters are, again, if someone tells me how. Thanks, Tony |
#15
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Tom,
I'm perplexed at your response because, I'm about as far from ignoring your posts as possible. Indeed, I generally skip directly to your posts when on this newsgroup since they are more helpful than anyone's, containing actual sample code that can be used. It's just that the output from the macro you sent me led me to the conclusion that you were mistaken this time. The output shows "0" for every other cell which is not what I would expect from UNICODE with 2 bytes per character. Furthermore, rather than revealing the presence of the special characters, your macro also had them stripped away. Here's what I mean. The file I've uploaded contains the following in the first line, $$158++yyyyyy++1++8++4.50 etc... I've substituting + for spaces and y for the special characters above. As you can see there are 2 spaces followed by 6 special characters followed by 2 spaces. The output from your macro completely omits the 6 special characters, if I'm reading it correctly. As I wrote, I spent "anoter few hours" researching into UNICODE in order to investigate the possibility you raised...but nothing I found seemed to confirm it. In addition, Excel has two different UNICODE types (UTF-8 and UTF-7) which one can select in the text import wizard. I tried both of them and neither gave me success in importing the special characters, as judged by using c pearson's CellView add-in, which allows character by character visualization of cell contents. Thank you again for your help, -Tony "Tom Ogilvy" wrote: Guess it was a waste of time trying to explain it to you. Did you bother to read it? -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I tried your macro, but unfortunately Excel still did not import the special characters. Recall that there are 6 special characters between the $$158 and the 1 8 in the first line of the file: $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 For that section, the output from your macro looked like this: 36 = $ 0 = 36 = $ 0 = 49 = 1 0 = 53 = 5 0 = 56 = 8 0 = 32 = 0 = 32 = 0 = 32 = 0 = 32 = 0 = 49 = 1 0 = 32 = 0 = 32 = 0 = 56 = 8 In other words, the 6 characters got stripped away again so that all you see are the 2 spaces which appear on either side of the 6 special characters. The only way I've found for Excel to even recognize that those characters exist is to use the "Delimited" option during text import and specify "spaces" as the delimiting character with the "Treat consecutive delimiters as one" feature unchecked. Unfortunately, that method of importing would mean a huge rework of my existing code. I spent another few hours trying to research the UNICODE possibilty you mentioned, but still was unable to come up with anything. At a loss... -Tony "Tom Ogilvy" wrote: put this in a workbook. Change the path to point to your file: Sub ReadStraightTextFile() Dim strTest As String Dim bytArray() As Byte Dim intcount As Integer Dim col As Long Open "E:\Data1\W158.DAT" For Input As #1 col = 0 Do While Not EOF(1) Line Input #1, strTest col = col + 1 bytArray = strTest i = 0 For intcount = LBound(bytArray) To UBound(bytArray) i = i + 1 Cells(i, col) = bytArray(intcount) & " = " & Chr(bytArray(intcount)) Next Loop 'Close the file Close #1 End Sub Have blank sheet as the activesheet. Run the macro. It appears to me that the file is UNICODE. unlike an ascii file that has one byte per character, a unicode file has two bytes per character. there are 8 bits to a byte, so an ascii file can have 8^2 = 256 different/unique character codes. In a unicode file, 2 bytes is 16 bits, so 2^16 = 65536 possible unique characters. I didn't see any actual characters that couldn't be represented by Ascii, so you could read every Odd character . It appears that opening it in Excel automatically converts it to Ascii, so you haven't lost any information, but if you want to edit it and write it back out, you would need to save it as Unicode Text. I know that is an option in at least xl2000 and I assume later. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... Tom, Thank you for your reply. I followed your procedure but only got four "32" s in that blank section; that is, there are only 4 spaces there. This confirms what I've suspected, namely, that Excel is simply not importing those characters. I've also tried using C. Pearson's Cell View Add-in with the same result (http://www.cpearson.com/excel/CellView.htm). As you point out, the characters also get stripped when I cut and paste into this forum. Therefore, I've emailed you separately the file I referred to as an attachment (it's a text document called W158.DAT) sent from myother_acct. If I knew how to post it to this forum, I would. I appreaciate your help...this is a frustrating problem for me. Is there a way to import the text file character by character? -Tony "Tom Ogilvy" wrote: put your string in cell A1. Then in B1 or another cell in the first row put in this formula =CODE(MID($A$1,ROW(),1)) Assume the above formula is in B1 in C1: =CHAR(B1) now select B1:C1 and drag fill down until the formula starts returning #Value errors. The only thing between the characters in your post are ascii code 32 which is a space. Possibly they didn't get carried forward in the email. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I need to import text files into Excel without losing special characters. I've tried several methods, but each time Excel imports in the file, ignoring those characters. The following is an example line, but what you can't see are the 6 special characters which appear between the $$158 and the 1 8! $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 I know they are there, however since I opened the document using Word, which displays them as a y with 2 dots above them. My Excel VBA code needs to import these characters so that it doesn't get lost when extracting the data using MID(,,,) function. The text file were generated using old FORTRAN programs, and there are thousands of them...my VBA routines need to access these files in order to modernize our system. Examples of what I've tried (all of these ignore the y characters) Workbooks.OpenText Filename:=fname, Origin:=437, _ StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2) With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname, Destination:=Cells(2, Col)) Open FName For Input Access Read As #1 While Not EOF(1) Line Input #1, WholeLine Cells(RowNdx, ColNdx).Value = WholeLine RowNdx = RowNdx + 1 Wend Close #1 I would upload an example file showing the characters if someone tells me how. I would also tell you what the characters are, again, if someone tells me how. Thanks, Tony |
#16
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
One added step. I formatted the cells in Excel as MS Ariel Unicode before
doing the saveas. Don't know if that is necessary or not. -- Regards, Tom Ogilvy "Tom Ogilvy" wrote in message ... Art, open excel open a new workbook type a bunch of spaces in the first couple of cells in the first column of the workbook Now do Save as and select Unicode Text (*.txt) as the file type. Now close excel. Open the file in word. It if offers to open it as Unicode, unselect that option and select the option to the left (normal). when it is open in Word, select the Paragraph symbol so you can see the character returns. You will see the y with two dots. (so those are the null characters - the second part/byte of the unicode two byte characters. -- Regards, Tom Ogilvy "Art H" wrote in message ups.com... Tony, Thanks for sending the file directly to me. I just wanted to make sure the file at http://www.savefile.com/files/150039 did not get translated somehow. I have another request. Please send me a screen shot of seeing "a y with 2 dots above" the y's. Art T_o_n_y wrote: Art, I've done as you requested, although the file is also available from the link I posted earlier, i.e., http://www.savefile.com/files/150039 I've concluded that the special characters are most likely null characters (i.e., ascii 000), but I'm still not 100% sure. If so, it appears Excel strips away null characters when importing into cells. Although it would be best if I could somehow import these characters into an Excel cell or string variable, a work around for me would be to search for null characters in multiple text files and replace them with spaces. If anyone knows how to do this, I'd be interested. I have a program called "Advanced Search and Replace" with which I've tried to do this, but without success...again making me wonder if the characters are, in fact, ascii 000. I appreaciate the continued help. -Tony "Art H" wrote: Tony, Please send your file directly to me via an attachment. I think I can help. T_o_n_y wrote: I tried your macro, but unfortunately Excel still did not import the special characters. Recall that there are 6 special characters between the $$158 and the 1 8 in the first line of the file: $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 For that section, the output from your macro looked like this: 36 = $ 0 = 36 = $ 0 = 49 = 1 0 = 53 = 5 0 = 56 = 8 0 = 32 = 0 = 32 = 0 = 32 = 0 = 32 = 0 = 49 = 1 0 = 32 = 0 = 32 = 0 = 56 = 8 In other words, the 6 characters got stripped away again so that all you see are the 2 spaces which appear on either side of the 6 special characters. The only way I've found for Excel to even recognize that those characters exist is to use the "Delimited" option during text import and specify "spaces" as the delimiting character with the "Treat consecutive delimiters as one" feature unchecked. Unfortunately, that method of importing would mean a huge rework of my existing code. I spent another few hours trying to research the UNICODE possibilty you mentioned, but still was unable to come up with anything. At a loss... -Tony "Tom Ogilvy" wrote: put this in a workbook. Change the path to point to your file: Sub ReadStraightTextFile() Dim strTest As String Dim bytArray() As Byte Dim intcount As Integer Dim col As Long Open "E:\Data1\W158.DAT" For Input As #1 col = 0 Do While Not EOF(1) Line Input #1, strTest col = col + 1 bytArray = strTest i = 0 For intcount = LBound(bytArray) To UBound(bytArray) i = i + 1 Cells(i, col) = bytArray(intcount) & " = " & Chr(bytArray(intcount)) Next Loop 'Close the file Close #1 End Sub Have blank sheet as the activesheet. Run the macro. It appears to me that the file is UNICODE. unlike an ascii file that has one byte per character, a unicode file has two bytes per character. there are 8 bits to a byte, so an ascii file can have 8^2 = 256 different/unique character codes. In a unicode file, 2 bytes is 16 bits, so 2^16 = 65536 possible unique characters. I didn't see any actual characters that couldn't be represented by Ascii, so you could read every Odd character . It appears that opening it in Excel automatically converts it to Ascii, so you haven't lost any information, but if you want to edit it and write it back out, you would need to save it as Unicode Text. I know that is an option in at least xl2000 and I assume later. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... Tom, Thank you for your reply. I followed your procedure but only got four "32" s in that blank section; that is, there are only 4 spaces there. This confirms what I've suspected, namely, that Excel is simply not importing those characters. I've also tried using C. Pearson's Cell View Add-in with the same result (http://www.cpearson.com/excel/CellView.htm). As you point out, the characters also get stripped when I cut and paste into this forum. Therefore, I've emailed you separately the file I referred to as an attachment (it's a text document called W158.DAT) sent from myother_acct. If I knew how to post it to this forum, I would. I appreaciate your help...this is a frustrating problem for me. Is there a way to import the text file character by character? -Tony "Tom Ogilvy" wrote: put your string in cell A1. Then in B1 or another cell in the first row put in this formula =CODE(MID($A$1,ROW(),1)) Assume the above formula is in B1 in C1: =CHAR(B1) now select B1:C1 and drag fill down until the formula starts returning #Value errors. The only thing between the characters in your post are ascii code 32 which is a space. Possibly they didn't get carried forward in the email. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I need to import text files into Excel without losing special characters. I've tried several methods, but each time Excel imports in the file, ignoring those characters. The following is an example line, but what you can't see are the 6 special characters which appear between the $$158 and the 1 8! $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 I know they are there, however since I opened the document using Word, which displays them as a y with 2 dots above them. My Excel VBA code needs to import these characters so that it doesn't get lost when extracting the data using MID(,,,) function. The text file were generated using old FORTRAN programs, and there are thousands of them...my VBA routines need to access these files in order to modernize our system. Examples of what I've tried (all of these ignore the y characters) Workbooks.OpenText Filename:=fname, Origin:=437, _ StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2) With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname, Destination:=Cells(2, Col)) Open FName For Input Access Read As #1 While Not EOF(1) Line Input #1, WholeLine Cells(RowNdx, ColNdx).Value = WholeLine RowNdx = RowNdx + 1 Wend Close #1 I would upload an example file showing the characters if someone tells me how. I would also tell you what the characters are, again, if someone tells me how. Thanks, Tony |
#17
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Art H,
In response to your request, I'm posting two screen shots related to the special characters I'm having trouble inputing into Excel. This one shows the file in Word while in the mode which "shows" special characters: http://www.savefile.com/files/154330 This one shows the file while in the mode which "hides" special characters: http://www.savefile.com/files/154335 I should have mentioned initially that the 6 special characters show as y's with 2 dots in the former mode and rectangles in the later. -Tony "Art H" wrote: Tony, Thanks for sending the file directly to me. I just wanted to make sure the file at http://www.savefile.com/files/150039 did not get translated somehow. I have another request. Please send me a screen shot of seeing "a y with 2 dots above" the y's. Art T_o_n_y wrote: Art, I've done as you requested, although the file is also available from the link I posted earlier, i.e., http://www.savefile.com/files/150039 I've concluded that the special characters are most likely null characters (i.e., ascii 000), but I'm still not 100% sure. If so, it appears Excel strips away null characters when importing into cells. Although it would be best if I could somehow import these characters into an Excel cell or string variable, a work around for me would be to search for null characters in multiple text files and replace them with spaces. If anyone knows how to do this, I'd be interested. I have a program called "Advanced Search and Replace" with which I've tried to do this, but without success...again making me wonder if the characters are, in fact, ascii 000. I appreaciate the continued help. -Tony "Art H" wrote: Tony, Please send your file directly to me via an attachment. I think I can help. T_o_n_y wrote: I tried your macro, but unfortunately Excel still did not import the special characters. Recall that there are 6 special characters between the $$158 and the 1 8 in the first line of the file: $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 For that section, the output from your macro looked like this: 36 = $ 0 = 36 = $ 0 = 49 = 1 0 = 53 = 5 0 = 56 = 8 0 = 32 = 0 = 32 = 0 = 32 = 0 = 32 = 0 = 49 = 1 0 = 32 = 0 = 32 = 0 = 56 = 8 In other words, the 6 characters got stripped away again so that all you see are the 2 spaces which appear on either side of the 6 special characters. The only way I've found for Excel to even recognize that those characters exist is to use the "Delimited" option during text import and specify "spaces" as the delimiting character with the "Treat consecutive delimiters as one" feature unchecked. Unfortunately, that method of importing would mean a huge rework of my existing code. I spent another few hours trying to research the UNICODE possibilty you mentioned, but still was unable to come up with anything. At a loss... -Tony "Tom Ogilvy" wrote: put this in a workbook. Change the path to point to your file: Sub ReadStraightTextFile() Dim strTest As String Dim bytArray() As Byte Dim intcount As Integer Dim col As Long Open "E:\Data1\W158.DAT" For Input As #1 col = 0 Do While Not EOF(1) Line Input #1, strTest col = col + 1 bytArray = strTest i = 0 For intcount = LBound(bytArray) To UBound(bytArray) i = i + 1 Cells(i, col) = bytArray(intcount) & " = " & Chr(bytArray(intcount)) Next Loop 'Close the file Close #1 End Sub Have blank sheet as the activesheet. Run the macro. It appears to me that the file is UNICODE. unlike an ascii file that has one byte per character, a unicode file has two bytes per character. there are 8 bits to a byte, so an ascii file can have 8^2 = 256 different/unique character codes. In a unicode file, 2 bytes is 16 bits, so 2^16 = 65536 possible unique characters. I didn't see any actual characters that couldn't be represented by Ascii, so you could read every Odd character . It appears that opening it in Excel automatically converts it to Ascii, so you haven't lost any information, but if you want to edit it and write it back out, you would need to save it as Unicode Text. I know that is an option in at least xl2000 and I assume later. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... Tom, Thank you for your reply. I followed your procedure but only got four "32" s in that blank section; that is, there are only 4 spaces there. This confirms what I've suspected, namely, that Excel is simply not importing those characters. I've also tried using C. Pearson's Cell View Add-in with the same result (http://www.cpearson.com/excel/CellView.htm). As you point out, the characters also get stripped when I cut and paste into this forum. Therefore, I've emailed you separately the file I referred to as an attachment (it's a text document called W158.DAT) sent from myother_acct. If I knew how to post it to this forum, I would. I appreaciate your help...this is a frustrating problem for me. Is there a way to import the text file character by character? -Tony "Tom Ogilvy" wrote: put your string in cell A1. Then in B1 or another cell in the first row put in this formula =CODE(MID($A$1,ROW(),1)) Assume the above formula is in B1 in C1: =CHAR(B1) now select B1:C1 and drag fill down until the formula starts returning #Value errors. The only thing between the characters in your post are ascii code 32 which is a space. Possibly they didn't get carried forward in the email. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I need to import text files into Excel without losing special characters. I've tried several methods, but each time Excel imports in the file, ignoring those characters. The following is an example line, but what you can't see are the 6 special characters which appear between the $$158 and the 1 8! $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 I know they are there, however since I opened the document using Word, which displays them as a y with 2 dots above them. My Excel VBA code needs to import these characters so that it doesn't get lost when extracting the data using MID(,,,) function. The text file were generated using old FORTRAN programs, and there are thousands of them...my VBA routines need to access these files in order to modernize our system. Examples of what I've tried (all of these ignore the y characters) Workbooks.OpenText Filename:=fname, Origin:=437, _ StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2) With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname, Destination:=Cells(2, Col)) Open FName For Input Access Read As #1 While Not EOF(1) Line Input #1, WholeLine Cells(RowNdx, ColNdx).Value = WholeLine RowNdx = RowNdx + 1 Wend Close #1 I would upload an example file showing the characters if someone tells me how. I would also tell you what the characters are, again, if someone tells me how. Thanks, Tony |
#18
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Tony,
The "special characters" I see in your uploaded file are Asc(0). It is not a Unicode file. As for the "0" for every other cell, that is expected if you have all ANSI text stored in a UNICODE format. The lower byte will always be 0 as no values exceed decimal 255 or FF hex. So do you have a Unicode file or not ? NickHK "T_o_n_y" wrote in message ... Tom, I'm perplexed at your response because, I'm about as far from ignoring your posts as possible. Indeed, I generally skip directly to your posts when on this newsgroup since they are more helpful than anyone's, containing actual sample code that can be used. It's just that the output from the macro you sent me led me to the conclusion that you were mistaken this time. The output shows "0" for every other cell which is not what I would expect from UNICODE with 2 bytes per character. Furthermore, rather than revealing the presence of the special characters, your macro also had them stripped away. Here's what I mean. The file I've uploaded contains the following in the first line, $$158++yyyyyy++1++8++4.50 etc... I've substituting + for spaces and y for the special characters above. As you can see there are 2 spaces followed by 6 special characters followed by 2 spaces. The output from your macro completely omits the 6 special characters, if I'm reading it correctly. As I wrote, I spent "anoter few hours" researching into UNICODE in order to investigate the possibility you raised...but nothing I found seemed to confirm it. In addition, Excel has two different UNICODE types (UTF-8 and UTF-7) which one can select in the text import wizard. I tried both of them and neither gave me success in importing the special characters, as judged by using c pearson's CellView add-in, which allows character by character visualization of cell contents. Thank you again for your help, -Tony "Tom Ogilvy" wrote: Guess it was a waste of time trying to explain it to you. Did you bother to read it? -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I tried your macro, but unfortunately Excel still did not import the special characters. Recall that there are 6 special characters between the $$158 and the 1 8 in the first line of the file: $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 For that section, the output from your macro looked like this: 36 = $ 0 = 36 = $ 0 = 49 = 1 0 = 53 = 5 0 = 56 = 8 0 = 32 = 0 = 32 = 0 = 32 = 0 = 32 = 0 = 49 = 1 0 = 32 = 0 = 32 = 0 = 56 = 8 In other words, the 6 characters got stripped away again so that all you see are the 2 spaces which appear on either side of the 6 special characters. The only way I've found for Excel to even recognize that those characters exist is to use the "Delimited" option during text import and specify "spaces" as the delimiting character with the "Treat consecutive delimiters as one" feature unchecked. Unfortunately, that method of importing would mean a huge rework of my existing code. I spent another few hours trying to research the UNICODE possibilty you mentioned, but still was unable to come up with anything. At a loss... -Tony "Tom Ogilvy" wrote: put this in a workbook. Change the path to point to your file: Sub ReadStraightTextFile() Dim strTest As String Dim bytArray() As Byte Dim intcount As Integer Dim col As Long Open "E:\Data1\W158.DAT" For Input As #1 col = 0 Do While Not EOF(1) Line Input #1, strTest col = col + 1 bytArray = strTest i = 0 For intcount = LBound(bytArray) To UBound(bytArray) i = i + 1 Cells(i, col) = bytArray(intcount) & " = " & Chr(bytArray(intcount)) Next Loop 'Close the file Close #1 End Sub Have blank sheet as the activesheet. Run the macro. It appears to me that the file is UNICODE. unlike an ascii file that has one byte per character, a unicode file has two bytes per character. there are 8 bits to a byte, so an ascii file can have 8^2 = 256 different/unique character codes. In a unicode file, 2 bytes is 16 bits, so 2^16 = 65536 possible unique characters. I didn't see any actual characters that couldn't be represented by Ascii, so you could read every Odd character . It appears that opening it in Excel automatically converts it to Ascii, so you haven't lost any information, but if you want to edit it and write it back out, you would need to save it as Unicode Text. I know that is an option in at least xl2000 and I assume later. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... Tom, Thank you for your reply. I followed your procedure but only got four "32" s in that blank section; that is, there are only 4 spaces there. This confirms what I've suspected, namely, that Excel is simply not importing those characters. I've also tried using C. Pearson's Cell View Add-in with the same result (http://www.cpearson.com/excel/CellView.htm). As you point out, the characters also get stripped when I cut and paste into this forum. Therefore, I've emailed you separately the file I referred to as an attachment (it's a text document called W158.DAT) sent from myother_acct. If I knew how to post it to this forum, I would. I appreaciate your help...this is a frustrating problem for me. Is there a way to import the text file character by character? -Tony "Tom Ogilvy" wrote: put your string in cell A1. Then in B1 or another cell in the first row put in this formula =CODE(MID($A$1,ROW(),1)) Assume the above formula is in B1 in C1: =CHAR(B1) now select B1:C1 and drag fill down until the formula starts returning #Value errors. The only thing between the characters in your post are ascii code 32 which is a space. Possibly they didn't get carried forward in the email. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I need to import text files into Excel without losing special characters. I've tried several methods, but each time Excel imports in the file, ignoring those characters. The following is an example line, but what you can't see are the 6 special characters which appear between the $$158 and the 1 8! $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 I know they are there, however since I opened the document using Word, which displays them as a y with 2 dots above them. My Excel VBA code needs to import these characters so that it doesn't get lost when extracting the data using MID(,,,) function. The text file were generated using old FORTRAN programs, and there are thousands of them...my VBA routines need to access these files in order to modernize our system. Examples of what I've tried (all of these ignore the y characters) Workbooks.OpenText Filename:=fname, Origin:=437, _ StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2) With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname, Destination:=Cells(2, Col)) Open FName For Input Access Read As #1 While Not EOF(1) Line Input #1, WholeLine Cells(RowNdx, ColNdx).Value = WholeLine RowNdx = RowNdx + 1 Wend Close #1 I would upload an example file showing the characters if someone tells me how. I would also tell you what the characters are, again, if someone tells me how. Thanks, Tony |
#19
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Tom,
I did what you suggested, typing spaces into Excel cells, saving as UNICODE, and then opening it in Word. I did the same thing with other samples of number and text. Since null characters are bytes full of zeros, it seems to me that one would inevitably see the null characters THROUGHOUT any file that was saved as unicode (since low code values will always have zeros in the front). By contrast, the file that I've uploaded contains the null character symbols in only one location. Unless it is possible for a file to be formatted as both non-unicode and unicode (I'm not even sure what that would mean), I don't see how this could be a unicode file. I admit that my grasp of this is incomplete at best, but it seems most likely to me that what is going on here is that null ascii characters were at one time inserted into these text files and now, for some reason, Excel strips them away when importing. If Excel didn't strip them away, I would have the problem, and it's remarkable to me that you can't even import nulls into a string. I also attempted to insert a null character into an Excel cell using the INSERT--Symbol feature, but was unable to access null character. It seems Excel refuses to have this character. -Tony |
#20
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Nick,
What you've written makes sense to me, and that is why I've had trouble concluding that it is a unicode file. As I wrote in an earlier post, I'm leaning toward it being a text file that has 6 Ascii(0) characters inserted into it. The question remains: is there a way to have Excel import these characters rather than completely ignoring them? -Tony "NickHK" wrote: Tony, The "special characters" I see in your uploaded file are Asc(0). It is not a Unicode file. As for the "0" for every other cell, that is expected if you have all ANSI text stored in a UNICODE format. The lower byte will always be 0 as no values exceed decimal 255 or FF hex. So do you have a Unicode file or not ? NickHK "T_o_n_y" wrote in message ... Tom, I'm perplexed at your response because, I'm about as far from ignoring your posts as possible. Indeed, I generally skip directly to your posts when on this newsgroup since they are more helpful than anyone's, containing actual sample code that can be used. It's just that the output from the macro you sent me led me to the conclusion that you were mistaken this time. The output shows "0" for every other cell which is not what I would expect from UNICODE with 2 bytes per character. Furthermore, rather than revealing the presence of the special characters, your macro also had them stripped away. Here's what I mean. The file I've uploaded contains the following in the first line, $$158++yyyyyy++1++8++4.50 etc... I've substituting + for spaces and y for the special characters above. As you can see there are 2 spaces followed by 6 special characters followed by 2 spaces. The output from your macro completely omits the 6 special characters, if I'm reading it correctly. As I wrote, I spent "anoter few hours" researching into UNICODE in order to investigate the possibility you raised...but nothing I found seemed to confirm it. In addition, Excel has two different UNICODE types (UTF-8 and UTF-7) which one can select in the text import wizard. I tried both of them and neither gave me success in importing the special characters, as judged by using c pearson's CellView add-in, which allows character by character visualization of cell contents. Thank you again for your help, -Tony "Tom Ogilvy" wrote: Guess it was a waste of time trying to explain it to you. Did you bother to read it? -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I tried your macro, but unfortunately Excel still did not import the special characters. Recall that there are 6 special characters between the $$158 and the 1 8 in the first line of the file: $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 For that section, the output from your macro looked like this: 36 = $ 0 = 36 = $ 0 = 49 = 1 0 = 53 = 5 0 = 56 = 8 0 = 32 = 0 = 32 = 0 = 32 = 0 = 32 = 0 = 49 = 1 0 = 32 = 0 = 32 = 0 = 56 = 8 In other words, the 6 characters got stripped away again so that all you see are the 2 spaces which appear on either side of the 6 special characters. The only way I've found for Excel to even recognize that those characters exist is to use the "Delimited" option during text import and specify "spaces" as the delimiting character with the "Treat consecutive delimiters as one" feature unchecked. Unfortunately, that method of importing would mean a huge rework of my existing code. I spent another few hours trying to research the UNICODE possibilty you mentioned, but still was unable to come up with anything. At a loss... -Tony "Tom Ogilvy" wrote: put this in a workbook. Change the path to point to your file: Sub ReadStraightTextFile() Dim strTest As String Dim bytArray() As Byte Dim intcount As Integer Dim col As Long Open "E:\Data1\W158.DAT" For Input As #1 col = 0 Do While Not EOF(1) Line Input #1, strTest col = col + 1 bytArray = strTest i = 0 For intcount = LBound(bytArray) To UBound(bytArray) i = i + 1 Cells(i, col) = bytArray(intcount) & " = " & Chr(bytArray(intcount)) Next Loop 'Close the file Close #1 End Sub Have blank sheet as the activesheet. Run the macro. It appears to me that the file is UNICODE. unlike an ascii file that has one byte per character, a unicode file has two bytes per character. there are 8 bits to a byte, so an ascii file can have 8^2 = 256 different/unique character codes. In a unicode file, 2 bytes is 16 bits, so 2^16 = 65536 possible unique characters. I didn't see any actual characters that couldn't be represented by Ascii, so you could read every Odd character . It appears that opening it in Excel automatically converts it to Ascii, so you haven't lost any information, but if you want to edit it and write it back out, you would need to save it as Unicode Text. I know that is an option in at least xl2000 and I assume later. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... Tom, Thank you for your reply. I followed your procedure but only got four "32" s in that blank section; that is, there are only 4 spaces there. This confirms what I've suspected, namely, that Excel is simply not importing those characters. I've also tried using C. Pearson's Cell View Add-in with the same result (http://www.cpearson.com/excel/CellView.htm). As you point out, the characters also get stripped when I cut and paste into this forum. Therefore, I've emailed you separately the file I referred to as an attachment (it's a text document called W158.DAT) sent from myother_acct. If I knew how to post it to this forum, I would. I appreaciate your help...this is a frustrating problem for me. Is there a way to import the text file character by character? -Tony "Tom Ogilvy" wrote: put your string in cell A1. Then in B1 or another cell in the first row put in this formula =CODE(MID($A$1,ROW(),1)) Assume the above formula is in B1 in C1: =CHAR(B1) now select B1:C1 and drag fill down until the formula starts returning #Value errors. The only thing between the characters in your post are ascii code 32 which is a space. Possibly they didn't get carried forward in the email. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I need to import text files into Excel without losing special characters. I've tried several methods, but each time Excel imports in the file, ignoring those characters. The following is an example line, but what you can't see are the 6 special characters which appear between the $$158 and the 1 8! $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 I know they are there, however since I opened the document using Word, which displays them as a y with 2 dots above them. My Excel VBA code needs to import these characters so that it doesn't get lost when extracting the data using MID(,,,) function. The text file were generated using old FORTRAN programs, and there are thousands of them...my VBA routines need to access these files in order to modernize our system. Examples of what I've tried (all of these ignore the y characters) Workbooks.OpenText Filename:=fname, Origin:=437, _ StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2) With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname, Destination:=Cells(2, Col)) Open FName For Input Access Read As #1 While Not EOF(1) Line Input #1, WholeLine Cells(RowNdx, ColNdx).Value = WholeLine RowNdx = RowNdx + 1 |
#21
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Tony,
Probably, but what use are 6 x null characters ? Do they actually mean anything to you ? Or are they some garbage that your Fortran app outputs ? NickHK "T_o_n_y" wrote in message ... Nick, What you've written makes sense to me, and that is why I've had trouble concluding that it is a unicode file. As I wrote in an earlier post, I'm leaning toward it being a text file that has 6 Ascii(0) characters inserted into it. The question remains: is there a way to have Excel import these characters rather than completely ignoring them? -Tony "NickHK" wrote: Tony, The "special characters" I see in your uploaded file are Asc(0). It is not a Unicode file. As for the "0" for every other cell, that is expected if you have all ANSI text stored in a UNICODE format. The lower byte will always be 0 as no values exceed decimal 255 or FF hex. So do you have a Unicode file or not ? NickHK "T_o_n_y" wrote in message ... Tom, I'm perplexed at your response because, I'm about as far from ignoring your posts as possible. Indeed, I generally skip directly to your posts when on this newsgroup since they are more helpful than anyone's, containing actual sample code that can be used. It's just that the output from the macro you sent me led me to the conclusion that you were mistaken this time. The output shows "0" for every other cell which is not what I would expect from UNICODE with 2 bytes per character. Furthermore, rather than revealing the presence of the special characters, your macro also had them stripped away. Here's what I mean. The file I've uploaded contains the following in the first line, $$158++yyyyyy++1++8++4.50 etc... I've substituting + for spaces and y for the special characters above. As you can see there are 2 spaces followed by 6 special characters followed by 2 spaces. The output from your macro completely omits the 6 special characters, if I'm reading it correctly. As I wrote, I spent "anoter few hours" researching into UNICODE in order to investigate the possibility you raised...but nothing I found seemed to confirm it. In addition, Excel has two different UNICODE types (UTF-8 and UTF-7) which one can select in the text import wizard. I tried both of them and neither gave me success in importing the special characters, as judged by using c pearson's CellView add-in, which allows character by character visualization of cell contents. Thank you again for your help, -Tony "Tom Ogilvy" wrote: Guess it was a waste of time trying to explain it to you. Did you bother to read it? -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I tried your macro, but unfortunately Excel still did not import the special characters. Recall that there are 6 special characters between the $$158 and the 1 8 in the first line of the file: $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 For that section, the output from your macro looked like this: 36 = $ 0 = 36 = $ 0 = 49 = 1 0 = 53 = 5 0 = 56 = 8 0 = 32 = 0 = 32 = 0 = 32 = 0 = 32 = 0 = 49 = 1 0 = 32 = 0 = 32 = 0 = 56 = 8 In other words, the 6 characters got stripped away again so that all you see are the 2 spaces which appear on either side of the 6 special characters. The only way I've found for Excel to even recognize that those characters exist is to use the "Delimited" option during text import and specify "spaces" as the delimiting character with the "Treat consecutive delimiters as one" feature unchecked. Unfortunately, that method of importing would mean a huge rework of my existing code. I spent another few hours trying to research the UNICODE possibilty you mentioned, but still was unable to come up with anything. At a loss... -Tony "Tom Ogilvy" wrote: put this in a workbook. Change the path to point to your file: Sub ReadStraightTextFile() Dim strTest As String Dim bytArray() As Byte Dim intcount As Integer Dim col As Long Open "E:\Data1\W158.DAT" For Input As #1 col = 0 Do While Not EOF(1) Line Input #1, strTest col = col + 1 bytArray = strTest i = 0 For intcount = LBound(bytArray) To UBound(bytArray) i = i + 1 Cells(i, col) = bytArray(intcount) & " = " & Chr(bytArray(intcount)) Next Loop 'Close the file Close #1 End Sub Have blank sheet as the activesheet. Run the macro. It appears to me that the file is UNICODE. unlike an ascii file that has one byte per character, a unicode file has two bytes per character. there are 8 bits to a byte, so an ascii file can have 8^2 = 256 different/unique character codes. In a unicode file, 2 bytes is 16 bits, so 2^16 = 65536 possible unique characters. I didn't see any actual characters that couldn't be represented by Ascii, so you could read every Odd character . It appears that opening it in Excel automatically converts it to Ascii, so you haven't lost any information, but if you want to edit it and write it back out, you would need to save it as Unicode Text. I know that is an option in at least xl2000 and I assume later. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... Tom, Thank you for your reply. I followed your procedure but only got four "32" s in that blank section; that is, there are only 4 spaces there. This confirms what I've suspected, namely, that Excel is simply not importing those characters. I've also tried using C. Pearson's Cell View Add-in with the same result (http://www.cpearson.com/excel/CellView.htm). As you point out, the characters also get stripped when I cut and paste into this forum. Therefore, I've emailed you separately the file I referred to as an attachment (it's a text document called W158.DAT) sent from myother_acct. If I knew how to post it to this forum, I would. I appreaciate your help...this is a frustrating problem for me. Is there a way to import the text file character by character? -Tony "Tom Ogilvy" wrote: put your string in cell A1. Then in B1 or another cell in the first row put in this formula =CODE(MID($A$1,ROW(),1)) Assume the above formula is in B1 in C1: =CHAR(B1) now select B1:C1 and drag fill down until the formula starts returning #Value errors. The only thing between the characters in your post are ascii code 32 which is a space. Possibly they didn't get carried forward in the email. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I need to import text files into Excel without losing special characters. I've tried several methods, but each time Excel imports in the file, ignoring those characters. The following is an example line, but what you can't see are the 6 special characters which appear between the $$158 and the 1 8! $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 I know they are there, however since I opened the document using Word, which displays them as a y with 2 dots above them. My Excel VBA code needs to import these characters so that it doesn't get lost when extracting the data using MID(,,,) function. The text file were generated using old FORTRAN programs, and there are thousands of them...my VBA routines need to access these files in order to modernize our system. Examples of what I've tried (all of these ignore the y characters) Workbooks.OpenText Filename:=fname, Origin:=437, _ StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2) With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname, Destination:=Cells(2, Col)) Open FName For Input Access Read As #1 While Not EOF(1) Line Input #1, WholeLine Cells(RowNdx, ColNdx).Value = WholeLine RowNdx = RowNdx + 1 |
#22
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Nick,
The presence of the null characters is important because I'm importing data from thousands of files like these using an Excel VBA routine. The data within these files is not well-organized by today's standards, but it can be extracted using string functions such as MID(,,) as long as it is known where particular fields begin and end. A series of FORTRAN programs has been used for 30-40 years to manipulate these files, and now the company wants to modernize. Up until now, my Excel VBA routine has successfully extracted the data from thousands of these files so that it can be placed into Excel format. Except for certain files; the ones having this problem. Apparently a FORTRAN programmer inadvertantly placed the null characters into certain of these files while developing certain aspects of the programs. If the null characters were spaces instead of null characters there would be little problem since the MID(,,) function would still collect the following fields in the string correctly. But the null characters being stripped away means that everything past that point gets offset by 6 characters, causing things like string conversion to doubles, for example, to cause the program to crash. If I can't find a way to keep Excel from stripping the nulls, I suppose I'll work out an error trapping routine that tells the user to manually fix that file. I just hope it doesn't amount to hundreds of files! -Tony "NickHK" wrote: Tony, Probably, but what use are 6 x null characters ? Do they actually mean anything to you ? Or are they some garbage that your Fortran app outputs ? NickHK "T_o_n_y" wrote in message ... Nick, What you've written makes sense to me, and that is why I've had trouble concluding that it is a unicode file. As I wrote in an earlier post, I'm leaning toward it being a text file that has 6 Ascii(0) characters inserted into it. The question remains: is there a way to have Excel import these characters rather than completely ignoring them? -Tony "NickHK" wrote: Tony, The "special characters" I see in your uploaded file are Asc(0). It is not a Unicode file. As for the "0" for every other cell, that is expected if you have all ANSI text stored in a UNICODE format. The lower byte will always be 0 as no values exceed decimal 255 or FF hex. So do you have a Unicode file or not ? NickHK "T_o_n_y" wrote in message ... Tom, I'm perplexed at your response because, I'm about as far from ignoring your posts as possible. Indeed, I generally skip directly to your posts when on this newsgroup since they are more helpful than anyone's, containing actual sample code that can be used. It's just that the output from the macro you sent me led me to the conclusion that you were mistaken this time. The output shows "0" for every other cell which is not what I would expect from UNICODE with 2 bytes per character. Furthermore, rather than revealing the presence of the special characters, your macro also had them stripped away. Here's what I mean. The file I've uploaded contains the following in the first line, $$158++yyyyyy++1++8++4.50 etc... I've substituting + for spaces and y for the special characters above. As you can see there are 2 spaces followed by 6 special characters followed by 2 spaces. The output from your macro completely omits the 6 special characters, if I'm reading it correctly. As I wrote, I spent "anoter few hours" researching into UNICODE in order to investigate the possibility you raised...but nothing I found seemed to confirm it. In addition, Excel has two different UNICODE types (UTF-8 and UTF-7) which one can select in the text import wizard. I tried both of them and neither gave me success in importing the special characters, as judged by using c pearson's CellView add-in, which allows character by character visualization of cell contents. Thank you again for your help, -Tony "Tom Ogilvy" wrote: Guess it was a waste of time trying to explain it to you. Did you bother to read it? -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I tried your macro, but unfortunately Excel still did not import the special characters. Recall that there are 6 special characters between the $$158 and the 1 8 in the first line of the file: $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 For that section, the output from your macro looked like this: 36 = $ 0 = 36 = $ 0 = 49 = 1 0 = 53 = 5 0 = 56 = 8 0 = 32 = 0 = 32 = 0 = 32 = 0 = 32 = 0 = 49 = 1 0 = 32 = 0 = 32 = 0 = 56 = 8 In other words, the 6 characters got stripped away again so that all you see are the 2 spaces which appear on either side of the 6 special characters. The only way I've found for Excel to even recognize that those characters exist is to use the "Delimited" option during text import and specify "spaces" as the delimiting character with the "Treat consecutive delimiters as one" feature unchecked. Unfortunately, that method of importing would mean a huge rework of my existing code. I spent another few hours trying to research the UNICODE possibilty you mentioned, but still was unable to come up with anything. At a loss... -Tony "Tom Ogilvy" wrote: put this in a workbook. Change the path to point to your file: Sub ReadStraightTextFile() Dim strTest As String Dim bytArray() As Byte Dim intcount As Integer Dim col As Long Open "E:\Data1\W158.DAT" For Input As #1 col = 0 Do While Not EOF(1) Line Input #1, strTest col = col + 1 bytArray = strTest i = 0 For intcount = LBound(bytArray) To UBound(bytArray) i = i + 1 Cells(i, col) = bytArray(intcount) & " = " & Chr(bytArray(intcount)) Next Loop 'Close the file Close #1 End Sub Have blank sheet as the activesheet. Run the macro. It appears to me that the file is UNICODE. unlike an ascii file that has one byte per character, a unicode file has two bytes per character. there are 8 bits to a byte, so an ascii file can have 8^2 = 256 different/unique character codes. In a unicode file, 2 bytes is 16 bits, so 2^16 = 65536 possible unique characters. I didn't see any actual characters that couldn't be represented by Ascii, so you could read every Odd character . It appears that opening it in Excel automatically converts it to Ascii, so you haven't lost any information, but if you want to edit it and write it back out, you would need to save it as Unicode Text. I know that is an option in at least xl2000 and I assume later. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... Tom, Thank you for your reply. I followed your procedure but only got four "32" s in that blank section; that is, there are only 4 spaces there. This confirms what I've suspected, namely, that Excel is simply not importing those characters. I've also tried using C. Pearson's Cell View Add-in with the same result (http://www.cpearson.com/excel/CellView.htm). As you point out, the characters also get stripped when I cut and paste into this forum. Therefore, I've emailed you separately the file I referred to as an attachment (it's a text document called W158.DAT) sent from myother_acct. If I knew how to post it to this forum, I would. I appreaciate your help...this is a frustrating problem for me. Is there a way to import the text file character by character? -Tony "Tom Ogilvy" wrote: put your string in cell A1. Then in B1 or another cell in the first row put in this formula =CODE(MID($A$1,ROW(),1)) Assume the above formula is in B1 in C1: =CHAR(B1) now select B1:C1 and drag fill down until the formula starts returning #Value errors. The only thing between the characters in your post are ascii code 32 which is a space. Possibly they didn't get carried forward in the email. -- |
#23
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Tony,
To continue using your current, working solution, why not pre-process these problem files by replacing the nulls with spaces. If it is only those 6 bytes you need to change, using something with: Open Yourfile For Binary as #freefile Replace the "00"s with "20" '&H20=decimal 32, [SPACE] Close #freefile NickHK "T_o_n_y" wrote in message ... Nick, The presence of the null characters is important because I'm importing data from thousands of files like these using an Excel VBA routine. The data within these files is not well-organized by today's standards, but it can be extracted using string functions such as MID(,,) as long as it is known where particular fields begin and end. A series of FORTRAN programs has been used for 30-40 years to manipulate these files, and now the company wants to modernize. Up until now, my Excel VBA routine has successfully extracted the data from thousands of these files so that it can be placed into Excel format. Except for certain files; the ones having this problem. Apparently a FORTRAN programmer inadvertantly placed the null characters into certain of these files while developing certain aspects of the programs. If the null characters were spaces instead of null characters there would be little problem since the MID(,,) function would still collect the following fields in the string correctly. But the null characters being stripped away means that everything past that point gets offset by 6 characters, causing things like string conversion to doubles, for example, to cause the program to crash. If I can't find a way to keep Excel from stripping the nulls, I suppose I'll work out an error trapping routine that tells the user to manually fix that file. I just hope it doesn't amount to hundreds of files! -Tony "NickHK" wrote: Tony, Probably, but what use are 6 x null characters ? Do they actually mean anything to you ? Or are they some garbage that your Fortran app outputs ? NickHK "T_o_n_y" wrote in message ... Nick, What you've written makes sense to me, and that is why I've had trouble concluding that it is a unicode file. As I wrote in an earlier post, I'm leaning toward it being a text file that has 6 Ascii(0) characters inserted into it. The question remains: is there a way to have Excel import these characters rather than completely ignoring them? -Tony "NickHK" wrote: Tony, The "special characters" I see in your uploaded file are Asc(0). It is not a Unicode file. As for the "0" for every other cell, that is expected if you have all ANSI text stored in a UNICODE format. The lower byte will always be 0 as no values exceed decimal 255 or FF hex. So do you have a Unicode file or not ? NickHK "T_o_n_y" wrote in message ... Tom, I'm perplexed at your response because, I'm about as far from ignoring your posts as possible. Indeed, I generally skip directly to your posts when on this newsgroup since they are more helpful than anyone's, containing actual sample code that can be used. It's just that the output from the macro you sent me led me to the conclusion that you were mistaken this time. The output shows "0" for every other cell which is not what I would expect from UNICODE with 2 bytes per character. Furthermore, rather than revealing the presence of the special characters, your macro also had them stripped away. Here's what I mean. The file I've uploaded contains the following in the first line, $$158++yyyyyy++1++8++4.50 etc... I've substituting + for spaces and y for the special characters above. As you can see there are 2 spaces followed by 6 special characters followed by 2 spaces. The output from your macro completely omits the 6 special characters, if I'm reading it correctly. As I wrote, I spent "anoter few hours" researching into UNICODE in order to investigate the possibility you raised...but nothing I found seemed to confirm it. In addition, Excel has two different UNICODE types (UTF-8 and UTF-7) which one can select in the text import wizard. I tried both of them and neither gave me success in importing the special characters, as judged by using c pearson's CellView add-in, which allows character by character visualization of cell contents. Thank you again for your help, -Tony "Tom Ogilvy" wrote: Guess it was a waste of time trying to explain it to you. Did you bother to read it? -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I tried your macro, but unfortunately Excel still did not import the special characters. Recall that there are 6 special characters between the $$158 and the 1 8 in the first line of the file: $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 For that section, the output from your macro looked like this: 36 = $ 0 = 36 = $ 0 = 49 = 1 0 = 53 = 5 0 = 56 = 8 0 = 32 = 0 = 32 = 0 = 32 = 0 = 32 = 0 = 49 = 1 0 = 32 = 0 = 32 = 0 = 56 = 8 In other words, the 6 characters got stripped away again so that all you see are the 2 spaces which appear on either side of the 6 special characters. The only way I've found for Excel to even recognize that those characters exist is to use the "Delimited" option during text import and specify "spaces" as the delimiting character with the "Treat consecutive delimiters as one" feature unchecked. Unfortunately, that method of importing would mean a huge rework of my existing code. I spent another few hours trying to research the UNICODE possibilty you mentioned, but still was unable to come up with anything. At a loss... -Tony "Tom Ogilvy" wrote: put this in a workbook. Change the path to point to your file: Sub ReadStraightTextFile() Dim strTest As String Dim bytArray() As Byte Dim intcount As Integer Dim col As Long Open "E:\Data1\W158.DAT" For Input As #1 col = 0 Do While Not EOF(1) Line Input #1, strTest col = col + 1 bytArray = strTest i = 0 For intcount = LBound(bytArray) To UBound(bytArray) i = i + 1 Cells(i, col) = bytArray(intcount) & " = " & Chr(bytArray(intcount)) Next Loop 'Close the file Close #1 End Sub Have blank sheet as the activesheet. Run the macro. It appears to me that the file is UNICODE. unlike an ascii file that has one byte per character, a unicode file has two bytes per character. there are 8 bits to a byte, so an ascii file can have 8^2 = 256 different/unique character codes. In a unicode file, 2 bytes is 16 bits, so 2^16 = 65536 possible unique characters. I didn't see any actual characters that couldn't be represented by Ascii, so you could read every Odd character . It appears that opening it in Excel automatically converts it to Ascii, so you haven't lost any information, but if you want to edit it and write it back out, you would need to save it as Unicode Text. I know that is an option in at least xl2000 and I assume later. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... Tom, Thank you for your reply. I followed your procedure but only got four "32" s in that blank section; that is, there are only 4 spaces there. This confirms what I've suspected, namely, that Excel is simply not importing those characters. I've also tried using C. Pearson's Cell View Add-in with the same result (http://www.cpearson.com/excel/CellView.htm). As you point out, the characters also get stripped when I cut and paste into this forum. Therefore, I've emailed you separately the file I referred to as an attachment (it's a text document called W158.DAT) sent from myother_acct. If I knew how to post it to this forum, I would. I appreaciate your help...this is a frustrating problem for me. Is there a way to import the text file character by character? -Tony "Tom Ogilvy" wrote: put your string in cell A1. Then in B1 or another cell in the first row put in this formula =CODE(MID($A$1,ROW(),1)) Assume the above formula is in B1 in C1: =CHAR(B1) now select B1:C1 and drag fill down until the formula starts returning #Value errors. The only thing between the characters in your post are ascii code 32 which is a space. Possibly they didn't get carried forward in the email. -- |
#24
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Tony,
After investigating the contents of the sample data file, I conclude that the file contains single byte characters--Unicode is not involved. Further, the file does contain nulls as suggested by others. The first few characters of the first line of the data file a $$158<s<s<nl<nl<nl<nl<nl<nl<s<s1<s<s8 where <s represents the space character and <nl represents the null character. Of course, the decimal equivalent of the space character is 32 and the decimal equivalent of the null character is 0. The decimal equivalent of a y with double dots above it (this letter is called a "Latin small letter y with diaeresis" per Character Map) is 255 (hex FF). I don't understand why Word uses this character to display nulls, but I guess, that's not relevant, but did lead to wrong conclusions. I agree with NickHK. Since you need the character positions maintained, modify your macro to replace the null characters with spaces then place the resulting string into your Excel worksheet. Have fun! Art NickHK wrote: Tony, To continue using your current, working solution, why not pre-process these problem files by replacing the nulls with spaces. If it is only those 6 bytes you need to change, using something with: Open Yourfile For Binary as #freefile Replace the "00"s with "20" '&H20=decimal 32, [SPACE] Close #freefile NickHK "T_o_n_y" wrote in message ... Nick, The presence of the null characters is important because I'm importing data from thousands of files like these using an Excel VBA routine. The data within these files is not well-organized by today's standards, but it can be extracted using string functions such as MID(,,) as long as it is known where particular fields begin and end. A series of FORTRAN programs has been used for 30-40 years to manipulate these files, and now the company wants to modernize. Up until now, my Excel VBA routine has successfully extracted the data from thousands of these files so that it can be placed into Excel format. Except for certain files; the ones having this problem. Apparently a FORTRAN programmer inadvertantly placed the null characters into certain of these files while developing certain aspects of the programs. If the null characters were spaces instead of null characters there would be little problem since the MID(,,) function would still collect the following fields in the string correctly. But the null characters being stripped away means that everything past that point gets offset by 6 characters, causing things like string conversion to doubles, for example, to cause the program to crash. If I can't find a way to keep Excel from stripping the nulls, I suppose I'll work out an error trapping routine that tells the user to manually fix that file. I just hope it doesn't amount to hundreds of files! -Tony "NickHK" wrote: Tony, Probably, but what use are 6 x null characters ? Do they actually mean anything to you ? Or are they some garbage that your Fortran app outputs ? NickHK "T_o_n_y" wrote in message ... Nick, What you've written makes sense to me, and that is why I've had trouble concluding that it is a unicode file. As I wrote in an earlier post, I'm leaning toward it being a text file that has 6 Ascii(0) characters inserted into it. The question remains: is there a way to have Excel import these characters rather than completely ignoring them? -Tony "NickHK" wrote: Tony, The "special characters" I see in your uploaded file are Asc(0). It is not a Unicode file. As for the "0" for every other cell, that is expected if you have all ANSI text stored in a UNICODE format. The lower byte will always be 0 as no values exceed decimal 255 or FF hex. So do you have a Unicode file or not ? NickHK "T_o_n_y" wrote in message ... Tom, I'm perplexed at your response because, I'm about as far from ignoring your posts as possible. Indeed, I generally skip directly to your posts when on this newsgroup since they are more helpful than anyone's, containing actual sample code that can be used. It's just that the output from the macro you sent me led me to the conclusion that you were mistaken this time. The output shows "0" for every other cell which is not what I would expect from UNICODE with 2 bytes per character. Furthermore, rather than revealing the presence of the special characters, your macro also had them stripped away. Here's what I mean. The file I've uploaded contains the following in the first line, $$158++yyyyyy++1++8++4.50 etc... I've substituting + for spaces and y for the special characters above. As you can see there are 2 spaces followed by 6 special characters followed by 2 spaces. The output from your macro completely omits the 6 special characters, if I'm reading it correctly. As I wrote, I spent "anoter few hours" researching into UNICODE in order to investigate the possibility you raised...but nothing I found seemed to confirm it. In addition, Excel has two different UNICODE types (UTF-8 and UTF-7) which one can select in the text import wizard. I tried both of them and neither gave me success in importing the special characters, as judged by using c pearson's CellView add-in, which allows character by character visualization of cell contents. Thank you again for your help, -Tony "Tom Ogilvy" wrote: Guess it was a waste of time trying to explain it to you. Did you bother to read it? -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I tried your macro, but unfortunately Excel still did not import the special characters. Recall that there are 6 special characters between the $$158 and the 1 8 in the first line of the file: $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 For that section, the output from your macro looked like this: 36 = $ 0 = 36 = $ 0 = 49 = 1 0 = 53 = 5 0 = 56 = 8 0 = 32 = 0 = 32 = 0 = 32 = 0 = 32 = 0 = 49 = 1 0 = 32 = 0 = 32 = 0 = 56 = 8 In other words, the 6 characters got stripped away again so that all you see are the 2 spaces which appear on either side of the 6 special characters. The only way I've found for Excel to even recognize that those characters exist is to use the "Delimited" option during text import and specify "spaces" as the delimiting character with the "Treat consecutive delimiters as one" feature unchecked. Unfortunately, that method of importing would mean a huge rework of my existing code. I spent another few hours trying to research the UNICODE possibilty you mentioned, but still was unable to come up with anything. At a loss... -Tony "Tom Ogilvy" wrote: put this in a workbook. Change the path to point to your file: Sub ReadStraightTextFile() Dim strTest As String Dim bytArray() As Byte Dim intcount As Integer Dim col As Long Open "E:\Data1\W158.DAT" For Input As #1 col = 0 Do While Not EOF(1) Line Input #1, strTest col = col + 1 bytArray = strTest i = 0 For intcount = LBound(bytArray) To UBound(bytArray) i = i + 1 Cells(i, col) = bytArray(intcount) & " = " & Chr(bytArray(intcount)) Next Loop 'Close the file Close #1 End Sub Have blank sheet as the activesheet. Run the macro. It appears to me that the file is UNICODE. unlike an ascii file that has one byte per character, a unicode file has two bytes per character. there are 8 bits to a byte, so an ascii file can have 8^2 = 256 different/unique character codes. In a unicode file, 2 bytes is 16 bits, so 2^16 = 65536 possible unique characters. I didn't see any actual characters that couldn't be represented by Ascii, so you could read every Odd character . It appears that opening it in Excel automatically converts it to Ascii, so you haven't lost any information, but if you want to edit it and write it back out, you would need to save it as Unicode Text. I know that is an option in at least xl2000 and I assume later. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... Tom, Thank you for your reply. I followed your procedure but only got four "32" s in that blank section; that is, there are only 4 spaces there. This confirms what I've suspected, namely, that Excel is simply not importing those characters. I've also tried using C. Pearson's Cell View Add-in with the same result (http://www.cpearson.com/excel/CellView.htm). As you point out, the characters also get stripped when I cut and paste into this forum. Therefore, I've emailed you separately the file I referred to as an attachment (it's a text document called W158.DAT) sent from myother_acct. If I knew how to post it to this forum, I would. I appreaciate your help...this is a frustrating problem for me. Is there a way to import the text file character by character? -Tony "Tom Ogilvy" wrote: put your string in cell A1. Then in B1 or another cell in the first row put in this formula =CODE(MID($A$1,ROW(),1)) Assume the above formula is in B1 in C1: =CHAR(B1) now select B1:C1 and drag fill down until the formula starts returning #Value errors. The only thing between the characters in your post are ascii code 32 which is a space. Possibly they didn't get carried forward in the email. -- |
#25
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Just for grins put the following Word macro into a blank document and
see what Word does with all characters whose value is from 0 to 255. I used line continuation hopefully to prevent word wrap issues. Sub ASCIICharSet() Selection.WholeStory Selection.TypeText _ Text:= _ "The data presented below represents all single " & _ "byte characters from 0 to 255 where the format " & _ "presented below is the hexadecimal equivalent of " & _ "the character, followed by the decimal equivalent " & _ "followed by the ASCII character. It might be of " & _ "interest to toggle Show All to see how Word's " & _ "display of certain characters changes. " & _ "Also note that certain characters perform Word " & _ "formatting functions (e.g., 0x0C, which is called " & _ "formfeed, causes a page break)." & vbCr & vbCr & vbCr For i = 0 To 255 Selection.TypeText _ Text:= _ "(0x" & Format(Hex(i), "00") & _ ", " & Format(i, "000") & ")" & Chr(i) Selection.MoveEnd Next End Sub Art Art H wrote: Tony, After investigating the contents of the sample data file, I conclude that the file contains single byte characters--Unicode is not involved. Further, the file does contain nulls as suggested by others. The first few characters of the first line of the data file a $$158<s<s<nl<nl<nl<nl<nl<nl<s<s1<s<s8 where <s represents the space character and <nl represents the null character. Of course, the decimal equivalent of the space character is 32 and the decimal equivalent of the null character is 0. The decimal equivalent of a y with double dots above it (this letter is called a "Latin small letter y with diaeresis" per Character Map) is 255 (hex FF). I don't understand why Word uses this character to display nulls, but I guess, that's not relevant, but did lead to wrong conclusions. I agree with NickHK. Since you need the character positions maintained, modify your macro to replace the null characters with spaces then place the resulting string into your Excel worksheet. Have fun! Art NickHK wrote: Tony, To continue using your current, working solution, why not pre-process these problem files by replacing the nulls with spaces. If it is only those 6 bytes you need to change, using something with: Open Yourfile For Binary as #freefile Replace the "00"s with "20" '&H20=decimal 32, [SPACE] Close #freefile NickHK "T_o_n_y" wrote in message ... Nick, The presence of the null characters is important because I'm importing data from thousands of files like these using an Excel VBA routine. The data within these files is not well-organized by today's standards, but it can be extracted using string functions such as MID(,,) as long as it is known where particular fields begin and end. A series of FORTRAN programs has been used for 30-40 years to manipulate these files, and now the company wants to modernize. Up until now, my Excel VBA routine has successfully extracted the data from thousands of these files so that it can be placed into Excel format. Except for certain files; the ones having this problem. Apparently a FORTRAN programmer inadvertantly placed the null characters into certain of these files while developing certain aspects of the programs. If the null characters were spaces instead of null characters there would be little problem since the MID(,,) function would still collect the following fields in the string correctly. But the null characters being stripped away means that everything past that point gets offset by 6 characters, causing things like string conversion to doubles, for example, to cause the program to crash. If I can't find a way to keep Excel from stripping the nulls, I suppose I'll work out an error trapping routine that tells the user to manually fix that file. I just hope it doesn't amount to hundreds of files! -Tony "NickHK" wrote: Tony, Probably, but what use are 6 x null characters ? Do they actually mean anything to you ? Or are they some garbage that your Fortran app outputs ? NickHK "T_o_n_y" wrote in message ... Nick, What you've written makes sense to me, and that is why I've had trouble concluding that it is a unicode file. As I wrote in an earlier post, I'm leaning toward it being a text file that has 6 Ascii(0) characters inserted into it. The question remains: is there a way to have Excel import these characters rather than completely ignoring them? -Tony "NickHK" wrote: Tony, The "special characters" I see in your uploaded file are Asc(0). It is not a Unicode file. As for the "0" for every other cell, that is expected if you have all ANSI text stored in a UNICODE format. The lower byte will always be 0 as no values exceed decimal 255 or FF hex. So do you have a Unicode file or not ? NickHK "T_o_n_y" wrote in message ... Tom, I'm perplexed at your response because, I'm about as far from ignoring your posts as possible. Indeed, I generally skip directly to your posts when on this newsgroup since they are more helpful than anyone's, containing actual sample code that can be used. It's just that the output from the macro you sent me led me to the conclusion that you were mistaken this time. The output shows "0" for every other cell which is not what I would expect from UNICODE with 2 bytes per character. Furthermore, rather than revealing the presence of the special characters, your macro also had them stripped away. Here's what I mean. The file I've uploaded contains the following in the first line, $$158++yyyyyy++1++8++4.50 etc... I've substituting + for spaces and y for the special characters above. As you can see there are 2 spaces followed by 6 special characters followed by 2 spaces. The output from your macro completely omits the 6 special characters, if I'm reading it correctly. As I wrote, I spent "anoter few hours" researching into UNICODE in order to investigate the possibility you raised...but nothing I found seemed to confirm it. In addition, Excel has two different UNICODE types (UTF-8 and UTF-7) which one can select in the text import wizard. I tried both of them and neither gave me success in importing the special characters, as judged by using c pearson's CellView add-in, which allows character by character visualization of cell contents. Thank you again for your help, -Tony "Tom Ogilvy" wrote: Guess it was a waste of time trying to explain it to you. Did you bother to read it? -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I tried your macro, but unfortunately Excel still did not import the special characters. Recall that there are 6 special characters between the $$158 and the 1 8 in the first line of the file: $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 For that section, the output from your macro looked like this: 36 = $ 0 = 36 = $ 0 = 49 = 1 0 = 53 = 5 0 = 56 = 8 0 = 32 = 0 = 32 = 0 = 32 = 0 = 32 = 0 = 49 = 1 0 = 32 = 0 = 32 = 0 = 56 = 8 In other words, the 6 characters got stripped away again so that all you see are the 2 spaces which appear on either side of the 6 special characters. The only way I've found for Excel to even recognize that those characters exist is to use the "Delimited" option during text import and specify "spaces" as the delimiting character with the "Treat consecutive delimiters as one" feature unchecked. Unfortunately, that method of importing would mean a huge rework of my existing code. I spent another few hours trying to research the UNICODE possibilty you mentioned, but still was unable to come up with anything. At a loss... -Tony "Tom Ogilvy" wrote: put this in a workbook. Change the path to point to your file: Sub ReadStraightTextFile() Dim strTest As String Dim bytArray() As Byte Dim intcount As Integer Dim col As Long Open "E:\Data1\W158.DAT" For Input As #1 col = 0 Do While Not EOF(1) Line Input #1, strTest col = col + 1 bytArray = strTest i = 0 For intcount = LBound(bytArray) To UBound(bytArray) i = i + 1 Cells(i, col) = bytArray(intcount) & " = " & Chr(bytArray(intcount)) Next Loop 'Close the file Close #1 End Sub Have blank sheet as the activesheet. Run the macro. It appears to me that the file is UNICODE. unlike an ascii file that has one byte per character, a unicode file has two bytes per character. there are 8 bits to a byte, so an ascii file can have 8^2 = 256 different/unique character codes. In a unicode file, 2 bytes is 16 bits, so 2^16 = 65536 possible unique characters. I didn't see any actual characters that couldn't be represented by Ascii, so you could read every Odd character . It appears that opening it in Excel automatically converts it to Ascii, so you haven't lost any information, but if you want to edit it and write it back out, you would need to save it as Unicode Text. I know that is an option in at least xl2000 and I assume later. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... Tom, Thank you for your reply. I followed your procedure but only got four "32" s in that blank section; that is, there are only 4 spaces there. This confirms what I've suspected, namely, that Excel is simply not importing those characters. I've also tried using C. Pearson's Cell View Add-in with the same result (http://www.cpearson.com/excel/CellView.htm). As you point out, the characters also get stripped when I cut and paste into this forum. Therefore, I've emailed you separately the file I referred to as an attachment (it's a text document called W158.DAT) sent from myother_acct. If I knew how to post it to this forum, I would. I appreaciate your help...this is a frustrating problem for me. Is there a way to import the text file character by character? -Tony "Tom Ogilvy" wrote: put your string in cell A1. Then in B1 or another cell in the first row put in this formula =CODE(MID($A$1,ROW(),1)) Assume the above formula is in B1 in C1: =CHAR(B1) now select B1:C1 and drag fill down until the formula starts returning #Value errors. The only thing between the characters in your post are ascii code 32 which is a space. Possibly they didn't get carried forward in the email. -- |
#26
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
I will eat crow on this one Tony. I was incorrect and misinterpreted the
results and I apologize for suggesting an incorrect solution and sticking with it. This code will show you the content of your file: Sub Test1() Dim b As Byte Dim rw As Long Open "E:\Data1\W158.DAT" For Binary Access Read As #1 Do While Not EOF(1) Get #1, , b rw = rw + 1 Cells(rw, 1) = b Loop Close #1 End Sub My problem was forgetting that Visual Basic/VBA internally converts strings to Unicode for processing. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... Tom, I did what you suggested, typing spaces into Excel cells, saving as UNICODE, and then opening it in Word. I did the same thing with other samples of number and text. Since null characters are bytes full of zeros, it seems to me that one would inevitably see the null characters THROUGHOUT any file that was saved as unicode (since low code values will always have zeros in the front). By contrast, the file that I've uploaded contains the null character symbols in only one location. Unless it is possible for a file to be formatted as both non-unicode and unicode (I'm not even sure what that would mean), I don't see how this could be a unicode file. I admit that my grasp of this is incomplete at best, but it seems most likely to me that what is going on here is that null ascii characters were at one time inserted into these text files and now, for some reason, Excel strips them away when importing. If Excel didn't strip them away, I would have the problem, and it's remarkable to me that you can't even import nulls into a string. I also attempted to insert a null character into an Excel cell using the INSERT--Symbol feature, but was unable to access null character. It seems Excel refuses to have this character. -Tony |
#27
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Tom,
Thank you very much. I was able to use your routine to import all the characters of the file, including the null characters. Once again, I appeciate that you post real code that works to solve real problems. Thanks, -Tony "Tom Ogilvy" wrote: I will eat crow on this one Tony. I was incorrect and misinterpreted the results and I apologize for suggesting an incorrect solution and sticking with it. This code will show you the content of your file: Sub Test1() Dim b As Byte Dim rw As Long Open "E:\Data1\W158.DAT" For Binary Access Read As #1 Do While Not EOF(1) Get #1, , b rw = rw + 1 Cells(rw, 1) = b Loop Close #1 End Sub My problem was forgetting that Visual Basic/VBA internally converts strings to Unicode for processing. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... Tom, I did what you suggested, typing spaces into Excel cells, saving as UNICODE, and then opening it in Word. I did the same thing with other samples of number and text. Since null characters are bytes full of zeros, it seems to me that one would inevitably see the null characters THROUGHOUT any file that was saved as unicode (since low code values will always have zeros in the front). By contrast, the file that I've uploaded contains the null character symbols in only one location. Unless it is possible for a file to be formatted as both non-unicode and unicode (I'm not even sure what that would mean), I don't see how this could be a unicode file. I admit that my grasp of this is incomplete at best, but it seems most likely to me that what is going on here is that null ascii characters were at one time inserted into these text files and now, for some reason, Excel strips them away when importing. If Excel didn't strip them away, I would have the problem, and it's remarkable to me that you can't even import nulls into a string. I also attempted to insert a null character into an Excel cell using the INSERT--Symbol feature, but was unable to access null character. It seems Excel refuses to have this character. -Tony |
#28
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Art,
Although I agree with you and Nick that preprocessing the files to replace null characters would solve the problem, Tom Ogilvy has now provided a solution that will work nicely with my current code. Thank you for all the help. "Art H" wrote: Tony, After investigating the contents of the sample data file, I conclude that the file contains single byte characters--Unicode is not involved. Further, the file does contain nulls as suggested by others. The first few characters of the first line of the data file a $$158<s<s<nl<nl<nl<nl<nl<nl<s<s1<s<s8 where <s represents the space character and <nl represents the null character. Of course, the decimal equivalent of the space character is 32 and the decimal equivalent of the null character is 0. The decimal equivalent of a y with double dots above it (this letter is called a "Latin small letter y with diaeresis" per Character Map) is 255 (hex FF). I don't understand why Word uses this character to display nulls, but I guess, that's not relevant, but did lead to wrong conclusions. I agree with NickHK. Since you need the character positions maintained, modify your macro to replace the null characters with spaces then place the resulting string into your Excel worksheet. Have fun! Art NickHK wrote: Tony, To continue using your current, working solution, why not pre-process these problem files by replacing the nulls with spaces. If it is only those 6 bytes you need to change, using something with: Open Yourfile For Binary as #freefile Replace the "00"s with "20" '&H20=decimal 32, [SPACE] Close #freefile NickHK "T_o_n_y" wrote in message ... Nick, The presence of the null characters is important because I'm importing data from thousands of files like these using an Excel VBA routine. The data within these files is not well-organized by today's standards, but it can be extracted using string functions such as MID(,,) as long as it is known where particular fields begin and end. A series of FORTRAN programs has been used for 30-40 years to manipulate these files, and now the company wants to modernize. Up until now, my Excel VBA routine has successfully extracted the data from thousands of these files so that it can be placed into Excel format. Except for certain files; the ones having this problem. Apparently a FORTRAN programmer inadvertantly placed the null characters into certain of these files while developing certain aspects of the programs. If the null characters were spaces instead of null characters there would be little problem since the MID(,,) function would still collect the following fields in the string correctly. But the null characters being stripped away means that everything past that point gets offset by 6 characters, causing things like string conversion to doubles, for example, to cause the program to crash. If I can't find a way to keep Excel from stripping the nulls, I suppose I'll work out an error trapping routine that tells the user to manually fix that file. I just hope it doesn't amount to hundreds of files! -Tony "NickHK" wrote: Tony, Probably, but what use are 6 x null characters ? Do they actually mean anything to you ? Or are they some garbage that your Fortran app outputs ? NickHK "T_o_n_y" wrote in message ... Nick, What you've written makes sense to me, and that is why I've had trouble concluding that it is a unicode file. As I wrote in an earlier post, I'm leaning toward it being a text file that has 6 Ascii(0) characters inserted into it. The question remains: is there a way to have Excel import these characters rather than completely ignoring them? -Tony "NickHK" wrote: Tony, The "special characters" I see in your uploaded file are Asc(0). It is not a Unicode file. As for the "0" for every other cell, that is expected if you have all ANSI text stored in a UNICODE format. The lower byte will always be 0 as no values exceed decimal 255 or FF hex. So do you have a Unicode file or not ? NickHK "T_o_n_y" wrote in message ... Tom, I'm perplexed at your response because, I'm about as far from ignoring your posts as possible. Indeed, I generally skip directly to your posts when on this newsgroup since they are more helpful than anyone's, containing actual sample code that can be used. It's just that the output from the macro you sent me led me to the conclusion that you were mistaken this time. The output shows "0" for every other cell which is not what I would expect from UNICODE with 2 bytes per character. Furthermore, rather than revealing the presence of the special characters, your macro also had them stripped away. Here's what I mean. The file I've uploaded contains the following in the first line, $$158++yyyyyy++1++8++4.50 etc... I've substituting + for spaces and y for the special characters above. As you can see there are 2 spaces followed by 6 special characters followed by 2 spaces. The output from your macro completely omits the 6 special characters, if I'm reading it correctly. As I wrote, I spent "anoter few hours" researching into UNICODE in order to investigate the possibility you raised...but nothing I found seemed to confirm it. In addition, Excel has two different UNICODE types (UTF-8 and UTF-7) which one can select in the text import wizard. I tried both of them and neither gave me success in importing the special characters, as judged by using c pearson's CellView add-in, which allows character by character visualization of cell contents. Thank you again for your help, -Tony "Tom Ogilvy" wrote: Guess it was a waste of time trying to explain it to you. Did you bother to read it? -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I tried your macro, but unfortunately Excel still did not import the special characters. Recall that there are 6 special characters between the $$158 and the 1 8 in the first line of the file: $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 For that section, the output from your macro looked like this: 36 = $ 0 = 36 = $ 0 = 49 = 1 0 = 53 = 5 0 = 56 = 8 0 = 32 = 0 = 32 = 0 = 32 = 0 = 32 = 0 = 49 = 1 0 = 32 = 0 = 32 = 0 = 56 = 8 In other words, the 6 characters got stripped away again so that all you see are the 2 spaces which appear on either side of the 6 special characters. The only way I've found for Excel to even recognize that those characters exist is to use the "Delimited" option during text import and specify "spaces" as the delimiting character with the "Treat consecutive delimiters as one" feature unchecked. Unfortunately, that method of importing would mean a huge rework of my existing code. I spent another few hours trying to research the UNICODE possibilty you mentioned, but still was unable to come up with anything. At a loss... -Tony "Tom Ogilvy" wrote: put this in a workbook. Change the path to point to your file: Sub ReadStraightTextFile() Dim strTest As String Dim bytArray() As Byte Dim intcount As Integer Dim col As Long Open "E:\Data1\W158.DAT" For Input As #1 col = 0 Do While Not EOF(1) Line Input #1, strTest col = col + 1 bytArray = strTest i = 0 For intcount = LBound(bytArray) To UBound(bytArray) i = i + 1 Cells(i, col) = bytArray(intcount) & " = " & Chr(bytArray(intcount)) Next Loop 'Close the file |
#29
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Nick,
Please see my note to Art. I appeciate all the help and I've definitely learned a thing or two on this one. -Tony "NickHK" wrote: Tony, To continue using your current, working solution, why not pre-process these problem files by replacing the nulls with spaces. If it is only those 6 bytes you need to change, using something with: Open Yourfile For Binary as #freefile Replace the "00"s with "20" '&H20=decimal 32, [SPACE] Close #freefile NickHK "T_o_n_y" wrote in message ... Nick, The presence of the null characters is important because I'm importing data from thousands of files like these using an Excel VBA routine. The data within these files is not well-organized by today's standards, but it can be extracted using string functions such as MID(,,) as long as it is known where particular fields begin and end. A series of FORTRAN programs has been used for 30-40 years to manipulate these files, and now the company wants to modernize. Up until now, my Excel VBA routine has successfully extracted the data from thousands of these files so that it can be placed into Excel format. Except for certain files; the ones having this problem. Apparently a FORTRAN programmer inadvertantly placed the null characters into certain of these files while developing certain aspects of the programs. If the null characters were spaces instead of null characters there would be little problem since the MID(,,) function would still collect the following fields in the string correctly. But the null characters being stripped away means that everything past that point gets offset by 6 characters, causing things like string conversion to doubles, for example, to cause the program to crash. If I can't find a way to keep Excel from stripping the nulls, I suppose I'll work out an error trapping routine that tells the user to manually fix that file. I just hope it doesn't amount to hundreds of files! -Tony "NickHK" wrote: Tony, Probably, but what use are 6 x null characters ? Do they actually mean anything to you ? Or are they some garbage that your Fortran app outputs ? NickHK "T_o_n_y" wrote in message ... Nick, What you've written makes sense to me, and that is why I've had trouble concluding that it is a unicode file. As I wrote in an earlier post, I'm leaning toward it being a text file that has 6 Ascii(0) characters inserted into it. The question remains: is there a way to have Excel import these characters rather than completely ignoring them? -Tony "NickHK" wrote: Tony, The "special characters" I see in your uploaded file are Asc(0). It is not a Unicode file. As for the "0" for every other cell, that is expected if you have all ANSI text stored in a UNICODE format. The lower byte will always be 0 as no values exceed decimal 255 or FF hex. So do you have a Unicode file or not ? NickHK "T_o_n_y" wrote in message ... Tom, I'm perplexed at your response because, I'm about as far from ignoring your posts as possible. Indeed, I generally skip directly to your posts when on this newsgroup since they are more helpful than anyone's, containing actual sample code that can be used. It's just that the output from the macro you sent me led me to the conclusion that you were mistaken this time. The output shows "0" for every other cell which is not what I would expect from UNICODE with 2 bytes per character. Furthermore, rather than revealing the presence of the special characters, your macro also had them stripped away. Here's what I mean. The file I've uploaded contains the following in the first line, $$158++yyyyyy++1++8++4.50 etc... I've substituting + for spaces and y for the special characters above. As you can see there are 2 spaces followed by 6 special characters followed by 2 spaces. The output from your macro completely omits the 6 special characters, if I'm reading it correctly. As I wrote, I spent "anoter few hours" researching into UNICODE in order to investigate the possibility you raised...but nothing I found seemed to confirm it. In addition, Excel has two different UNICODE types (UTF-8 and UTF-7) which one can select in the text import wizard. I tried both of them and neither gave me success in importing the special characters, as judged by using c pearson's CellView add-in, which allows character by character visualization of cell contents. Thank you again for your help, -Tony "Tom Ogilvy" wrote: Guess it was a waste of time trying to explain it to you. Did you bother to read it? -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I tried your macro, but unfortunately Excel still did not import the special characters. Recall that there are 6 special characters between the $$158 and the 1 8 in the first line of the file: $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 For that section, the output from your macro looked like this: 36 = $ 0 = 36 = $ 0 = 49 = 1 0 = 53 = 5 0 = 56 = 8 0 = 32 = 0 = 32 = 0 = 32 = 0 = 32 = 0 = 49 = 1 0 = 32 = 0 = 32 = 0 = 56 = 8 In other words, the 6 characters got stripped away again so that all you see are the 2 spaces which appear on either side of the 6 special characters. The only way I've found for Excel to even recognize that those characters exist is to use the "Delimited" option during text import and specify "spaces" as the delimiting character with the "Treat consecutive delimiters as one" feature unchecked. Unfortunately, that method of importing would mean a huge rework of my existing code. I spent another few hours trying to research the UNICODE possibilty you mentioned, but still was unable to come up with anything. At a loss... -Tony "Tom Ogilvy" wrote: put this in a workbook. Change the path to point to your file: Sub ReadStraightTextFile() Dim strTest As String Dim bytArray() As Byte Dim intcount As Integer Dim col As Long Open "E:\Data1\W158.DAT" For Input As #1 col = 0 Do While Not EOF(1) Line Input #1, strTest col = col + 1 bytArray = strTest i = 0 For intcount = LBound(bytArray) To UBound(bytArray) i = i + 1 Cells(i, col) = bytArray(intcount) & " = " & Chr(bytArray(intcount)) Next Loop 'Close the file Close #1 End Sub Have blank sheet as the activesheet. Run the macro. It appears to me that the file is UNICODE. unlike an ascii file that has one byte per character, a unicode file has two bytes per character. there are 8 bits to a byte, so an ascii file can have 8^2 = 256 different/unique character codes. In a unicode file, 2 bytes is 16 bits, so 2^16 = 65536 possible unique characters. I didn't see any actual characters that couldn't be represented by Ascii, so you could read every Odd character . |
#30
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
The Unicode and "Latin small letter y with diaeresis" were red herring in
this. Seems that The OP has his solution now anyway. NickHK "Art H" wrote in message oups.com... Tony, After investigating the contents of the sample data file, I conclude that the file contains single byte characters--Unicode is not involved. Further, the file does contain nulls as suggested by others. The first few characters of the first line of the data file a $$158<s<s<nl<nl<nl<nl<nl<nl<s<s1<s<s8 where <s represents the space character and <nl represents the null character. Of course, the decimal equivalent of the space character is 32 and the decimal equivalent of the null character is 0. The decimal equivalent of a y with double dots above it (this letter is called a "Latin small letter y with diaeresis" per Character Map) is 255 (hex FF). I don't understand why Word uses this character to display nulls, but I guess, that's not relevant, but did lead to wrong conclusions. I agree with NickHK. Since you need the character positions maintained, modify your macro to replace the null characters with spaces then place the resulting string into your Excel worksheet. Have fun! Art NickHK wrote: Tony, To continue using your current, working solution, why not pre-process these problem files by replacing the nulls with spaces. If it is only those 6 bytes you need to change, using something with: Open Yourfile For Binary as #freefile Replace the "00"s with "20" '&H20=decimal 32, [SPACE] Close #freefile NickHK "T_o_n_y" wrote in message ... Nick, The presence of the null characters is important because I'm importing data from thousands of files like these using an Excel VBA routine. The data within these files is not well-organized by today's standards, but it can be extracted using string functions such as MID(,,) as long as it is known where particular fields begin and end. A series of FORTRAN programs has been used for 30-40 years to manipulate these files, and now the company wants to modernize. Up until now, my Excel VBA routine has successfully extracted the data from thousands of these files so that it can be placed into Excel format. Except for certain files; the ones having this problem. Apparently a FORTRAN programmer inadvertantly placed the null characters into certain of these files while developing certain aspects of the programs. If the null characters were spaces instead of null characters there would be little problem since the MID(,,) function would still collect the following fields in the string correctly. But the null characters being stripped away means that everything past that point gets offset by 6 characters, causing things like string conversion to doubles, for example, to cause the program to crash. If I can't find a way to keep Excel from stripping the nulls, I suppose I'll work out an error trapping routine that tells the user to manually fix that file. I just hope it doesn't amount to hundreds of files! -Tony "NickHK" wrote: Tony, Probably, but what use are 6 x null characters ? Do they actually mean anything to you ? Or are they some garbage that your Fortran app outputs ? NickHK "T_o_n_y" wrote in message ... Nick, What you've written makes sense to me, and that is why I've had trouble concluding that it is a unicode file. As I wrote in an earlier post, I'm leaning toward it being a text file that has 6 Ascii(0) characters inserted into it. The question remains: is there a way to have Excel import these characters rather than completely ignoring them? -Tony "NickHK" wrote: Tony, The "special characters" I see in your uploaded file are Asc(0). It is not a Unicode file. As for the "0" for every other cell, that is expected if you have all ANSI text stored in a UNICODE format. The lower byte will always be 0 as no values exceed decimal 255 or FF hex. So do you have a Unicode file or not ? NickHK "T_o_n_y" wrote in message ... Tom, I'm perplexed at your response because, I'm about as far from ignoring your posts as possible. Indeed, I generally skip directly to your posts when on this newsgroup since they are more helpful than anyone's, containing actual sample code that can be used. It's just that the output from the macro you sent me led me to the conclusion that you were mistaken this time. The output shows "0" for every other cell which is not what I would expect from UNICODE with 2 bytes per character. Furthermore, rather than revealing the presence of the special characters, your macro also had them stripped away. Here's what I mean. The file I've uploaded contains the following in the first line, $$158++yyyyyy++1++8++4.50 etc... I've substituting + for spaces and y for the special characters above. As you can see there are 2 spaces followed by 6 special characters followed by 2 spaces. The output from your macro completely omits the 6 special characters, if I'm reading it correctly. As I wrote, I spent "anoter few hours" researching into UNICODE in order to investigate the possibility you raised...but nothing I found seemed to confirm it. In addition, Excel has two different UNICODE types (UTF-8 and UTF-7) which one can select in the text import wizard. I tried both of them and neither gave me success in importing the special characters, as judged by using c pearson's CellView add-in, which allows character by character visualization of cell contents. Thank you again for your help, -Tony "Tom Ogilvy" wrote: Guess it was a waste of time trying to explain it to you. Did you bother to read it? -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... I tried your macro, but unfortunately Excel still did not import the special characters. Recall that there are 6 special characters between the $$158 and the 1 8 in the first line of the file: $$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740 For that section, the output from your macro looked like this: 36 = $ 0 = 36 = $ 0 = 49 = 1 0 = 53 = 5 0 = 56 = 8 0 = 32 = 0 = 32 = 0 = 32 = 0 = 32 = 0 = 49 = 1 0 = 32 = 0 = 32 = 0 = 56 = 8 In other words, the 6 characters got stripped away again so that all you see are the 2 spaces which appear on either side of the 6 special characters. The only way I've found for Excel to even recognize that those characters exist is to use the "Delimited" option during text import and specify "spaces" as the delimiting character with the "Treat consecutive delimiters as one" feature unchecked. Unfortunately, that method of importing would mean a huge rework of my existing code. I spent another few hours trying to research the UNICODE possibilty you mentioned, but still was unable to come up with anything. At a loss... -Tony "Tom Ogilvy" wrote: put this in a workbook. Change the path to point to your file: Sub ReadStraightTextFile() Dim strTest As String Dim bytArray() As Byte Dim intcount As Integer Dim col As Long Open "E:\Data1\W158.DAT" For Input As #1 col = 0 Do While Not EOF(1) Line Input #1, strTest col = col + 1 bytArray = strTest i = 0 For intcount = LBound(bytArray) To UBound(bytArray) i = i + 1 Cells(i, col) = bytArray(intcount) & " = " & Chr(bytArray(intcount)) Next Loop 'Close the file Close #1 End Sub Have blank sheet as the activesheet. Run the macro. It appears to me that the file is UNICODE. unlike an ascii file that has one byte per character, a unicode file has two bytes per character. there are 8 bits to a byte, so an ascii file can have 8^2 = 256 different/unique character codes. In a unicode file, 2 bytes is 16 bits, so 2^16 = 65536 possible unique characters. I didn't see any actual characters that couldn't be represented by Ascii, so you could read every Odd character . It appears that opening it in Excel automatically converts it to Ascii, so you haven't lost any information, but if you want to edit it and write it back out, you would need to save it as Unicode Text. I know that is an option in at least xl2000 and I assume later. -- Regards, Tom Ogilvy "T_o_n_y" wrote in message ... Tom, Thank you for your reply. I followed your procedure but only got four "32" s in that blank section; that is, there are only 4 spaces there. This confirms what I've suspected, namely, that Excel is simply not importing those characters. I've also tried using C. Pearson's Cell View Add-in with the same result (http://www.cpearson.com/excel/CellView.htm). As you point out, the characters also get stripped when I cut and paste into this forum. Therefore, I've emailed you separately the file I referred to as an attachment (it's a text document called W158.DAT) sent from myother_acct. If I knew how to post it to this forum, I would. I appreaciate your help...this is a frustrating problem for me. Is there a way to import the text file character by character? -Tony "Tom Ogilvy" wrote: put your string in cell A1. Then in B1 or another cell in the first row put in this formula =CODE(MID($A$1,ROW(),1)) Assume the above formula is in B1 in C1: =CHAR(B1) now select B1:C1 and drag fill down until the formula starts returning #Value errors. The only thing between the characters in your post are ascii code 32 which is a space. Possibly they didn't get carried forward in the email. -- |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Extract all text without spaces and special characters | Excel Worksheet Functions | |||
Excel Save As... text with special characters | Excel Discussion (Misc queries) | |||
REMOVE SPECIAL CHARACTERS FROM TEXT CELLS | Excel Worksheet Functions | |||
Importing text files with mathematical characters | Excel Discussion (Misc queries) | |||
Convert text with special characters | Excel Programming |