![]() |
Reading file saved from Excel as Unicode text
I inherited some Excel code that does this:
Set fso = CreateObject("Scripting.FileSystemObject") strTempFile = fso.GetSpecialFolder(2) & "\" & fso.GetTempName & ".txt" '2 = TemporaryFolder. ActiveWorkbook.SaveAs(Filename:=strTempFile, FileFormat:=xlText) nSourceFile = FreeFile Open strTempFile For Input As #nSourceFile sText = Input$(LOF(1), 1) Close Then, sText is uploaded to a server. That works fine for English, German, and French. But then we added Japanese. Excel's xlText (tab delimited) file format changes all Japanese characters into question marks. So, I changed that line to save as Unicode tab delimited instead: ActiveWorkbook.SaveAs(Filename:=strTempFile, FileFormat:=xlUnicodeText) The file looks great in Notepad; Japanese characters are preserved. But, then I get Error 62 on this line: sText = Input$(LOF(1), 1) So, I changed that section to this: Open strTempFile For Binary As #nSourceFile sText = InputB$(LOF(1), 1) With that, I don't get errors in VBA, but the backend server chokes on extra characters that I guess must be invisible in Notepad. So I tried this: Set oTextStream = fso.OpenTextFile(strTempFile ) sText = oTextStream.ReadAll The result is garbled; where the Japanese characters should be, instead there are random characters like this: K0Q0f0$B!&(Bc0_0$B!&%?(BL0$B!&%f(B0$B!&!V( B0$B%-(B0$B!&%#(B0$B%F(B0$B%M(B0g0 Any suggestions on how I can preserve the Japanese characters without getting extra characters that are invisible in Notepad? Thanks, Greg |
Reading file saved from Excel as Unicode text
"Greg Lovern" wrote: <snip The result is garbled; where the Japanese characters should be, instead there are random characters like this: K0Q0f0・c0_0・タL0・ユ0・「0*0・ィ0テ0 ネ0g0 That result isn't garbled - it looks exactly like a unicode string being displayed in ascii, because it is one. UTF-16 would be a good educated guess. VBA strings are not unicode, they're only ascii (well, 8-bit anyway). So if you're handling UTF-16 strings then the VBA native string functions will be useless to you (this may or may not apply to Excel 2007 strings, I've no idea). You need some unicode aware string tools, I'm not aware of any for vba but that doesn't mean they don't exist. It's also a very complex area and I wouldn't recommend anyone trying to write their own tools to handle it. Put very approximately; UTF-16 works on words (pairs of bytes) to represent a character. For example (byte values shown in hex) a space rendered as 20 in ascii and would be the byte pair 0020 in UTF-16*. A line break might be 000D which vba strings would interpret as two characters 00 (which would display as a block or question mark as it has no ascii character to display) and 0D the carriage return. Using 0D as the end of line marker (as is correct) would leave the vba string handler with an extra byte 00 at the end of the string. Culling this may be enough to get the string through but it's not substitute for proper localisation functions. There's more to it than that but it's enough to understand the scope of the problem. Notepad knows about UTF-16 and a number of other text formats and renders the text correctly, which is why it is fine there. Not sure how much that actually helps but at least you know what's happening... |
Reading file saved from Excel as Unicode text
On Apr 24, 3:31 pm, James Snell
wrote: You need some unicode aware string tools, I'm not aware of any for vba I was under the impression that FileSystemObject is Unicode-aware; is that not correct? proper localisation functions. Are these available in a library, or do you mean rolling your own? Greg |
Reading file saved from Excel as Unicode text
Sorry for the delay - I'm not always around to respond to posts.
FileSystemObject is unicode aware in that you can open a file with a unicode name, the content that is returned by a stream is language neutral (it's just a bytestream) which would need to be handled downstream. By the same token you could oly open a unicode filename if you were able to represent that unicode name in VBA. I've not looked for existing libraries, on further inspection there are some unicode functions available in vba such as ascw() & chrw(). It may be easier to just open the file up as a hidden sheet / workbook. The biggest issue is that you'd have to know what flavour of unicode the server can speak (if any). If that's only ANSI (in which case you'd have to rethink the use of japanese) if it's UTF-7 / UTF-8 then a script like (http://www.planet-source-code.com/vb...44569&lngWId=1) might help. Do you have the info about the server? Once you know what that handles then you might be able to web-search for a conversion script... "Greg Lovern" wrote: On Apr 24, 3:31 pm, James Snell wrote: You need some unicode aware string tools, I'm not aware of any for vba I was under the impression that FileSystemObject is Unicode-aware; is that not correct? proper localisation functions. Are these available in a library, or do you mean rolling your own? Greg |
Reading file saved from Excel as Unicode text
Thanks, we've been banging on this for a few days now and I've posted
a new question that reflects some little progress we've made and more detail on the problem: http://groups.google.com/group/micro...f2fb1df?hl=en# It turns out the server requires Shift-JIS, not Unicode. But we're still trying to go through Unicode, then convert the Unicode to Shift- JIS, because Excel won't save as Shift-JIS. Details in the thread referenced above. I hope you look at that post and comment. Thanks, Greg On Apr 29, 5:10 am, James Snell wrote: Sorry for the delay - I'm not always around to respond to posts. FileSystemObject is unicode aware in that you can open a file with a unicode name, the content that is returned by a stream is language neutral (it's just a bytestream) which would need to be handled downstream. By the same token you could oly open a unicode filename if you were able to represent that unicode name in VBA. I've not looked for existing libraries, on further inspection there are some unicode functions available in vba such as ascw() & chrw(). It may be easier to just open the file up as a hidden sheet / workbook. The biggest issue is that you'd have to know what flavour of unicode the server can speak (if any). If that's only ANSI (in which case you'd have to rethink the use of japanese) if it's UTF-7 / UTF-8 then a script like (http://www.planet-source-code.com/vb...sp?txtCodeId=4...) might help. Do you have the info about the server? Once you know what that handles then you might be able to web-search for a conversion script... "Greg Lovern" wrote: On Apr 24, 3:31 pm, James Snell wrote: You need some unicode aware string tools, I'm not aware of any for vba I was under the impression that FileSystemObject is Unicode-aware; is that not correct? proper localisation functions. Are these available in a library, or do you mean rolling your own? Greg |
All times are GMT +1. The time now is 03:16 AM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com