Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.misc
|
|||
|
|||
How can I make Excel save Unicode CSV data correctly?
Excel 2007's Unicode for CSV (comma separated values) support seems sketchy
at best: - Reads ANSI CSV OK. - Reads Unicode (UTF-16, UCS-2) CSV OK. - Regularly garbles UTF-8 CSV when reading (depends on the included characters). - Regularly garbles Unicode big-endian CSV when reading (depends on the included characters). - Writes ANSI CSV OK. - Can't write any form of non-ASCII Unicode CSV data OK. When saving into CSV format it saves in ANSI CSV and breaks any non-ANSI characters by converting them into ? (question mark) characters. To replicate: 1. Open notepad and type in the following line of text: Degrees C („ƒ),foo You can type the special „ƒ symbol by pressing-and-holding the Alt key while typing 2103 on the numeric keypad, then release the Alt. 2. File / Save as... into a file named "sample-unicode.csv" and select the "Unicode" encoding. 3. Open the "sample-unicode.csv" file in Excel. The data is still displayed correctly. 4. Press F12 to get the "Save as..." window, save into a file named "sample-excel.csv" in CSV format. Excel incorrectly writes the following data, omitting any kind of UTF-8/Unicode/Unicode-BE BOM: 0000: 44 65 67 72 65 65 73 20 43 20 28 3F 29 2C 66 6F Degrees C (?),fo 0010: 6F 0D 0A o.. Correct Unicode UTF-8 output should look like the following: 0000: EF BB BF 44 65 67 72 65 65 73 20 43 20 28 E2 84 ...Degrees C (.. 0010: 83 29 2C 66 6F 6F 0D 0A .),foo.. (Note the EF BB BF byte order mark) Correct Unicode UTF-16/UCS-2 output should look like the following: 0000: FF FE 44 00 65 00 67 00 72 00 65 00 65 00 73 00 ..D.e.g.r.e.e.s. 0010: 20 00 43 00 20 00 28 00 03 21 29 00 2C 00 66 00 .C. .(..!).,.f. 0020: 6F 00 6F 00 0D 00 0A 00 o.o..... (Note the FF FE byte order mark) Interestingly, saving from Excel to "Unicode text" format does save into UTF-16/UCS-2 correctly: 0000: FF FE 44 00 65 00 67 00 72 00 65 00 65 00 73 00 0010: 20 00 43 00 20 00 28 00 03 21 29 00 09 00 66 00 0020: 6F 00 6F 00 0D 00 0A 00 But it converts the comma field separators (\x002c) into tabs (\x0009). How can I make Excel behave correctly and save into some kind of Unicode CSV format? I'm using Microsoft Office Excel 2007 (12.0.6300.5000) SP1 MSO (12.0.6213.1000). |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
How do I save a data as non-unicode text in am Excell sheet? | Excel Discussion (Misc queries) | |||
hyperlinks in excel2002 won't save correctly | Links and Linking in Excel | |||
How do I make colors appear correctly when linking excel to pub? | Excel Discussion (Misc queries) | |||
Save as .xml not displayed correctly | Excel Discussion (Misc queries) | |||
how i make the excel send data or save it to another bad file | Excel Worksheet Functions |