Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 2
Default How can I make Excel save Unicode CSV data correctly?

Excel 2007's Unicode for CSV (comma separated values) support seems sketchy
at best:
- Reads ANSI CSV OK.
- Reads Unicode (UTF-16, UCS-2) CSV OK.
- Regularly garbles UTF-8 CSV when reading (depends on the included
characters).
- Regularly garbles Unicode big-endian CSV when reading (depends on the
included characters).
- Writes ANSI CSV OK.
- Can't write any form of non-ASCII Unicode CSV data OK.

When saving into CSV format it saves in ANSI CSV and breaks any non-ANSI
characters by converting them into ? (question mark) characters.

To replicate:
1. Open notepad and type in the following line of text:
Degrees C („ƒ),foo
You can type the special „ƒ symbol by pressing-and-holding the Alt key while
typing 2103 on the numeric keypad, then release the Alt.
2. File / Save as... into a file named "sample-unicode.csv" and select the
"Unicode" encoding.
3. Open the "sample-unicode.csv" file in Excel. The data is still displayed
correctly.
4. Press F12 to get the "Save as..." window, save into a file named
"sample-excel.csv" in CSV format.

Excel incorrectly writes the following data, omitting any kind of
UTF-8/Unicode/Unicode-BE BOM:
0000: 44 65 67 72 65 65 73 20 43 20 28 3F 29 2C 66 6F Degrees C (?),fo
0010: 6F 0D 0A o..

Correct Unicode UTF-8 output should look like the following:
0000: EF BB BF 44 65 67 72 65 65 73 20 43 20 28 E2 84 ...Degrees C (..
0010: 83 29 2C 66 6F 6F 0D 0A .),foo..
(Note the EF BB BF byte order mark)

Correct Unicode UTF-16/UCS-2 output should look like the following:
0000: FF FE 44 00 65 00 67 00 72 00 65 00 65 00 73 00 ..D.e.g.r.e.e.s.
0010: 20 00 43 00 20 00 28 00 03 21 29 00 2C 00 66 00 .C. .(..!).,.f.
0020: 6F 00 6F 00 0D 00 0A 00 o.o.....
(Note the FF FE byte order mark)

Interestingly, saving from Excel to "Unicode text" format does save into
UTF-16/UCS-2 correctly:
0000: FF FE 44 00 65 00 67 00 72 00 65 00 65 00 73 00
0010: 20 00 43 00 20 00 28 00 03 21 29 00 09 00 66 00
0020: 6F 00 6F 00 0D 00 0A 00

But it converts the comma field separators (\x002c) into tabs (\x0009).

How can I make Excel behave correctly and save into some kind of Unicode CSV
format?

I'm using Microsoft Office Excel 2007 (12.0.6300.5000) SP1 MSO
(12.0.6213.1000).

Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How do I save a data as non-unicode text in am Excell sheet? aina Excel Discussion (Misc queries) 0 March 17th 08 10:31 AM
hyperlinks in excel2002 won't save correctly Alastair Links and Linking in Excel 4 September 25th 07 12:32 PM
How do I make colors appear correctly when linking excel to pub? kris Excel Discussion (Misc queries) 3 September 19th 06 07:52 PM
Save as .xml not displayed correctly Anthony Excel Discussion (Misc queries) 1 September 21st 05 12:01 PM
how i make the excel send data or save it to another bad file [email protected] Excel Worksheet Functions 1 August 31st 05 02:02 PM


All times are GMT +1. The time now is 08:46 PM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"