Home |
Search |
Today's Posts |
#1
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
Sigh, I had forgotten about this old thread and did not see Jerry's
posting before today. I stumbled across it when I rediscovered the problem recently and searched for any relevant threads. Imagine my surprise! (A "senior moment".) But Google Groups does not permit me to simply post a reply to the original thread :-(. On Aug 5 2008, 11:51 am, Jerry W. Lewis wrote: there were millions of floating point numbers that simply could not exist as a constant value in Excel. You could generate them as a formula result, but if you pasted that result as a value, then you lost some of the trailing bits Jerry seems to be suggesting that Excel converts the value to a displayable constant (first 15 significant digits), then copies or pastes that constant. That was my first hypothesis, too, as noted in my original posting. And that rule does seem to apply when pasting into other applications such as Notepad and MSWord. But as I noted in my original posting, I found a counter-example that disproves that theory for copy-and-paste-special-value (CAPSV) in Excel. Specifically, I showed that 0.999999999999972 - 2^-53 CAPSVs exactly. That is, the internal representation of the original and pasted value is &h3FEFFFFF,FFFFFF03 [1]. The cell does display 0.999999999999972 when formatted to 15 sig digits. But its internal representation is different from the internal representation of 0.999999999999972, which is &h3FEFFFFF,FFFFFF04 [3]. The difference is exactly 2^-53, the smallest one-bit incremental difference [2]. And FYI, this behavior (sometimes, but not always, altering the internal representation) also happens when storing into the Value and Value2 properties of Excel range objects in V BA. For example: Dim x As Double, y as Double y = 201 x = 100 * 2.01 Debug.Print "y: "; dbl2bin(y); " "; dbl2dec(y) Debug.Print "x: "; dbl2bin(x); " "; dbl2dec(x) Range("a1") = x Debug.Print "A1: "; dbl2bin(Range("a1")); " "; dbl2dec(Range ("a1")) DBL2BIN and DBL2DEC are my own UDFs that show the exact internal representation in binary and decimal. The Debug.Print statements show that A1 is "y", not "x"; to wit: y: &h40692000,00000000 201 x: &h40691FFF,FFFFFFFF 200.999999999999,971578290569595992565155029296875 A1: &h40692000,00000000 201 I conclude that this behavior is driven by an Excel heuristic that applies to the PasteSpecial xlPasteValues method and the Value and Value2 properties -- and perhaps other similar operations that store into Excel objects. But that heuristic does not apply to UDF results. For example, the following UDF returns exactly "x": Function capsv() Dim x As Double, y as Double y = 201 x = 100 * 2.01 Debug.Print "y: "; dbl2bin(y); " "; dbl2dec(y) Debug.Print "x: "; dbl2bin(x); " "; dbl2dec(x) capsv = x End Function Based on many examples, my best explanation of the heuristic is: if the original value rounded to its 15-sig-digit representation (the "modified value") is represented exactly within 15 sig digits, convert the original value to the modified value. I think that explanation is "close". But, sigh, I found counter- examples that contradict it. First, to demonstrate the heuristic, consider the following, example #1 (based on a random number): y = 0.200653076171871 x = y + 2 ^ -55 y: &h3FC9AEFF,FFFFFF70 0.200653076171871,00319711134943645447492599487304 6875 x: &h3FC9AEFF,FFFFFF71 0.200653076171871,03095268696506536798551678657531 73828125 A1: &h3FC9AEFF,FFFFFF71 0.200653076171871,03095268696506536798551678657531 73828125 Note that A1 is "x", not "y". That fits the rule because 0.200653076171871 cannot be represented exactly within 15 sig digits, as shown. Second, consider example #2: y = 0.200653076171875 x = y + 2 ^ -55 y: &h3FC9AF00,00000000 0.200653076171875 x: &h3FC9AF00,00000001 0.200653076171875,02775557561562891351059079170227 05078125 A1: &h3FC9AF00,00000000 0.200653076171875 A1 is "y", not "x". Again, that fits the rule because 0.200653076171875 is represented exactly within 15 sig digits. My explanation of the heuristic seems to work. But now consider example #3: y = 200.653076171875 x = y + 2 ^ -45 y: &h406914E6,00000000 200.653076171875 x: &h406914E6,00000001 200.653076171875,028421709430404007434844970703125 A1: &h406914E6,00000001 200.653076171875,028421709430404007434844970703125 A1 is "x". I expected "y" because 200.653076171875 is represented exactly within 15 sig digits. Arguably, perhaps the heuristic applies only when the magnitude of the difference between original and modified values is sufficiently small -- something less than 2^-45. But consider counter-example #4: y = 200.640625 x = y + 2 ^ -45 y: &h40691480,00000000 200.640625 x: &h40691480,00000001 200.640625000000,028421709430404007434844970703125 A1: &h40691480,00000000 200.640625 A1 is "y", following the heuristic, despite the fact that the magnitude of the difference is the same as example #3, and the constant "y" in example #3 and #4 are similar in magnitude. (In fact, the example #4 "y" was derived from example #3.) So once again, I am out of theories. Based on other examples, I suspect the heuristic might behave differently under certain conditions, making it that much harder to intuit. But if anyone can demonstrate a viable heuristic, I'm all ears. (Well, eyes. ;-) Endnotes: [1] The form &h3FEFFFFF,FFFFFF03 is my stylized representation of the 64-bit floating-point value in "little endian" order (aka "network order"), which is not the storage order in Pentium-compatible systems. The 12-bit sign and biased exponent are in the first 3 hex characters ("3FE"), and the 52-bit mantissa follows ("F...F03"). The comma separates upper and lower 32 bits. [2] The operative phrase is "one-bit" incremental difference. For example, the two-bit increment 2^-54 + 2^-55 is smaller than 2^-53, but both =0.5 + 2^-53 and =0.5 + (2^-54 + 2^-55) result in the same internal representation. That is, both differ from 0.5 by 2^-53, thanks to the 80-bit FP representation and default FP rounding algorithm implemented in Pentium-compatible FPUs. [3] 0.999999999999972 - 2^-53 is stored exactly as 0.999999999999971,91135747698353952728211879730224 609375. (The comma demarcates 15 significant digits to the left.) It displays as 0.999999999999972; but 0.999999999999972 is stored exactly as 0.999999999999972,02237977944605518132448196411132 8125. ----- original posting ----- On Aug 5 2008, 11:51 am, Jerry W. Lewis wrote: This issue is more common than you might imagine. Prior to 2007, there were millions of floating point numbers that simply could not exist as a constant value in Excel. You could generate them as a formula result, but if you pasted that result as a value, then you lost some of the trailing bits. Excel 2007 allows any valid floating point number to exist as a constant, but that means that the display routines had never been tested for millions of values, some of which did not initially display properly -- values just below 2^16-1 displayed as 100000 and values just below 2^16 displayed as 100001, e.g.http://groups.google.com/group/micro...rashesgpfs/msg... Jerry "joeu2004" wrote: Usually when I copy-and-paste-special-value a numeric value, the target cell gets exactly the same binary value that the original cell had. But I stumbled upon one case (I'm sure there are others) where that is not true. That is, the pasted value has a different binary representation. Under what conditions does this happen? Consider the following two examples. (The binary value is on the right. I use Excel 2003.) 1. Unusual(?) case. A1: 2.00666666666667 A2: =100*round(A1,2) // &h40691FFF,FFFFFFFF A3: 201 // &h40692000,00000000 A4: copy-and-PSV from A2 // &h40692000,00000000 A2 displays the same character repesentation as A3, even in Scientific format with 14 DP. But the binary values differ by 2^-45. As shown, the pasted binary value of A4 is the same as A3, not A2. 2. Usual(?) case. B1: 0.999999999999972 // &h3FEFFFFF,FFFFFF04 B2: =B1 - 2^-53 // &h3FEFFFFF,FFFFFF03 B3: copy-and-PSV from B2 // &h3FEFFFFF,FFFFFF03 Again, B1 and B2 display the same character representation in Scientific with 14 DP. But the binary values differ by 2^-53. Contrary to Example 1, the binary value of B3 is the same as B2, not B1. I say "contrary" because to explain Example 1, I hypothesized that when a value is copied to the Clipboard, Excel converts the binary value to its most precise character representation, then converts from the character representation to binary when we PSV. Example 2 seems to contradict that theory. Every other theory I've hypothesized does not seem to explain when PSV will and will not change the binary value. Example 2 differs by a much smaller amount than Example 1. The difference in Example 2 is even a slightly smaller percentage (about 1.11E-16) of the binary value associated with the character representation than the difference in Example 1 (about 1.414E-E16). Both examples differ in the least-significant bit. And the LSB differs in the same "direction", namely from odd to even, which may be significant when considering the (default) rules for rounding from 80-bit to 64-bit floating-point in the Intel FPU. |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Why PasteSpecialValue doesnt always dupe binary value exactly? | Excel Discussion (Misc queries) | |||
find and return adjacent value (redux)! | Excel Discussion (Misc queries) | |||
De-Dupe Records Based on Two Fields. | Excel Discussion (Misc queries) | |||
how can i de-dupe email lists using excel? | Excel Worksheet Functions | |||
how do i "de-dupe?" rows by a column value? | Excel Discussion (Misc queries) |