Home |
Search |
Today's Posts |
#41
Posted to microsoft.public.excel,microsoft.public.excel.programming
|
|||
|
|||
Excel and the Math Coprocessor for DLLs
"Lynn McGuire" wrote:
On 4/2/2012 7:33 PM, joeu2004 wrote: So arguably, it is more reliable to call _control87 (or a DLL routine) from each entry point that allow to be called from VBA. Done. No better calcs. What does that mean exactly? That is, what is chptst each time? At the very least, it should be consistent. Is it? "Lynn McGuire" wrote: The Math Coprocessor status is 0x4020 and control is 0x137f The Math Coprocessor status is is now 0x127f The Math Coprocessor status is 0x4000 and control is 0x127f The Math Coprocessor status is is now 0x127f The Math Coprocessor status is 0x4000 and control is 0x127f The Math Coprocessor status is is now 0x127f I call the following code from each entry point. What does that output represent from the VBA side? That is, are you calling 3 DLL routines (or the same routine 3 times) in a single "execution" (e.g. a single click of CommandButton1_Click)? Or are you doing 3 "executions" (e.g. clicking CommandButton1_Click 3 times)? ----- Finally, as interesting as all this might be academically, what use is this to you practically ultimately? F'getabout these experiments, interesting as they might be? What is the final application? Is your ultimate purpose indeed to ferret the FP behavior of VBA and/or Watcom, fine. Or you doing computations that demand accuracy to 53 bits (mantissa), for example some scientific application? Otherwise, do you understand that none of this should matter to you? (Other than as a curiosity.) Most properly-designed algorithms with FP computations, e.g. business algorithms, should be immune to the details of the precision and rounding in the least-significant bit, unless you are dealing with trillions (US) of dollars, for example. Do you understand that? Do you understand that such "properly-designed" algorithms should explicitly round most numerical computations that involve non-integer values? Is there any reason why that is not a good solution for your purposes? |
#42
Posted to microsoft.public.excel,microsoft.public.excel.programming
|
|||
|
|||
Excel and the Math Coprocessor for DLLs
On 03/04/2012 23:50, Lynn McGuire wrote:
On 4/2/2012 7:33 PM, joeu2004 wrote: "Lynn McGuire" wrote: TestDLL.dll is created using my testdll.c code and the Open Watcom C compiler and linker. Alternatively, as I explained previously, you can call a DLL routine directly from the VBA procedure CommandButton1_Click. The DLL routine would call _control87(_PC53 + _RCNEAR, _MCW_PC + _MCW_RC). That would obviate the need to call _control87 from each DLL routine that you might call from CommandButton1_Click. But you would need to call that DLL routine from every VBA procedure that you initiate, either with an Excel "button" or by calling a VBA function from an Excel formula. So arguably, it is more reliable to call _control87 (or a DLL routine) from each entry point that allow to be called from VBA. Done. No better calcs. OK. There is something odd here. The link map for your DLL suggests that the code includes the FPU emulator emu387.lib code as well (which AFAIR was obsoleted more than a decade ago). It is entirely possible that the 387 emulator cuts enough corners that it cannot compute the FPU test sum right even when asked to do so. Incidentally your compiler has optimised *at compile time* the original division that you think you are testing to a multiplication by the reciprocal. It is a technique called strength reduction. That is (X/Y)*Y - X was compiled as X * (1/Y) * Y - X where the value (1/Y) was computed by the compiler and not by the FPU! (almost any modern optimising compiler will do this because at runtime an FP division is significantly more expensive than a multiply) MS Visual C goes further an spots that the result is identically zero. Optimiser on it compiles to FLDZ. What you are seeing here could well be a defect in how the *compiler* optimises expressions since it will quite likely use a 64bit mantissa here. I now think that is the root cause of your "problem". The Math Coprocessor status is 0x4020 and control is 0x137f The Math Coprocessor status is is now 0x127f The Math Coprocessor status is 0x4000 and control is 0x127f The Math Coprocessor status is is now 0x127f The Math Coprocessor status is 0x4000 and control is 0x127f The Math Coprocessor status is is now 0x127f OK. The math coprocessor is now in the right state. With this change does your fluid mechanics code now give the right expected answers? I call the following code from each entry point. unsigned checkMathCoprocessorStatus () { unsigned old87Status = 0; unsigned old87Control = 0; unsigned new87result = 0; old87Status = _status87 (); old87Control = _control87 (0, 0); //if (old87Status != 0) { char msg [4096]; sprintf (msg, "\nThe Math Coprocessor status is 0x%x and control is 0x%x\n\n", old87Status, old87Control); writeLineToScreen (msg); // new87result = _control87 (_PC_64 + _RC_NEAR, _MCW_PC + _MCW_RC); // new87result = _control87 (_PC_53, _MCW_PC); // new87result = _control87 (_PC_64, _MCW_PC); // new87result = _control87 (_RC_NEAR, _MCW_RC); // _asm { FINIT }; //new87result = _control87 (0x27f, 0xffff); //new87result = _control87 (0, 0); new87result = _control87 (_PC_53 + _RC_NEAR, _MCW_PC + _MCW_RC); sprintf (msg, "\nThe Math Coprocessor status is is now 0x%x\n\n", new87result); writeLineToScreen (msg); } return old87Status; } eg void checkMathCoprocessorStatus (unsigned &old87Status, unsigned &old87Control) { old87Status = _status87 (); old87Control = _control87 (0, 0); } unsigned setMathCoprocessorStatus (unsigned value, unsigned mask) { unsigned old87Control = _control87 (value, mask); return old87Control } To be able to help further strip out every last thing that isn't absolutely essential from the code and make all routines individually callable from VBA. Also disable all compiler optimisations and tell it to assume an FPU is fitted (this alone may fix your problem once the right rounding mode is enabled and that seems to have worked OK). Also make the value "Y" used in the (X/Y)*Y-X expression parameters to the calling routine - that way the compiler cannot pre compute (1/Y) -- Regards, Martin Brown |
#43
Posted to microsoft.public.excel,microsoft.public.excel.programming
|
|||
|
|||
Excel and the Math Coprocessor for DLLs
"Martin Brown" wrote:
On 03/04/2012 01:33, joeu2004 wrote: There is a difference(!) when calling InitTestDLL from XL2003 VBA v. XL2010 VBA. And I misinterpreted the results from XL2003 VBA. [....] This is also the behaviour in XL2007 so it looks like XL2010 is the point where MS went back to using full 64 bit mantissas (80 bit reals). I don't believe so. Normally, XL2003 VBA does compute using _PC_64, just as XL2007 and XL2010 VBA do. However, I did not uncover a defect in XL2003 VBA that is reproducible nearly 100% of the time. Inexplicably, after we import a module into XL2003 VBA (SP3), the FPU control word is set to _PC_53. And apparently it remains in that state until we change it by calling _controlfp ourselves. That does not happen in XL2007 and XL2010. I say "nearly 100% of the time" because I believe there were a couple times out of 20 or 30 when that did not happen; that is, when the precision mode continued to be _PC_64. Be that as it may, that has nothing to do with Lynn's problems. You seem to be doing a great job of disassembling his DLL and ferreting out the root cause. So I will leave you to it. |
#44
Posted to microsoft.public.excel,microsoft.public.excel.programming
|
|||
|
|||
Excel and the Math Coprocessor for DLLs
On 04/04/2012 09:29, joeu2004 wrote:
"Martin Brown" wrote: On 03/04/2012 01:33, joeu2004 wrote: There is a difference(!) when calling InitTestDLL from XL2003 VBA v. XL2010 VBA. And I misinterpreted the results from XL2003 VBA. [....] This is also the behaviour in XL2007 so it looks like XL2010 is the point where MS went back to using full 64 bit mantissas (80 bit reals). I don't believe so. Normally, XL2003 VBA does compute using _PC_64, just as XL2007 and XL2010 VBA do. However, I did not uncover a defect in XL2003 VBA that is reproducible nearly 100% of the time. Inexplicably, after we import a module into XL2003 VBA (SP3), the FPU control word is set to _PC_53. And apparently it remains in that state until we change it by calling _controlfp ourselves. That does not happen in XL2007 and XL2010. That is odd though. For me here on XL2007 it does. I am using XL2007 SP3 here and I see exactly consistent behaviour with control word set to _PC_53 on entry just like with XL2003. It gets the "right" answer the first time through but fails on subsequent goes when the control word has been reset to _PC_64. I think by his application. One way to tell would be to set _PC_24 and see if you get handed back the FPU in that parlous state the second time around. I say "nearly 100% of the time" because I believe there were a couple times out of 20 or 30 when that did not happen; that is, when the precision mode continued to be _PC_64. Be that as it may, that has nothing to do with Lynn's problems. You seem to be doing a great job of disassembling his DLL and ferreting out the root cause. So I will leave you to it. I am curious now as to what is actually going on. The alleged "fault" should have no bearing on the DLLs computational accuracy since _PC_64 maintains additional guard digits beyond what normal _PC_53 REAL*8 computations can manage. The final results may differ slightly but a 64 bit mantissa should give better results! -- Regards, Martin Brown |
#45
Posted to microsoft.public.excel,microsoft.public.excel.programming
|
|||
|
|||
Excel and the Math Coprocessor for DLLs
"Martin Brown" wrote:
Inexplicably, after we import a module into XL2003 VBA (SP3), the FPU control word is set to _PC_53. And apparently it remains in that state until we change it by calling _controlfp ourselves. That does not happen in XL2007 and XL2010. That is odd though. For me here on XL2007 it does. I am using XL2007 SP3 here and I see exactly consistent behaviour with control word set to _PC_53 on entry just like with XL2003. It gets the "right" answer the first time through but fails on subsequent goes when the control word has been reset to _PC_64. I think by his application. (FYI, I am using XL2007 SP2. I will have to look into updating to SP3.) Yes, "by his application". Please note that the behavior that I described is __independent__ of Lynn's DLL. I agree that something hinky is happening with Lynn's DLL, which not only includes Watcom C modules, but also Watcom F77 modules. Instead, I am using the following VBA module to infer the state of the FPU. Sub testit2() Const factor As Double = 2 ^ -24 + 2 ^ -53 + 2 ^ -55 Dim z As Double, x As Double z = 1 + factor + factor x = 1 + factor: x = x + factor MsgBox "z-1 = " & Format(z - 1, "0.0000000000E+00") & _ vbNewLine & "x-1 = " & Format(x - 1, "0.0000000000E+00") End Sub The expected output is: z-1 = 1.1920928977E-07 x-1 = 1.1920928999E-07 x-1 always emulates the behavior of _PC_53. z-1 demonstrates the state of the FPU. 1.1...77E-7 is indicative of _PC_64 + _RC_NEAR; 1.1...99E-7 is indicative of _PC_53 + _RC_NEAR. Note that 1.1...77E-7 is exactly 2^-23 + 2^-52, whereas as 1.1...99E-7 is exactly 2^-23 + 2^-51. ----- "Martin Brown" wrote: I am curious now as to what is actually going on. The alleged "fault" should have no bearing on the DLLs computational accuracy since _PC_64 maintains additional guard digits beyond what normal _PC_53 REAL*8 computations can manage. That is incorrect. As demonstrated by "testit2" above, when the FPU state is _PC_53, each FP operation (e.g. addition, multiplication, divide, negate) is calculated using the full 64-bit mantissa of the 80-bit register, but then it is rounded to the 53-bit mantissa according to the FPU state -- _RC_NEAR unless we change it with a call to _control87. It is the _PC_53 rounding of each FP operation that makes the difference. With _PC_64, rounding is done only after all FP operations, when the FP register is stored into 64-bit memory (type Double). Note: That assumes that the compiler relies on the FP registers for intermediate results, as VBA does. If it stores each operation into 64-bit memory, as I believe Excel does, in effect it is emulating _PC_53 mode despite the _PC_64 state. But what is happening in Lynn's DLL is a total mystery. As you seem to have discovered, the DLL code itself is doing unexpected things. If you are correct that it uses an FPU __emulator__, that might explain some the inconsistencies that I discovered, e.g. differences between the result of "testit2" and Lynn's compilation of InitTestDLL. However, finding a reference to emu387.lib in the DLL link map does not mean that that code is being executed. I don't know much about the Windows DLL architecture, and nothing about Watcom compilations. But the emu387.lib reference might be invoked somehow automagically only if computer lacks real FPU support. Otherwise, the emu387.lib might simply be dormant code. Nonetheless, even if emu387.lib is not the explanation, I am convinced that Lynn's DLL does unexpected things -- inexplicative things for the time being. We should not draw over-generalizations about "VBA" behavior based on how VBA and Lynn's DLL interact. ----- FYI, I wrote: And apparently it remains in that state until we change it by calling _controlfp ourselves. ..... Or until we open another workbook in the same Excel instance. ..... Or until we delete a module when at least two workbooks are open in the same Excel instance -- usually(!). This is clearly defective behavior of XL2003, not intentional and expected behavior, IMHO. |
#46
Posted to microsoft.public.excel,microsoft.public.excel.programming
|
|||
|
|||
Excel and the Math Coprocessor for DLLs
PS.... "Martin Brown" wrote:
The final results may differ slightly but a 64 bit mantissa should give better results! I forgot to address this point. I'm really pressed for time.... The result of (x/y)*y-x with the specific chptst constants will be exactly zero when evaluated __consistently__ in _PC_53 and _PC_64 modes. That is a coincidence. There are values of x and y for which the result is not zero in either mode, not always at the same time. The operative word is "consistently". And I am only talking about the one-line expression (x/y)*y-x. Lynn's implementation is not consistent, assuming no "subexpression elimination" optimization by the compiler. He computes x/y and stores it (call it z). Effectively, that is the same as _PC_53 mode. Then he computes z*y-x. When the latter is calculated in _PC_53 mode, the result is zero because that is the same as (x/y)*y-z in _PC_53 mode. When the latter is calculated in _PC_64 mode, the result is non-zero because we have mixed modes: _PC_53 rounding for z, but _PC_64 for the rest of the computation. In contrast, if Lynn had written (x/y)*y-z in one expression, which is the correct way to test the FDIV defect, it would result in zero in _PC_64 mode. Again, all of this is by coincidence; that is, it is specific to the constants in the chptst code. |
#47
Posted to microsoft.public.excel,microsoft.public.excel.programming
|
|||
|
|||
Excel and the Math Coprocessor for DLLs
On 4/3/2012 2:42 AM, joeu2004 wrote:
I wrote: There is a difference(!) when calling InitTestDLL from XL2003 VBA v. XL2010 VBA. [....] When InitTestDLL is called from XL2003 VBA, the last MsgBox displayed by checkMathCoprocessorStatus says "was 0x127f" and "is now 0x127f" the first time. But subsequent times, it says "was 0x137f" and "is now 0x137f". [....] But when InitTestDLL is called from XL2010 VBA, we see "was 0x137f" and "is now 0x137f" for the first call and for all subsequent calls. Okay, this is all just a little too weird even for me. The difference in XL2010 VBA that I saw happens when I click on the testdll.xls file icon, which opens with XL2010 on my system. But when I open XL2010 and open testdll.xls from the Recent Workbooks list, it behaves the same as XL2003 VBA above. Moreover, despite displaying "was 0x137f" and "is now 0x137f", indicative of _PC_64 if Martin is correct, the chptst test returns zero the first time, indicative of _PC_53. But that is inconsistent with the results of the following test, which I put in the beginning of CommandButton1_Click: Const factor As Double = 2 ^ -24 + 2 ^ -53 + 2 ^ -55 Const z As Double = 1 + factor + factor Dim x As Double x = 1 + factor: x = x + factor MsgBox Format(z - 1, "0.0000000000E+00") & vbNewLine & _ Format(x - 1, "0.0000000000E+00") The result of x-1 is about 1.1920928999E-7, which emulates _PC_53 precision. The result of z-1 is about 1.1920928977E-7, which is indicative of _PC_64 precision. I cannot explain these inconsistencies. On the other hand, when I compile your testdll.c using VC++, I get consistent results with XL2003 and XL2010 VBA, no matter how I start testdll.xls (click on the file and open in XL2010, or start XL2003 and XL2010 and open testdll.xls within). With the recompiled testdll.c, chptst is always about 2.85E-10, indicative of _PC_64 precision. And checkMathCoprocessorStatus always displays "was 0x0" and "is now 0x0", indicative of _PC64 + _RC_NEAR (based on VC++ values). Notes: 1. When I recompiled testdll.c, I changed sprintf(...,old87Status,old87Control,new87result) to sprintf(...,old87Status,old87Control & mask,new87result & mask), where mask is _MCW_PC + _MCW_RC. In other words, checkMathCoprocessorStatus only displays the "PC" and "RC" states.) 2. Also, I commented out the call to capsmn in InitTestDLL, since I did not know how to link to it. (I am new to VC++.) Any conclusion on my part would be just wild speculation. The call to capsmn is my other test dll, testdll1.dll. It is written in a mixture of F77, C and C++. I am beginning to think that I have a computational problem in my big DLL that Excel is not causing but exacerbating. I need to check this out. I am also moving my business on Saturday so I will be going offline for a few days. Thanks, Lynn |
#48
Posted to microsoft.public.excel,microsoft.public.excel.programming
|
|||
|
|||
Excel and the Math Coprocessor for DLLs
On 4/4/2012 2:55 AM, Martin Brown wrote:
On 03/04/2012 23:50, Lynn McGuire wrote: On 4/2/2012 7:33 PM, joeu2004 wrote: "Lynn McGuire" wrote: TestDLL.dll is created using my testdll.c code and the Open Watcom C compiler and linker. Alternatively, as I explained previously, you can call a DLL routine directly from the VBA procedure CommandButton1_Click. The DLL routine would call _control87(_PC53 + _RCNEAR, _MCW_PC + _MCW_RC). That would obviate the need to call _control87 from each DLL routine that you might call from CommandButton1_Click. But you would need to call that DLL routine from every VBA procedure that you initiate, either with an Excel "button" or by calling a VBA function from an Excel formula. So arguably, it is more reliable to call _control87 (or a DLL routine) from each entry point that allow to be called from VBA. Done. No better calcs. OK. There is something odd here. The link map for your DLL suggests that the code includes the FPU emulator emu387.lib code as well (which AFAIR was obsoleted more than a decade ago). It is entirely possible that the 387 emulator cuts enough corners that it cannot compute the FPU test sum right even when asked to do so. Incidentally your compiler has optimised *at compile time* the original division that you think you are testing to a multiplication by the reciprocal. It is a technique called strength reduction. That is (X/Y)*Y - X was compiled as X * (1/Y) * Y - X where the value (1/Y) was computed by the compiler and not by the FPU! (almost any modern optimising compiler will do this because at runtime an FP division is significantly more expensive than a multiply) MS Visual C goes further an spots that the result is identically zero. Optimiser on it compiles to FLDZ. What you are seeing here could well be a defect in how the *compiler* optimises expressions since it will quite likely use a 64bit mantissa here. I now think that is the root cause of your "problem". The Math Coprocessor status is 0x4020 and control is 0x137f The Math Coprocessor status is is now 0x127f The Math Coprocessor status is 0x4000 and control is 0x127f The Math Coprocessor status is is now 0x127f The Math Coprocessor status is 0x4000 and control is 0x127f The Math Coprocessor status is is now 0x127f OK. The math coprocessor is now in the right state. With this change does your fluid mechanics code now give the right expected answers? I call the following code from each entry point. unsigned checkMathCoprocessorStatus () { unsigned old87Status = 0; unsigned old87Control = 0; unsigned new87result = 0; old87Status = _status87 (); old87Control = _control87 (0, 0); //if (old87Status != 0) { char msg [4096]; sprintf (msg, "\nThe Math Coprocessor status is 0x%x and control is 0x%x\n\n", old87Status, old87Control); writeLineToScreen (msg); // new87result = _control87 (_PC_64 + _RC_NEAR, _MCW_PC + _MCW_RC); // new87result = _control87 (_PC_53, _MCW_PC); // new87result = _control87 (_PC_64, _MCW_PC); // new87result = _control87 (_RC_NEAR, _MCW_RC); // _asm { FINIT }; //new87result = _control87 (0x27f, 0xffff); //new87result = _control87 (0, 0); new87result = _control87 (_PC_53 + _RC_NEAR, _MCW_PC + _MCW_RC); sprintf (msg, "\nThe Math Coprocessor status is is now 0x%x\n\n", new87result); writeLineToScreen (msg); } return old87Status; } eg void checkMathCoprocessorStatus (unsigned &old87Status, unsigned &old87Control) { old87Status = _status87 (); old87Control = _control87 (0, 0); } unsigned setMathCoprocessorStatus (unsigned value, unsigned mask) { unsigned old87Control = _control87 (value, mask); return old87Control } To be able to help further strip out every last thing that isn't absolutely essential from the code and make all routines individually callable from VBA. Also disable all compiler optimisations and tell it to assume an FPU is fitted (this alone may fix your problem once the right rounding mode is enabled and that seems to have worked OK). Also make the value "Y" used in the (X/Y)*Y-X expression parameters to the calling routine - that way the compiler cannot pre compute (1/Y) Very impressive analysis ! Wow, I did not know that Open Watcom was reorganizing my floating point code for me even in debug mode. That sucks. However, we started using the Watcom compilers back in 1994 or so because the floating point speed was so outstanding. All the games were compiled with it for the same reason. No, I still have slightly bad results from my big DLL. The problem that I am really having is that the results are varying depending on the last results. Not good and obvious to the user even if the varying is only by 1.0e-5 (10 ppm). I am beginning to think that I have a computational problem in my big DLL that Excel is not causing but exacerbating. I need to check this out. I am also moving my business on Saturday so I will be going offline for a few days. Thanks, Lynn |
#49
Posted to microsoft.public.excel,microsoft.public.excel.programming
|
|||
|
|||
Excel and the Math Coprocessor for DLLs
On 04/04/2012 16:35, joeu2004 wrote:
PS.... "Martin Brown" wrote: The final results may differ slightly but a 64 bit mantissa should give better results! I forgot to address this point. I'm really pressed for time.... The result of (x/y)*y-x with the specific chptst constants will be exactly zero when evaluated __consistently__ in _PC_53 and _PC_64 modes. That is a coincidence. There are values of x and y for which the result is not zero in either mode, not always at the same time. It also works out exactly in all three precision modes 24, 53, 64 bits mantissa provided that round to nearest is enabled. The operative word is "consistently". And I am only talking about the one-line expression (x/y)*y-x. Lynn's implementation is not consistent, assuming no "subexpression elimination" optimization by the compiler. Indeed. The store back to memory of x/y *and* the compilers precomputation at compile time of (1/y) to full precision makes the test meaningless. It is a shame that no modern compilers implement TEMPREAL or REAL*10 for those of us that do know what we are doing. ISTR Mickeysoft withdrew that feature at around Visual C++ v6 for "better numerical consistency" - when what they meant was to avoid support calls. Using 10 byte reals has performance implications for alignment. He computes x/y and stores it (call it z). Effectively, that is the same as _PC_53 mode. Then he computes z*y-x. When the latter is calculated in _PC_53 mode, the result is zero because that is the same as (x/y)*y-z in _PC_53 mode. When the latter is calculated in _PC_64 mode, the result is non-zero because we have mixed modes: _PC_53 rounding for z, but _PC_64 for the rest of the computation. In contrast, if Lynn had written (x/y)*y-z in one expression, which is the correct way to test the FDIV defect, it would result in zero in _PC_64 mode. Trouble is that a lot of compilers do know about identities "y/y" and "x-x" these days although they seldom eliminate in debug mode. I suspect that your suggested code rearrangement above will "fix" the "apparent" rounding problem in Watcom too. Though I have a hunch it will compile as FLD1 ie precomputing the numeric constant "y/y". If it allows embedded assembly then the following code in VC will do the test explicitly if inserted before "return chptst" _asm{ // divtwo = top / bottom; fld qword ptr [top] fdiv qword ptr [bottom] // chptst = (divtwo * bottom) - top; fmul qword ptr [bottom] fsub qword ptr [top] fstp qword ptr [chptst] } This gives total control of the instructions used and leaves all intermediate results on the stack. Again, all of this is by coincidence; that is, it is specific to the constants in the chptst code. And as we have both pointed out several times the "error" here is insignificant and not indicative of any problem with the FPU. -- Regards, Martin Brown |
#50
Posted to microsoft.public.excel,microsoft.public.excel.programming
|
|||
|
|||
Excel and the Math Coprocessor for DLLs
"Martin Brown" wrote:
If it allows embedded assembly then the following code in VC will do the test explicitly if inserted before "return chptst" Let's not get side-tracked by methods to make the one example behave consistently and as expected. AFAIK, that is not the issue that Lynn is dealing with. (Although he is the final arbiter on that point.) IIRC, the issue is: Lynn discovered that (some) floating-point computations (initially not the chptst example) did not return exactly the same result when called from VBA and when linked into a program. He probably expected an exact zero, which made the difference "obvious" to him. I suspect Lynn interpreted that as a defect, and in Googling for an explanation, he stumbled across the decades-old and long-since fixed FDIV defect. When Lynn implemented the FDIV example (incorrectly), he discovered that it did indeed return zero in his program, but not exactly zero (albeit infinitesimally close) when called from VBA. Incorrectly thinking that this explained the original disparity that he encountered, Lynn became fixated on this one example. But my point is: any example would do. So there is no point in taking draconian steps to make the one example -- the chptst computation -- work. Moreover, although it has been interesting to explore the behavoir of the FPU control word with respect to VBA v. Watcom and VC++, I believe that is not the right solution for Lynn. Instead of setting the FPCW precision mode, Lynn should recognize that such floating-point inconsistencies are common place. For example, it is not unusual to experience them when doing "the same" calculation in Excel and in VBA. Again, the solution is to explicitly round the results of calculation when you expect results to be consistent to a specific precision. Alternatively, design algorithms to tolerate infinitesimal differences in floating-point calculations. For example, instead of test A=B, test ABS(A-B)<e, where "e" is some difference that reflects your tolerance for error, e.g. 0.005. |
#51
Posted to microsoft.public.excel,microsoft.public.excel.programming
|
|||
|
|||
Excel and the Math Coprocessor for DLLs
On 4/5/2012 9:48 AM, joeu2004 wrote:
"Martin Brown" wrote: If it allows embedded assembly then the following code in VC will do the test explicitly if inserted before "return chptst" Let's not get side-tracked by methods to make the one example behave consistently and as expected. AFAIK, that is not the issue that Lynn is dealing with. (Although he is the final arbiter on that point.) IIRC, the issue is: Lynn discovered that (some) floating-point computations (initially not the chptst example) did not return exactly the same result when called from VBA and when linked into a program. He probably expected an exact zero, which made the difference "obvious" to him. I suspect Lynn interpreted that as a defect, and in Googling for an explanation, he stumbled across the decades-old and long-since fixed FDIV defect. When Lynn implemented the FDIV example (incorrectly), he discovered that it did indeed return zero in his program, but not exactly zero (albeit infinitesimally close) when called from VBA. Incorrectly thinking that this explained the original disparity that he encountered, Lynn became fixated on this one example. But my point is: any example would do. So there is no point in taking draconian steps to make the one example -- the chptst computation -- work. Moreover, although it has been interesting to explore the behavoir of the FPU control word with respect to VBA v. Watcom and VC++, I believe that is not the right solution for Lynn. Instead of setting the FPCW precision mode, Lynn should recognize that such floating-point inconsistencies are common place. For example, it is not unusual to experience them when doing "the same" calculation in Excel and in VBA. Again, the solution is to explicitly round the results of calculation when you expect results to be consistent to a specific precision. Alternatively, design algorithms to tolerate infinitesimal differences in floating-point calculations. For example, instead of test A=B, test ABS(A-B)<e, where "e" is some difference that reflects your tolerance for error, e.g. 0.005. Nope, I have had the FDIV check in our software since 1995. We had several customers all of sudden reporting bad results with their new PCs. Here is the comment that I put in the code for that subroutine: C 03/27/95 Lynn McGuire add pentium math coprocessor test That test used to work just fine. I suspect that the Watcom floating point code optimization has gotten better since 1995. Since I don't have a bad FPU to check this code with anymore, I probably need to pull it out of our code. Lynn |
#52
Posted to microsoft.public.excel,microsoft.public.excel.programming
|
|||
|
|||
Excel and the Math Coprocessor for DLLs
"Lynn McGuire" wrote:
Nope, I have had the FDIV check in our software since 1995. Okay. My bad! In that case, Martin's assembly-language implementation is just the thing for you. Your original test design was incorrect; it worked only by coincidence. |
#53
Posted to microsoft.public.excel,microsoft.public.excel.programming
|
|||
|
|||
Excel and the Math Coprocessor for DLLs
On 05/04/2012 21:06, joeu2004 wrote:
"Lynn McGuire" wrote: Nope, I have had the FDIV check in our software since 1995. Okay. My bad! In that case, Martin's assembly-language implementation is just the thing for you. Your original test design was incorrect; it worked only by coincidence. Strictly it worked OK in a standard Fortran REAL*8 aka 53bit mantissa environment provided that the compiler doesn't get too clever. The store to an 8 byte real is harmless there. However, everything breaks if the arithmetic 64bit mantissa but only 53 bits stored the result ends up different in the least significant bit. This should not matter at all unless your algorithm is extremely sensitive to roundoff - you still have almost 16 good decimal digits. All decent commercial compilers are smart enough to optimise divide by a constant to multiple by a reciprocal these days. So the test is not actually testing the divide instruction at all any more. -- Regards, Martin Brown |
#54
Posted to microsoft.public.excel,microsoft.public.excel.programming
|
|||
|
|||
Excel and the Math Coprocessor for DLLs
"Martin Brown" wrote:
provided that the compiler doesn't get too clever __Of_course__, all of my comments assume that compile-time optimizations are avoided. That goes without saying, if our intent is to test the FDIV defect or the effect of difference precision modes (_PC_64, _PC_53, _PC_24). (Although it was useful to Lynn that you discovered that his compiler is optimizing the code.) "Martin Brown" wrote: On 05/04/2012 21:06, joeu2004 wrote [in response to Lynn]: Your original test design was incorrect; it worked only by coincidence. Strictly it worked OK in a standard Fortran REAL*8 aka 53bit mantissa environment [...]. The store to an 8 byte real is harmless there. However, everything breaks if the arithmetic 64bit mantissa but only 53 bits stored the result ends up different in the least significant bit. When I said that Lynn's original chptst implementation was incorrect, I was referring to the rounding of the __intermediate__ result (only the division) to 64-bit representation (53-bit mantissa) For the FDIV defect, if x is 4195835 and y is 3145727, then (x/y)*y-x is exactly zero whether all calculation is done __consistently__ using 64-bit or using 80-bit representation (the latter with a 64-bit mantissas). Of course, the final result is stored with 64-bit representation. (And again, __of_course__ I am only talking about the case when compile-time optimizations are avoided.) But instead of a single expression, Lynn[*] implemented this as div = x/y, then chptst = div*y-x. That worked (i.e. chptst is exactly zero) for him in the past only because apparently he was using 64-bit representation __consistently__. It failed to work (i.e. chptst is not exactly zero) when Excel VBA used 80-bit representation, but only because Lynn is not using 80-bit representation __consistently__. Indeed, it is the store into div, with only 53-bit precision, that causes the problem. It is not "harmless" in this particular case. That is what I meant when I said it worked (in the past) only by coincidence. The "coincidence" was that he was using __consistent__ precision in the past, namely 64-bit representation (53-bit precision). "Martin Brown" wrote: This should not matter at all unless your algorithm is extremely sensitive to roundoff - you still have almost 16 good decimal digits. It is not clear to me what your point is. But in a previous response, I alluded to the fact that _in_general__, we cannot expect (x/y)*y-x to be exactly zero, even when we use __consistent__ precision for the calculation. (And again, __of_course__ I am only talking about the case when compile-time optimizations are avoided.) Here are some random examples, all using x and y with the same absolute and relative magnitudes as Nicely's example for the FDIV defect. By that, I mean: x and y are 7-digit numbers; and 1 < INT(x/y) < 2. 80bit 64bit mixMode x y =0 =0 <0 4195835 3145727 (FDIV bug) =0 =0 =0 1300666 1233646 =0 <0 <0 1695563 1538366 =0 <0 =0 none <0 =0 <0 1923832 1204810 <0 =0 =0 none <0 <0 <0 1867447 1462980 <0 <0 =0 none -----[*] I don't know if Lynn invented the div/chptst implementation or he copied it from some reference. When I asked him for a reference, he pointed me only to the wiki FDIV bug page, http://en.wikipedia.org/wiki/Pentium_FDIV_bug. I do not see Lynn's div/chptst implementation on that page or on any referenced page. On the contrary, I do find Nicely's original pentbug.c file, which contains a __consistent__ implementation using 64-bit representation. Excerpted: double lfNum1=4195835.0, lfNum2=4195835.0, lfDenom1=3145727.0, lfDenom2=3145727.0, lfQuot, lfProd, lfZero; /* The duplicate variables are used in an effort to foil compiler optimization and compile-time evaluation of numeric literals. */ lfQuot=lfNum1/lfDenom1; lfProd=lfDenom2*lfQuot; lfZero=lfNum2 - lfProd; Aside #1.... Note that Nicely went out of his way to avoid compile-time optimizations. Or so he hoped ;-). If Lynn had followed Nicely's implementation correctly, he wouldn't have had any problem with the FDIV constants despite the change in precision mode between Watcom languages and Excel VBA and despite the Watcom compile-time optimizations. Aside #2.... IMHO, Nicely was wrong to use 64-bit representation to demonstrate the FDIV bug. Luckily, it worked, but only because the FDIV error was so large, about 8E-5. He might have overlooked more subtle errors due to rounding the intermediate 80-bit results to 64-bit representation. |
#55
Posted to microsoft.public.excel,microsoft.public.excel.programming
|
|||
|
|||
Excel and the Math Coprocessor for DLLs
Errata.... I wrote:
If Lynn had followed Nicely's implementation correctly, he wouldn't have had any problem with the FDIV constants despite the change in precision mode between Watcom languages and Excel VBA But perhaps Nicely's implementation or something nearly identical was not available when Lynn wrote his(?) algorithm. The pentbug.c file that I find today has only a 2011 copyright. But it presumably was written in or before 2003, since that is the date attributed to the pentbug.zip file on Nicely's website. If the algorithm had been made public at some earlier date (e.g. 1995), I would think that Nicely would know that the earlier date(s) of "publication" should be included in the copyright. PS.... I wrote: IMHO, Nicely was wrong to use 64-bit representation to demonstrate the FDIV bug. Luckily, it worked, but only because the FDIV error was so large, about 8E-5. He might have overlooked more subtle errors due to rounding the intermediate 80-bit results to 64-bit representation. I meant to add.... I also think that it was "wrong" to use (x/y)*y-x to demonstrate the FDIV bug in the first place. Well, I do agree that it had "marketing appeal" insofar as it might have made the defect more relevant to some people. But my point is.... IMHO, that is not the correct way to test for the presence of the defect. Actually, it is Tim Coe who discovered the error with 4195835.0/3145727.0 per se. It represents the largest known error among the many defective FDIV results. And it does appear that Nicely or Coe did look at the result of the division using 80-bit arithmetic, at least eventually. Nicely's published results of the division are (the comma shows 15 significant digits to the left): 1.33382044913624,10025 (Correct value) 1.33373906890203,75894 (Flawed Pentium) Note that the representation of the "correct value" is more precise than the 64-bit representation can be. The 64-bit representation is exactly 1.33382044913624,109305771980871213600039482116699 21875, somewhat greater than 1.333...,10025. The next closest 64-bit representation is exactly 1.33382044913624,087101311488368082791566848754882 8125, which is too low. Based on the information available today, I would implement the FDIV bug test as follows. #include <math.h double ident(double x) { return x; } // thwart compiler optimizations int fdivTest() // 1 = okay; 0 = error { const double num = 4195835; const double denom = 3145727; const double quot = 1.33382044913624; return (fabs(ident(num)/ident(denom) - quot) < 5e-15); } It does not matter whether that is evaluated consistently in 64-bit (53-bit mantissa) or 80-bit (64-bit mantissa) representation or a mix. FYI, we cannot expect the division result to have the same 64-bit binary representation as the constant 1.33382044913624. In fact, the two closest representations of the constant are 1.33382044913623,998283469518355559557676315307617 1875 and 1.33382044913624,020487930010858690366148948669433 59375. |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Excel is messing with the math coprocessor | Excel Programming | |||
VSTO updating excel dlls | Excel Programming | |||
How are excel function names in dlls/xlas resolved ? | Excel Programming | |||
Referencing .Net DLLs | Excel Programming | |||
DLLs | Excel Programming |