09-28-2007 04:07 PM
Here's what I think the problem is...
The Intel floating point chip supports three different "precision control" modes. These modes control whether floating point calculations are done in single, double, or extended precision. (Single = 32-bit, Double = 64-bit, Extended = 80-bit)
LabVIEW sets the precision control to extended precision. The main reasons we do this are because LabVIEW presents an extended precision floating point format to users, and we also use this format internally for certain things (such as converting numbers to text for display).
A few years ago, Microsoft changed their math libraries to occasionally reset the precision control to double precision. I think they do this when they are first loaded or called, but I don't claim to fully understand how their code works. I'll conjecture that they reduce the precision of the chip to gain consistency regardless of how their compiled code is optimized. I should also note that Microsoft's languages such as Visual-C++ and the .Net languages do not support an extended precision floating point format.
So consider this scenario...
Our code to convert floating point numbers (including single- and double-precision numbers) uses an extended format internally. I think what you're seeing is that we get a slightly different answer when converting the number to string in a reduced-precision mode. Given the reduced precision, this isn't a surprising result mathematically, but it sure is annoying if you expect a certain answer to result.
I first started hearing about these scenarios in the LabVIEW 7 timeframe. In LabVIEW 8.0 (or maybe 7.1.1), I fixed LabVIEW to check and reset the floating point state when we return from "foreign" code.
09-28-2007 04:52 PM
Brian wrote
"
A few years ago, Microsoft changed their math libraries to occasionally reset the precision control to double precision.
"
To which I reply "Out! Out! you demons of stupidity!" (Dogbert)
Thanks for the explanation and fix Brian!
Ben
09-29-2007 12:06 PM
10-02-2007 02:22 PM - edited 10-02-2007 02:22 PM
Message Edited by Travis M. on 10-02-2007 02:23 PM
10-03-2007 08:13 AM
@Travis M. wrote:
Out of curiosity, when you closed the COM using program to run the example program, were you sure that the com program was unloaded from the LabVIEW process?
This persisted after closing LV and reopening it, so I think it should have been unloaded.
To describe the components in a bit more detail:
1. .NET executable running separately.
2. LabVIEW program calling 3 and 4 (running in the IDE. I don't remember seeing it in the RTE, but this was during the development phase, so I do most of the running in the IDE).
3. Several .NET classes, some of which do all kinds of work with the .NET decimal type.
4. A COM object which does something and is usually called only relatively rarely.
It is quite likely that when I closed LV, the .NET executable was still running, but I also seem to remember checking this out in the evening after I got home and it most likely was not running then. Of course, it is possible that I did not restart LV in the evening.
If I understand what you're suggesting correctly, it's basically something like this:
I guess the main questions now would be:
10-03-2007 09:07 AM
First of all, to answer Travis' question about whether this is a global setting...
The floating point state is managed per thread. I.e., when the processor switches between threads, it swaps out all the processor registers, including the floating point state. So having a separate process running should not affect things. That would be wacky.
As an aside, correctly copying the processor state when switching between processes and threads is not something that Windows 3.1 did well. I have a check in LV's random number generator that ensures that floating point is still working. (It's an algorithm that should never produce a negative number, so I check for that.) In Win 3.1, I'd hear about someone running into this about every month.
I also know of some advanced real-time OS's that try to be so clever in doing a minimal context switch that they've screwed it up. There's an off chance that there's some context switch bug in Windows that could make this happen, but I have no reason to believe that this is happening.
tst wrote:
- Why is it relatively rare? What causes it to happen? I ran the program quite a few times and didn't notice it most of the times.
- Why is it intermittent if LV 7.0 isn't setting the precision back to EXT?
- Why did increasing the priority of the VI solve this?
- Is this guranteed not to affect numeric calculations? I would guess yes, but would like to make sure.
4. No. It might affect numeric calculations, which is why we don't like it when external libraries switch the floating point state and leave it that way. The "nice" thing for a library to do is save and restore the floating point state around changes to it. Since the Microsoft libraries don't play nice, we try to save and restore for them (in later versions of LV).
2. LV 7.0 does reset the floating point state in certain cases. (Telling you exactly when it does, and how it changed in 7.1.1/8.x, would be a major research project for someone who wants to look through our old source code.)
1,2. It basically depends on the code path you go through between the time the foreign code is called and the time we reset the floating point state for that thread again.
3. Increasing the priority of the VI puts it into its own thread that probably no other VI (or the UI thread) is using. Thus, the screwed up FP state is isolated to that one thread. I would avoid doing mathematical computations in that thread.
I would especially avoid calling such components in the UI thread. I believe .Net objects are inherently thread-safe and can be called from any thread. In the case of COM objects, you're at the mercy of the COM object's developer--he or she specifies the threading model you have to use. Single-threaded COM models have to be run in the UI thread.
10-04-2007 12:14 PM
10-05-2007 09:09 AM
Yes. The potential is definitely there. In practice, it hasn't reared its head as much as I imagined.
OK, this seems to explain it. It still does not explain why this would persist in the evening or immediately after a restart of LV, but it's possible I didn't remember it correctly. In any case, I understand that my feeling that this could potentially be a very serious bug was correct.
I understand that my solution would be (if possible) to switch all the VIs handling COM objects (and .NET objects?) to another execution system? Can we get someone from the .NET integration team to answer whether the interaction with .NET is protected?
When I addressed the bug in 7.1.1/8.0, I tried to find any place we call foreign code, including DLLs, .Net, and COM objects. This also includes places we call such things internally. There's an IVI component, for example, that's used by the IVI Logical Name front panel control and that changes the floating point state. I had to protect LabVIEW from that, too.
By the way, the reason I say "7.1.1/8.0" is that I'm pretty sure I made the change to both source code branches at about the same time. 7.1.1 released before 8.0, but both source code branches were alive.
Does this need to be done only in LV 7 or is this still possible in later versions even with the changes you made? If it is, then this is a very important piece of information for anyone working with COM or .NET.
It's definitely possible. In 8.0, we added a lot more ways that we call DLLs to deal with the project.
Knowing that such calls could sneak in, I added code to LabVIEW in 7.1.1/8.0 to warn of this. If you see a "kernel.cpp" error, it might be from me. In our debug versions around here, it says something like, "Floating point state is bad. Tell Brian."
Longer term, I'd like to see us immunize our code from whatever Microsoft decides to do. We have some ideas, but this is pretty hard to do.
10-05-2007 02:54 PM
Longer term, I'd like to see us immunize our code from whatever Microsoft decides to do. We have some ideas, but this is pretty hard to do.
10-08-2007 02:48 PM
@brian Powell wrote:In practice, it hasn't reared its head as much as I imagined.
Or so you think...
Seriously though, it's possible that people ran into this and didn't notice it. I'm still not sure what the impact of this on my program is, but I only noticed it because that specific program would always round down and would convert to a string with a precision of 32 (no, I didn't write that bit) and so I had an indicator which alternated between 102 and 103 for no good reason. If the difference was only in the nth digit, I probably wouldn't mind, but both DBL and EXT should have had 103, right?
To be honest, this bug really scares me now. At first I thought it might be some freak conditions causing it, but from what you say, I understand that to be safe, I have to move all COM\.NET\DLL calls to other execution systems. That's prohibitive.
P.S. Yes, Scott, I know can also just switch an OS.