Number to fractional string intermittently wrong

Brian_Powell · ‎09-28-2007

Here's what I think the problem is...

The Intel floating point chip supports three different "precision control" modes. These modes control whether floating point calculations are done in single, double, or extended precision. (Single = 32-bit, Double = 64-bit, Extended = 80-bit)

LabVIEW sets the precision control to extended precision. The main reasons we do this are because LabVIEW presents an extended precision floating point format to users, and we also use this format internally for certain things (such as converting numbers to text for display).

A few years ago, Microsoft changed their math libraries to occasionally reset the precision control to double precision. I think they do this when they are first loaded or called, but I don't claim to fully understand how their code works. I'll conjecture that they reduce the precision of the chip to gain consistency regardless of how their compiled code is optimized. I should also note that Microsoft's languages such as Visual-C++ and the .Net languages do not support an extended precision floating point format.

So consider this scenario...

LabVIEW launches and sets the floating point chip to extended precision
LabVIEW calls "foreign" code--such as a DLL, COM object, or .Net object
The "foreign" code loads and calls the Microsoft libraries that reduce floating point precision to double-precision.
The "foreign" code returns to LabVIEW, leaving the floating point in a reduced-precision mode
LabVIEW does something which assumes that extended-precision floating point works.
It doesn't work.

Our code to convert floating point numbers (including single- and double-precision numbers) uses an extended format internally. I think what you're seeing is that we get a slightly different answer when converting the number to string in a reduced-precision mode. Given the reduced precision, this isn't a surprising result mathematically, but it sure is annoying if you expect a certain answer to result.

I first started hearing about these scenarios in the LabVIEW 7 timeframe. In LabVIEW 8.0 (or maybe 7.1.1), I fixed LabVIEW to check and reset the floating point state when we return from "foreign" code.

Ben · ‎09-28-2007

Brian wrote

"

A few years ago, Microsoft changed their math libraries to occasionally reset the precision control to double precision.

"

To which I reply "Out! Out! you demons of stupidity!" (Dogbert)

Thanks for the explanation and fix Brian!

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

tst · ‎09-29-2007

First of all, thanks for the explanation, Brian.

However, unlike Ben, I'm not convinced yet that this explains this issue.
I am using both COM and .NET in that application, so that would be a good reason, but I was also running the example VI after I closed the other program and it would still reproduce the issue. The problem was that the results are intermittent - one iteration of the for loop would return X and the next iteration, only a few ms later, would return Y. In average, about 20 random (?) iterations out of 200 were wrong. Is this explainable by this behavior?

Would this also explain why LV sometimes converted a number similar to 209.08500001 to the string "209.08" when the numeric control had DBL precision?

___________________
Try to take over the world!

Travis_M. · ‎10-02-2007

Out of curiosity, when you closed the COM using program to run the example program, were you sure that the com program was unloaded from the LabVIEW process? Depending on how you opened the example, you may need to shut down LabVIEW to remove the external code from the LabVIEW process' memory.

Perhaps Brian might know the answer to this question, but if this Intel floating point mode is some sort of global setting (as opposed to a per-process setting), it is possible that any other process running on your machine could be toggling this at unpredictable times throughout the run of your example.

Message Edited by Travis M. on 10-02-2007 02:23 PM

Travis M
LabVIEW R&D
National Instruments

tst · ‎10-03-2007

@Travis M. wrote:

Out of curiosity, when you closed the COM using program to run the example program, were you sure that the com program was unloaded from the LabVIEW process?

This persisted after closing LV and reopening it, so I think it should have been unloaded.

To describe the components in a bit more detail:

1. .NET executable running separately.
2. LabVIEW program calling 3 and 4 (running in the IDE. I don't remember seeing it in the RTE, but this was during the development phase, so I do most of the running in the IDE).
3. Several .NET classes, some of which do all kinds of work with the .NET decimal type.
4. A COM object which does something and is usually called only relatively rarely.

It is quite likely that when I closed LV, the .NET executable was still running, but I also seem to remember checking this out in the evening after I got home and it most likely was not running then. Of course, it is possible that I did not restart LV in the evening.

If I understand what you're suggesting correctly, it's basically something like this:

I run the .NET app.
I run the LV app.
For some reason, it starts acting crazy.
I close LV.
I reopen LV, but this is still happening, because the .NET app is causing it.

I guess the main questions now would be:

Why is it relatively rare? What causes it to happen? I ran the program quite a few times and didn't notice it most of the times.
Why is it intermittent if LV 7.0 isn't setting the precision back to EXT?
Why did increasing the priority of the VI solve this?
Is this guranteed not to affect numeric calculations? I would guess yes, but would like to make sure.

___________________
Try to take over the world!

Brian_Powell · ‎10-03-2007

First of all, to answer Travis' question about whether this is a global setting...

The floating point state is managed per thread. I.e., when the processor switches between threads, it swaps out all the processor registers, including the floating point state. So having a separate process running should not affect things. That would be wacky.

As an aside, correctly copying the processor state when switching between processes and threads is not something that Windows 3.1 did well. I have a check in LV's random number generator that ensures that floating point is still working. (It's an algorithm that should never produce a negative number, so I check for that.) In Win 3.1, I'd hear about someone running into this about every month.

I also know of some advanced real-time OS's that try to be so clever in doing a minimal context switch that they've screwed it up. There's an off chance that there's some context switch bug in Windows that could make this happen, but I have no reason to believe that this is happening.

tst wrote:

Why is it relatively rare? What causes it to happen? I ran the program quite a few times and didn't notice it most of the times.

Why is it intermittent if LV 7.0 isn't setting the precision back to EXT?

Why did increasing the priority of the VI solve this?

Is this guranteed not to affect numeric calculations? I would guess yes, but would like to make sure.

4. No. It might affect numeric calculations, which is why we don't like it when external libraries switch the floating point state and leave it that way. The "nice" thing for a library to do is save and restore the floating point state around changes to it. Since the Microsoft libraries don't play nice, we try to save and restore for them (in later versions of LV).

2. LV 7.0 does reset the floating point state in certain cases. (Telling you exactly when it does, and how it changed in 7.1.1/8.x, would be a major research project for someone who wants to look through our old source code.)

1,2. It basically depends on the code path you go through between the time the foreign code is called and the time we reset the floating point state for that thread again.

3. Increasing the priority of the VI puts it into its own thread that probably no other VI (or the UI thread) is using. Thus, the screwed up FP state is isolated to that one thread. I would avoid doing mathematical computations in that thread.

I would especially avoid calling such components in the UI thread. I believe .Net objects are inherently thread-safe and can be called from any thread. In the case of COM objects, you're at the mercy of the COM object's developer--he or she specifies the threading model you have to use. Single-threaded COM models have to be run in the UI thread.

tst · ‎10-04-2007

OK, this seems to explain it. It still does not explain why this would persist in the evening or immediately after a restart of LV, but it's possible I didn't remember it correctly. In any case, I understand that my feeling that this could potentially be a very serious bug was correct.

I understand that my solution would be (if possible) to switch all the VIs handling COM objects (and .NET objects?) to another execution system? Can we get someone from the .NET integration team to answer whether the interaction with .NET is protected?

Does this need to be done only in LV 7 or is this still possible in later versions even with the changes you made? If it is, then this is a very important piece of information for anyone working with COM or .NET.

___________________
Try to take over the world!

Brian_Powell · ‎10-05-2007

OK, this seems to explain it. It still does not explain why this would persist in the evening or immediately after a restart of LV, but it's possible I didn't remember it correctly. In any case, I understand that my feeling that this could potentially be a very serious bug was correct.

Yes. The potential is definitely there. In practice, it hasn't reared its head as much as I imagined.

I understand that my solution would be (if possible) to switch all the VIs handling COM objects (and .NET objects?) to another execution system? Can we get someone from the .NET integration team to answer whether the interaction with .NET is protected?

When I addressed the bug in 7.1.1/8.0, I tried to find any place we call foreign code, including DLLs, .Net, and COM objects. This also includes places we call such things internally. There's an IVI component, for example, that's used by the IVI Logical Name front panel control and that changes the floating point state. I had to protect LabVIEW from that, too.

By the way, the reason I say "7.1.1/8.0" is that I'm pretty sure I made the change to both source code branches at about the same time. 7.1.1 released before 8.0, but both source code branches were alive.

Does this need to be done only in LV 7 or is this still possible in later versions even with the changes you made? If it is, then this is a very important piece of information for anyone working with COM or .NET.

It's definitely possible. In 8.0, we added a lot more ways that we call DLLs to deal with the project.

Knowing that such calls could sneak in, I added code to LabVIEW in 7.1.1/8.0 to warn of this. If you see a "kernel.cpp" error, it might be from me. In our debug versions around here, it says something like, "Floating point state is bad. Tell Brian."

Longer term, I'd like to see us immunize our code from whatever Microsoft decides to do. We have some ideas, but this is pretty hard to do.

sth · ‎10-05-2007

Brian Powell wrote:

Longer term, I'd like to see us immunize our code from whatever Microsoft decides to do. We have some ideas, but this is pretty hard to do.

It isn't that hard to do.... In fact it is fairly trivial if you look at the start of the thread and some of the configurations where this bug doesn't occur!

Yes, I know that just translates the problem, but you can't throw out such a great straight line and not expect either Urs or myself to rise to the bait! There is always a bit of truth to the best jokes.

A Parable of a Mac user Meeting a Windows User"

tst · ‎10-08-2007

@brian Powell wrote:

In practice, it hasn't reared its head as much as I imagined.

Or so you think...

Seriously though, it's possible that people ran into this and didn't notice it. I'm still not sure what the impact of this on my program is, but I only noticed it because that specific program would always round down and would convert to a string with a precision of 32 (no, I didn't write that bit) and so I had an indicator which alternated between 102 and 103 for no good reason. If the difference was only in the nth digit, I probably wouldn't mind, but both DBL and EXT should have had 103, right?

To be honest, this bug really scares me now. At first I thought it might be some freak conditions causing it, but from what you say, I understand that to be safe, I have to move all COM\.NET\DLL calls to other execution systems. That's prohibitive.

P.S. Yes, Scott, I know can also just switch an OS.

___________________
Try to take over the world!

LabVIEW

Number to fractional string intermittently wrong

Re: Number to fractional string intermittently wrong

Re: Number to fractional string intermittently wrong

Re: Number to fractional string intermittently wrong

Re: Number to fractional string intermittently wrong

Re: Number to fractional string intermittently wrong

Re: Number to fractional string intermittently wrong

Re: Number to fractional string intermittently wrong

Re: Number to fractional string intermittently wrong

Re: Number to fractional string intermittently wrong

Re: Number to fractional string intermittently wrong