NIDaqmxbase Read VI till buffer empty challenge

sth · ‎08-19-2009

I hate to push the point. But it is a bit more than "not officially supported". Actually it doesn't work or do anything except be a zero sample read. A value of -1 is useless and indeterminate in it's behavior. I believe that as an error input the system will just return the configured number of samples. It is sort of irrelevant since I don't use it.

Second, we have gotten off of the case I am interested in, I am using hardware timing. I want hardware timing. However in the real world hardware failures etc. cause this timing to not be acheivable. I would like to use hardware timing and when that fails, empty the buffer so I can get back to hardware timing. I do not believe that NIDAQmx Base can do this. That is why this is a challenge! Anyone out there who can post code?

As a test harness I have my read and the standard read compared in the attached zip file.

Run the analog buffer test VI.

If you use the "standard read" it will return 2500 scans each iteration as it should. Now use the boolean control to introduce a 1.3 second delay as some sort of OS distraction (or other hardware timeout). It will still return 2500 pts but they become more and more stale, and finally the 20 second buffer will overflow and you will get the infamous error 42 RLP failure. It may take a few minutes for this to happen. This is the buffer overflow and not the ultimate answer to life the universe and EVERYTHING.

For the above case, if I turn on the delay and leave it for only 20 seconds or so and then turn it off, the system will run fast and return the stale data. This make the synchronization with other devices impossible.

Now on my slow G4 (it used to be top of the line) if I use my "read until empty" version. The system will return 2550 pts each iteration. If I turn on the 1.3 second delay it will then return about 3400 pts and keep the buffer empty. I will never get a buffer overflow and thus not only am I ending up with the latest data I want instead of stale data I have a system that doesn't fail. If I turn off the 1.3 second delay the system will go back to happily being hardware timing and returning 2550 pts or so.

What I would like is the above behavior *except* that in the case without the delay the system return exactly 2500 points. With the delay I want the 3250 points but that second number can be approximate since in real life the delay is never exact. The 1.3 second number is just an example, in real life that delay can be a few seconds but I think it will never be more than 10 seconds (which is why I put in a 20 second buffer!).

I feel like I haven't explained this well and hopefully the contained code will show this. It is designed to work with a 32 differential channel card (6330) but I think you can change the channel spec array for other hardware and it will still show this error.

I am fairly specific about the need here which is for a reliable system to make periodic measurements in the face of other equipment problems.

sth · ‎08-19-2009

The behavior is even a bit more "off" than I thought. If you give a timeout of zero seconds, and ask for 2500 pts while there are more than that in the buffer, it will return all 2500 points and STILL return a timeout error. This isn't a timeout since the points were already in the buffer so even with a zero second timeout it should NOT return such an error.

sth · ‎08-19-2009

And even adding the 1 mS timeout doesn't solve this problem (or 2 mS) there is another dang race condition in here that is returning a timeout when that is not the true problem. So even my solution does NOT empty the buffer. AAAAGGHGHGHG.

stilly32 · ‎09-03-2009

Hi sth,

I'm looking at the timeout code the for the different HW families and realized a 6330 isn't an NI card. We have a 6030e but that doesn't quite match your description of 32 channels. To make sure we find a solution for your HW, what card are you using?

Looking at E And M Series, I can see why you're getting that timeout (especially in the E series case) as well as the data. I'm talking to the developers about that. My initial thoughts are along the lines of your solution - do an extra read with a timeout of 0 and reading a large number of samples. If there are extra samples in the buffer, read them and toss them, if not just timeout and move on. I think you could still do this, though you'd always get that error. I also think there may be a better way, but it depends on the HW. With M Series it looks like it is possible to implement a "number of samples available" property, the other HW lines will take some more research.

Thanks,

Andrew S.

MIO DAQ Product Support Engineer

Getting Started with NI-DAQmx

Measurement Fundamentals

sth · ‎09-03-2009

I'm looking at the timeout code the for the different HW families and realized a 6330 isn't an NI card. We have a 6030e but that doesn't quite match your description of 32 channels.

Andrew,

Ok, my dyslexia kicked in! My wife just habitually reverses numbers on my phone messages since they are always backwards. On the other hand my LV programs end up nice and symmetric!! 🙂

Yes it is an NI card a pci-6033e. It is definitely an E series card. I use the 64 channels as 32 in differential mode.

There are two things wrong with this solution.
1. If I use a timeout value of 0, I always get a timeout so I am not sure about determining if I have really emptied the buffer.

2. There are always the few extra samples in the buffer that now disrupts my hardware timing.

I really want a function that tells me how many samples are in the buffer or an output that returns it upon reading. The data is there if you drill down in the code it is just not passed back up the call chain.

JoeFriedchicken · ‎09-04-2009

Hi Scott,

Your "DAQmx Base Read Until Empty" VI is nearly correct given the limitations of the driver, the only change you need to make is your timeout value for the second and subsequent iterations of the while loop. Instead of 0 or near-zero, you should use a larger value that is less than the length of time it would take the requested number of samples to arrive.

I think you have idealized the processes involved in general DAQ programming in this case, so please let me lay out a thought experiment to make my case. To simplify comparision between each trial, I'll choose a simple AI task that each trial will use. This task is part of larger system that has been written to read data once every second. This task has one channel, is sampled at 1 kHz, and has no external triggering. We will thus expect that in one second, 1,000 samples will be available. We will be reading 1,000 samples each time, so each read will contain one second's worth of data.

Experiment: The Perfect Situation
The OS executes every function call and every processor instruction in zero time.

Variation 1: Start the task and then immediately read.
Since the OS responds instantaneously, it is always able service to the task exactly on time. After the task is started, it enters DAQmx Base Read and begins to wait for 1,000 samples to be available in memory. Once they arrive, the read call returns the 1,000 samples. The buffer never has excess samples left over as a result. If we set a timeout of 1.000 second, we will never get a timeout error since everything happens exactly on time every time. If we set a timeout of anything less than 1.000 second, the task will return a timeout error because the DAQ hardware is not returning samples more quickly than 1,000/sec.

Variation 2: Start the task and then wait three seconds for the first read.
The buffer is now holding three seconds' worth of data, but will never accumulate more left-over samples. On the first call to DAQmx Base Read, the buffer will return the 1,000 samples (the first second of data). Each call will return the next group of 1,000 every second, but that group of data will lag reality by three seconds, potentially causing problems in the larger application. If we used your "DAQmx Base Read Until Empty" VI here in place of DAQmx Base Read, it would work as you intended: it would read all of the data from the buffer and return it to you. The OS would instantaneosly query how many samples were in the buffer, see that more than 1,000 were available, and pull them into LabVIEW memory, all in zero time. It will continue to do this until there are less than 1,000 samples in the buffer. At that time, with a zero timeout, the DAQ hardare will not be able to fill the buffer until it's at least 1,000 full, and the read call will report a timeout error since 1,000 samples weren't available. This causes your VI to exit and return the data, and you have successfully cleared the data backlog. Subsequent reads will the most recently acquired second's worth of data.

I think you may see where I'm going with this 'instantaneous OS' -- a zero timeout is impossible in the real world since even a no-op instruction consumes time on a processor. Querying the samples available and comparing them with how many are requested takes a non-zero amount of time. Moreover, copying the data from the DMA buffer to LabVIEW takes a non-zero amount of time.

I'll summarize variation two's perfect behavior in a small table. These parameters apply to the situation when DAQmx Base Read Until Empty is called, at the moment when the OS first enters the call.

| time (sec) | timeout | samples in DMA buffer | data returned? | error? |
| 3.000000 | 1.000 |    3,000              |    yes         | no    |
| 3.00000   | 0.000 |    2,000              |    yes         | no    |
| 3.0000    | 0.000 |    1,000              |    yes         | no    |
| 3.000     | 0.000 |    0                  |    no          | yes   |

Here's the distinction I'm trying to make: when the buffer has fewer samples than requested, DAQmx Base Read and the OS must wait for the DAQ hardware to make samples become available. When the buffer has more samples than requested, the OS doesn't need to wait on the DAQ hardware and can retrieve the data immediately, and thus timeout is irrelevant if it's 1, 10, or 100 seconds: the only time taken by the call to get the data is the time used to query, compare, and copy.

Your timeout in DAQmx Base Read Until Empty VI needs to greater than this time. Once the timeout is greater than this, DAQmx Base Read will retrieve the data as quickly as the OS can shuttle bytes in RAM; the DAQ board does not affect the length of the call.

On the other hand, there is an upper bound for the timeout, as you know. Our goal is to empty the buffer, and that means eventually calling DAQmx Base Read when there are fewer samples than we expect (0-999). This means the OS has caught up to the DAQ board to point where it has less than a one second lag. Once we hit this situation, the OS will then poll the DMA buffer for samples available. If the timeout is too large, say two seconds, then the DAQ board will have a two seconds to add more samples to the DMA buffer, and even if there were 0 left over samples, the DAQ board will have added another 1,000 before two seconds had passed and the read will succeed. With this timeout value, your Read Until Empty VI will never encounter an error, and it will only stop once LabVIEW has consumed all of your RAM 😉

So the key is to pick a timeout value such that the OS can have its overhead but prevent the DAQ board from keeping pace once the OS cathces up to it. The window of values that work begins at the maximum amount of time it takes the OS to query, compare, and copy; the window ends at the amount of time it takes the DAQ board to fill the buffer with the number of samples you want.

The value that you choose determines how many samples will be left behind when you finally error on a timeout. If you choose a half second timeout, then the read will succeed if there 501-999 samples in the buffer, leaving 1-499 samples in the buffer on exit. Upon the next read, the timeout error will happen since at most only 501-999 samples will be in the buffer at the end of the polling.

The shorter the timeout, the fresher your data. Here's another chart to compare a timeout of 500 ms versus 100 ms, beginning at the call to read in which the OS catches up the DAQ hardware:

Samples avail when...                 | 500 ms timeout | 100 ms timeout |
...when entering read                 |    501..999    |   901..999     |
...completing read                    |      1..499    |     1..99      |
...entering read again and timing out |    501..999    |   101..199     |

With a 500 ms timeout, it possible for 999 samples to be left in the buffer after read times out. With a 100 ms, only as much as 199 samples can be left behind. Since the OS can't act instantanesously, the timeout must be greater than zero. Indeed, even if it could, the DAQ board will have pushed more samples into the buffer after the OS had completed the copy to LabVIEW memory. There will always be left over samples in a non-RT system 🙂

Joe Friedchicken
NI Configuration Based Software
Get with your fellow OS users
[ Linux ] [ macOS ] Principal Software Engineer :: Configuration Based Software
Senior Software Engineer :: Multifunction Instruments Applications Group (until May 2018)
Software Engineer :: Measurements RLP Group (until Mar 2014)
Applications Engineer :: High Speed Product Group (until Sep 2008)

JoeFriedchicken · ‎09-04-2009

I have a little more to add. It looks like I ran into the maximum post length 🙂

Obviously, this problem wouldn't exist if there were a way to directly ask DAQmx Base how many samples there were in the buffer before reading, and then just requesting that amount. If you poke around the read VIs, there is a mechanism we use to do just that.

Drill down to "ESeries -- AI DMA Read Data DMA 2D.vi" and you'll an invoke node for "DMA Read u16". There's an unwired output called "Samples Left in Buffer". This output indicates the total number of samples in the buffer for all channels. To determine how many samples are available for each channel, you would obviously need to divide by the number of channels in your scan list. It's possible to bring this indicator up to the umbrella DAQmx Base Read VI for E and M series, but our other hardware (namely USB) doesn't communicate so directly.

All in all, I would say there's a feature request for Base in this mess. My preference is to tell DAQmx Base to read and return the entire buffer by passing '-1' as the number of samples to read. I'm not sure if this is something we can do since USB doesn't play so nicely, but it's worth tracking at the very least.

What are your thoughts?

Joe Friedchicken
NI Configuration Based Software
Get with your fellow OS users
[ Linux ] [ macOS ] Principal Software Engineer :: Configuration Based Software
Senior Software Engineer :: Multifunction Instruments Applications Group (until May 2018)
Software Engineer :: Measurements RLP Group (until Mar 2014)
Applications Engineer :: High Speed Product Group (until Sep 2008)

sth · ‎09-10-2009

Joe,

Thanks. It took me awhile to chew through all that information. Having written some DAQ drivers for the old PDP-11 series it is not that I have idealized the situation to zero processing time. It may be a difference in how we account for that processing time that has lead to my misunderstanding.

I am used to a driver not polling the hardware. Not ever. This is a wasteful, inefficient and bad way to write a driver. It may be the easy way that NI has used for the NIDAQmx base driver but it is a fundamental flaw that shows up here. The driver should pass the request to the PCI(e) card with a DMA address, a timeout and a count. All that polling of the DMA engine should go away and only notify the user when the interrupt occurs telling you that the count items have been retrieved.

Thus the driver never sees any data past the count number of points. All this race condition with the driver waiting a full timeout to tell you that the buffer has overflowed while claiming there isn't enough samples goes way. Polling is bad because of race conditions, inefficiencies and just general kludyness!!!

That being said, there is not much chance that a more efficient driver will appear in the near future.

Where we are disagreeing on behavior can be traced to the "time to transfer data" being counted as part of that timeout. I figured that timeout was passed to the DMA engine itself. So there is now a problem where the timeout can occur even if the samples were available but just not time to transfer the data to the user buffer? Thus my request with a 0 second timeout should have two cases. I am using your 1000 sample blocks as an example.

more than 1000 samples in the buffer, it should realize they are there in less than 0 seconds since they were there before the call, then move the 1000 samples into the user buffer with NO timeout
less than 1000 samples in the buffer, then wait for 0 milliseconds and then transfer what it has with a timeout since it actually waited some time.

This didn't work, since I got a timeout with 0 seconds and more than 1000 items in the buffer. So I added a 1 mS timeout as a minimal time. Now on a modern GHz machine this should allow enough time for transfer of 10,000 samples even with a huge amount of overhead. Now in the two cases

more than 1000 samples in the buffer, it should not wait, but transfer the 1000 samples to the user buffer in a couple of microseconds. And record NO timeout
less than 1000 samples in the buffer, it should wait for 1 millisecond, then return what is in the buffer with a timeout error.

Back in the good old days of DAQ under LV 3 to 7 (or even the older DAQ of LV <2) every one of us wrote a VI that would check the number of samples in the buffer and then read the appropriate amount. This was necessary since a thread would lock up during the AI read function! 🙂 In your suggestion of going to even a 100 mS timeout really messes up my hardware loop timing. I would like to keep it as close to the 1 second hardware timed loop as possible. Adding 10% is way off. I was trying to compromise of adding 1 mS (0.1%) but would like to avoid that. It is only necessary to empty that buffer in an unusual case and adding that extra time for each loop to handle the 1 in 100,000 case where I get the extra delay is a problem.

But in this case to really do it right, I need that check how many samples are in the buffer before I start a read. So as you say this is a feature request.

# of samples in buffer < 1000, read 1000 with 2 second timeout

# of samples in buffer > 1000, read all the samples and return immediately to try to get back on track.

the problem is that everything I add to handle the second case, adds overhead in the 1st case unless I know beforehand which case I am dealing with. I am not sure how to phrase a feature request for this and not get it kicked out immediately because of the problem with your USB based devices which don't have the peek-ahead ability or would be very hard to do.

I can drill down to the E-Series DMA calls and get that number. It may be possible to ask for 0 samples with a 0 timeout and then get that number though LV application server. I have found that it is hard to maintain a modification to NIDAQmxBase though system upgrades. This is both good and bad.:-)

JoeFriedchicken · ‎09-10-2009

Hi Scott,

Your critique of the DMA architecture for E Series under DAQmx Base is well founded. The driver uses a software loop to poll the DMA chip and ask how many samples it has pushed to host memory. However, perhaps unlike PDP-11 boards, the E Series boards do not have a concept of timeout in their hardware.

Even in DAQmx under Windows, the driver uses software to query how many samples are available. The performance differences come about in the programming of the board's DMA behavior. The DMA chip has several different personalities, and Base uses a simpler one than DAQmx, which can use interrupts to prompt the software to update its running total of available samples. This mechanism isn't present in the personality Base uses, so the driver is limited to loop-driven polling.

With that said, there are a few places I've found where improvements can be made and not require a new DMA personality. With a few changes to two VIs, I believe it would be possible to change Base's E Series read behavior to return data immediately without a timeout error, even when the user requested a timeout of zero seconds. In other words, your first use case would be possible. Although the hardware prevents both the Mac and Windows driver from "realizing they [the data] are there in less than 0 seconds" a priori, it would be reasonable for the driver to ask how many samples are available and then, if the amount requested aren't there, start polling and marking time while waiting for the rest of the samples to arrive.

I agree with the behavior of your second use case as well (returning as much data as possible given the timeout), but I still think the magic '-1' is a good approach. I think perhaps both would be best. I can imagine some people wanting to read whatever is available (by using '-1'), and others wanting to catch up to their hardware (like you, by using '0' and clearing a timeout error). I don't think folks in the first group would naturally think that the way to read all available samples is to call DAQmx Base Read in such a way as to cause an error.

At any rate, I pieced together a small E-Series-only property VI that will tell you how many samples per channel are currently available in the DMA buffer. The easiest way to get the freshest data when you're behind is to

call that VI,
add a few hundred to the amount returned,
and read that new amount.

This new amount will be greater than the amount currently available, and will force the read to wait for more samples to become available. As long as your timeout is long enough to allow those few hundred extras samples to arrive, the read will not only empty the buffer, but give you the freshest possible data. What's more, you can keep this VI separate from your Base installations and keep using it as you upgrade 🙂

It's saved in LabVIEW 8.2.1.

Joe Friedchicken
NI Configuration Based Software
Get with your fellow OS users
[ Linux ] [ macOS ] Principal Software Engineer :: Configuration Based Software
Senior Software Engineer :: Multifunction Instruments Applications Group (until May 2018)
Software Engineer :: Measurements RLP Group (until Mar 2014)
Applications Engineer :: High Speed Product Group (until Sep 2008)

sth · ‎09-10-2009

Joe F. wrote:

Your critique of the DMA architecture for E Series under DAQmx Base is well founded. The driver uses a software loop to poll the DMA chip and ask how many samples it has pushed to host memory. However, perhaps unlike PDP-11 boards, the E Series boards do not have a concept of timeout in their hardware.

Just to correct a misstatement of my original post. In trying to remember back to that earlier incarnation of OS programming, I believe that the timeout was set as a system timer in the driver. You would launch a DMA transfer, set the timer, and then relinquish the CPU waiting for either timeout. If the DMA happened you cancel the timer request, if the timer happened, you cancelled the DMA transfer. The PDP-11 series was probably discontinued around 198? and before NI had analog boards. I think that the ones I used mainly were from Data Translation and they are still around, but these were very rudimentary boards.

I believe that the OS X kernel programming uses a "WorkLoop" concept that is similar. Nowadays you can use the nanosecond clocks for timing. (Not that you really get nanosecond precision on these). But there are mechanisms to do the similar operation.

Multifunction DAQ

NIDaqmxbase Read VI till buffer empty challenge

Re: NIDaqmxbase Read VI till buffer empty challenge

Re: NIDaqmxbase Read VI till buffer empty challenge

Re: NIDaqmxbase Read VI till buffer empty challenge

Re: NIDaqmxbase Read VI till buffer empty challenge

Re: NIDaqmxbase Read VI till buffer empty challenge

Re: NI-DAQmx Base Read VI till buffer empty challenge

NI-DAQmx Base Read VI till buffer empty challenge

Re: NI-DAQmx Base Read VI till buffer empty challenge

DAQmx Base E Series DMA polling logic needs improvement

Re: DAQmx Base E Series DMA polling logic needs improvement