High speed data transfer internally, between loops. Nothing seems fast enough.

John_Isodore · ‎10-20-2022

There is a part of a LabVIEW application that runs a control algorithm. It needs to run as fast as possible, but we have settled on 8000Hz. It runs on an sbRIO9609.

Code function explanation:

Reads a FPGA DMA, calibrates the raw values, sort the data, push it through an algorithm that gives the next current output and then push that down again to the FPGA. This works fine as long as I use a timed sequence and select a specific CPU for the task. If I do not do this it does not run at 8000Hz.

The problem:

The data it produces and some status info(timing, overruns etc.) needs to be sent out of the loop to be processed by the rest of the application. I need the data from every iteration of the loop, so it is important that it is lossless. How to get the data out without halting the loop speed?

I want the main loop to run as fast as possible, but I use the output data in larger chunks so I do not need it to run fast. Loops run at 8000kHz, and the reader reads at 10hz would be ideal. Buffering is not a problem, not running out of memory

Solution attempt 1:

Queues and flush the queue. I had the fast loop write to a queue and I had a queue in parallel that flush this queue. The problem was that every time I flushed the queue it blocks the writing from the fast loop. It blocks it for about 1ms. Missing the data is not an option, so it cannot have 0 as a timeout.

Solution attempt 2:

Reading and writing with the same speed.

I used a queue and a RT queue. Put 2 loops with the timed sequence and fixed CPU. 1 for the control loop and one for the reader loop. The reader loop does not flush the queue, but it reads one element at the time. This works but now I use 2 CPUs just on control loop and I need 2 control loops. I could have 1 for reading both and 2 for each of the control loops, but this would leave only 1 CPU for the rest of the application. (I know it does “lock” the cpus up, but it sure does use them.) The other problem with it is that it seems like a waste. I do not need the data sample by sample. I store it in big chunks at fixed intervals.

Is there any other and better way to write fast and read slower that does not halt the writing process?

If there is anything i can add to help understand the problem. Feel free to ask.

GerdW · ‎10-20-2022

Hi John,

@John_Isodore wrote:

Code function explanation:

Reads a FPGA DMA, calibrates the raw values, sort the data, push it through an algorithm that gives the next current output and then push that down again to the FPGA. This works fine as long as I use a timed sequence and select a specific CPU for the task. If I do not do this it does not run at 8000Hz.

Having a small RIO system run a loop at 8kHz with FPGA data transfer in both directions is quite heavy work…

How many samples do you transfer with each iteration of your 8kHz loop?

@John_Isodore wrote:

Is there any other and better way to write fast and read slower that does not halt the writing process?

Did you try to run the control algorithm entirely on your FPGA? This one should be able to consistently run at 8kHz…

Best regards,
GerdW

using LV2016/2019/2021 on Win10/11+cRIO, TestStand2016/2019

John_Isodore · ‎10-20-2022

Hei,

Thanks for your interessert.

@GerdW wrote:
How many samples do you transfer with each iteration of your 8kHz loop?

I transfer 16x32bit FXP up from the FPGA. But again this uses some time around 22us, but thats fast enough.

Data transfer down is 3x32bit, so i have not decided on FP or DMA. FP was easy to set up and seems to work fine, but then i need an addition flag and extra logic to make sure it works. This might not be ideal, but it works. The big show stopper is the data transfer internally in the RT.

@GerdW wrote:
Did you try to run the control algorithm entirely on your FPGA? This one should be able to consistently run at 8kHz

The algorithm is a simulink node. This is how the client wanted it. I also recommended having it on the FPGA, but it required too much resources, so it will be the simulink node. It takes about 22us to process too. So the big stuff takes around 44us together of the totalt 125us i have, which is plenty. If only the flush queue was not blocking everything would be good.

GerdW · ‎10-20-2022

Hei John,

@John_Isodore wrote:

The algorithm is a simulink node. This is how the client wanted it. I also recommended having it on the FPGA, but it required too much resources, so it will be the simulink node. It takes about 22us to process too. So the big stuff takes around 44us together of the totalt 125us i have, which is plenty. If only the flush queue was not blocking everything would be good.

Unfortunately you forgot to attach some (simplified) code…

Which kind of queues do you use? "Default" queues (like on Windows target) or RTQueues?

Best regards,
GerdW

using LV2016/2019/2021 on Win10/11+cRIO, TestStand2016/2019

James_McN · ‎10-20-2022

Next thing I would try is buffering in an array on the control loop and then an RT FIFO.

I think it will need to be an RT FIFO since the concept here is that it can't block your RT loop - definitely what you need. But as you say, you want to step down the rate of the reader.

If you have a pre-allocated array of say 100 elements. You can write into this in the control loop and when you get to element 100, write to the RT FIFO. The reader can then read 100 elements at a time at 800Hz which will drop the timing requirements on this.

This should be pretty fast consistently but experimenting with the size of the pre-allocation may trade off the change in loop execution time when you have to write to the RT FIFO.

As an aside - you may find that FPGA Front Panel items are actually more consistent than using DMA transfer mechanisms as the DMA has an internal buffer which probably has locking for management where FPGA front panel items are simpler consistent memory reads. It also means that if there is a glitch on the RT side - there isn't data left in the buffer so you are now running behind. There may be some other reason for it to be like this though

James Mc
========
CLA and cRIO Fanatic
My writings on LabVIEW Development are at devs.wiresmithtech.com

LucianM · ‎10-20-2022

Some code or at least some pictures with the code would be nice to better understand what you are doing. Have you tried to configure the queue to be an array of the elements that you want to process, build the array inside your fast loop and as soon as you reach the desired number of elements, enqueue the data?

Lucian
CLA

johntrich1971 · ‎10-20-2022

@LucianM wrote:

Some code or at least some pictures with the code would be nice to better understand what you are doing. Have you tried to configure the queue to be an array of the elements that you want to process, build the array inside your fast loop and as soon as you reach the desired number of elements, enqueue the data?

This is also what I was going to suggest. I don't recall ever having a reason to use flush queue. If I need to send multiple elements then I format my queue as an array, collect that array and enqueue it, then dequeue and process it.

John_Isodore · ‎10-20-2022

@GerdW wrote:
Unfortunately you forgot to attach some (simplified) code…

Which kind of queues do you use? "Default" queues (like on Windows target) or RTQueues?

I have now added some dummy code, but as i stated in the original post, attempted solution 2, i have tried both. You cannot flush a RT queue or have any other way to read all the elements at once. So in solution attempt 1, i tried a normal queue. To be more specfic i have only used the queue function under "Queue Operations" tab, which i what i refer to as the normal queue. The RT queue i refere to is the RT FIFO. located under "Real-TIme", "RT FIFO". If there is any addition way of transfering data lossless that is efficient, please let me know.

In the dummy code. Solution attempt 2 works, but its very resource heavy and i use the data in chuck, so i would like to pull it in chunks.

EDIT: Added VI for LV 16 too, i use LV20 for this project.

John_Isodore · ‎10-20-2022

Hei James,

Thanks for the respons. I have been thinking about something like this, but was not sure if it was clever enough. I will definitivily try it next.

I have been reluctant to add more logic and handling to the control loop because i was afraid that it might slow it down.

Another option on a similar note i have been thinking about is having 2 normal queues and then write say 4k samples to one then 4k to the other, and then back to the first queue. With this i could flush queue 1 while the control loop is writing to queue 2 and vice versa. What do you think of an approach like this? (writing to a normal queue has never been a problem, just the flushing that blocks.)

Kevin_Price · ‎10-20-2022

To the OP: I think you're missing one of the things many are saying. If you want your RT consumer to read many samples at a time, then the queue or RTFIFO datatype should be an array of samples. So that reading one queue element gives you one array of samples.

As others have said, the producer part of the code would first accumulate single samples until a fixed size array is full, then write that whole array to the queue / RTFIFO as a single queue element.

-Kevin P

CAUTION! New LabVIEW adopters -- it's too late for me, but you *can* save yourself. The new subscription policy for LabVIEW puts NI's hand in your wallet for the rest of your working life. Are you sure you're *that* dedicated to LabVIEW? (Summary of my reasons in this post, part of a voluminous thread of mostly complaints starting here).

LabVIEW

High speed data transfer internally, between loops. Nothing seems fast enough.

High speed data transfer internally, between loops. Nothing seems fast enough.

Re: High speed data transfer internally, between loops. Nothing seems fast enough.

Re: High speed data transfer internally, between loops. Nothing seems fast enough.

Re: High speed data transfer internally, between loops. Nothing seems fast enough.

Re: High speed data transfer internally, between loops. Nothing seems fast enough.

Re: High speed data transfer internally, between loops. Nothing seems fast enough.

Re: High speed data transfer internally, between loops. Nothing seems fast enough.

Re: High speed data transfer internally, between loops. Nothing seems fast enough.

Re: High speed data transfer internally, between loops. Nothing seems fast enough.

Re: High speed data transfer internally, between loops. Nothing seems fast enough.