SGL to I32 Typecast

Intaris · ‎03-19-2015

I'm going to have a hard time solving this problem...... Darn

nathand · ‎03-19-2015

Wish I had a solution for you, but all I can offer is sympathy. I worked on an application where we use a sbRIO to drive a high-speed printing process. The sbRIO receives image data over TCP as a string, Type Casts that data to an array of 32-bit integers, and writes it to a DMA FIFO. As far as I can tell, the sbRIO makes three copies of the data without modifying any of the bits: one copy as a string when it is received over TCP, a second copy as an array of numbers due to the type cast, and a third copy into the DMA buffer. Making all these copies creates a bottleneck at high print speeds.

JLewis · ‎03-19-2015

I don't have an answer either, but would like to pull the string on how you're using the I32 data. Normally when you have a need for this type of casting, that means there's a gap in our data type support somewhere. Is there a feature request for SGL support that we can extract from this?

BTW, another gotcha on C style casting is that floating-point registers can come into play with platform-specific ramifications. Don't ask how I know that...

Intaris · ‎03-20-2015

Right, here is my situation. Maybe there's a better solution to my problem, I hope so.

Our RT-Loop runs at 20kHz. This gives us a hard limit of 50 microseconds per loop, let's say 45 microseconds so that we have some breathing room. Of that 45 microseconds, approximately 20 is taken up with FPGA communications (DMA transfers in both directions). These currently cannot be offloaded to a different loop because we run some control loops in the RT loop, hence the FPGA communications needs to be synchronous. We have a max of 25 microsedonds to play with.

Currently we have >10 different modules running on the RT doing their thing. DIO, PI Controllers, motor control, slew rate controlled outputs, that kind of thing. Each module takes in the region of 1 microsecond to run. At the moment all of these share a single SGL array as a CVT essentially. All active parameters are stored int his SGL array. We have a single "God Enum" which tells us what is located where and we index according to the values of this Enum (i.e. Input 1 is Index 0, Input 2 Index 1 and so on). We have more than 500 Parameters. This works but I was hoping to change the following:

Encapsulate functionality of each individual module so that changes to one module (which results in changes to the "God Enum") no longer require touching ALL RT Loop VIs. In addition, make the functionality (such as bit-packed U32s) internal with proper exposed accessors so that the implementation and the interface are decoupled. Simple stuff, all good. First modules are implemented and the ability to make changes localls without global ramifications is nice. So I now have an "Analog Out" object which encapsulates calibration, slew rate and so on. Modules which write to the AO need no longer know ANYTHING about the implementation, programming instead using the provided interface. All of the functions of all of the modules are non-Dynamic dispatch and it took some trial and error to find out how to achieve this without completely borking the performance since a single DD-call overhead is in the region of 0.7 microseconds. Huge in this context. Most VIs are inlined. Each module stores its own data in it's proper data format. We thus no longer need to continually read a SGL from the SGL array (using the God Enum) and coerce to whatever type we require. Theoretically this should save time on the RT as a host of conversions to and from SGL are avoided.

My current problems arose when I started implementing channels into and out of these modules. When sending data from the host PC, we send an index value (which comprises of Module Number (8-bit) plus other values. In total we have 64 bit, the upper 32 bits are identifiers and the lower 32 bits the actual data payload. This is a nice scaleable approach with the ability to increase up to over 200 individual modules with each having possibly over 200 parameters. In order to maintain scaleability, I need a common interface to this communication path for all modules. I had implemented this with an U64 which is sent to the input handler of the module (based on the module number being the upmost 8 bits of the U64) and the module then knows what to do with it. Problems occur when converting the 32-bit payload to SGL or to I32 as they kind of need to share the same data path.

On the other end I have a similar problem. We have functionality to register to certain parameter changes on the RT (status feedback). Changing the internal data back to a common interface to send them to the host has the same problem of interconversion as the input path. Ideally casting a SGL as an I32 and passing it tto the host via TCP should be easy but the 0.4 microseconds for EACH type cast is hurting. We can register for up to 50 parameters for feedback, 50x 0.4 microseconds is 20 microseconds which will kill our RT loop. We don't pass all values each loop, but we do need to make comparisons to see if the value has changed. This comparison was also implemented on a common datatype in order to help scaleability. The idea of making the comparison for each and every parameter in the owning object (and thus avoioding the conversion for this part) would mean a huge number of extra VIs.

What kind of common data path should I be using if this "SGL and I32 as a common type" does not work for me. I can't simply convert the first 32 bits of the data to I32 (Indexing) and send the second 32-bits as String because a RT FIFO (used for communication outside of the RT Loop for TCP comms to take place) doesn't allow this. Perhaps a cluster of I32 for the Index plus a DVR of a (essentially fixed-length) String? The need for a workaround of this type makes me rather depressed to be honest. I just hope my boss is willing to put up with another round of exploratory coding before the bloody thing can work.

Spoiler

And yeah, I'm really angry at myself because I missed this aspect during earlier benchmark testing (I only wrote a few parameters within 1 million RT loops for testing so the cost of the conversion was lost in the noise) and I should KNOW that typecast is expensive on the RT platform. It's too late in the day for things like this to be popping up, mea culpa. Only the return route (which needs to run EVERY iteration) raised its ugly head sufficiently to get me worried. Benchmarking sucks. Or at least I suck at benchmarking.

tst · ‎03-20-2015

@Intaris wrote:

My current problems arose when I started implementing channels into and out of these modules. When sending data from the host PC, ...

On the other end I have a similar problem. We have functionality to register to certain parameter changes on the RT (status feedback). Changing the internal data back to a common interface to send them to the host...

Perhaps a cluster of I32 for the Index plus a DVR of a (essentially fixed-length) String? The need for a workaround of this type makes me rather depressed to be honest.

If I'm understanding these correctly, your problem is in sending information into and out of the loop. My immediate thought (before I actually read your last paragraph, which seems to go in a similar direction) was that you should offload the conversion part to a lower priority loop (hopefully that's doable) and then try to send data using "pointers".

Essentially, this would mean that the comm loop gets the data from the host, breaks it down and shoves the data into a DVR or whatever and then sends that index + (flattened?) DVR combo to the processing loop. Inside each case you can now destroy the DVR. The reason I suggested possibly flattening it is that this could allow the DVR to be strictly typed.

I have no idea what the performance of all this would be like and I don't remember doing something exactly like this before. I don't know what it takes to allocate a DVR. I don't know what it takes to flatten and unflatten it. It's possible that you would be better off with a queue instead of a DVR, where you can create the queue once, and even create space in RAM for a few elements.

___________________
Try to take over the world!

Intaris · ‎03-20-2015

I thought of exactly that option for a while, but I would like to avoid creation of a string inside my time critical loop if at all possible.

I've just implemented a "workaround" which may turn out to be better anyway. It's a lot of code for getting around what SHOULD be an easy option but hey, that's LabVIEW.

I'm going to benchmark now and see if the functionality is even given......

tst · ‎03-20-2015

@Intaris wrote:

I thought of exactly that option for a while, but I would like to avoid creation of a string inside my time critical loop if at all possible.

I don't see why you would need a string. You have 32 bits for your index data and 32 bits for the reference. The flattening of the reference can be to a U32 (I should probably have used the term casting instead of flattening, sorry).

If your other workaround (which is what, I'm curious?) doesn't pan out, hopefully this is stil a backup option.

___________________
Try to take over the world!

Intaris · ‎03-20-2015

@tst wrote:

I don't see why you would need a string. You have 32 bits for your index data and 32 bits for the reference. The flattening of the reference can be to a U32 (I should probably have used the term casting instead of flattening, sorry).

I thought if a string since everything gets sent back to the host as a string anyway. It's an enevitable conversion. The "Type Cast" is also actually the function causing the problems and is exactly what I'm trying to remove from my code. It is reassuring though that others (whose opinions I value) seem to think along similar lines to my original approach. My sanity (which is often questioned) may just be holding on by a hair's breadth.....

My solution was to create a data container object. It contains the I32 Parameter index (with module number, parameter number and so on which is universal), a SGL and another I32 as data payload. It also contains an U8 "mode" datatype (to let the code know if the SGL or I32 contains the actual data) and a Boolean to indicate whether the data has changed since it was sent back last time. The Boolean is for functionality which was actually maintained externally previously but it now makes more sense to have it in here since the comparisons to determine a change can then be made strictly-typed.

I maintain 50 of these obejcts in an array (We can subscribe to a max of 50 values) and pass it to the routines for getting the return values from each module. The module then does it's own processing (as it's encapsulated) and passes the data in to the data container (at the moment only "Write I32" and "Write SGL"). The I32 for the Index is ALWAYS written to. I perform a comparison on writing if the value has actually changed so that this can be done SGL to SGL or I32 to I32 which is probably more efficient. Each object therefore tracks it's own "changed" status. When I pass this on via RT-FIFO I set the "Changed" flag to false. I currently pass a static cluster of data via FIAO because AFAIK, ~~classes~~ objects (Don't tell AQ):smileyfrustrated: are not supported on RT-FIFOs.

Then on the receiver side (still RT) I read the FIFO, write the data to a dummy object of the data container class and then call the "Get Send String" method which spits out the correct string (only 64 bits) as it knows itself which data is important (Index and I32 or Index and SGL). This string is then sent via TCP as per usual. The host doesn't need to know anything has changed.

Seems to be working and performance is certainly better than before. So as astonishing as it might seem, this approach appeals to me. A further improvement would be to send a DVR of a cluster instead of the cluster itself as the cluster has now grown from the original I32 + SGL = 64 bits to U8 + I32 + I32 + SGL = 104 bits.

It's nearly weekend and I'm happy to at least have a candidate for a better solution.

The bottom line is that ALL this work is still significantly faster than actually using a type cast. I won't clain Type Cast should be free (in performance terms) but I will say we SHOULD have a function to do a blind re-assignment of type which would actually have prevented this problem entierly.

Intaris · ‎03-20-2015

So basically I've made a "pseudo Dynamic Dispatch" call where the different cases are statically defined as opposed to doing actual Dynamic Dispatch.

The advantage is the ability to inline, and the fact that I can't send an object over a RT-FIFO anyway would have only introduced a different problem if I wanted to actually send abstract datatypes between the time critical loop and the TCP sender loop. Inlineable Dynamic Dispatch calls is a different matter for another day.

tst · ‎03-21-2015

Not that it really matters, but on second thought (which is about as short as the first one was, as you'll see), in my proposed solution you don't actually need to send the reference separately, at least if it's a queue. The queue refs can be fixed and you can select the correct queue inside the case handling the specific message using any mechanism you like. Again, probably not relevant now that you have a solution, but might be useful in the future.

___________________
Try to take over the world!

LabVIEW

SGL to I32 Typecast

Re: SGL to I32 Typecast

Re: SGL to I32 Typecast

Re: SGL to I32 Typecast

Re: SGL to I32 Typecast

Re: SGL to I32 Typecast

Re: SGL to I32 Typecast

Re: SGL to I32 Typecast

Re: SGL to I32 Typecast

Re: SGL to I32 Typecast

Re: SGL to I32 Typecast