LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

CPU update from 4770k to 13900K but not much increase in speed in a data processing VI

Solved!
Go to solution
Solution
Accepted by obarriel

@obarriel wrote:

Yes my original question was why the performance improvement from 4770k to 13900K was so limited. And that still remains. But anyway, I also appreciate very much any tips to make the code more efficient. Trying to learn from there.


Yes, is is always wrong to just throw more expensive hardware at inefficient code and I think your real issue was always just to improve the code. 😄 Since most of the time is spent in the PINV, you don't really have much slack to tighten elsewhere, but here's a cleaned up version of your non-MASM code to give you some ideas.

 

altenbach_0-1694189959858.png

 

 

Note that this is a near "literal" cleanup. If you would tell us the details of the overall algorithm, maybe it could be done very differently and faster.

0 Kudos
Message 31 of 39
(769 Views)

Thank you very much again.

 

I have been doing more trials. As you said, the best is to set everything to double at the beginning. But for the last two steps I get an important speed up (with MASM) if I switch everything to SGL, and I do the pseudoinverse and vector matrix calculation in single

 

obarriel_0-1694190654793.png

 

0 Kudos
Message 32 of 39
(762 Views)

@obarriel wrote:

Yes my original question was why the performance improvement from 4770k to 13900K was so limited. And that still remains. But anyway, I also appreciate very much any tips to make the code more efficient. Trying to learn from there.


@Yamaeda  post makes it very plausible that memory is a bottleneck.

 

That's where you'll get most benefits: preventing copies (incl. dbl<->sgl conversion).

 

There are 'rule of thump' lists for expected performance gain is (making this up completely, but there are lists like this):

0 to 10X syntax tweaking

0 to 100X memory tweaking

0 to 10000X for algorithm changes

 

Of course any\all might be optimal at any point in time.

Message 33 of 39
(756 Views)

@obarriel wrote:

Thank you very much again.

 

I have been doing more trials. As you said, the best is to set everything to double at the beginning. But for the last two steps I get an important speed up (with MASM) if I switch everything to SGL, and I do the pseudoinverse and vector matrix calculation in single

 

obarriel_0-1694190654793.png

 


If the code before this didn't change, you're better off converting to dbl before concatenating those arrays of dbls.

 

Building (concatenating) and array will always require a copy (well, maybe not if an element is empty). Building an array of dbls and then converting to sgl will be slower than converting 2 arrays to sgl and then concatenating.

 

You might consider showing us the hole picture...

0 Kudos
Message 34 of 39
(749 Views)

Yes, since your data is only 16bit, SGL is sufficient. I think all MASM linear algebra works fine with SGL (example, but not the stock linear algebra, though), so that's what you should be using. The main problem was the constant bouncing between representations as well as the array resizings.

 

And yes, if you can do SGL, twice as much data can fit in the CPU cache, for example.

0 Kudos
Message 35 of 39
(747 Views)

I share here the version from CA modified to use the MASM toolkit

 

Maybe as a summary I include this comparison:

obarriel_0-1694192325431.png

 

0 Kudos
Message 36 of 39
(735 Views)

If you're really going for 'as fast as possible', save a copy of this VI and strip what you don't need:

wiebeCARYA_0-1694192282506.png

 

That tone VI in there can probably be reduced. That code is, well, :"special"...

 

It does output (create) arrays, that you don't use. It's not the most time consuming, but it's wasteful.

Message 37 of 39
(733 Views)

Ok, I know it's kinda bad form to quote myself, but I'm genuinely curious to get a response from the OP about something I brought up earlier today:

 


@Kevin_Price wrote:

I notice in step 1 that you're generating a sine wave pattern that might contain a lot of redundant information.  For example, if the "Main array 16bit" is sized at ~1M, and your detected frequency is maybe 1/100 of the sample rate, then you'd be generating 10k cycles worth of sine wave. 

 

If so, then what exactly do you learn by doing all that downstream processing on 10k cycles worth of this generated sine that you wouldn't learn by processing a much smaller # of cycles, perhaps as small as 1?

 

If the 'cycles' input to the sine pattern generator is often >>1, this seems like a prime candidate to consider for speeding things up dramatically.  It doesn't speak to the CPU differences, but perhaps it could make the point moot?


The size of that sine wave array seems to drive almost all the downstream processing.  So I went back and dug up the sample code you posted and that array contains 160k sine wave cycles!   Zoomed in, it isn't very clean looking with a little less than 4 samples per cycle, so *maybe* (?) you learn a bit more from 10 cycles than 1.  But I don't understand what you gain from the remaining 99.9%+.

 

If you could cut your data size down by a factor of 1000, THAT would be a much bigger win than the other tweaks we're discussing.

 

So am I missing something?  Is there a reason you can't do that?

 

 

-Kevin P

CAUTION! New LabVIEW adopters -- it's too late for me, but you *can* save yourself. The new subscription policy for LabVIEW puts NI's hand in your wallet for the rest of your working life. Are you sure you're *that* dedicated to LabVIEW? (Summary of my reasons in this post, part of a voluminous thread of mostly complaints starting here).
0 Kudos
Message 38 of 39
(718 Views)

The Mean VI is slow, replace with native functions for a modest speed up, I went from 82 ->72 ms.

 

mcduff_0-1694209558050.png

EDIT: The speed up may be smaller after running a few times. The first run was slow.

 

0 Kudos
Message 39 of 39
(671 Views)