LabVIEW GPU Computing

jl8987 · ‎08-25-2009

Thanks for the reply. I did not read the "resource independent" section. This applies to my specific application and I did not see the need for a computing context. I can now see the other side of the story where a computing context makes sense.

ryanldavidson · ‎09-04-2009

Whenever I try to run any of the examples all I ever get is an error code 12 "Call Library Function Node in LV_BlackScholes.vi"

Suggestions??

MathGuy · ‎09-08-2009

This sounds like an issue with the installation of CUDA. When the Black-Scholes DLL loads, the CUDA runtime DLL is automatically loaded in the background. If the runtime fails to load, you'll get a library error on the LabVIEW side.

I've found you can run into this issue in at least a few ways: installing the CUDA driver after the CUDA toolset, swapping out a video card that requires a different driver from the one currently installed, installing a more recent CUDA toolkit than the driver supports, etc. Sometimes it is difficult to figure out which of these (or a combination of them) is the culprit.

The safest solution is to (re)download the versions of the driver and toolkit that you want to use. In addition, download the sdk which can be used to test the installation of the CUDA runtime components. Uninstall the driver and toolkit which will require a reboot (sorry about that - I know it's annoying). Then, install the driver first, the toolkit second and the sdk third. Once everything is installed, go to the examples directory for the sdk and run some of the pre-built examples (found in 'bin\win32\release'). In particular, run deviceQuery.exe and bandwidthTest.exe. These list the CUDA devices you have in the system and perform some basic I/O tests using the primary CUDA device (id = 0).

jl8987 · ‎09-08-2009

I have two unrelated questions:

How would include the CUDA palette in my LV palette?
Is there an algorithm that would allow LV to calculate the optimal block/thread sizes based on the inputs that will be acted on by the kernal?

Thanks.

MathGuy · ‎09-08-2009

I used the palette editor to add the CUDA palettes to my developer install of LabVIEW so it may be a hack rather than a proper method. I'd take a look at help on customizing the work environment (aka look for palettes>>customizing under Index).

UPDATE: See this thread on installation: http://decibel.ni.com/content/message/8219#8219
I don't have an automated tool for refining the resource usage but I've designed most of my benchmarks to do so manually. I expose the block & threads/block parameters at an API level. You might be surprised that many choices result in good performance. My original benchmarks showed that many parameter pairs for these got close to peak (i.e. w/in 20% or so) given the data sizes I was testing. This was very promising as a user doesn't have to find the 'sweet spot' to get some reasonable speed-up.

jl8987 · ‎09-08-2009

Thanks for the response. Do you recommend a particular method for operating on multiple device at once? For example, will "GetMaxFlops" (I can't remember the exact function) automatically select another device if the first choice is being used?

MathGuy · ‎09-08-2009

I'm not familiar with that function so I can't address your question directly. The key to multiple devices from LabVIEW is the use of compute contexts. How they are 'deployed' can vary dramatically based on the use case. I have only toyed with static parallelism where data is distributed equally between two GPU kernels invoked in parallel on the same diagram. This is not the only way.

If you are familiar with queues, another straightforward approach is to create compute contexts for each device in the system and grab a context from the queue each time a computation element requires work. If the compute element always uses the same resources, then the devices can be preconfigured when the contexts are created. Otherwise, resource allocation will have to be done each time the computation runs in a context.

It may be possible to test a compute context to see if it is 'busy', but the queue approach avoids this naturally because it wouldn't re-queue the resource until it's finished.

ryanldavidson · ‎09-08-2009

Thanks for the suggestion. All the CUDA examples in the CUDA SDK work fine.

I didn't read your limitations section in my haste to give it a try so I'm guessing my problem originates there. I'm am running vista 64 bit. Therefore my GPU drivers are probably the problem. Does that seem plausable to you?

If so, what are the chances we'll be seeing a 64bit compatible version anytime soon?

Anyways, thanks for the help and I apologize for being a begger and a chooser.

MathGuy · ‎09-09-2009

Yes. That would have been my next question. We only support 32-bit at this point. I have not tried to run 32-bit LabVIEW on a 64-bit platform using the examples I built. My guess is that they will fail to load and link to the installed driver.

jl8987 · ‎12-07-2009

Question about memcpy (Copy 2D Array to CUDA Memory (SGL).vi) commands....

Are these asynchronous copies?

Thanks.

GPU Computing

LabVIEW GPU Computing

Re: LabVIEW GPU Computing

Re: LabVIEW GPU Computing

Re: LabVIEW GPU Computing

Re: LabVIEW GPU Computing

Re: LabVIEW GPU Computing

Re: LabVIEW GPU Computing

Re: LabVIEW GPU Computing

Re: LabVIEW GPU Computing

Re: LabVIEW GPU Computing

Re: LabVIEW GPU Computing