GPU Computing

cancel
Showing results for 
Search instead for 
Did you mean: 

LabVIEW GPU Computing

Thanks for the reply.  I did not read the "resource independent" section.  This applies to my specific application and I did not see the need for a computing context.  I can now see the other side of the story where a computing context makes sense.

0 Kudos
Message 11 of 52
(2,182 Views)

Whenever I try to run any of the examples all I ever get is an error code 12 "Call Library Function Node in LV_BlackScholes.vi"

Suggestions??

0 Kudos
Message 12 of 52
(2,182 Views)

This sounds like an issue with the installation of CUDA. When the Black-Scholes DLL loads, the CUDA runtime DLL is automatically loaded in the background. If the runtime fails to load, you'll get a library error on the LabVIEW side.

I've found you can run into this issue in at least a few ways: installing the CUDA driver after the CUDA toolset, swapping out a video card that requires a different driver from the one currently installed, installing a more recent CUDA toolkit than the driver supports, etc. Sometimes it is difficult to figure out which of these (or a combination of them) is the culprit.

The safest solution is to (re)download the versions of the driver and toolkit that you want to use. In addition, download the sdk which can be used to test the installation of the CUDA runtime components. Uninstall the driver and toolkit which will require a reboot (sorry about that - I know it's annoying). Then, install the driver first, the toolkit second and the sdk third. Once everything is installed, go to the examples directory for the sdk and run some of the pre-built examples (found in 'bin\win32\release'). In particular, run deviceQuery.exe and bandwidthTest.exe. These list the CUDA devices you have in the system and perform some basic I/O tests using the primary CUDA device (id = 0).

0 Kudos
Message 13 of 52
(2,182 Views)

I have two unrelated questions:

  1. How would include the CUDA palette in my LV palette?
  2. Is there an algorithm that would allow LV to calculate the optimal block/thread sizes based on the inputs that will be acted on by the kernal?

Thanks.

0 Kudos
Message 14 of 52
(2,182 Views)
  1. I used the palette editor to add the CUDA palettes to my developer install of LabVIEW so it may be a hack rather than a proper method. I'd take a look at help on customizing the work environment (aka look for palettes>>customizing under Index).

    UPDATE:  See this thread on installation: http://decibel.ni.com/content/message/8219#8219

  2. I don't have an automated tool for refining the resource usage but I've designed most of my benchmarks to do so manually. I expose the block & threads/block parameters at an API level. You might be surprised that many choices result in good performance. My original benchmarks showed that many parameter pairs for these got close to peak (i.e. w/in 20% or so) given the data sizes I was testing. This was very promising as a user doesn't have to find the 'sweet spot' to get some reasonable speed-up.
    0 Kudos
    Message 15 of 52
    (2,182 Views)

    Thanks for the response.  Do you recommend a particular method for operating on multiple device at once?  For example, will "GetMaxFlops" (I can't remember the exact function) automatically select another device if the first choice is being used?

    0 Kudos
    Message 16 of 52
    (2,182 Views)

    I'm not familiar with that function so I can't address your question directly. The key to multiple devices from LabVIEW is the use of compute contexts. How they are 'deployed' can vary dramatically based on the use case. I have only toyed with static parallelism where data is distributed equally between two GPU kernels invoked in parallel on the same diagram. This is not the only way.

    If you are familiar with queues, another straightforward approach is to create compute contexts for each device in the system and grab a context from the queue each time a computation element requires work. If the compute element always uses the same resources, then the devices can be preconfigured when the contexts are created. Otherwise, resource allocation will have to be done each time the computation runs in a context.

    It may be possible to test a compute context to see if it is 'busy', but the queue approach avoids this naturally because it wouldn't re-queue the resource until it's finished.

    0 Kudos
    Message 17 of 52
    (2,182 Views)

    Thanks for the suggestion. All the CUDA examples in the CUDA SDK work fine.

    I didn't read your limitations section in my haste to give it a try so I'm guessing my problem originates there. I'm am running vista 64 bit. Therefore my GPU drivers are probably the problem. Does that seem plausable to you?

    If so, what are the chances we'll be seeing a 64bit compatible version anytime soon?

    Anyways, thanks for the help and I apologize for being a begger and a chooser.

    0 Kudos
    Message 18 of 52
    (2,182 Views)

    Yes. That would have been my next question. We only support 32-bit at this point. I have not tried to run 32-bit LabVIEW on a 64-bit platform using the examples I built. My guess is that they will fail to load and link to the installed driver.

    0 Kudos
    Message 19 of 52
    (2,182 Views)

    Question about memcpy (Copy 2D Array to CUDA Memory (SGL).vi) commands....

    Are these asynchronous copies?

    Thanks.

    0 Kudos
    Message 20 of 52
    (2,182 Views)