![]() |
|
|
LabVIEW GPU Computing unleashes the computing power of NVIDIA GPUs via the CUDA interface from within a LabVIEW application. Code that calls the GPU for computation is integrated into the native parallel execution system of LabVIEW as if it were any other multi-threaded external library function call.
LabVIEW GPU Computing includes:
Put together these items allow LabVIEW users to:
Prototypes based on this architecture have shown very valuable in a number of scientific applications. Here's a few examples:
At NIWeek 2008, a real-time solution to this controller was shown on stage at loop time of under 2ms using two octal-core Dell T7400 workstations and a host laptop. At SC08, the same controller was redesigned to run on a single box - an octal-core Dell T7400 workstation fitted with an NVIDIA Tesla C1060. That system achieved a loop time of just under 7ms. Although this redesign seems backwards (and it is!), it proved that LabVIEW is capable of taking advantage of processing resources and deploying solutions to multiple targets with only subtle modifications.
A recent demo at NIWeek 2009 showed the flexibility of GPU computing in LabVIEW by using the Black-Scholes model to compute fair market value for a 1000 stocks over 500 future times. A single application was developed to solve the PDE (Partial Differential Equation) using a mixture of CPU cores and GPUs. It was deployed to multiple nodes for simultaneous computations focusing each on different stock exchange data. Each compute node consisted of an NI-8353 Quad-core 1U rack-mount controller attached to one of the two ports on an NVIDIA S1070 (4 Tesla Processors). This solution employed 4 CPU cores and 2 NVIDIA GPUs working in parallel with each other to solve the PDE. To show the control LabVIEW yields over parallelism, the master node dynamically changed the data distribution between resources, the number of CPU cores and the number of GPUs used to compute the Black-Scholes. This was done per node and on the fly as data was being processed.
The installer is available from this page. You'll also need appropriate versions of NI LabVIEW and NVIDIA CUDA before opening any VIs or running examples.
An overview document is installed with the source code in the doc directory. To learn how to use this feature, refer to the documentation that is available on this page. Additional reference material will be added as user feedback warrants.
The installer creates an examples directory containing a few custom implementations and several LVCUDA and LVCUBLAS wrappers. Because users will want to call their own GPU functions from LabVIEW, most of the focus is on building your function for execution from LabVIEW.
For an interactive demo that runs immediately, open the project BlackScholes.lvproj in the examples directory and run European Call Option.vi from the vi folder. All examples that include solution files were created with Visual Studio 2005 and compiled using VC8 and the NVCC compiler that installs with CUDA v2.2.
All technical issues should be posted on the GPU Community. Support depends significantly on the adoption of the feature.
While execution framework is independent of both LabVIEW and CUDA, the API at the G and C levels may change based on the respective development platforms.
Very nice. I will test it^^
I've found that examples I've created in the past using CUDA v1.x were not able to run using CUDA v2.x without recompiling my DLL. In effect, this used the v2.x import libraries that tie into the CUDA runtime system. You can install this module on your system and try to run the Black-Scholes example (European Call Option.vi) without any work on your part. Either way, I'd appreciate it if you posted the results here (or on the general discussion thread for GPU computing).
Does the framework support running in emulation mode for developing on computers without a CUDA-capable GPU?
I've never tried it. All of my development systems have NVIDIA hardware compatible with CUDA. In theory, it should work. If you try it, please post the results.
That all sounds exciting. Unfortunately I don't have any GPU on my laptop, and I am still in Austin, flying back tonight. In the meantime, it seems that the "discuss" link for this topic on http://decibel.ni.com/content/groups/ni-labs seems to be broken. Maybe somebody could fix it....
On my GPU computer at home I have LabVIEW 8.6 and LabVIEW 2009 32bit and 64bit installed at the same time. How does the GPU installer know where to install? Can I choose? (I know 64bit is out, but how is the decision made for the two 32bit versions? Will both be upgraded?).
I'll look into the discussion link issue. As for the installer, this is somewhat unique from the typical LabVIEW approach. All files are installed in an LVCUDA directory paralleling the (default) CUDA directory for the Toolkit (i.e. c:\CUDA).
You can copy the LVCUDA folder to new locations (e.g. under a specific LabVIEW location) to support multiple LV versions. The VIs are designed to look for the support DLLs in a relative location.
The VIs were compiled with LV8.6 so you'll see the dirty asterisk on load in LV 2009. If you try to move the folder, please post your results as I have not investigated all the permutations.
I did not limit the installation to 32-bit. In some cases, it is possible to invoke 32-bit apps within the 64-bit OS. However, I'm not sure the NVIDIA drivers that support CUDA would be accessible in that way.
What is the difference between writing my own CUDA DLL and calling it from Labview versus using this library? Obviously there are some conveniences to use the LV library. But besides the obvious, what advantage is there?
This is explained in the document LVCUDA - Why Do I Need A Compute Context. In this reference, there is one type of GPU computing (Resource Independent) that can use the CUDA DLL without using the LV library.
Note that using this library also requires recompiling your DLL using our NICompute context layer. Together, this allows you to call a CUDA-based DLL function from a LabVIEW diagram where (a) the correct GPU device is called and (b) the (cached) parameters on that device are valid.
Thanks for the reply. I did not read the "resource independent" section. This applies to my specific application and I did not see the need for a computing context. I can now see the other side of the story where a computing context makes sense.
Whenever I try to run any of the examples all I ever get is an error code 12 "Call Library Function Node in LV_BlackScholes.vi"
Suggestions??
This sounds like an issue with the installation of CUDA. When the Black-Scholes DLL loads, the CUDA runtime DLL is automatically loaded in the background. If the runtime fails to load, you'll get a library error on the LabVIEW side.
I've found you can run into this issue in at least a few ways: installing the CUDA driver after the CUDA toolset, swapping out a video card that requires a different driver from the one currently installed, installing a more recent CUDA toolkit than the driver supports, etc. Sometimes it is difficult to figure out which of these (or a combination of them) is the culprit.
The safest solution is to (re)download the versions of the driver and toolkit that you want to use. In addition, download the sdk which can be used to test the installation of the CUDA runtime components. Uninstall the driver and toolkit which will require a reboot (sorry about that - I know it's annoying). Then, install the driver first, the toolkit second and the sdk third. Once everything is installed, go to the examples directory for the sdk and run some of the pre-built examples (found in 'bin\win32\release'). In particular, run deviceQuery.exe and bandwidthTest.exe. These list the CUDA devices you have in the system and perform some basic I/O tests using the primary CUDA device (id = 0).
I have two unrelated questions:
Thanks.
Thanks for the response. Do you recommend a particular method for operating on multiple device at once? For example, will "GetMaxFlops" (I can't remember the exact function) automatically select another device if the first choice is being used?
I'm not familiar with that function so I can't address your question directly. The key to multiple devices from LabVIEW is the use of compute contexts. How they are 'deployed' can vary dramatically based on the use case. I have only toyed with static parallelism where data is distributed equally between two GPU kernels invoked in parallel on the same diagram. This is not the only way.
If you are familiar with queues, another straightforward approach is to create compute contexts for each device in the system and grab a context from the queue each time a computation element requires work. If the compute element always uses the same resources, then the devices can be preconfigured when the contexts are created. Otherwise, resource allocation will have to be done each time the computation runs in a context.
It may be possible to test a compute context to see if it is 'busy', but the queue approach avoids this naturally because it wouldn't re-queue the resource until it's finished.
Thanks for the suggestion. All the CUDA examples in the CUDA SDK work fine.
I didn't read your limitations section in my haste to give it a try so I'm guessing my problem originates there. I'm am running vista 64 bit. Therefore my GPU drivers are probably the problem. Does that seem plausable to you?
If so, what are the chances we'll be seeing a 64bit compatible version anytime soon?
Anyways, thanks for the help and I apologize for being a begger and a chooser.
Yes. That would have been my next question. We only support 32-bit at this point. I have not tried to run 32-bit LabVIEW on a 64-bit platform using the examples I built. My guess is that they will fail to load and link to the installed driver.
Question about memcpy (Copy 2D Array to CUDA Memory (SGL).vi) commands....
Are these asynchronous copies?
Thanks.
Thank you! Such wrapper I have wait sooo looong time...
But can't use it, because I have CUDA 1.1 only.
Is it possible to preserve backward compatibility with CUDA 1? Or something specific from CUDA 2.2 was used inside?
Andrey