Simplest possible LVCUBLAS program failing?

Mike Renfro · ‎08-13-2010

I'm just getting started with LVCUDA after doing a small amount of work with GPUmat for MATLAB. For some reason, I can't manage to get LVCUBLAS to work properly, even for something as simple as copying a 1D array to and from the GPU. I have a minimal example working with regular CUDA, but not CUBLAS. CUDA 2.3, Windows XP SP3 32-bit, Labview 9.0f3, Quadro FX370M. Where can I look next for fixing this? Getting an internal error in CUBLAS isn't too specific.

Working CUDA block diagram and front panel:

Non-working CUBLAS block diagram and front panel:

MathGuy · ‎08-13-2010

If I recall, I've noticed the same issue w/ CUBLAS data transfers under v2.3 in my latest benchmarks. To work around it, I used LVCUDA allocations instead. The CUBLAS functions still work with inputs and outputs allocated by the CUDA runtime.

The error you're getting is an internal error from the CUBLAS library. My guess is that there is some incompatibility between apps built w/ early 2.x versions of CUDA. At least, that's my guess. The current NILabs module was built against v2.1. I'm planning on releasing binaries with support for more recent CUDA runtime engines.

Mike Renfro · ‎08-17-2010

I've cut out all but the SGEMV call now, but it still fails:

Tried with CUDA 2.1 (failed just getting a context, and I think the Black-Scholes example failed as well), 2.2, and 2.3 (both ran Black-Scholes, but not my code). Haven't tried 3.1, since that would require a new set of video card drivers.

MathGuy · ‎08-17-2010

I found a bug in the SGEMV wrapper code. Under examples\lvcublas_sgemv, open LV_cublasSgemv.cpp. On line #131 and #134, the x and y data references are being retrieved from _A not _x and _y.

This explains the internal error code returned by the CUBLAS library. The references are valid but not for the SGEMV call. Let me know if this works for you so I can post a fix.

Mike Renfro · ‎08-17-2010

I expect that'd be a problem, if not the only one. I think I've successfully rebuill lvcublas_sgemv.dll with Visual Studio 2008, but I admit to not having much experience with Visual Studio (I did set the Solution Configuration to Release, and then right-clicked the lvcublas_sgemv entry in Solution Explorer, and hit Rebuild). C:\LVCUDA\examples\lvcublas_sgemv\bin\lvcublas_sgemv.dll has an updated modification date, but I still get an error 15 on running the VI. If you have time to build a known-good dll, I can try that, too. Otherwise, I'm still missing something.

MathGuy · ‎08-18-2010

OK, I've tracked down multiple issues. While your code would not have worked as is, most of the problems are in the module itself.

So that everyone has access to these fixes, I've posted a service pack from the introduction thread. You can find it here:

http://decibel.ni.com/content/message/18844#18844

To research your problem, I had to save your VI in LV8.6 so the version you're getting back is in that version. You'll find two primary changes:

I modified your execution sequence by wiring the first vector retrieval (aka the test to see if the download worked) in between the send and SGEMV operations. If you look at the input dependencies, the retrieval is free to happen any time after the send VI is complete - even after the SGEMV operation. That's LabVIEW's concurrent execution in action!
I changed the boolean constant wired to SGEMV to be TRUE rather than FALSE. This is because LabVIEW data uses row-major storage while the GPU uses column-major storage. This can be really confusing and I don't have a fail-safe way of protecting users from running into this. I plan to start a thread on this issue to raise awareness and get user input.

I've applied SP1 to a clean installation of LVCUDA and run the attached VI successfully. If you update your existing example to call SGEMV w/ the correct transpose input, you should get back the correct results.

Mike Renfro · ‎08-19-2010

Hasn't fixed it for me yet. LVCUBLASError:13 (cublas.h) shows up after SGEMV ran. This happens both with LabVIEW 8.6 and 2009, and with a fresh install of the CUDA 2.3, LVCUDA SP1 (the md5sum for lvcublas_sgemv.dll is 9eb89051b7f12ab8f67d29927b0256a7), and your edited VI. I might be able to try this on a different system with a better GPU, but I'd have to see what version of LabVIEW is available there. I'm completely puzzled now.

MathGuy · ‎08-19-2010

I have been unable to reproduce the problem on my workstation. My original tests of SP1 were done using CUDA v2.1 but I changed to v2.3 just in case it was an runtime incompatibility. I even rebuilt the example under 2.3 to see if building against different runtime versions could be it. No luck.

I was running your example on some old Gen1 GPU hardware - a Tesla D870. I plan to add a Gen2 GPU to that system and will rerun the test just in case. Sorry the fix didn't get rid of all the errors in your sample code.

koche · ‎11-18-2010

I suppose there is still a problem with Get Cublas Matrix. In the attached example vectors and matrixes are transported to the GPU and read back. All functions work, except Get Cublas Matrix produces the error code -12.

// Calls that request context for execution will return this error if they are being executed without one.

#define kNICompute_ContextUndefinedError -12

Get Cublas Matrix doesn't have an ComputeContext Input.

But actually in the library all manipulations of data are still missing.

Is there a way to use other functions of Cuda and Cublas without the need fore a C-Compiler. Is it possible to use CUDA.dll, CUBLAS.dll and CUFFT.dll directly?

Otherwise the benefit of these functions is limited.

Thanks for help.

MathGuy · ‎12-02-2010

This error is tied to a limitation built into the LVCUDA v1.x implementation coupled with a special fix applied internally to support matrices from CUBLAS.

I'll do a more detailed description of the issue below, but the jist of it is that you can avoid the problem by using data management functions on the LVCUDA palette exclusively. According to CUBLAS documentation, the get/set functions for vectors and matrices utilize the CUDA allocation routines internally so you should not experience by difference in execution or performance by using only LVCUDA data routines.

The Details

When this module was architected, we have two fundamental issues related to usability we wanted to address:

Protection for the user when attempting to run a GPU kernel with data that was allocated on another device.
The ability to support more than one context simultaneously (i.e. in pre-Fermi days, this only made sense when targeting multiple GPUs in parallel).

To handle the protection issue, the NICompute framework allowed for the creation of a context with different management permissions (read, write, close). These permissions were tied to the library in which the context was defined (i.e. here's where the setting of the gEnvironment variable plays a role at DLL init).

In LVCUDA v1.x, I the context is created with only read permissions. While this is the safest mode, it has some undesirable side effects when data references are allocated to that context from a different library (i.e. from LVCUBLAS not LVCUDA).

For CUBLAS, I was unable to use the get/set functions for matrices to work properly in early releases of CUBLAS. So, I allocate data for these functions from the LVCUBLAS library using functions defined in the LVCUDA library. This creates a mismatch between the library that created the data reference and the functions that want to manipulate/use the data.

This is responsible for the error you're receiving. For the earlier test case in this thread, changing the data management to use only LVCUDA VIs resolved the issue. Perhaps it will in your case.

Darren

GPU Computing

Simplest possible LVCUBLAS program failing?

Simplest possible LVCUBLAS program failing?

Re: Simplest possible LVCUBLAS program failing?

Re: Simplest possible LVCUBLAS program failing?

Re: Simplest possible LVCUBLAS program failing?

Re: Simplest possible LVCUBLAS program failing?

Re: Simplest possible LVCUBLAS program failing?

Re: Simplest possible LVCUBLAS program failing?

Re: Simplest possible LVCUBLAS program failing?

Re: Simplest possible LVCUBLAS program failing?

Re: Simplest possible LVCUBLAS program failing?