Passing Matrix Data between LabVIEW & GPU

MathGuy · ‎08-18-2010

When I began integrating GPU computing into LabVIEW, one of the first problems I encountered was a difference in how 2D data was stored on the different platforms. Most individuals familiar with the BLAS interface know that functions assume matrix agruments are stored in column-major order (i.e. column data is contiguous). In LabVIEW and many C implementations, 2D data is stored using row-major order. If the elements of 2D array in LabVIEW are sent to the GPU in the same order they are stored on the host, they will represent a transposed version of the 2D array in GPU memory.

In creating the LVCUDA module, I had at least two options for handling these data transfers. The first option was to transpose the data in the underlying interface code so that users where unaware of the differences in data storage between the host and GPU. While ignorance is bliss, a huge performance penalty is possible since data is transposed at each call. There would be no way to avoid it.

My second option was what is deployed in the released module. Instead of automatically transposing 2D data, I've exposed a transpose input on memory function wrappers that met the following two conditions:

The wrapper is designed to manage data used by BLAS functions, and
The wrapper can avoid the transposition work and termporary storage required on the host.

The BLAS function interfaces support a transpose input for matrix arguments so that these inputs do not have to be physically transposed. As a result, I chose to add the transpose input option is Set cuBLAS Matrix (SGL).vi. Because this data could be consumed by a BLAS function later on, the transposition could be achieved by setting the transpose option in the BLAS call.

Other functions could be fitted with this option but are not currently:

Get cuBLAS Matrix (SGL).vi
Copy 2D Array to CUDA Memory.vi
Copy CUDA Memory to 2D Array.vi

When LVCUDA was in prepration for release, each of these seemed to fail one of the two conditions. At that time, we were not aware that memory allocated by the CUDA memory functions could be used to allocate arguments for CUBLAS functions. So, for consistency, this input should be added to Copy 2D Array to CUDA Memory.vi.

That leaves the retrieval versions of these memory functions. If they are retrofitted with a transpose input, I'm compelled to add a transposed output to reflect the state of the data. As I've thought about that further, this output might be beneficial for the send functions as well so that it can be passed downstream to GPU functions receiving the data as arguments.

If you've found this data interaction confusing, I hope this information helps clarify the situation. If you have an opinion or a use case you'd like to offer up concerning this issue, consider this thread a welcome mat.

Darren

gaisi · ‎01-04-2011

Hi, MathGuy:

I tried to sent two array into my function and realize a simple addition of the two and stored in the second array. However, the return value of the second array do not change!!!!!!!!!

I tried to copy the example and worked on it for days, but there is no progress.

Hope you could help me..... My code is here....

int tryplus( tLVCUDARef * _var1, tLVCUDARef * _var2, float _Size_of_Matrx)
{
    return _compute(GetCUDAData(_var1),
                            GetCUDAData(_var2),
                            _Size_of_Matrx,
                            GetCUDAContext(_var2));
}

///////////////////////////////////////////////////////////////////////////////

NICOMPUTE_FUNCTION_3( _compute, tLVCUDAData * _var1,
tLVCUDAData * _var2,
float _Size_of_Matrix)
{

int status = kNICompute_NoError;
float * var1;
float * var2;

var1 = (float*)NIComputeGetDataReferenceUserData(gEnvironment, _var1, &status);
if (status != kNICompute_NoError)
return status;

var2 = (float*)NIComputeGetDataReferenceUserData(gEnvironment, _var2, &status);
if (status != kNICompute_NoError)
return status;

plus2 (var1, var2, _Size_of_Matrix);
return 0;
}

HOST void plus2( float * _var1,float * _var2, float _Size_of_Matrix)

{
plusit<<<1,1>>>(_var1,_var2,_Size_of_Matrix);
return;
}

GLOBAL void plusit(float * _var1,float * _var2, float _Size_of_Matrix)
{
for (int i=0;i<_Size_of_Matrix;++i)
{
_var2+=_var2;
}

return;

}

GPU Computing

Passing Matrix Data between LabVIEW & GPU

Passing Matrix Data between LabVIEW & GPU

Re: Passing Matrix Data between LabVIEW & GPU