Tagged Questions

info newest faq votes active unanswered

This tag refers to NVIDIA’s parallel computing architecture (CUDA) that enables dramatic increases in computing performance by harnessing the power of the GPU (graphics processing unit). The CUDA architecture enables application development using several languages and associated APIs, including: ...

learn more… | top users | synonyms (2)

votes

0answers

6 views

Porting and openMp program to cuda c: correct grid_size/block_size and reduction

I want to convert an openMP program to cuda c. I try to find my way on the web and the sdk. But the material is beyond my level. My c program loop over n=2^30 index and add the weight of each index. ...

asked 13 mins ago

Nicolas Essis-Breton
1316

votes

2answers

21 views

How to quantify the processing tradeoffs of CUDA devices for C kernels?

I recently upgraded from a GTX480 to a GTX680 in the hope that the tripled number of cores would manifest as significant performance gains in my CUDA code. To my horror, I've discovered that my memory ...

linux cuda

asked 1 hour ago

Gearoid Murphy
1,202619

votes

0answers

10 views

build gpuocelot fails due to boost linkage errors on OS X Snow Leopard

I used the latest trunk version of gpuocelot on a mac snow 10.6.8 with gcc 4.5.3 and boost @1.49.0_0+universal (active) (boost installed via macports). I run scons and I get ...

boost cuda x86 osx-snow-leopard amd

asked 15 hours ago

nyiotis
11

votes

1answer

43 views

How to gather rows from a matrix by indices list using CUDA Thrust

This is seemingly a simple problem but I just can’t figure out an elegant way to do this with CUDA Thrust. I have a two dimensional matrix NxM and a vector of desired row indices of size L that is a ...

cuda thrust

asked yesterday

Leo
263

votes

2answers

33 views

Link to cutil in GPU Computing SDK

I've been trying to link to the functions in the cutil.h ofthe GPU Computing SDK released by NVIDIA. At the moment, I am simply trying to compile this simple piece of code: #include <iostream> ...

gcc library compilation cuda nvcc

asked yesterday

sj755
551214

votes

0answers

29 views

In CUDA, how to translate screen space coordinate to world space coordinate in the Kernel Function

Here, I'm trying to add ray-casting into a real 3D scene. As we know, in ray-casting, in order to cast the ray, we need to get the direction of ray. The first point in the ray is the start point of ...

opengl cuda coordinate-transformation raycasting

asked yesterday

TonyLic
325

votes

0answers

37 views

Ray-casting using CUDA [closed]

I now want to learn ray-casting volume rendering using CUDA, but I do not know how. I have some basic knowledge on OpenGL and 3D graphics. What books should learn from? And more importantly, where ...

graphics 3d cuda gpu raycasting

asked yesterday

江南烟雨
1385

votes

1answer

62 views

What happened when alll thread of a warp read the same global memory?

I want to know what happened when all threads of a warp read the same 32-bit address of global memory. How many memory requests are there? Is there any serialization. The GPU is Fermi card, the ...

cuda gpu gpgpu gpu-programming

asked 2 days ago

Fan Zhang
475

votes

2answers

68 views

parallel reduction in CUDA

Following code sums every 32 elements in a array to the very first element of each 32 element group: int i = threadIdx.x; int warpid = i&31; if(warpid < 16){ s_buf[i] += ...

cuda

asked 2 days ago

small_potato
394313

votes

2answers

49 views

cuda code produces incorrect result in release mode

my CUDA code produces correct result in Debug mode. However, in the release mode, the same code produces garbage results. Could the synchronization between threads behave differently between debug and ...

cuda

asked 2 days ago

small_potato
394313

-1

votes

1answer

41 views

HAAR wavelet transform in CUDA

I have Tried to Implement the HAAR wavelet transform in CUDA for a 1D array. ALGORITHM I have 8 indices in the input array With this condition if(x_index>=o_width/2 || y_index>=o_height/2) I ...

cuda haar-wavelet

asked 2 days ago

asd
73

votes

2answers

38 views

CUDA Threads execution order

In CUDA when we talk about parallel threads executing the same code is there any order to their execution? For-example: If, I have 4 threads,for a 1D array of 4 elements.All four threads perfom ...

cuda

asked 2 days ago

asd
73

votes

1answer

24 views

CUDA cublas<t>gbmv understanding

I recently wanted to use a simple CUDA matrix-vector multiplication. I found a proper function in cublas library: cublas<<>>gbmv. Here is the official documentation But it is actually very ...

cuda

asked 2 days ago

Ixanezis
8914

votes

3answers

100 views

CUDA speedup for simple calculations

I have the following code in cuda_computation.cu #include <iostream> #include <stdio.h> #include <cuda.h> #include <assert.h> void checkCUDAError(const char *msg); ...

c cuda

asked May 22 at 22:07

baol
1,673817

votes

1answer

37 views

What can cause this cuda stack trace and what is wrong with this call to cudaMemcpy?

My program, which draws a small animation, uses glut and cuda, and is written in C++, hangs after a while, and I see the following trace in the debugger when I interrupt it a few seconds after it ...

cuda

asked May 22 at 18:21

Kirill
1404

15 30 50 per page

newest cuda questions feed

2,148

questions tagged

cuda about »

c++× 366
c× 253
gpu× 252
gpgpu× 197
nvidia× 183
cuda-kernel× 152
opencl× 128
thrust× 104
parallel-processing× 95
gpu-programming× 72
nvcc× 72
memory× 69
visual-studio-2010× 49
multithreading× 47
opengl× 47
visual-studio-2008× 44
pycuda× 40
visual-studio× 39
cudamalloc× 39
arrays× 37
opencv× 36
optimization× 35
python× 34
matlab× 30
debugging× 30

Tagged Questions

Related Tags