How to quantify the processing tradeoffs of CUDA devices for C kernels?

up vote 2 down vote favorite

I recently upgraded from a GTX480 to a GTX680 in the hope that the tripled number of cores would manifest as significant performance gains in my CUDA code. To my horror, I've discovered that my memory intensive CUDA kernels run 30%-50% slower on the GTX680.

I realize that this is not strictly a programming question but it does directly impact on the performance of CUDA kernels on different devices. Can anyone provide some insight into the specifications of CUDA devices and how they can be used to deduce their performance on CUDA C kernels?

asked 1 hour ago

Gearoid Murphy
1,197619

85% accept rate

For maximum performance you really need to tune your code for different GPU configurations. – Paul R 56 mins ago

From what Wikipedia tells me, the memory BW of the 680 is not much higher than that of the 480. So if you're memory-bound, you're not going to see much speedup. I can't explain why you see a slowdown, though. – Oli Charlesworth 56 mins ago

feedback

Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

Browse other questions tagged linux cuda or ask your own question.

question feed

asked	today
viewed	17 times

How to quantify the processing tradeoffs of CUDA devices for C kernels?

Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

Your Answer

Browse other questions tagged linux cuda or ask your own question.

Hello World!

How to quantify the processing tradeoffs of CUDA devices for C kernels?

Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

Your Answer

Browse other questions tagged linux cuda or ask your own question.

Hello World!

Related