[ACCEPTED]-CUDA: Calling a __device__ function from a kernel-cuda
Accepted answer
CUDA actually inlines all functions by default 8 (although Fermi and newer architectures 7 do also support a proper ABI with function 6 pointers and real function calls). So your 5 example code gets compiled to something 4 like this
__global__ void Kernel(int *ptr)
{
if(threadIdx.x<2)
if(ptr[threadIdx.x]==threadIdx.x)
ptr[threadIdx.x]++;
}
Execution happens in parallel, just 3 like normal code. If you engineer a memory 2 race into a function, there is no serialization 1 mechanism that can save you.
Source:
stackoverflow.com
More Related questions
Cookie Warning
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.