gpu - Linking with 3rd party CUDA libraries slows down cudaMalloc -

it not secret on cuda 4.x first call cudamalloc can ridiculously slow (which reported several times), seemingly bug in cuda drivers.

recently, noticed weird behaviour: running time of cudamalloc directly depends on how many 3rd-party cuda libraries linked program (note not use these libraries, link program them)

i ran tests using following program:

int main() {   cudasetdevice(0);   unsigned int *ptr = 0;   cudamalloc((void **)&ptr, 2000000 * sizeof(unsigned int));      cudafree(ptr); return 1; } 

the results follows:

  • linked with: -lcudart -lnpp -lcufft -lcublas -lcusparse -lcurand running time: 5.852449

  • linked with: -lcudart -lnpp -lcufft -lcublas running time: 1.425120

  • linked with: -lcudart -lnpp -lcufft running time: 0.905424

  • linked with: -lcudart running time: 0.394558

according 'gdb', time indeed goes cudamalloc, it's not caused library initialization routine..

i wonder if has plausible explanation ?

in example, cudamalloc call initiates lazy context establishment on gpu. when runtime api libraries included, binary payloads have inspected , gpu elf symbols , objects contain merged context. more libraries there are, longer can expect process take. further, if there architecture mismatch in of cubins , have backwards compatible gpu, can trigger driver recompilation of device code target gpu. in extreme case, have seen old application linked old version of cublas take 10s of seconds load , initialise when run on fermi gpu.

you can explicitly force lazy context establishment issuing cudafree call this:

int main() {     cudasetdevice(0);     cudafree(0); // context establishment happens here     unsigned int *ptr = 0;     cudamalloc((void **)&ptr, 2000000 * sizeof(unsigned int));        cudafree(ptr);   return 1; } 

if profile or instrument version timers should find first cudafree call consumes of runtime , cudamalloc call becomes free.


Popular posts from this blog

jquery - How can I dynamically add a browser tab? -

android - to resolve host “URL”: No address associated with hostname) -

keyboard - C++ GetAsyncKeyState alternative -