performance - Why does CUDA code run so much faster in NVIDIA Visual Profiler? -


a piece of code takes well on 1 minute on command line done in matter of seconds in nvidia visual profiler (running same .exe). natural question why? there wrong command line, or visual profiler different , not execute on command line?

i'm using cublas, thrust , curand.

incidentally, there's been noticeable slowdown in compiled code on machine recently, old code ran quickly, hence i'm getting suspicious.

update:

  • i have checked calculated output on command line , visual profiler identical - i.e. required code has been run in both cases.
  • gpu-shark indicated performance state unchanged @ p0 when switched command line visual profiler.
  • however, gpu usage reported @ 0.0% when run visual profiler, went high 98% when run off command line.
  • moreover, far less memory used visual profiler. when run off command line, task manager indicates usage of 650-700mb of memory (spikes @ first cudafree(0) call). in visual profiler figure goes down ~100mb.

this old question, i've finished chasing same issue (though cause may not same).

namely: app achieved between 900 , 1100 frames (synchronous launches) per second when running under nvvp, around 100-120 when running outside of profiler.

the cause appears status message printing console via cout. had intended happen once every 100-200 frames. instead, printing status message every frame, , console io became bottleneck.

by printing status message every 100 frames (though optimal number here depend on application), frame rate jumped match seeing in nvvp. of course, handled in separate cpu thread if sort of overhead unacceptable in circumstances.

nvvp has redirect stdout own internal buffer in order capture application's output (which shows in console tab). appears nvvp's mechanism buffering or processing output has less overhead allowing operating system handle directly. looks nvvp buffering everything, , displaying in separate thread, or saving bunch of output until threshold reached, when adds buffer own console tab.

so, advice disable console io, , see if or how affects things.

(it didn't vs2012 refused profile cuda app. have been nice see 80% of execution time spent on console io.)

hope helps!


Comments

Popular posts from this blog

jquery - How can I dynamically add a browser tab? -

keyboard - C++ GetAsyncKeyState alternative -

android - java.net.UnknownHostException(Unable to resolve host “URL”: No address associated with hostname) -