cuda - What is (if any) the standard approach for designing out of core/ external memory algorithms? -


i looking rules of thumb designing algorithms data accessed due limitations of disk speed, pci speed(gpgpu) or other bottleneck.

also, how 1 manage gpgpu programs memory of application exceeds gpgpu memory?

in general, gpu memory should not arbitrary limitation on size of data algorithms. gpu memory considered "cache" of data gpu operating on, many gpu algorithms designed operate on more data can fit in "cache". accomplished moving data , gpu while computation going on, , gpu has specific concurrent execution , copy/compute overlap mechanisms enable this.

this implies independent work can completed on sections of data, typically indicator acceleration in parallelizable application. conceptually, similar large scale mpi applications (such high performance linpack) break work pieces , send pieces various machines (mpi ranks) computation.

if amount of work done on data small compared cost transfer data, data transfer speed still become bottleneck, unless addressed directly via changes storage system.

the basic approach handling out-of-core or algorithms data set large fit in gpu memory @ once determine version of algorithm can work on separable data, , craft "pipelined" algorithm work on data in chunks. example tutorial covers such programming technique here.


Comments

Popular posts from this blog

jquery - How can I dynamically add a browser tab? -

keyboard - C++ GetAsyncKeyState alternative -

android - java.net.UnknownHostException(Unable to resolve host “URL”: No address associated with hostname) -