sorting - How to partly sort arrays on CUDA? -
problem
provided have 2 arrays:
const int n = 1000000; float a[n]; mystruct *b[n];
the numbers in can positive or negative (e.g. a[n]={3,2,-1,0,5,-2}
), how can make array partly sorted (all positive values first, not need sorted, negative values)(e.g. a[n]={3,2,5,0,-1,-2}
or a[n]={5,2,3,0,-2,-1}
) on gpu? array b should changed according (a keys, b values).
since scale of a,b can large, think sort algorithm should implemented on gpu (especially on cuda, because use platform). surely know thrust::sort_by_key
can work, muck work since not need array a&b sorted entirely.
has come across kind of problem?
thrust example
thrust::sort_by_key(thrust::device_ptr<float> (a), thrust::device_ptr<float> ( + n ), thrust::device_ptr<mystruct> ( b ), thrust::greater<float>() );
thrust's documentation on github not up-to-date. @jaredhoberock said, thrust::partition
way go since supports stencils. may need copy github repository:
git clone git://github.com/thrust/thrust.git
then run scons doc
in thrust folder updated documentation, , use these updated thrust sources when compiling code (nvcc -i/path/to/thrust ...
). new stencil partition, can do:
#include <thrust/partition.h> #include <thrust/execution_policy.h> #include <thrust/iterator/zip_iterator.h> #include <thrust/tuple.h> struct is_positive { __host__ __device__ bool operator()(const int &x) { return x >= 0; } }; thrust::partition(thrust::host, // if want test on host thrust::make_zip_iterator(thrust::make_tuple(keyvec.begin(), valvec.begin())), thrust::make_zip_iterator(thrust::make_tuple(keyvec.end(), valvec.end())), keyvec.begin(), is_positive());
this returns:
before: keyvec = 0 -1 2 -3 4 -5 6 -7 8 -9 valvec = 0 1 2 3 4 5 6 7 8 9 after: keyvec = 0 2 4 6 8 -5 -3 -7 -1 -9 valvec = 0 2 4 6 8 5 3 7 1 9
note 2 partitions not sorted. also, order may differ between original vectors , partitions. if important you, can use thrust::stable_partition
:
stable_partition differs partition in stable_partition guaranteed preserve relative order. is, if x , y elements in [first, last), such pred(x) == pred(y), , if x precedes y, still true after stable_partition x precedes y.
if want complete example, here is:
#include <thrust/host_vector.h> #include <thrust/device_vector.h> #include <thrust/partition.h> #include <thrust/iterator/zip_iterator.h> #include <thrust/tuple.h> struct is_positive { __host__ __device__ bool operator()(const int &x) { return x >= 0; } }; void print_vec(const thrust::host_vector<int>& v) { for(size_t = 0; < v.size(); i++) std::cout << " " << v[i]; std::cout << "\n"; } int main () { const int n = 10; thrust::host_vector<int> keyvec(n); thrust::host_vector<int> valvec(n); int sign = 1; for(int = 0; < n; ++i) { keyvec[i] = sign * i; valvec[i] = i; sign *= -1; } // copy host device thrust::device_vector<int> d_keyvec = keyvec; thrust::device_vector<int> d_valvec = valvec; std::cout << "before:\n keyvec = "; print_vec(keyvec); std::cout << " valvec = "; print_vec(valvec); // partition key-val on device thrust::partition(thrust::make_zip_iterator(thrust::make_tuple(d_keyvec.begin(), d_valvec.begin())), thrust::make_zip_iterator(thrust::make_tuple(d_keyvec.end(), d_valvec.end())), d_keyvec.begin(), is_positive()); // copy result host keyvec = d_keyvec; valvec = d_valvec; std::cout << "after:\n keyvec = "; print_vec(keyvec); std::cout << " valvec = "; print_vec(valvec); }
update
i made quick comparison thrust::sort_by_key
version, , thrust::partition
implementation seem faster (which naturally expect). here obtain on nvidia visual profiler, n = 1024 * 1024
, sort version on left, , partition version on right. may want same kind of tests on own.
Comments
Post a Comment