Atomic operators, SSE/AVX, and OpenMP -
i'm wondering if sse/avx operations such addition , multiplication can atomic operation? reason ask in openmp atomic construct works on limited set of operators. not work on example sse/avx additions.
let's assume had datatype float4
corresponds sse register , addition operator defined float4 sse addition. in openmp reduction on array following code:
float4 sum4 = 0.0f; //sets 4 values 0 #pragma omp parallel { float4 sum_private = 0.0f; #pragma omp nowait for(int i=0; i<n; i+=4) { float4 val = float4().load(&array[i]) //load 4 floats sse register sum_private4 += val; //sum_private4 = _mm_addps(val,sum_private4) } #pragma omp critical sum4 += sum_private; } float sum = horizontal_sum(sum4); //sum4[0] + sum4[1] + sum4[2] + sum4[3]
but atomic faster critical in general , instinct tells me sse/avx operations should atomic (even if openmp not support it). limitation of openmp? use example e.g. intel threading building blocks or pthreads atomic operation?
edit: based on jim cownie's comment created new function best solution. verified gives correct result.
float sum = 0.0f; #pragma omp parallel reduction(+:sum) { vec4f sum4 = 0.0f; #pragma omp nowait for(int i=0; i<n; i+=4) { vec4f val = vec4f().load(&a[i]); //load 4 floats sse register sum4 += val; //sum4 = _mm_addps(val,sum4) } sum += horizontal_add(sum4); }
edit: based on comments jim cownie , comments mystical @ thread openmp atomic _mm_add_pd realize reduction implementation in openmp not use atomic operators , it's best rely on openmp's reduction implementation rather try atomic.
sse & avx in general not atomic operations (but multiword cas sure sweet).
you can use combinable class template in tbb or ppl more general purpose reductions , thread local initializations, think of synchronized hash table indexed thread id; works fine openmp , doesn't spin threads on own.
you can find examples on tbb site , on msdn.
regarding comment, consider code:
x = x + 5
you should think of following particularly when multiple threads involved:
while( true ){ oldvalue = x desiredvalue = oldvalue + 5 //this conditional atomic compare , swap if( x == oldvalue ) x = desiredvalue break; }
make sense?
Comments
Post a Comment