Atomic operators, SSE/AVX, and OpenMP -


i'm wondering if sse/avx operations such addition , multiplication can atomic operation? reason ask in openmp atomic construct works on limited set of operators. not work on example sse/avx additions.

let's assume had datatype float4 corresponds sse register , addition operator defined float4 sse addition. in openmp reduction on array following code:

float4 sum4 = 0.0f; //sets 4 values 0 #pragma omp parallel {     float4 sum_private = 0.0f;     #pragma omp nowait     for(int i=0; i<n; i+=4) {         float4 val = float4().load(&array[i]) //load 4 floats sse register         sum_private4 += val; //sum_private4 = _mm_addps(val,sum_private4)     }     #pragma omp critical     sum4 += sum_private; } float sum = horizontal_sum(sum4); //sum4[0] + sum4[1] + sum4[2] + sum4[3] 

but atomic faster critical in general , instinct tells me sse/avx operations should atomic (even if openmp not support it). limitation of openmp? use example e.g. intel threading building blocks or pthreads atomic operation?

edit: based on jim cownie's comment created new function best solution. verified gives correct result.

float sum = 0.0f; #pragma omp parallel reduction(+:sum) {     vec4f sum4 = 0.0f;       #pragma omp nowait     for(int i=0; i<n; i+=4) {         vec4f val = vec4f().load(&a[i]); //load 4 floats sse register         sum4 += val; //sum4 = _mm_addps(val,sum4)     }     sum += horizontal_add(sum4); } 

edit: based on comments jim cownie , comments mystical @ thread openmp atomic _mm_add_pd realize reduction implementation in openmp not use atomic operators , it's best rely on openmp's reduction implementation rather try atomic.

sse & avx in general not atomic operations (but multiword cas sure sweet).

you can use combinable class template in tbb or ppl more general purpose reductions , thread local initializations, think of synchronized hash table indexed thread id; works fine openmp , doesn't spin threads on own.

you can find examples on tbb site , on msdn.

regarding comment, consider code:

x = x + 5 

you should think of following particularly when multiple threads involved:

while( true ){     oldvalue = x     desiredvalue = oldvalue + 5     //this conditional atomic compare , swap     if( x == oldvalue )        x = desiredvalue        break; } 

make sense?


Comments

Popular posts from this blog

jquery - How can I dynamically add a browser tab? -

keyboard - C++ GetAsyncKeyState alternative -

android - java.net.UnknownHostException(Unable to resolve host “URL”: No address associated with hostname) -