there is something else I do not understand.
why is mohamed’s simple_loop_sum() slower than parallel other versions, specifically sharedarray_parallel_sum(), even with one processor (nprocs()==1)?? (I also confirmed it on my own computer.)
there is something else I do not understand.
why is mohamed’s simple_loop_sum() slower than parallel other versions, specifically sharedarray_parallel_sum(), even with one processor (nprocs()==1)?? (I also confirmed it on my own computer.)