there is something else I do not understand.
why is mohamed’s simple_loop_sum()
slower than parallel other versions, specifically sharedarray_parallel_sum()
, even with one processor (nprocs()==1)?? (I also confirmed it on my own computer.)
there is something else I do not understand.
why is mohamed’s simple_loop_sum()
slower than parallel other versions, specifically sharedarray_parallel_sum()
, even with one processor (nprocs()==1)?? (I also confirmed it on my own computer.)