Simple Parallel Examples for Embarrassingly Simple Problems

there is something else I do not understand.

why is mohamed’s simple_loop_sum() slower than parallel other versions, specifically sharedarray_parallel_sum(), even with one processor (nprocs()==1)?? (I also confirmed it on my own computer.)