Can you provide an MWE?
10ns is very small value, looks like arrays are very small and overhead from barriers shadows actual calculations. Is it always the case? For large enough arrays difference should be negligible, so further actions depends on the use case.