Does @simd hang Base.Threads.@spawn?

Holy @#*@! All I had to do was deepcopy(p_Ds_v7), the parameters vector, in each Base.Threads.@spawn call.

I had thought I wasn’t writing anything to the parameters vector, but actually I now remember/see that I added 4 blank floats to the parameters vector to provide workspace for the simd calculations. These are not read back out, but apparently that was it.

Now it’s blazingly fast, and accurate:

tspan = (0.0, 1.0)
parallel_with_plain_v5(tspan, p_Ds_v7, solve_results2; number_of_solves=number_of_solves)
# Faster than serial plain version
# (duration, sum_of_solutions)
# (0.351, 8.731365050398926)
# (0.343, 8.731365050398926)
# (0.366, 8.731365050398926)


serial_with_simd_v7(tspan, p_Ds_v7, solve_results1; number_of_solves=number_of_solves)
# (5.819, 8.731365050398926)
# (0.069, 8.731365050398926)
# (0.071, 8.731365050398926)
# (0.07, 8.731365050398926)

parallel_with_simd_v7(tspan, p_Ds_v7, solve_results2; number_of_solves=number_of_solves)
# New version with deepcopy of parameters input to solve
# (0.023, 8.731365050398926)
# (0.016, 8.731365050398926)
# (0.017, 8.731365050398926)

# Old parallel_with_simd_v7 was dramatically slower than serial simd version
# (and inaccurate)
# (duration, sum_of_solutions)
# (136.966, 9.61313614002137)
# (141.843, 9.616688089683372)

Thankyou thankyou Chris! And thanks for your amazing Julia work!

1 Like