I am getting inconsistent results from multiplying the same two matrices together with OpenBLAS on Julia master:
julia> y = randn(400,400);
julia> x = randn(400,400);
julia> cond(x * y)
3108.7389106095816
julia> cond(x * y)
250095.81630383938
julia> cond(x * y) # same
250095.81630383938
julia> cond(x * y)
16627.89658977373
julia> cond(x * y) # again
250095.81630383938
julia> cond(x * y)
2631.777572954922
julia> z1 = x * y;
julia> z2 = x * y;
julia> maximum(abs, z1 .- z2)
81.02332984524476
81 is not a trivial difference!!
I understand that if threading results in adding numbers together in a different order (depending on when threads complete), that it is expected to get slightly different answers. But 81 is a very large number compared to the magnitude of entries in the matrix. argmax
points to the 81, and I see a large block of incorrect answers.
When I look at zδ
, the matrix of differences, I notice that it is almost entirely 0.0.
julia> zδ[1:15,350:365]
0.0 3.40016 5.51059 15.5744 38.3682 7.10866 38.1058 0.791143 27.7809 15.6203 13.285 1.89272 17.4851 0.0 0.0 0.0
0.0 21.2102 0.686299 7.07578 38.1467 21.3853 13.6936 22.3834 20.9474 12.5285 19.6346 7.20672 18.6198 0.0 0.0 0.0
0.0 5.87573 4.10078 31.6043 2.47654 19.1434 19.9269 40.6902 8.60271 17.1051 38.5678 13.5105 39.141 0.0 0.0 0.0
0.0 29.5379 17.8031 19.7143 5.82252 12.9395 81.0233 8.24332 9.7846 6.86419 34.6936 40.1017 3.81325 0.0 0.0 0.0
0.0 7.74442 30.1483 1.19507 19.9153 24.768 3.15936 26.7861 8.8477 25.36 4.08705 34.1401 25.1399 0.0 0.0 0.0
0.0 25.1696 6.23755 5.07073 22.7564 22.1431 9.66137 18.8559 18.284 34.299 8.87634 11.4519 19.5225 0.0 0.0 0.0
0.0 0.0837383 10.7374 1.09023 31.3011 12.3954 17.2506 34.3372 33.3534 5.56733 3.04148 0.278859 6.39959 0.0 0.0 0.0
0.0 65.448 16.9498 8.4026 19.6995 11.1045 52.3888 23.052 2.5545 27.4019 2.33811 1.74398 5.8446 0.0 0.0 0.0
0.0 31.9391 0.708929 4.41991 12.3908 10.8966 8.13072 11.3948 8.15043 6.74772 19.3858 41.8931 37.9043 0.0 0.0 0.0
0.0 3.92232 2.29414 20.2639 30.3701 17.8275 8.54815 0.0279554 1.83999 35.9216 3.36693 33.7164 11.1285 0.0 0.0 0.0
0.0 3.98999 17.6255 33.2399 1.99359 26.5441 24.0701 31.2736 11.5732 15.1464 23.3285 24.5417 0.35802 0.0 0.0 0.0
0.0 22.4708 8.45022 11.09 22.609 25.0343 21.2055 9.91141 11.274 5.06574 17.0936 7.48366 36.9995 0.0 0.0 0.0
0.0 0.112206 25.5591 20.8131 1.24503 46.9793 18.4077 24.4956 5.83157 51.9475 0.893717 5.15207 12.8931 0.0 0.0 0.0
0.0 14.986 9.28381 16.6061 35.9068 1.0082 5.79184 9.15659 2.57992 23.2148 33.4547 11.6657 13.8442 0.0 0.0 0.0
0.0 16.46 28.1464 35.3354 4.86666 26.4538 4.80481 36.36 37.221 27.8764 25.0792 4.25533 25.3219 0.0 0.0 0.0
Setting it to a single thread solves the problem.
julia> BLAS.set_num_threads(1)
julia> zot = x*y;
julia> maximum(abs, z1 .- zot)
84.05177973473988
julia> maximum(abs, z2 .- zot)
83.6115213251184
julia> maximum(abs, z3 .- zot)
83.6115213251184
From a few tests vs my own matmul functions, the single threaded BLAS seems to be getting the correct answer (maximum absolute difference closer to 10^-13).
MKL, and Julia 0.6 + OpenBLAS get the correct answer.
Should I file an issue with OpenBLAS on github?
I reproduced on two computers (one AMD Ryzen, the other Intel Skylake), so this seems like a cross platform problem. Please let me know if you can/cannot reproduce.