Eigvals() results being slightly different when Distributed.jl are used

Hi friends, recently I am working on some eigen problems in quantum physics. I need to construct and diagonalize quite a few symmetric real matrices so I am using Distributed, the standard library.

See a minimal example.

using Distributed
@everywhere using LinearAlgebra

function symrand(n)
  A=rand(n,n)
  A+=transpose(A)
  A
end

mats=[symrand(10) for _ in 1:1]

# distributed version
function exec1()
  temp=[]
  for (p, m) in Iterators.zip(Iterators.cycle(workers()), mats)
    push!(temp,remotecall(eigvals, p, m))
  end
  for j in temp
    @show fetch(j)
  end
end

# series version
function exec2()
  for m in mats
    @show eigvals(m)
  end
end

@time exec1() # ...1
@time exec1() # ...2
@time exec1()

@time exec2() # ...3
@time exec2()
@time exec2()

We can then save the codes above in eigs.jl and run with julia -p2 eigs.jl.

I find the distributed results of eigvals() (also eigvecs, eigen) returns differently (i.e., 1 and 3 differ slightly), while I can also run the same function several times and always gives same result (I.e., 1 and 2 give exactly same output).

It should be mentioned that the difference is very small. I read something about floating number arithmetic, but I am not sure about it here. Did I wrote something wrong or it is a possible bug in the library or it cannot be avoided?

The error is around 10e-15, for sure is a floating point fault; I just don’t know “where”.

Unless you need high precision results, this difference is small enough to be ignored :grinning:

3 Likes

Usually there is no guarantee that linear algebra operations behave in the same way on all machines and implementations. For instance, even on the same machine and with the same compiler optimizations, changing the BLAS library you use will alter slightly your results. You can look into projects such as Reproblas if you wish to have exact reproducibility (warning: that may come at a large performance cost).

1 Like