Julia is significantly slower (~18 x) than Matlab in vector and matrix algebra

Because matt1 and matt2 are (without purpose) defined before the loop? Otherwise, I don’t see it. (To be safe, declare vec1, mat, matt1 and matt2 as locals.)

Working on a blog-post about it, but here’s the unpublished draft: www.julialang.org/blog/2023/06/PSA-dont-use-threadid.md at 65e25e87d01111cc45a477c8072c9f5dc5878a39 · JuliaLang/www.julialang.org · GitHub

2 Likes

Interesting. I always thought that each task in a for-loop was running separately, in the sense of not sharing local variables with the other iterations. Is that the problem in last instance?

Yes, they run separately. The problem is that to do a summation, you need to somehow merge those separate values, and the various ways people often try to merge those values is incorrect.

1 Like

But is the problem you mention arising here? if each thread is running separately and you store the result in Arg1[i], then you can sum Arg1 out of the loop.

Yes, that one is fine because you haven’t used threadid. But if you want to be efficient, you should avoid allocating an N element vector Arg1 and can get away with nthreads() different elements, and that’s where people start getting themselves into trouble.

1 Like

To get things back on topic, here’s the fastest version I could make (essentially just what @gdalle already wrote):

#+begin_src julia
function vec_prod1(N,n)
    Arg = 0.0
    mat = Array{Float64}(undef, n, n)
    matt1 = Array{Float64}(undef, n, n)
    matt2 = Array{Float64}(undef, n, n)
    for i ∈ 1:N
        vec1 = 1:n;
        mat .= vec1 .* vec1'
        matt1 .= (mat .* mat) ./ n^2;
        matt2 .= mul!(matt2, mat, mat) ./ n^2
        Arg += (matt1 ⋅ matt2)/N;
    end
    Arg
end

@btime vec_prod1(10, 1000)
#+end_src
#+RESULTS:
:   84.133 ms (6 allocations: 22.89 MiB)
: 2.094817739604174e19

I found this was actually faster than using explicit multithreading since the matrix multiplication is already multithreaded for 1000 x 1000 matrices and BLAS multithreading is very efficient.

Here’s the explicitly multithreaded version for comparison

#+begin_src julia
function vec_prod3(N,n)
    chunks = Iterators.partition(1:N, max(1, N ÷ Threads.nthreads()))
    tasks = map(chunks) do chunk
        Arg = 0.0
        mat = Array{Float64}(undef, n, n)
        matt1 = Array{Float64}(undef, n, n)
        matt2 = Array{Float64}(undef, n, n)
        for i ∈ chunk
            vec1 = 1:n;
            mat .= vec1 .* vec1'
            matt1 .= (mat .* mat) ./ n^2;
            matt2 .= mul!(matt2, mat, mat) ./ n^2
            Arg += (matt1 ⋅ matt2)/N;
        end
        return Arg
    end
    Arg = sum(fetch, tasks)
end

@btime vec_prod3(10, 1000)
#+end_src
#+RESULTS:
:   97.034 ms (84 allocations: 228.88 MiB)
: 2.094817739604174e19
3 Likes

Sorry, I dont know why I said this was fine. You’ll still get race conditions from different threads overwriting mat, matt1 and matt2 (assuming you were actually calculating new values each iteration instead of reusing the exact same values like is done here)

1 Like

Now I see what you mean. The variables inside the loop weren’t local, because they were already defined outside the loop.

Yeah, redefining variables inside a function is prone to err. For example for type stability:

#STABLE
function type_stable()
    e = 1
    parameter() = e

    return e 
end

#UNSTABLE
function type_unstable()
    e = 1    
    parameter() = e
    
    e = 1
    return e
end

#UNSTABLE
function type_unstable()
    e = 1    
    parameter() = e
    
    e::Int64 = 1
    return e
end

yeah, this is my least favorite issue in Julia: https://github.com/JuliaLang/julia/issues/15276. It is one of the most annoying performance issues in Julia.

3 Likes

Type instability isnt the problem here, it’s a race condition Race condition - Wikipedia

I know, I meant that redefining variables can lead to several problems (e.g. race conditions, type instability)

1 Like

In this case, the variable is never re-assigned, it’s just that the memory of the array it points to is over-written