What is the cause of this performance difference between Julia and Cython?

tkf · April 9, 2021, 1:17am

mcabbott:

julia> function compute_array_threads(m, n=m)
           x = Array{Int32}(undef, (m, n))
           @inbounds Threads.@threads for j = 0:n - 1
               for i = 0:m - 1
                   x[i+1, j+1] = Int32(i*i + j*j)
               end
           end
           return x
       end;

FYI, since @threads creates a closure and @inbounds does not penetrate closures, you need to put @inbounds inside @threads (as in @inbounds x[i+1, j+1] = Int32(i*i + j*j)).

The difference is probably very hard to observe for this kind of code, though. It’s easy to observe this with a bit heavier SIMD’able computation is indie. For example:

julia> function f!(ys ,xs)
           @inbounds Threads.@threads for i in eachindex(xs, ys)
               ys[i] = Base.FastMath.sqrt_fast(xs[i])
           end
       end;

julia> function g!(ys ,xs)
           Threads.@threads for i in eachindex(xs, ys)
               @inbounds ys[i] = Base.FastMath.sqrt_fast(xs[i])
           end
       end;

julia> xs = ones(2^25); ys = similar(xs);

julia> @btime f!(ys ,xs)
  13.981 ms (41 allocations: 3.64 KiB)

julia> @btime g!(ys ,xs)
  5.630 ms (41 allocations: 3.64 KiB)

julia> Threads.nthreads()
8

Topic		Replies	Views
This post claims that Julia is still 4-5 times slower than cython Performance	38	4434	February 21, 2022
Comparing Python, Julia, and C++ Performance broadcast , python	21	33140	November 1, 2018
General questions from Python user Performance	59	4317	March 8, 2021
Why is this simple function twice as slow as its Python version Performance question	97	4427	April 12, 2021
Huge performance improvement by separating function? General Usage	15	1155	August 27, 2018

What is the cause of this performance difference between Julia and Cython?

Related topics