Why is this Julia code considerably slower than Matlab

zsoerenm · March 1, 2017, 5:04pm

Using Yeppp and devectorization I got it 3 x faster than my initial implementation and almost 5 x faster when parallelized:

@benchmark Testing.performance_test()
BenchmarkTools.Trial: 
  memory estimate:  534.06 MiB
  allocs estimate:  69
  --------------
  minimum time:     1.241 s (9.83% GC)
  median time:      1.246 s (9.52% GC)
  mean time:        1.289 s (12.10% GC)
  maximum time:     1.421 s (19.32% GC)
  --------------
  samples:          4
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

@benchmark Testing.performance_test_parallel()
BenchmarkTools.Trial: 
  memory estimate:  198.43 MiB
  allocs estimate:  2075
  --------------
  minimum time:     417.643 ms (26.01% GC)
  median time:      893.489 ms (11.51% GC)
  mean time:        826.470 ms (12.60% GC)
  maximum time:     921.135 ms (11.17% GC)
  --------------
  samples:          7
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

Here is the code:

module Testing

  using Yeppp;

  function performance_test()
    N = 2_000_000;
    range = collect(1:N);

    steering_vectors = complex(ones(4,11), ones(4,11));
    carrier_signal = zeros(Complex{Float64}, N);
    cos_sig, sin_sig = zeros(N), zeros(N);
    sum_signal = zeros(Complex{Float64}, 4, length(range));
    for i = 1:11
        arg = (2π * 1.023e6 / 4e6) .* range .+ (40π/180);
        Yeppp.sin!(sin_sig, arg);
        Yeppp.cos!(cos_sig, arg);
        carrier_signal .= complex.(cos_sig, sin_sig);
        @inbounds for j = 1:4, k = 1:N ; sum_signal[j, k] += steering_vectors[j, i] * carrier_signal[k]; end
    end
    return sum_signal;
  end

  function performance_test_parallel()
    N = 2_000_000;
    range = collect(1:N);

    steering_vectors = complex(ones(4,11), ones(4,11));
    carrier_signal = zeros(Complex{Float64}, N);
    cos_sig, sin_sig = zeros(N), zeros(N);
    sum_signal_temp = zeros(Complex{Float64}, 4, length(range));
    sum_signal = convert(SharedArray, sum_signal_temp);
    @parallel for i = 1:11
        arg = (2π * 1.023e6 / 4e6) .* range .+ (40π/180);
        Yeppp.sin!(sin_sig, arg);
        Yeppp.cos!(cos_sig, arg);
        carrier_signal .= complex.(cos_sig, sin_sig);
        @inbounds for j = 1:4, k = 1:N ; sum_signal[j, k] += steering_vectors[j, i] * carrier_signal[k]; end
    end
    return sum_signal;
  end
end

Yeppp seems not that fast as Apple Accelerate. Without Yeppp the unparallelized code takes 2.2s instead of 1.2s.

Topic		Replies	Views
My julia code is somehow much slower than the matlab code New to Julia question , performance , matlab	55	4271	December 30, 2022
Julia is significantly slower (~18 x) than Matlab in vector and matrix algebra New to Julia	32	1814	June 25, 2023
Matlab versus Julia General Usage	33	4869	July 15, 2021
How to make my Julia code fast like MATLAB? New to Julia question	19	1995	May 25, 2021
Simple speed comparison with Matlab New to Julia question	33	1153	May 6, 2025

Why is this Julia code considerably slower than Matlab

Related topics