Julia significantly slower than Matlab for simple sin(x) loop – am I doing something wrong?

GateOfShadows · October 20, 2025, 10:02pm

Hi everyone,

I recently did some benchmark testing between Julia and Matlab, focusing on basic for-loops and vectorized numerical operations. To my surprise, Julia performed significantly worse than Matlab in some very simple scenarios. I’m trying to determine:

Is this due to my code implementation (missing flags, wrong usage, etc)?
Or is this an actual limitation of Julia’s current handling of such math-heavy element-wise loops?

Here’s the test I ran in Julia:

using BenchmarkTools

arr = randn(Int(1e7))
vals = similar(arr)

# for loop
@btime @inbounds @simd for i in eachindex(arr)
    vals[i] = sin(arr[i])
end

# for loop with multi-threading (JULIA_NUM_THREADS = 8)
println("threads: $(Threads.nthreads())")
@btime Threads.@threads for i in eachindex(arr)
    vals[i] = sin(arr[i])
end

# vectorized
@btime vals2 = sin.(arr);

The outputs are:

for loop: 922.618 ms (39998981 allocations: 610.34 MiB)
for loop with multi-threading: 204.304 ms (39999023 allocations: 610.34 MiB)
vectorized: 87.596 ms (4 allocations: 76.29 MiB)

Matlab version:

clear;
clc;

arr = randn(1e7,1);
vals = zeros(size(arr));

tic;
for ii = 1:length(arr)
    vals(ii) = sin(arr(ii));
end
t1 = toc;

tic;
vals2 = sin(arr);
t2 = toc;

The outputs are

for loop: t1 = 0.1735 s
vectorized: t2 = 0.0285 s

I expected Julia’s loop to be competitive, especially with @inbounds, @simd, and @threads. But in my test, even Matlab’s loop beat Julia’s multi-threaded version — and Matlab’s broadcast was 3x faster than Julia’s.

I’m evaluating Julia for scientific computing (currently using Matlab heavily), and would appreciate any insights or correction of my assumptions!

Thanks in advance!

mbauman · October 20, 2025, 10:20pm

Welcome! Yes, this is an artifact of how you’re benchmarking. Julia — unlike Matlab — is very function-centric. You get significantly better performance by putting your code inside functions and passing the data as function arguments. And the way you’re using @btime there is treating arr and vals like non-constant globals — as though you wrote that loop outside a function.

You can tell @btime to treat these like function arguments with a special $ flag. This is just a special thing for BenchmarkTools.

julia> @btime @inbounds @simd for i in eachindex(arr)
           vals[i] = sin(arr[i])
       end
  918.916 ms (39998981 allocations: 610.34 MiB)

julia> @btime for i in eachindex($arr)
           $vals[i] = sin($arr[i])
       end
  90.637 ms (0 allocations: 0 bytes)

No need for @inbounds and @simd here.

I’m not sure what Matlab is doing in its vectorized version, but I’d suspect threads. Broadcast doesn’t use threads by default, but it does provide a speedup:

julia> Threads.nthreads()
6

julia> @btime Threads.@threads for i in eachindex($arr)
           $vals[i] = sin($arr[i])
       end
  16.317 ms (32 allocations: 3.20 KiB)

GateOfShadows · October 20, 2025, 10:31pm

Thank you so much! Julia is indeed faster than Matlab now with the “$” flag.

John_Gibson · October 20, 2025, 10:49pm

Worthwhile readings on this topic:

DNF · October 21, 2025, 6:18am

Also keep in mind that when you call code like this in Matlab, it is directly running a highly optimized C or Fortran implementation, possibly multithreaded. So it’s not Julia vs Matlab, but Julia vs C/Fortran. You should not normally expect Julia to outperform Matlab in those cases.

Vasily_Pisarev · October 21, 2025, 1:21pm

Keep in mind that adding $ is not a performance tip. It is only a trick to substitute something into the @btime macro by value instead of by name. If anything, I would suggest writing as follows if you don’t want to write a separate function:

@btime (@inbounds for i in eachindex(arr)
    vals[i] = sin(arr[i])
end) setup=(arr=$arr, vals=$vals)

Typically, you want to benchmark function calls:

function set_sin!(target, args)
    @inbounds for i in eachindex(target, args)
        target[i] = sin(args[i])
    end
end

@btime set_sin!(vals, arr)

As for why Matlab is still faster than single-threaded Julia, there are two possible explanation which may actually come to play together.
First, as already said. Matlab may use multiple threads by default.
Second, the algorithm to compute element-wise sine may be tailored for handling array inputs: there might be a faster rearrangement of arithmetic operations if you want to compute multiple sines than just computing each one sequentially.

stevengj · October 21, 2025, 2:07pm

As I wrote in another thread, comparing to optimized library routines (like a vectorized sin(array) call) is not particularly interesting except to verify that Julia can produce good compiled code. Where you hope to get significant performance improvements is in cases where you have to write new performance-critical code, not simply call one or two library functions.

Mason · October 21, 2025, 2:32pm

mbauman:

You can tell @btime to treat these like function arguments with a special $ flag. This is just a special thing for BenchmarkTools.
julia> @btime @inbounds @simd for i in eachindex(arr)
           vals[i] = sin(arr[i])
       end
  918.916 ms (39998981 allocations: 610.34 MiB)

julia> @btime for i in eachindex($arr)
           $vals[i] = sin($arr[i])
       end
  90.637 ms (0 allocations: 0 bytes)

Just to tack onto this answer, one might wonder why there is a massive speedup here when you interpolate, but not with the broadcast example:

julia> @btime vals .= sin.(arr);
  85.849 ms (1 allocation: 16 bytes)

julia> @btime $vals .= sin.($arr);
  86.514 ms (0 allocations: 0 bytes)

The reason is that in the loop example, julia needed to access an untyped global variable at each iteration of the loop, so every time it loops it has no idea what vals or arr could possibly be, whereas in the broadcast example, it doesn’t know what vals and arr are only at the very start before they get passed into the broadcast machinery, at which point it is able to loop knowing what they are.

Elrod · October 28, 2025, 10:00pm

If you’re bottlenecked by these kinds of streaming workloads (you probably aren’t), you could try

julia> @btime $vals .= sin.($arr);
  82.147 ms (0 allocations: 0 bytes)

julia> using LoopVectorization

julia> @btime @turbo $vals .= sin.($arr);
  13.840 ms (0 allocations: 0 bytes)

julia> @btime @tturbo $vals .= sin.($arr);
  3.332 ms (0 allocations: 0 bytes)

julia> @btime vmapnt!(sin, $vals, $arr);
  11.214 ms (0 allocations: 0 bytes)

julia> @btime vmapntt!(sin, $vals, $arr);
  2.244 ms (0 allocations: 0 bytes)

julia> versioninfo()
Julia Version 1.12.1
Commit ba1e628ee49 (2025-10-17 13:02 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 36 × Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, cascadelake)
  GC: Built with stock GC
Threads: 36 default, 1 interactive, 36 GC (on 36 virtual cores)

Topic		Replies	Views
Matlab versus Julia General Usage	33	5097	July 15, 2021
Arithmetic broadcasting in Julia 5x slower than MATLAB Performance	17	1137	May 26, 2022
MATLAB outperforms Julia (20 times faster) running this nested loop Performance question , performance , matlab , time , finitediff	20	6006	January 10, 2023
Julia is significantly slower (~18 x) than Matlab in vector and matrix algebra New to Julia	32	2121	June 25, 2023
Why is this Julia code considerably slower than Matlab New to Julia performance	64	8644	March 5, 2017

Julia significantly slower than Matlab for simple sin(x) loop – am I doing something wrong?

Related topics