Welcome! Yes, this is an artifact of how you’re benchmarking. Julia — unlike Matlab — is very function-centric. You get significantly better performance by putting your code inside functions and passing the data as function arguments. And the way you’re using @btime there is treating arr and vals like non-constant globals — as though you wrote that loop outside a function.
You can tell @btime to treat these like function arguments with a special $ flag. This is just a special thing for BenchmarkTools.
julia> @btime @inbounds @simd for i in eachindex(arr)
vals[i] = sin(arr[i])
end
918.916 ms (39998981 allocations: 610.34 MiB)
julia> @btime for i in eachindex($arr)
$vals[i] = sin($arr[i])
end
90.637 ms (0 allocations: 0 bytes)
No need for @inbounds and @simd here.
I’m not sure what Matlab is doing in its vectorized version, but I’d suspect threads. Broadcast doesn’t use threads by default, but it does provide a speedup:
julia> Threads.nthreads()
6
julia> @btime Threads.@threads for i in eachindex($arr)
$vals[i] = sin($arr[i])
end
16.317 ms (32 allocations: 3.20 KiB)