The following two functions seem to do exactly the same thing, but the built-in version (`randn!`

) is some 60% slower. Am I missing something or could `randn!`

be implemented more efficiently?

```
using Random
broadcast_randn(x) = (x .= randn.())
inplace_randn(x) = randn!(x)
x = zeros(10_000)
@btime broadcast_randn($x)
41.680 μs (0 allocations: 0 bytes)
@btime inplace_randn($x)
68.003 μs (0 allocations: 0 bytes)
```

Looking at the output of `@code_llvm`

, it seems that the broadcasted version takes advantage of SIMD instructions. The base implementation of the in-place version is not too complicated:

```
function $randfun!(rng::AbstractRNG, A::AbstractArray{T}) where T
for i in eachindex(A)
@inbounds A[i] = $randfun(rng, T)
end
A
end
```

By including a `@simd`

annotation, I recover the same performance as the broadcasted version:

```
function myrandn!(x)
@inbounds @simd for i in eachindex(x)
x[i] = randn()
end
x
end
```

However, `@simd`

may not play well with the random number generator (are there memory dependencies based on the state of the RNG?), but the specifics are beyond me.

2 Likes

This will not generate any SIMD instructions so it must be that the structure of the loop is slightly different with the SIMD macro which in this case appears to matter.

Diffing the asm generated in both cases, the only difference is that the simd version calls `julia_randn_unlikely_12356`

, whereas the non-simd version calls `randn_unlikely`

. No idea what that means, but probably unintended?

edit: well, that’s not the *only* difference, there’s also a `jne`

that gets swapped with a `jb`

, but that probably shouldn’t matter?

I was confused in the previous post, sorry. That has nothing to do with `simd`

:

```
using Random
broadcast_randn(x) = (x .= randn.())
inplace_randn(x) = randn!(x)
function myrandn!(x)
@inbounds @simd for i in eachindex(x)
x[i] = randn()
end
x
end
function myrandn_nosimd!(x)
@inbounds for i in eachindex(x)
x[i] = randn()
end
x
end
x = zeros(10_000)
@btime broadcast_randn($x);
@btime inplace_randn($x);
@btime myrandn!($x);
@btime myrandn_nosimd!($x);
```

is only slow in `inplace_randn`

. The real difference in the two cases is that `randn`

isn’t inlined in `randn!`

. This is despite `randn`

being marked as `@inline`

, so it looks like the inlining heuristics are being too conservative here.