Next, you might notice that we’re creating `cdist`

as a matrix of `zeros`

and then setting every element. That means we’re wasting time filling the initial array with zeros. We can avoid that by constructing a matrix with `undef`

, which tells Julia to just allocate memory but not fill it with anything:

```
function distance_matrix4(c)
cdist = Matrix{eltype(c)}(undef, size(c, 1), triangle_number(size(c, 2) - 1))
col = 1
for j in 1:(size(c, 2) - 1)
for h in (j + 1):size(c, 2)
for k in 1:size(c, 1)
cdist[k, col] = abs(c[k, j] - c[k, h])
end
col += 1
end
end
return cdist
end
```

```
julia> @btime distance_matrix4($c_cols)
55.658 ns (1 allocation: 224 bytes)
```

Very slightly faster, but the return on effort is diminishing at this point.

There’s more you can do, like using `@inbounds`

to avoid bounds checks, using `@simd`

to vectorize your innermost loop, and using StaticArrays.jl to represent your data as a collection of small vectors rather than a matrix. I won’t go into detail on that here because it’s been covered in a lot of other posts on this forum, but you can dig as deep as you’re interested.

Overall, though, I think the message is that if you:

- Pre-allocate your outputs when possible
- Avoid accidentally creating lots of copies
- Don’t be afraid to write out a loop

then you can write Julia which is pretty close to optimal in terms of performance without going too far down the rabbit hole.