Thanks for all the answers so far, @L_Adam, @Tamas_Papp! In the initial post I ordered the matrix entries rowwise, but everybody correctly assumed columnwise ordering

I came up with a quick and dirty way for 1-based indexing and quadratic matrices, which does not need an if-statement:

function offdiag(A::Matrix)
@assert size(A)[1] == size(A)[2]
D = size(A)[1]
v = zeros(D*(D-1))
for i in 1:D
for j in 1:(i-1)
v[(i-1)*(D-1)+j] = A[j,i]
end
for j in (i+1):D
v[(i-1)*(D-1)+j-1] = A[j,i]
end
end
v
end

I know from direct SIMD programming that conditions in your code can decrease performance gains from SIMD by a factor (2?). Would this apply here as well? Or is Julia smart enough or not using SIMD at all?

PS: In my code example, one has off course disable bounds checking to make automatic SIMD available.

If you know something about the structure of the problem, you can of course do better than a filtered collection (which does not know the number of elements in the result ex ante, etc).

That is true. My code looks more like a C example, while yours is a nice, generic oneliner. And it is easy to turn it into an interator over the off-diagonal indices than allocating a new vector.

offdiag_iter(A) = (ι for ι in CartesianIndices(A) if ι[1] ≠ ι[2])

Just for completeness, deleteat! is quite efficient and accepts a collection for indices. The indices of the diagonal elements are simply 1:n+1:n^2. Compared to manual loops, it can even be 30% faster.

A = randn(1000,1000);
@btime offdiag($A); # with bounds checking turned off
3.399 ms (2 allocations: 7.62 MiB)
@btime deleteat!(vec($A), 1:size(A,1)+1:size(A,1)^2);
2.595 ms (8 allocations: 7.63 MiB)

The typical Julia solution is usually writing a generic catch-all method, and adding specialized methods when faster solutions are available for specific types.

This happens for dense matrices as in your example, but eg triangular and banded matrices could also benefit from specialized code. That said, for a one-off application this is probably overkill.