Off diagonal elements of Matrix

Hi,
what is the Julia-way of obtaining off-diagonal elements of a matrix flattend as a vector, e.g.

A = [1 2; 3 4]
offdiag(A) = [2, 3]

I stumbled across diagind but could not figure out how to do it neatly.
Best,
jamblejoe

EDIT: Better example:

A = [1 2 3; 4 5 6; 7 8 9]
offdia(A) = [2, 3, 4, 6, 7, 8]

EDIT2:

EDIT3:

1 Like

For bigger matrices what order do you want?

I do not care right now. I am just interested in the 2D case!

EDIT: Sorry, I should have mentioned that!

by 2D do you mean 2x2?

No, I mean Array{T, 2}.

quick and dirty way for Array{T, 2}

function offdiag(ma)
  m, n = size(ma)
  [ma[i, j] for i = 1:m, j = 1:n if i != j]
end 
3 Likes

Or using the generalized indexing accessors,

function offdiag2(A::AbstractMatrix)
    [A[ι] for ι in CartesianIndices(A) if ι[1] ≠ ι[2]]
end 

which should work for indexes other than 1-based.

2 Likes

Thanks for all the answers so far, @L_Adam, @Tamas_Papp! In the initial post I ordered the matrix entries rowwise, but everybody correctly assumed columnwise ordering :smiley:

I came up with a quick and dirty way for 1-based indexing and quadratic matrices, which does not need an if-statement:

function offdiag(A::Matrix)
    @assert size(A)[1] == size(A)[2]
    D = size(A)[1]
    v = zeros(D*(D-1))
    for i in 1:D
        for j in 1:(i-1)
            v[(i-1)*(D-1)+j] = A[j,i]
        end
        for j in (i+1):D
            v[(i-1)*(D-1)+j-1] = A[j,i]
        end
    end
    v
end

I know from direct SIMD programming that conditions in your code can decrease performance gains from SIMD by a factor (2?). Would this apply here as well? Or is Julia smart enough or not using SIMD at all?

PS: In my code example, one has off course disable bounds checking to make automatic SIMD available.

I would suggest that you simply benchmark, eg with

My code without bounds checking

let
    n = 100
    A = rand(n,n)
    @benchmark offdiag($A)
end

BenchmarkTools.Trial: 
  memory estimate:  77.45 KiB
  allocs estimate:  2
  --------------
  minimum time:     7.667 μs (0.00% GC)
  median time:      8.602 μs (0.00% GC)
  mean time:        8.939 μs (2.83% GC)
  maximum time:     2.540 ms (99.43% GC)
  --------------
  samples:          10000
  evals/sample:     3

@L_Adam’s code

let
    n = 100
    A = rand(n,n)
    @benchmark offdiag1($A)
end

BenchmarkTools.Trial: 
  memory estimate:  256.78 KiB
  allocs estimate:  17
  --------------
  minimum time:     78.077 μs (0.00% GC)
  median time:      80.600 μs (0.00% GC)
  mean time:        82.971 μs (1.17% GC)
  maximum time:     13.512 ms (71.90% GC)
  --------------
  samples:          10000
  evals/sample:     1

and @Tamas_Papp’s code

let
   n = 100
   A = rand(n,n)
   @benchmark offdiag2($A)
end

BenchmarkTools.Trial: 
 memory estimate:  256.77 KiB
 allocs estimate:  17
 --------------
 minimum time:     66.708 μs (0.00% GC)
 median time:      69.133 μs (0.00% GC)
 mean time:        70.352 μs (0.00% GC)
 maximum time:     3.355 ms (0.00% GC)
 --------------
 samples:          10000
 evals/sample:     1

Are the benchmarks done correctly? If yes, then the non-bounds checking, non-conditional code for square matrices is roughly 10x faster.

Yes, the benchmarks look OK.

If you know something about the structure of the problem, you can of course do better than a filtered collection (which does not know the number of elements in the result ex ante, etc).

That is true. My code looks more like a C example, while yours is a nice, generic oneliner. And it is easy to turn it into an interator over the off-diagonal indices than allocating a new vector.

offdiag_iter(A) = (ι for ι in CartesianIndices(A) if ι[1] ≠ ι[2])

Thanks!

Just for completeness, deleteat! is quite efficient and accepts a collection for indices. The indices of the diagonal elements are simply 1:n+1:n^2. Compared to manual loops, it can even be 30% faster.

A = randn(1000,1000);

@btime offdiag($A);   # with bounds checking turned off
  3.399 ms (2 allocations: 7.62 MiB)

@btime deleteat!(vec($A), 1:size(A,1)+1:size(A,1)^2);
  2.595 ms (8 allocations: 7.63 MiB)
2 Likes

Thanks @Seif_Shebl! I thought that this approach would be the slowest and discarded it from the beginning. I am surprised :smiley:

The typical Julia solution is usually writing a generic catch-all method, and adding specialized methods when faster solutions are available for specific types.

This happens for dense matrices as in your example, but eg triangular and banded matrices could also benefit from specialized code. That said, for a one-off application this is probably overkill.

2 Likes