Help needed with some matrix manipulation

I need some code review… :slight_smile:

Suppose you are given a matrix A \in \mathbb{N}^{m \times n} you are to determine all the rows k \leq m of A that have a given maximum interaction j which is given by

j = \sum_{i=0}^{n} \mathbf{1}_{A_{k,i} }, \quad \text{where} \quad \mathbf{1}_{A_{k,i}} = \begin{cases} 1, & A_{k,i} > 0, \\ 0, & \text{otherwise}.\end{cases}

Here’s an example. Take

julia> inds
15×4 Array{Int64,2}:
 0  0  0  0
 1  0  0  0
 0  1  0  0
 0  0  1  0
 0  0  0  1
 2  0  0  0
 1  1  0  0
 1  0  1  0
 1  0  0  1
 0  2  0  0
 0  1  1  0
 0  1  0  1
 0  0  2  0
 0  0  1  1
 0  0  0  2

and set j = 2, then

julia> get_interaction(inds,2)
6×4 Array{Int64,2}:
 1  1  0  0
 1  0  1  0
 1  0  0  1
 0  1  1  0
 0  1  0  1
 0  0  1  1

So, the function get_interaction finds all rows of a matrix that have j non-zero entries.

function get_interaction(inds::Matrix{Int},j::Int)
    M = .!(iszero.(inds)) # auxiliary variable that is true for every non-zero element of inds
    J = findall(x->x==j,sum(M;dims=2))
    vcat(map(j->inds[j.I[1],:],J)'...) # return as a matrix of integers
end

This is one way to implement this, but it feels too bulky. There’s probably a more clever way to exploit the CartesianIndices returned by findall.

thanks!

p.s.: The last row could be simplified to the more performant

    map(j->inds[j.I[1],:],J)

which would return an arrray of arrays of integers

julia> get_interaction(inds,2)
6-element Array{Array{Int64,1},1}:
 [1, 1, 0, 0]
 [1, 0, 1, 0]
 [1, 0, 0, 1]
 [0, 1, 1, 0]
 [0, 1, 0, 1]
 [0, 0, 1, 1]

Imho a quite julian way would be:

 julia> interactions(inds,j) = Iterators.filter(x -> count(!iszero,x) == j, 
                                                eachrow(inds))

which quite closely matches your description. Note that this returns an iterator and to get the resulting vectors you need to collect the result or just iterate through it - might be advantageous if you only work on one row at a time.

The standard filter doesn’t work on eachrow unless you collect first - that’s why I used the one from Iterators but there might be a better solution.

edit:
Comparing the performance for your example input I get roughly an order of magnitude speedup. For larger problems still factor 4 but note that even collecting the iterator returns SubArrays. To get the same, you can e.g. collect each vector individually - the performance advantage is smaller but still exists.

Thanks, this looks promising!

It’s not crucial to return a matrix type of integers, SubArrays are fine (as long as I adapt the functions that use get_interactions downstream).

If it’s only about bulkiness and not performance, I would do

get_interaction(inds, j) = inds[vec(sum(inds .!= 0, dims = 2)) .== j, :]

This is quite fast and returns full array:

inds[ [count(!iszero,r) == j for r in eachrow(inds)], : ]
6×4 Array{Int64,2}:
 1  1  0  0
 1  0  1  0
 1  0  0  1
 0  1  1  0
 0  1  0  1
 0  0  1  1
1 Like