How to remove irrelevant array entries?

Hello!

In this case I have:

idc
2×9 Array{Int64,2}:
 1  2  0  1  1  2  0  0  2
 1  1  1  2  0  2  2  0  0

I only care about columns, in which 0 does not appear. Therefore, coming from Matlab, I would like to do something which gives me:

idc
2×4 Array{Int64,2}:
 1  2  1  2
 1  1  2  2 

But I cannot figure out a simple way to remove the irrelevant columns. Maybe I shouldn’t alter the Array, but do something else?

Kind regards

I am sure that there should be a better way, but you can do:

# Get the column indexes for which all values are not 0
ids=[all(!=(0), r) for r in eachcol(idc)]
# Filter only these columns
idc = idc[:,ids]
3 Likes

A more compact, but probably much less efficient, solution:

reduce(hcat, c for c in eachcol(idc) if all(!=(0),c))
3 Likes

Thanks to both of you. Was not aware of “eachcol” functionality.

Kind regards

For anyone interested in doing this kind of thing, I would advise trying to circumvent it - both options dmolina and yha show do the job, but they use a very large amount of allocations around 250k to 500k respectively to achieve it. I instead wrote out a bunch of if statements and got a lot lower allocation count.

Kind regards

Just treating your data as a Vector of, say, SVector{2}s or tuples may be very efficient and idiomatic.

1 Like

Could you explain a bit more, why this would be of benefit?

You are right in making it tuple’s would make it more efficient, since the sizes are fixed then

Kind regards

Well, we can try to allocate less.

function filter_out_all_zero_columns!(X)
    cnt = 1
    for i in axes(X, 2)
        zc = false
        for j in axes(X, 1)
            if X[j, i] == 0
                zc = true
                break
            end
        end
        if !zc & (cnt != i)
            for j in axes(X, 1)
                X[j, cnt] = X[j, i]
            end
            cnt += 1
        end
    end
    X[:, 1:cnt - 1]
end

And benchmarks

using Random
Random.seed!(2020)
idc = rand(0:2, 2, 9)

@btime filter_out_all_zero_columns!(Y) setup = (Y = copy($idc))
# 58.250 ns (1 allocation: 128 bytes)

It can be made faster if change X[:, 1:cnt-1] to @view X[:, 1:cnt-1], but it depends on a problem.

1 Like

Because what you are doing is manipulating columns as units of data, so it is better to store them that way. Using SVectors or tuples, the compiler will know the number of elements in each, and generate very efficient code.

As a bonus, you can filter it like any other vector.

1 Like

Thanks for the explanation.

My experience has always been that using tuple’s is slightly more efficient than SVectors, so will try to stick wih them in the future, whenever I can.

Sometimes I just forget to think about how I am storing data etc.

Kind regards

Thank you for showcasing that!

This is one of the things I like about Julia is that one can write code at very many different levels of complexity - I had never seen the axes function before now.

Kind regards

That’s not my experience. Are you sure it’s not a benchmarking issue?

If SVectors offer more convenience, I would use them. In many cases you just need a tuple, of course.

2 Likes

An SVector is just a wrapper around a tuple so they should be exactly equivalent in most circumstances (I believe).

2 Likes