Make function more julian / simpler

ZettEff · March 20, 2021, 2:08pm

Can someone tell me how to simplify (or make more) my following function.

It works. I am just curious if (and quite certain) there is a simpler way.

I have two arrays and I want to remove from first array all instances values that are in the second array.

The following needs to be ensured:

if first array contains multiple instances of values of second array only the number of instances that are in second array should be removed from first array
if second array contains values that are not in first array nothing needs to be removed

So essentially it is like removing cards from a deck where a deck may include multiple instances of the same card.

Example:

array1 = [1,1,1,2,2,3,4,5,3,2]
array2 = [34, 2,4,1,2]

Should yield for array1
[1,1,3,5,3,2]

Here is my function (which is working!)

function remove!(out, deck) 
    for card in out
        idx = findfirst(x -> x == card, deck)
        if isnothing(idx)
            continue
        end
        deleteat!(deck, idx)
    end
end

Thanks for your help!

jules · March 20, 2021, 2:55pm

I think it’s faster for large numbers if you count the elements in arr2 first, then go over arr1 once and update the dict with how many elements you delete. Then do the mask index once. In your version you loop over the array again and again (partially)

julia> function remove(arr1, arr2)
           d = Dict{eltype(arr2), Int}()
           for a in arr2
               d[a] = get!(d, a, 0) + 1
           end
           
           mask = ones(Bool, length(arr1))
           for (i, a) in enumerate(arr1)
               x = get(d, a, 0)
               if x > 0
                   d[a] = x - 1
                   mask[i] = false
               end
           end
           arr1[mask]
       end

julia> remove(array1, array2)
6-element Vector{Int64}:
 1
 1
 3
 5
 3
 2

Jeff_Emanuel · March 20, 2021, 2:58pm

Lowest hanging fruit is to reverse the sense of the if test

function remove!(out, deck) 
    for card in out
        idx = findfirst(x -> x == card, deck)
        if !isnothing(idx)
             deleteat!(deck, idx)
        end
    end

Some people like this shortcut

    !isnothing(idx) && deleteat!(deck, idx)

Jeff_Emanuel · March 20, 2021, 2:59pm

Nice, but this doesn’t change the argument in place as the OP’s code does.

Jeff_Emanuel · March 20, 2021, 3:06pm

Use a list comprehension to find the indices for all cards and then deleteat! them at once

indices = [findfirst(x->x==card, deck) for card in out if card in deck]
deleteat!(deck, indices)

Edit, I realized this doesn’t handle multiple occurences of card correctly

ZettEff · March 20, 2021, 3:11pm

Thanks for the suggestion. It doesn’t exactly do what my code does, because it doesn’t change the original array. But if gives me some great new approaches. Thanks!

ZettEff · March 20, 2021, 3:15pm

Yes, this was my first approach as well but it doesn’t work.

ZettEff · March 20, 2021, 3:16pm

The && is new to me. So something to look into. Thanks.

Peter_Adelman · March 20, 2021, 3:20pm

function remove!(out, deck) 
    removable = countmap(out)
    to_remove = Int[]
    for (i,card) in deck
        if haskey(removable, card)
            push!(i,to_remove)
            removable[card] -= 1
            removable[card] == 0 && deleteat!(removable, card)
        end
    end
    deleteat!.(deck, reverse(to_remove))
    return deck
end

Perhaps something like this? You iterate over out and deck once each and otherwise are hitting hash indices I think.

I think countmap is maybe from statsbase?

On phone so I couldn’t make sure that deleteat! works for dicts

jules · March 20, 2021, 3:30pm

You can do the in place change at the end. I wouldn’t do it with deleteat everytime you find a value because then the whole tail of the array has to be moved. Just mutate the array once at the end.

ZettEff · March 20, 2021, 4:05pm

Thanks to everyone for your suggestions. Also my take away is that there isn’t an easy oneliner that could do the trick. Which is kind of comforting to realize.

cgeoga · March 20, 2021, 5:09pm

I’m not sure if this will speed things up, but if you have the option you might consider sorting the lists and then using things like searchsortedfirst instead of findfirst. In general, I would guess that if you were willing to sort the lists before you do this you could write something that is a bit more manual but pretty quick.

EDIT: okay, didn’t read the docs of searchsortedfirst very closely, sorry. But still, sorting might be a helpful idea.

anon74562486 · March 20, 2021, 8:15pm

I am not an expert, so don’t take this answer too seriously.
I don’t know if this is a bad practice and would like to get the opinion of someone more experienced:

remove!(deck, out) = 
    for i in out
        indx = findfirst(==(i), deck)
        indx |> isnothing ? 
            nothing : 
            deleteat!(deck, indx)
    end

Another solution that seems to work (similar to the one proposed by jules):

find_n_indx(array, number, first_n) =       # get indices of first n occurrences of number in array
    (index for (index, value) in pairs(array) if value == number) |> 
        x -> Iterators.take(x, first_n) |> 
        collect

function remove_2!(deck, out)
    s = ((i, count(==(i), out)) for i in Set(out))      # unique out elements with relative frequencies in out [(out_elem, count out_elem in out), ..]
    indexes_to_remove = vcat((find_n_indx(deck, i[1], i[2]) for i in s)...)
    sort!(indexes_to_remove)
    deleteat!(deck, indexes_to_remove)
    nothing
end

But probably there are better solutions to calculate the indexes to remove

Skoffer · March 20, 2021, 9:28pm

deleteat! is rather heavy operation. Faster is to do some sort of double cursor, when you shift elements as you go. In this case you can achieve O(N) complexity.

function mycountmap(arr)
    res = Dict{Int, Int}()
    for x in arr
        res[x] = get(res, x, 0) + 1
    end
    return res
end

function cleanup!(arr1, arr2)
    cmap = mycountmap(arr2)
    i1 = 1
    @inbounds for (i2, v) in pairs(arr1)
        i1 != i2 && (arr1[i1] = v)
        cval = get(cmap, v, 0)
        if cval > 0
            cmap[v] = cval - 1
        else
            i1 += 1
        end
    end
    resize!(arr1, i1 - 1)
    return arr1
end

julia> array1 = [1,1,1,2,2,3,4,5,3,2];
julia> array2 = [34, 2, 4, 1, 2];
julia> cleanup!(x, array2)
6-element Vector{Int64}:
 1
 1
 3
 5
 3
 2

UPD: This one is slower then initial version with find and delete. mycountmap take it’s time to initialize and generates overhead. So this approach is useful, when the size of the deck and out is big enough, that update and find time is much larger than mycountmap construction.

Peter_Adelman · March 20, 2021, 9:50pm

I took another swing as well, this time with a julia instance to check code against:

function remove!(out::Vector{Int64}, deck::Vector{Int64})
    # Iterate once over `out` to make a DefaultDict
    removable = Dict{eltype(out),Int64}()
    cnt = 0
    for v in out
        if haskey(removable, v)
            removable[v] += 1
        else
            removable[v] = 1
        end
        cnt += 1
    end

    # Iterate once over `deck` to do a one-pass version of findfirst
    to_remove = BitArray(undef,length(deck))
    @inbounds for (i,card) in enumerate(deck)
        res = haskey(removable, card)
        to_remove[i] = res
        if res
            if isone(cnt) 
                to_remove[(i+1):end] .= false
                break
            end
            cnt -= 1
            if isone(removable[card])
                pop!(removable, card)
            else
                removable[card] -= 1
            end
        end
    end
    # remove cards using the BitArray
    deleteat!(deck, to_remove)
end

This seems to perform worse than the OP’s code at small deck sizes and only marginally better at huge out/deck sizes. I am a bit flummoxed as to why. findfirst and deleteat! must both be way more efficient than I expected.

stevengj · March 21, 2021, 12:33am

What you are describing is a multiset difference, and the easiest (and perhaps most efficient) way would probably be to use the Multisets.jl package:

julia> using Multisets

julia> m = Multiset(array1) - Multiset(array2)
{1,1,2,3,3,5}

Although you can convert the result back to an array with collect(m), if this is the kind of operation you want then you might consider not using arrays at all — just leave everything as a Multiset.

ZettEff · March 21, 2021, 6:55pm

This is great! Thanks for the hint.

Although there is no “solution” to my answer, I will set this as the solution for future reference.

Thanks!

Skoffer · March 21, 2021, 7:48pm

It’s actually great, that the best solution to the question “how to write better function?” is “change input data structure”. This is genuine out-of-the-box thinking, and it helps a lot in real world tasks.

ZettEff · March 21, 2021, 8:40pm

Fully agree!

jules · March 21, 2021, 9:25pm

The multiset is actually implemented as a dict with integers, just like in the solutions above. But of course it’s faster to use the pre-made version if it’s available

Topic		Replies	Views
Is there a Julia equivalent for python's .remove() function? New to Julia question	14	3469	January 7, 2020
Newbie delete elements from array question New to Julia	10	697	February 21, 2024
Whats a better way to remove elements from an array based on index?```g = [55, 7 General Usage	9	6583	March 2, 2021
Remove an element from an array without changing the original General Usage array	5	3779	June 17, 2021
Opposite of unique New to Julia sets	19	2398	March 25, 2021

Make function more julian / simpler

Related topics