# Make function more julian / simpler

Can someone tell me how to simplify (or make more) my following function.

It works. I am just curious if (and quite certain) there is a simpler way.

I have two arrays and I want to remove from first array all instances values that are in the second array.

The following needs to be ensured:

• if first array contains multiple instances of values of second array only the number of instances that are in second array should be removed from first array
• if second array contains values that are not in first array nothing needs to be removed

So essentially it is like removing cards from a deck where a deck may include multiple instances of the same card.

Example:

``````array1 = [1,1,1,2,2,3,4,5,3,2]
array2 = [34, 2,4,1,2]
``````

Should yield for array1
`[1,1,3,5,3,2]`

Here is my function (which is working!)

``````function remove!(out, deck)
for card in out
idx = findfirst(x -> x == card, deck)
if isnothing(idx)
continue
end
deleteat!(deck, idx)
end
end

``````

I think itâ€™s faster for large numbers if you count the elements in arr2 first, then go over arr1 once and update the dict with how many elements you delete. Then do the mask index once. In your version you loop over the array again and again (partially)

``````julia> function remove(arr1, arr2)
d = Dict{eltype(arr2), Int}()
for a in arr2
d[a] = get!(d, a, 0) + 1
end

for (i, a) in enumerate(arr1)
x = get(d, a, 0)
if x > 0
d[a] = x - 1
end
end
end

julia> remove(array1, array2)
6-element Vector{Int64}:
1
1
3
5
3
2

``````

Lowest hanging fruit is to reverse the sense of the if test

``````function remove!(out, deck)
for card in out
idx = findfirst(x -> x == card, deck)
if !isnothing(idx)
deleteat!(deck, idx)
end
end
``````

Some people like this shortcut

``````    !isnothing(idx) && deleteat!(deck, idx)
``````

Nice, but this doesnâ€™t change the argument in place as the OPâ€™s code does.

Use a list comprehension to find the indices for all cards and then deleteat! them at once

``````indices = [findfirst(x->x==card, deck) for card in out if card in deck]
deleteat!(deck, indices)
``````

Edit, I realized this doesnâ€™t handle multiple occurences of card correctly

Thanks for the suggestion. It doesnâ€™t exactly do what my code does, because it doesnâ€™t change the original array. But if gives me some great new approaches. Thanks!

Yes, this was my first approach as well but it doesnâ€™t work.

The `&&` is new to me. So something to look into. Thanks.

``````function remove!(out, deck)
removable = countmap(out)
to_remove = Int[]
for (i,card) in deck
push!(i,to_remove)
removable[card] -= 1
removable[card] == 0 && deleteat!(removable, card)
end
end
deleteat!.(deck, reverse(to_remove))
return deck
end
``````

Perhaps something like this? You iterate over `out` and `deck` once each and otherwise are hitting hash indices I think.

I think countmap is maybe from statsbase?

On phone so I couldnâ€™t make sure that deleteat! works for dicts

You can do the in place change at the end. I wouldnâ€™t do it with deleteat everytime you find a value because then the whole tail of the array has to be moved. Just mutate the array once at the end.

1 Like

Thanks to everyone for your suggestions. Also my take away is that there isnâ€™t an easy oneliner that could do the trick. Which is kind of comforting to realize.

Iâ€™m not sure if this will speed things up, but if you have the option you might consider sorting the lists and then using things like `searchsortedfirst` instead of `findfirst`. In general, I would guess that if you were willing to sort the lists before you do this you could write something that is a bit more manual but pretty quick.

EDIT: okay, didnâ€™t read the docs of `searchsortedfirst` very closely, sorry. But still, sorting might be a helpful idea.

I am not an expert, so donâ€™t take this answer too seriously.
I donâ€™t know if this is a bad practice and would like to get the opinion of someone more experienced:

``````remove!(deck, out) =
for i in out
indx = findfirst(==(i), deck)
indx |> isnothing ?
nothing :
deleteat!(deck, indx)
end
``````

Another solution that seems to work (similar to the one proposed by jules):

``````find_n_indx(array, number, first_n) =       # get indices of first n occurrences of number in array
(index for (index, value) in pairs(array) if value == number) |>
x -> Iterators.take(x, first_n) |>
collect

function remove_2!(deck, out)
s = ((i, count(==(i), out)) for i in Set(out))      # unique out elements with relative frequencies in out [(out_elem, count out_elem in out), ..]
indexes_to_remove = vcat((find_n_indx(deck, i[1], i[2]) for i in s)...)
sort!(indexes_to_remove)
deleteat!(deck, indexes_to_remove)
nothing
end
``````

But probably there are better solutions to calculate the indexes to remove

`deleteat!` is rather heavy operation. Faster is to do some sort of double cursor, when you shift elements as you go. In this case you can achieve `O(N)` complexity.

``````function mycountmap(arr)
res = Dict{Int, Int}()
for x in arr
res[x] = get(res, x, 0) + 1
end
return res
end

function cleanup!(arr1, arr2)
cmap = mycountmap(arr2)
i1 = 1
@inbounds for (i2, v) in pairs(arr1)
i1 != i2 && (arr1[i1] = v)
cval = get(cmap, v, 0)
if cval > 0
cmap[v] = cval - 1
else
i1 += 1
end
end
resize!(arr1, i1 - 1)
return arr1
end

julia> array1 = [1,1,1,2,2,3,4,5,3,2];
julia> array2 = [34, 2, 4, 1, 2];
julia> cleanup!(x, array2)
6-element Vector{Int64}:
1
1
3
5
3
2
``````

UPD: This one is slower then initial version with find and delete. `mycountmap` take itâ€™s time to initialize and generates overhead. So this approach is useful, when the size of the deck and out is big enough, that update and find time is much larger than `mycountmap` construction.

I took another swing as well, this time with a julia instance to check code against:

``````function remove!(out::Vector{Int64}, deck::Vector{Int64})
# Iterate once over `out` to make a DefaultDict
removable = Dict{eltype(out),Int64}()
cnt = 0
for v in out
removable[v] += 1
else
removable[v] = 1
end
cnt += 1
end

# Iterate once over `deck` to do a one-pass version of findfirst
to_remove = BitArray(undef,length(deck))
@inbounds for (i,card) in enumerate(deck)
to_remove[i] = res
if res
if isone(cnt)
to_remove[(i+1):end] .= false
break
end
cnt -= 1
if isone(removable[card])
pop!(removable, card)
else
removable[card] -= 1
end
end
end
# remove cards using the BitArray
deleteat!(deck, to_remove)
end
``````

This seems to perform worse than the OPâ€™s code at small deck sizes and only marginally better at huge out/deck sizes. I am a bit flummoxed as to why. `findfirst` and `deleteat!` must both be way more efficient than I expected.

What you are describing is a multiset difference, and the easiest (and perhaps most efficient) way would probably be to use the Multisets.jl package:

``````julia> using Multisets

julia> m = Multiset(array1) - Multiset(array2)
{1,1,2,3,3,5}
``````

Although you can convert the result back to an array with `collect(m)`, if this is the kind of operation you want then you might consider not using arrays at all â€” just leave everything as a `Multiset`.

10 Likes

This is great! Thanks for the hint.

Although there is no â€śsolutionâ€ť to my answer, I will set this as the solution for future reference.

Thanks!

Itâ€™s actually great, that the best solution to the question â€śhow to write better function?â€ť is â€śchange input data structureâ€ť. This is genuine out-of-the-box thinking, and it helps a lot in real world tasks.

2 Likes

Fully agree!

The multiset is actually implemented as a dict with integers, just like in the solutions above. But of course itâ€™s faster to use the pre-made version if itâ€™s available

1 Like