Is it safe to pop! a Dict in a loop like this?

At the moment, the following code gives the result that I want (i.e. the for loop goes through each key in the dict once):

let mydict = Dict([i => rand() for i in 1:10])
    println(mydict)
    for k in keys(mydict)
        if isodd(k)
            val = pop!(mydict, k)
            println("$k\t$val")
        end
    end
    println(mydict)
end

Is this something that I can count on in future releases of Julia or would I need to change the “for” line to for k in copy(keys(mydict)) in order to be sure that I would always run the loop once for each key?

No, this isn’t safe. Any modification of a dict while iterating it can potentially mess things up. Instead you’ll want to collect the keys into a vector and iterate that. Or use filter! which does this safely.

7 Likes

Thanks, so you are saying that it is better to modify the code to:

let mydict = Dict([i => rand() for i in 1:10])
    println(mydict)
    for k in collect(keys(mydict))
        if isodd(k)
            val = pop!(mydict, k)
            println("$k\t$val")
        end
    end
    println(mydict)
end
1 Like

Yes, exactly.

1 Like

A filter! version as was suggested in the answer. This would be faster I think.

let mydict = Dict([i => rand(1:100) for i in 1:10])
    println(mydict)
    filter!(mydict) do (k,v)
        isodd(k) || return true
        println("$k\t$v")
        return false
    end
    println(mydict)
end

or (to reparaphrase code):

filter!(mydict) do (k,v)
    return isodd(k) ? (println("$k\t$v"); false) : true
end

and even shorter:

filter!(((k,v),)->isodd(k) ? (println("$k\t$v"); false) : true, mydict)
1 Like

Interesting, I would also think that it is faster, but it just feels wrong to have a filter that has side effects (in my actual scenario, the side effects are much more dramatic than a println). Am I the only one that feels that way about filter!?

It’s unlikely to be much faster—it’s just implemented with a for loop same as what you can write, but it can avoid the copy of the keys. I don’t think there’s a big issue with printing from a filter but yeah, it’s not strictly pure.

1 Like
foreach(k->delete!(mydict,k), 2:2:10)

Maybe a more general solution to this problem, would be to add a method to the vector splice! for Dicts:

function Base.splice!(d::AbstractDict, p::F) where {F}
    r = Vector{eltype(d)}()
    for (k,v) in d
        if p(k)
            push!(r, k=>v)
        end
    end
    for k in first.(r)
        delete!(d, k)
    end
    return r
end

and then the operation in the OP can look like:

foreach(splice!(mydict, isodd)) do (k,v)
    println("$k\t$v")
end

Some optimizations to the Base method should be applied, but the idea seems sound and faithful to splice!.

Shorter splice! implementation:

function Base.splice!(d::AbstractDict, p::F) where {F}
    r = [k=>v for (k,v) in d if p(k)]
    for k in first.(r) delete!(d, k); end
    r
end

What is the role of F here? How would this differ from:

function Base.splice!(d::AbstractDict, p)
    r = [k=>v for (k,v) in d if p(k)]
    for k in first.(r) delete!(d, k); end
    r
end

For everyone in the thread: Thanks for the suggestions. Until now I still feel that Stefans collect solution is best for my situation. This is because what I posted was a minimal working example. My actual situation is one where the keys are more chaotic than a simple sequence, the values are not simply Float64, the side effect is more tricky than println, the condition is more complex than isodd and it is only part of a function that is about 70 lines of code. I also expect that performance will not be significantly influenced by anything discussed in this thread, so code clarity and correctness are my main concerns.

Yep, it may be redundant here. I’m use to parameterizing functions when in structs to enforce specialization.

In any case, I think the splice! version is very clean conceptually. So given, your needs, I would opt for it (perhaps using a different name, so as to not intrude on Base module).

It would be a nice addition to Base.

Here is another version which uses filter! and may be faster for some implementations of AbstractDict:

function Base.splice!(d::AbstractDict, p)
    r = [k=>v for (k,v) in d if p(k)]
    filter!(!in(r), d)
    r
end

or

function Base.splice!(d::AbstractDict, p)
    r = Dict(k=>v for (k,v) in d if p(k))
    filter!(p->!haskey(r,first(p)), d)
    r
end

Another perhaps useful utility function for Dicts:

function groupby(d::AbstractDict, p)
    T = promote_type(Base.return_types(p,(keytype(d),))...)
    D = Dict{T,Dict{keytype(d), valtype(d)}}()
    for (k,v) in d
        dd = get!(()->(typeof(d)()), D, p(k))
        dd[k] = v
    end
    D
end

with it:

julia> D = groupby(mydict, isodd)
Dict{Bool, Dict{Int64, Int64}} with 2 entries:
  0 => Dict(4=>12, 6=>25, 2=>45, 10=>21, 8=>1)
  1 => Dict(5=>45, 7=>93, 9=>18, 3=>5, 1=>28)

julia> filter!(in(D[false]), mydict)
Dict{Int64, Int64} with 5 entries:
  4  => 12
  6  => 25
  2  => 45
  10 => 21
  8  => 1

julia> foreach(D[true]) do (k,v)
           println("$k\t$v")
       end
5	45
7	93
9	18
3	5
1	28

Our opinions differ on this

I like the functionality, but it does not fit in the pattern of the other splice! methods (to fit, p should be a collection of keys to remove), so I would prefer to give it another name and swap the order of p and d to conform with Julias recommended order of function and collection arguments. For completeness, a version for AbstractArray would also be nice. So, something like

function siphon!(p, d::AbstractDict)
    r = [k=>v for (k,v) in d if p(k)]
    filter!(!in(r), d)
    r
end

would be my preference. If this had existed in the currently stable version of Julia then I would prefer it over the collect solution :smiley:

(If this is going into Julia itself, then it would be important to make a docstring and to link to & from it from the filter! and splice! docstrings)

1 Like

Now that I have a more precise idea (in fact the MWE was too M), I wonder if it doesn’t make sense in this case (but also more generally), rather than building 2 (but more generally many) new dictionaries , have only the “equivalence class” @views you want.
Ultimately, what is important about a dictionary is having the “right” value when needed and this could also be achieved by partitioning only the keys, not the entire dictionary.
or not?

I don’t really understand what you are trying to say, but I have a better solution than the one I wrote in my previous message. Rather than making a completely new function in Base, one could achieve the same by just making an extra option for filter! which would cause it to return the deleted keys-value pairs rather than the retained key-value pairs.

Using my OP as an example of how to use this:

let mydict = Dict([i => rand() for i in 1:10])
    println(mydict)
    deleted_pairs = filter!(mydict; return_deleted_pairs=true) do (k,v)
        !isodd(k)
    end
    [println("$k\t$v") for (k,v) in deleted_pairs]
    println(mydict)
end

This reminds me of an answer to stackoverflow question:

In that question, an answer was:

function separate!(p, x) 
    b, e, L = firstindex(x), lastindex(x), length(x)
    L == 1 && (return p(x[b]) ? (x, eltype(x)[]) : (eltype(x)[], x))
    L == 0 && return (eltype(x)[], eltype(x)[])
    while b < e
        p(x[b]) && (b += 1; continue)
        p(x[e]) || (e -= 1; continue)
        x[b], x[e] = x[e], x[b]
    end
    @view(x[begin:b-1]), @view(x[b:end])
end

which nicely returns a vector separated according to a predicate. It can be used for the goal here in the following way:

let mydict = Dict([i => rand(1:100) for i in 1:10])
    println(mydict)
    t_pairs, f_pairs = separate!(isodd∘first, collect(pairs(mydict)))
    for (k,v) in t_pairs println("$k\t$v"); end
    mydict = Dict(f_pairs)
    println(mydict)
end

Maybe not an ideal solution, but promotes the reuse of that useful primitive (which is related to the groupby idea I’ve raised earlier).

1 Like

I mean something like that

julia>     myDict = Dict([i => rand(1:100) for i in 1:10])
Dict{Int64, Int64} with 10 entries:
  5  => 1
  4  => 98
  6  => 61
  7  => 57
  2  => 92
  10 => 23
  9  => 8
  8  => 94
  3  => 46
  1  => 59

julia> using SplitApplyCombine
julia> groupview(isodd∘first, collect(myDict))
2-element GroupDictionary{Bool, SubArray{Pair{Int64, Int64}, 1, Vector{Pair{Int64, Int64}}, Tuple{Vector{Int64}}, false}, Vector{Pair{Int64, Int64}}, Dictionary{Bool, Vector{Int64}}}
  true │ [5 => 1, 7 => 57, 9 => 8, 3 => 46, 1 => 59]
 false │ [4 => 98, 6 => 61, 2 => 92, 10 => 23, 8 => 94]

julia> groupview(x->mod(first(x),3), collect(myDict))
3-element GroupDictionary{Int64, SubArray{Pair{Int64, Int64}, 1, Vector{Pair{Int64, Int64}}, Tuple{Vector{Int64}}, false}, Vector{Pair{Int64, Int64}}, Dictionary{Int64, Vector{Int64}}}
 2 │ [5 => 1, 2 => 92, 8 => 94]
 1 │ [4 => 98, 7 => 57, 10 => 23, 1 => 59]
 0 │ [6 => 61, 9 => 8, 3 => 46]