Length of Dict is not thread safe

I’m trying to write something providing mysql-like functions: insert one row then return id.

d = Dict{Int64,Int64}()
Threads.@threads for i in 1:10000
  if !haskey(d,i)
    d[i] = length(d)+1
  end
end

julia> length(d)
5670

julia> collect(d)
ERROR: ArgumentError: destination has fewer elements than required
Stacktrace:
 [1] copyto!(dest::Vector{Pair{Int64, Int64}}, src::Dict{Int64, Int64})
   @ Base ./abstractarray.jl:897
 [2] _collect
   @ ./array.jl:715 [inlined]
 [3] collect(itr::Dict{Int64, Int64})
   @ Base ./array.jl:709
 [4] top-level scope
   @ REPL[5]:1

I can use an extra Vector to fix this, but is there another better solution?

How to prevent data races: Multi-Threading · The Julia Language

Thanks a lot!
Struct a type with data and lock can fix this.

found a solution:
https://github.com/wherrera10/ThreadSafeDicts.jl

1 Like

I suggest just protect the plain Dict with a ReentrantLock. Protecting the integrity of the dictionary itself is not sufficient for the correctness of your program. Consider the following scenario involving two tasks T1 and T2:

  • T1 wants a new id for key 123
  • T2 wants a new id for key 456
  • T1 calls length(d), get 41
  • T2 calls length(d), get 41
  • T1 inserts d[123]= 42
  • T2 inserts d[456]= 42

Now two keys refer to the same ID and no keys have id 43.

The code in the OP would be fine if you put a lock around if !haskey(d,i) ... end.

There are a lot of clever techniques to make this efficient. But reasoning about concurrent programs requires a lot more thinking than some comments fit in a discourse post.

2 Likes

I don’t want to be deliberately snarky but “what made you think it would be?”

When entering the world of concurrency, one has to think carefully.