Ordered Dict in Base: Is LittleDict not thread safe, and possibly the reason I can't add it to Base?

I’m working on a PR to Julia, already could add OrderedDict to Base, independent of Dict, or by replacing it. When replacing, it however makes some stuff slower. [EDIT: short answer, dicts are not thread-safe, see solution on package to make them, and ordered dict is not coming to Base, even though I made a PR for it.]

No dictionary will be fastest for everything, but I noticed in: GitHub - JuliaCollections/OrderedCollections.jl: Julia implementation of associative containers that preserve insertion order

also implements LittleDict which is a ordered dictionary, that is much faster than any other AbstractDict (ordered or not) for small collections.

[OrderedRobinDict is another option.]

so I’m now testing replacing Dict with it.

$ make 
[..]
error during bootstrap:
LoadError("sysimg.jl", 19, LoadError("/home/pharaldsson_sym/myjulia/julia/usr/share/julia/stdlib/v1.6/Pkg/src/Pkg.jl", 3, LoadError("/home/pharaldsson_sym/myjulia/julia/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl", 3, LoadError("/home/pharaldsson_sym/myjulia/julia/usr/share/julia/stdlib/v1.6/REPL/src/options.jl", 56, MethodError(convert, (Union{Dict, Vector{var"#s852"} where var"#s852"<:Dict}, UDict{Any, Any}[]), 0x0000000000006499)))))
jl_method_error_bare at /home/pharaldsson_sym/myjulia/julia/src/gf.c:1767
jl_method_error at /home/pharaldsson_sym/myjulia/julia/src/gf.c:1785
jl_lookup_generic_ at /home/pharaldsson_sym/myjulia/julia/src/gf.c:2355 [inlined]
jl_apply_generic at /home/pharaldsson_sym/myjulia/julia/src/gf.c:2370
Options at /home/pharaldsson_sym/myjulia/julia/usr/share/julia/stdlib/v1.6/REPL/src/options.jl:4
#Options#1 at /home/pharaldsson_sym/myjulia/julia/usr/share/julia/stdlib/v1.6/REPL/src/options.jl:45
Options at /home/pharaldsson_sym/myjulia/julia/usr/share/julia/stdlib/v1.6/REPL/src/options.jl:45
_jl_invoke at /home/pharaldsson_sym/myjulia/julia/src/gf.c:2192 [inlined]
jl_apply_generic at /home/pharaldsson_sym/myjulia/julia/src/gf.c:2374
jl_apply at /home/pharaldsson_sym/myjulia/julia/src/julia.h:1687 [inlined]
do_call at /home/pharaldsson_sym/myjulia/julia/src/interpreter.c:115
eval_value at /home/pharaldsson_sym/myjulia/julia/src/interpreter.c:204
eval_stmt_value at /home/pharaldsson_sym/myjulia/julia/src/interpreter.c:155 [inlined]
eval_body at /home/pharaldsson_sym/myjulia/julia/src/interpreter.c:575
jl_interpret_toplevel_thunk at /home/pharaldsson_sym/myjulia/julia/src/interpreter.c:669
top-level scope at /home/pharaldsson_sym/myjulia/julia/usr/share/julia/stdlib/v1.6/REPL/src/options.jl:56
jl_toplevel_eval_flex at /home/pharaldsson_sym/myjulia/julia/src/toplevel.c:837
jl_toplevel_eval_flex at /home/pharaldsson_sym/myjulia/julia/src/toplevel.c:785
jl_toplevel_eval_in at /home/pharaldsson_sym/myjulia/julia/src/toplevel.c:880
eval at ./boot.jl:360 [inlined]
include_string at ./loading.jl:1026
_jl_invoke at /home/pharaldsson_sym/myjulia/julia/src/gf.c:2192 [inlined]
jl_apply_generic at /home/pharaldsson_sym/myjulia/julia/src/gf.c:2374
_include at ./loading.jl:1080
include at ./Base.jl:402
_jl_invoke at /home/pharaldsson_sym/myjulia/julia/src/gf.c:2192 [inlined]
jl_apply_generic at /home/pharaldsson_sym/myjulia/julia/src/gf.c:2374
jl_apply at /home/pharaldsson_sym/myjulia/julia/src/julia.h:1687 [inlined]
do_apply at /home/pharaldsson_sym/myjulia/julia/src/builtins.c:672
jl_f__apply_latest at /home/pharaldsson_sym/myjulia/julia/src/builtins.c:722
include at /home/pharaldsson_sym/myjulia/julia/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:14
_jl_invoke at /home/pharaldsson_sym/myjulia/julia/src/gf.c:2192 [inlined]
jl_apply_generic at /home/pharaldsson_sym/myjulia/julia/src/gf.c:2374
jl_apply at /home/pharaldsson_sym/myjulia/julia/src/julia.h:1687 [inlined]
do_call at /home/pharaldsson_sym/myjulia/julia/src/interpreter.c:115
eval_value at /home/pharaldsson_sym/myjulia/julia/src/interpreter.c:204
eval_stmt_value at /home/pharaldsson_sym/myjulia/julia/src/interpreter.c:155 [inlined]
eval_body at /home/pharaldsson_sym/myjulia/julia/src/interpreter.c:575
jl_interpret_toplevel_thunk at /home/pharaldsson_sym/myjulia/julia/src/interpreter.c:669
top-level scope at /home/pharaldsson_sym/myjulia/julia/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:42
jl_toplevel_eval_flex at /home/pharaldsson_sym/myjulia/julia/src/toplevel.c:837
jl_eval_module_expr at /home/pharaldsson_sym/myjulia/julia/src/toplevel.c:197 [inlined]
jl_toplevel_eval_flex at /home/pharaldsson_sym/myjulia/julia/src/toplevel.c:661
jl_toplevel_eval_flex at /home/pharaldsson_sym/myjulia/julia/src/toplevel.c:785
jl_toplevel_eval_flex at /home/pharaldsson_sym/myjulia/julia/src/toplevel.c:785
jl_toplevel_eval_in at /home/pharaldsson_sym/myjulia/julia/src/toplevel.c:880
eval at ./boot.jl:360 [inlined]
include_string at ./loading.jl:1026
_jl_invoke at /home/pharaldsson_sym/myjulia/julia/src/gf.c:2192 [inlined]
jl_apply_generic at /home/pharaldsson_sym/myjulia/julia/src/gf.c:2374
_include at ./loading.jl:1080
include at ./Base.jl:402 [inlined]
_require at ./loading.jl:982
require at ./loading.jl:846
require at ./loading.jl:833
_jl_invoke at /home/pharaldsson_sym/myjulia/julia/src/gf.c:2192 [inlined]
jl_apply_generic at /home/pharaldsson_sym/myjulia/julia/src/gf.c:2374
jl_apply at /home/pharaldsson_sym/myjulia/julia/src/julia.h:1687 [inlined]
call_require at /home/pharaldsson_sym/myjulia/julia/src/toplevel.c:423 [inlined]
eval_import_path at /home/pharaldsson_sym/myjulia/julia/src/toplevel.c:460
jl_toplevel_eval_flex at /home/pharaldsson_sym/myjulia/julia/src/toplevel.c:717
jl_eval_module_expr at /home/pharaldsson_sym/myjulia/julia/src/toplevel.c:197 [inlined]
jl_toplevel_eval_flex at /home/pharaldsson_sym/myjulia/julia/src/toplevel.c:661
jl_toplevel_eval_flex at /home/pharaldsson_sym/myjulia/julia/src/toplevel.c:785
jl_toplevel_eval_in at /home/pharaldsson_sym/myjulia/julia/src/toplevel.c:880
eval at ./boot.jl:360 [inlined]
include_string at ./loading.jl:1026
_jl_invoke at /home/pharaldsson_sym/myjulia/julia/src/gf.c:2192 [inlined]
jl_apply_generic at /home/pharaldsson_sym/myjulia/julia/src/gf.c:2374
_include at ./loading.jl:1080
include at ./Base.jl:402 [inlined]
_require at ./loading.jl:982
require at ./loading.jl:846
require at ./loading.jl:833
_jl_invoke at /home/pharaldsson_sym/myjulia/julia/src/gf.c:2192 [inlined]
jl_apply_generic at /home/pharaldsson_sym/myjulia/julia/src/gf.c:2374
macro expansion at ./timing.jl:233 [inlined]
macro expansion at ./sysimg.jl:73 [inlined]
macro expansion at ./timing.jl:233 [inlined]
top-level scope at ./sysimg.jl:72
jl_toplevel_eval_flex at /home/pharaldsson_sym/myjulia/julia/src/toplevel.c:831
jl_parse_eval_all at /home/pharaldsson_sym/myjulia/julia/src/toplevel.c:954
jl_load_ at /home/pharaldsson_sym/myjulia/julia/src/toplevel.c:1001
jl_load at /home/pharaldsson_sym/myjulia/julia/src/toplevel.c:1014
exec_program at /home/pharaldsson_sym/myjulia/julia/src/jlapi.c:492
true_main at /home/pharaldsson_sym/myjulia/julia/src/jlapi.c:565
repl_entrypoint at /home/pharaldsson_sym/myjulia/julia/src/jlapi.c:672
main at /home/pharaldsson_sym/myjulia/julia/cli/loader_exe.c:46
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at /home/pharaldsson_sym/myjulia/julia/usr/bin/julia (unknown line)
1 Like

No:

MethodError(convert, (Union{Dict, Vector{var"#s852"} where var"#s852"<:Dict}, UDict{Any, Any}[]

Base rightly assumes that what you’re using is a Dict, if you want to change the type you have to change the hard coded assumptions as well.

2 Likes

In Python 3.6, the dictionary implementation has been changed and now preserves order by default, see e.g. here:

In general, dictionaries are used very often in the CPython internals, therefore much effort has been invested to make them as efficient as possible.
Would it be an option to use the new Python algorithm also for Dicts in Julia?

Also note that LittleDict has O(n) lookup cost compared to O(1) for the current Dict implementation. This means hat if the collection isn’t super small, LittleDict will perform much, much worse than the current implementation. For dicts that are small, Base already has ImmutableDict, which has similar performance characteristics for lookup as LittleDict, so places where you would really see a benefit of using LittleDict should already be using ImmutableDict instead of Dict.

4 Likes

You’re right, for non-small LitleDict is slower, maybe not super small" depending on what you mean by that. I realized that, my code with supstitution should work either way, and was meant to see if faster (indicating most dicts in base are small enough), they I would worry about scalability later.

What the docs on it say:

While theoretically this has expected time complexity O(n),
vs the hash-based OrderDict/Dict’s expected time complexity O(1),
and the search-tree-based SortedDict’s expected time complexity O(log(n)).
In practice it is really fast, because it is cache & SIMD friendly.
It is reasonable to expect it to outperform an OrderedDict,
with up to around 30 elements in general;
or with up to around 50 elements if using a LittleDict backed by Tuples

(see freeze)

However, this depends on exactly how long isequal and hash take,
as well as on how many hash collisions occur etc.

What assumtions? If you’re pointing out UDict there, then actually it’s not in Julia (I added it), when I used OrderedDict (renamed to Dict) all of this worked without UDict. For LittleDict I got similar the error I posted, just longer, and was trying to limit the replacement by introducing UDict for UnorderedDict for use of other dictionaries.

The Methoderror occurs because some function limits the input types to Union{Dict, Vector{var"#s852"} where var"#s852"<:Dict. UDict doesn’t match that, and that’s why the error is thrown. If you want to replace Dict with UDict, you have to change the method signatures of all functions using it to match your new type.

so places where you would really see a benefit of using LittleDict should already be using ImmutableDict instead of Dict .

Yes, and no. It seems to be only used by 1 (or 2?) places in Base, for regex, so (you) implying small dicts not important:

https://github.com/JuliaLang/julia/search?q=ImmutableDict

But the restriction of it being immutable is probably why it’s not used much, rather than you know non-small needed. I really see why you would want to avoid O(n), and LittleDict’s performance can be had while avoiding that. I’m just not going to make a new Dict variant right now, rather want to reuse tested code for Base.

To answer my own question here, in case others are curious, Dict isn’t thread-safe it seems (assuming still), as there’s a package for that (wrapping the regular Dict):

so I rule that out as an issue in my investigation.

1 Like

I do now have a working compiled Julia, and local git commit for my planned change to Julia, but can’t send it in! [EDIT: drop outdated trouble, it took a while to solve, e.g got scary looking in red: “HEAD detached at 9b9cbee4a6”.]

You should fork to a repo you own, then submit a PR. Github has docs on this, and plenty of other blogs etc.

https://docs.github.com/en/free-pro-team@latest/github/getting-started-with-github/fork-a-repo

2 Likes

Figured out PR/git stuff. PR: https://github.com/JuliaLang/julia/pull/37804

1 Like