(Compiler) Performance of Dict

In Roadmap for a faster time-to-first-plot?, it was suggested by @ChrisRackauckas to circumvent the usage of Dicts since they slow down compile time performance. I would love to get suggestions how to actually do this.

In my application I am using TOML files for configuration and when reading TOML files, what I get back is a Dict. Any suggestions how to make the compiler faster in these situations? Does it make sense to convert the Dict into a struct for further processing?

Does the type of these Dicts depend on the data, or is it always the same? In the latter case you should be fine.

Dicts are structs:

Deeply nested parametric structs can be hard on the compiler, e.q. Dict{A, Dict{B, Dict{C, Dict{D, E}}}} where {A, B, C, D, E} but that’s not unique to Dicts.

4 Likes

Yes, but that’s the architecture of TOML.jl. It returns nested Dicts.

Again, my question is, whether anybody has some performance tips how to circumvent the long inference times. When I remember correctly Pkg had the same issue and solved it by compiling those things into the system image.

I kind of know the structure of the TOML file. But I don’t want to write my own TOML parser. Therefore the question if it is possible using TOML.jl (which returns Dicts) and then converting that to a struct. Does that help or not?

Yes, but the types are not nested. There is nothing inherently slow with Dicts. But code like

might be slow to compile (not because of dicts but because of the nesting of types).

2 Likes

That is true. The type is Dict{AbstractString, Any} and it then contains more Dicts of the same type. Not sure, why it is AbstractString and not String.

for reference, I get

julia> @time a = TOML.parsefile("myfile.toml")
  1.242109 seconds (2.40 M allocations: 120.457 MiB, 11.85% gc time)
Dict{AbstractString,Any} with 6 entries:
  
julia> @time a = TOML.parsefile("myfile.toml")
  0.001002 seconds (2.15 k allocations: 97.641 KiB)
Dict{AbstractString,Any} with 6 entries:

The file has 2613 Bytes. Not sure for what it is allocating 120 MB.

I wonder what would happen if we just added a bunch of Dict{Any,Any} or Dict{Symbol,Any} stuff to the sysimage and told people to prefer that (Any,Any should already have some coverage, but I wonder what would happen if we purposely made it pretty complete)

Lets switch to a concrete example. Here are the numbers I get from Winston:

julia> @time using Winston
  2.744007 seconds (2.26 M allocations: 119.856 MiB, 2.29% gc time)

julia> @time plot(1:100)
  2.838665 seconds (7.31 M allocations: 376.682 MiB, 4.47% gc time)

The last time is not really complete since the window starts rendering after about 7 seconds. In its core, Winston uses a Dict for configuration

https://github.com/JuliaGraphics/Winston.jl/blob/master/src/Winston.jl#L113

My hyothesis, that is not proven but your post seems to indicate that it might be true, is that the performance issues are due to the usage of Dicts. Maybe we could learn from that example, how either

  • Dicts can be accelerated, or
  • How such code can be refactored to run fast.

Try compiling just the dict portions to the sysimage and see what happens.

If memory serves, the top three inference times for Revise’s startup are all setindex! functions for different Dict types, which is why I chose that for the example in https://github.com/JuliaLang/julia/pull/31466.

And I agree with @ChrisRackauckas that certain programming patterns are unnecessarily hard on the compiler. Changing a Dict{AbstractFloat,AbstractString} to a Dict{Any,Any} is probably quite reasonable.

But in other ways, I think this is precisely the wrong time to be distorting your programming style in ways you might later regret to achieve faster startup. While faster compile/interpret performance will in aggregate be a really big job (indeed, its own research problem), areas like getting setindex! to precompile properly seem completely doable. My latest attempt to improve precompilation is https://github.com/JuliaLang/julia/pull/32705, which (as you’ll see from my comment) may not be in exactly the right direction but is attempting to address one of the bigger limitations. I’d guess that Jeff & Jameson haven’t yet been able to switch to making compile time their main priority, but once they do I suspect we’ll see a few gains fairly quickly (and the particular issue in 32705 seems quite “ripe” to me), and others will unfold over much longer time scales.

9 Likes

I want to believe Tim, but since compile time is a non-trivial problem, I have some doubts that there will be a solution soon. This is really not meant as a critique and I want that you prove me wrong. My hope would be that 1.4 (which I expect in summer 2020) contains improvements.

Yes, hoping for improvements in 1.4 (and further improvements in later releases) seems reasonable. But at least for your own personal use, I’d guess if you compile Julia yourself from master you might see some major benefits earlier than summer 2020.

2 Likes

Check out

It seems that the road is already paved with the new ORC jit.
You get concurrent compilation, object cache jit-dylib, the ability to delete compiled functions.

The only problem is the global method table a.k.a global multiple dispatch, which makes it very hard to cache compilation. Because as I pointed out in the past it is hard to prove that adding a module will not alter previously compiled code.

1 Like

Because as I pointed out in the past it is hard to prove that adding a module will not alter previously compiled code.

That’s what our backedges are for. It will always be possible to load one tiny thing and force recompilation of a huge amount of code. For example, loading a module that does this:

import Base: +
+(x::Int, y::Int) = 0   # surprise!

will invalidate any method that adds Ints, so future calls will have to compile everything they need from scratch. The bigger problem is indeed tricky, but questions like “can we at least cache more of the results of inference? will it help in some circumstances?” are certainly going to have affirmative answers.

2 Likes

Can Julia use the JIT concurrency and speculative JITing? That seems like it would solve a lot of the issues.

OT: Wasn’t actions like this - overwriting methods in other modules - the classical counterargument to method merging cross modules? I wasn’t aware that this is possible for Base, i always had the impression you can extend Base.

That’s what our backedges are for. It will always be possible to load one tiny thing and force recompilation of a huge amount of code.

If type piracy is disallowed, can one still cause invalidations? Would it be enough to forbid type piracy to be able to cache compiled code?

Looks like there were efforts to do “Speculative compilation support in LLVM ORC JIT Infrastructure” for Julia as Google Summer of Code 2019 project: New LLVM JIT Features

1 Like