In my application I am using TOML files for configuration and when reading TOML files, what I get back is a Dict. Any suggestions how to make the compiler faster in these situations? Does it make sense to convert the Dict into a struct for further processing?
Deeply nested parametric structs can be hard on the compiler, e.q. Dict{A, Dict{B, Dict{C, Dict{D, E}}}} where {A, B, C, D, E} but that’s not unique to Dicts.
Yes, but that’s the architecture of TOML.jl. It returns nested Dicts.
Again, my question is, whether anybody has some performance tips how to circumvent the long inference times. When I remember correctly Pkg had the same issue and solved it by compiling those things into the system image.
I kind of know the structure of the TOML file. But I don’t want to write my own TOML parser. Therefore the question if it is possible using TOML.jl (which returns Dicts) and then converting that to a struct. Does that help or not?
That is true. The type is Dict{AbstractString, Any} and it then contains more Dicts of the same type. Not sure, why it is AbstractString and not String.
I wonder what would happen if we just added a bunch of Dict{Any,Any} or Dict{Symbol,Any} stuff to the sysimage and told people to prefer that (Any,Any should already have some coverage, but I wonder what would happen if we purposely made it pretty complete)
My hyothesis, that is not proven but your post seems to indicate that it might be true, is that the performance issues are due to the usage of Dicts. Maybe we could learn from that example, how either
If memory serves, the top three inference times for Revise’s startup are all setindex! functions for different Dict types, which is why I chose that for the example in https://github.com/JuliaLang/julia/pull/31466.
And I agree with @ChrisRackauckas that certain programming patterns are unnecessarily hard on the compiler. Changing a Dict{AbstractFloat,AbstractString} to a Dict{Any,Any} is probably quite reasonable.
But in other ways, I think this is precisely the wrong time to be distorting your programming style in ways you might later regret to achieve faster startup. While faster compile/interpret performance will in aggregate be a really big job (indeed, its own research problem), areas like getting setindex! to precompile properly seem completely doable. My latest attempt to improve precompilation is Change precompilation format to be more amenable to copying by timholy · Pull Request #32705 · JuliaLang/julia · GitHub, which (as you’ll see from my comment) may not be in exactly the right direction but is attempting to address one of the bigger limitations. I’d guess that Jeff & Jameson haven’t yet been able to switch to making compile time their main priority, but once they do I suspect we’ll see a few gains fairly quickly (and the particular issue in 32705 seems quite “ripe” to me), and others will unfold over much longer time scales.
I want to believe Tim, but since compile time is a non-trivial problem, I have some doubts that there will be a solution soon. This is really not meant as a critique and I want that you prove me wrong. My hope would be that 1.4 (which I expect in summer 2020) contains improvements.
Yes, hoping for improvements in 1.4 (and further improvements in later releases) seems reasonable. But at least for your own personal use, I’d guess if you compile Julia yourself from master you might see some major benefits earlier than summer 2020.
It seems that the road is already paved with the new ORC jit.
You get concurrent compilation, object cache jit-dylib, the ability to delete compiled functions.
The only problem is the global method table a.k.a global multiple dispatch, which makes it very hard to cache compilation. Because as I pointed out in the past it is hard to prove that adding a module will not alter previously compiled code.
Because as I pointed out in the past it is hard to prove that adding a module will not alter previously compiled code.
That’s what our backedges are for. It will always be possible to load one tiny thing and force recompilation of a huge amount of code. For example, loading a module that does this:
import Base: +
+(x::Int, y::Int) = 0 # surprise!
will invalidate any method that adds Ints, so future calls will have to compile everything they need from scratch. The bigger problem is indeed tricky, but questions like “can we at least cache more of the results of inference? will it help in some circumstances?” are certainly going to have affirmative answers.
OT: Wasn’t actions like this - overwriting methods in other modules - the classical counterargument to method merging cross modules? I wasn’t aware that this is possible for Base, i always had the impression you can extend Base.
Looks like there were efforts to do “Speculative compilation support in LLVM ORC JIT Infrastructure” for Julia as Google Summer of Code 2019 project: New LLVM JIT Features - #5 by jpsamaroo