Why is rust compilation faster than Julia pre-compilation?

Ok, but if the package has no init function there’s not much to compile in terms of the functionalities of the package, right?

Then compilation will mostly occur at run time (first run), and is not what’s been compared here.

On the other side, when there are precompilation directives, they might involve more than what’s required by the dependency tree.

What it seems to me is that to know if the machinery of rust compilation of the same code is faster (it probably is), more careful tests should be performed, and the differencences here might be related more to compilation involving different things.

Or not?

Actually, I realize I was not propagating the trace correctly.

$ JULIA_DEBUG=loading julia --project -e "Base.PRECOMPILE_TRACE_COMPILE[] = \"Comonicon.trace\"; using Comonicon"
┌ Debug: Skipping mtime check for file /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/LazyArtifacts/src/LazyArtifacts.jl used by /home/mkitti/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/compiled/v1.10/LazyArtifacts/MRP8l_jWrrO.ji, since it is a stdlib
└ @ Base loading.jl:3129
┌ Debug: Loading object cache file /home/mkitti/.julia/compiled/v1.10/ExproniconLite/2CPrV_Jb1xZ.so for ExproniconLite [55351af7-c7e9-48d6-89ff-24e801d99491]
└ @ Base loading.jl:1057
┌ Debug: Loading object cache file /home/mkitti/.julia/compiled/v1.10/OrderedCollections/LtT3J_sUOrk.so for OrderedCollections [bac558e1-5e72-5ebc-8fee-abe8a469f55d]
└ @ Base loading.jl:1057
┌ Debug: Loading object cache file /home/mkitti/.julia/compiled/v1.10/Configurations/2z6N1_Jb1xZ.so for Configurations [5218b696-f38b-4ac9-8b61-a12ec717816d]
└ @ Base loading.jl:1057
┌ Debug: Loading object cache file /home/mkitti/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/compiled/v1.10/LazyArtifacts/MRP8l_jWrrO.so for LazyArtifacts [4af54fe1-eca0-43a8-85a7-787d91b784e3]
└ @ Base loading.jl:1057
┌ Debug: Loading object cache file /home/mkitti/.julia/compiled/v1.10/Scratch/ICI1U_SjAHO.so for Scratch [6c6a2e73-6563-6170-7368-637461726353]
└ @ Base loading.jl:1057
┌ Debug: Loading object cache file /home/mkitti/.julia/compiled/v1.10/RelocatableFolders/Yg3O9_7JKAU.so for RelocatableFolders [05181044-ff0b-4ac5-8273-598c1e38db00]
└ @ Base loading.jl:1057
┌ Debug: Loading object cache file /home/mkitti/.julia/compiled/v1.10/Glob/3FzEV_7JKAU.so for Glob [c27321d9-0574-5035-807b-f59d2c89b15c]
└ @ Base loading.jl:1057
┌ Debug: Loading object cache file /home/mkitti/.julia/compiled/v1.10/PackageCompiler/MMV8C_Jb1xZ.so for PackageCompiler [9b87118b-4619-50d2-8e1e-99f35a4d4d9d]
└ @ Base loading.jl:1057
┌ Debug: Loading object cache file /home/mkitti/.julia/compiled/v1.10/Comonicon/ylrB3_Jb1xZ.so for Comonicon [863f3e99-da2a-4334-8734-de3dacbe5542]
└ @ Base loading.jl:1057

It looks like

precompile(Tuple{typeof(Base.CoreLogging.shouldlog), Logging.ConsoleLogger, Base.CoreLogging.LogLevel, Module, Symbol, Symbol})
precompile(Tuple{typeof(Base.get), Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, Symbol, Nothing})
precompile(Tuple{typeof(Base.CoreLogging.handle_message), Logging.ConsoleLogger, Base.CoreLogging.LogLevel, Vararg{Any, 6}})
precompile(Tuple{typeof(Base.isopen), Base.GenericIOBuffer{Array{UInt8, 1}}})
precompile(Tuple{typeof(Logging.default_metafmt), Base.CoreLogging.LogLevel, Vararg{Any, 5}})
precompile(Tuple{typeof(Base.string), Module})
precompile(Tuple{Type{Base.IOContext{IO_t} where IO_t<:IO}, Base.GenericIOBuffer{Array{UInt8, 1}}, Base.TTY})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:bold, :color), Tuple{Bool, Symbol}}, typeof(Base.printstyled), Base.IOContext{Base.GenericIOBuffer{Array{UInt8, 1}}}, String})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:bold, :color), Tuple{Bool, Symbol}}, typeof(Base.printstyled), Base.IOContext{Base.GenericIOBuffer{Array{UInt8, 1}}}, String, Vararg{String}})
precompile(Tuple{Base.var"##printstyled#995", Bool, Bool, Bool, Bool, Bool, Bool, Symbol, typeof(Base.printstyled), Base.IOContext{Base.GenericIOBuffer{Array{UInt8, 1}}}, String, Vararg{Any}})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:bold, :italic, :underline, :blink, :reverse, :hidden), NTuple{6, Bool}}, typeof(Base.with_output_color), Function, Symbol, Base.IOContext{Base.GenericIOBuffer{Array{UInt8, 1}}}, String, Vararg{Any}})
precompile(Tuple{typeof(Base.write), Base.TTY, Array{UInt8, 1}})
precompile(Tuple{typeof(Comonicon.include), String})
precompile(Tuple{typeof(Configurations.option_m), Module, Expr})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:source,), Tuple{Nothing}}, typeof(ExproniconLite.split_struct), Expr})
precompile(Tuple{typeof(ExproniconLite.flatten_blocks), Expr})
precompile(Tuple{typeof(ExproniconLite.flatten_blocks), LineNumberNode})
precompile(Tuple{Type{Array{LineNumberNode, 1}}, UndefInitializer, Tuple{Int64}})
precompile(Tuple{typeof(Base.collect_to_with_first!), Array{LineNumberNode, 1}, LineNumberNode, Base.Generator{Array{Any, 1}, typeof(ExproniconLite.flatten_blocks)}, Int64})
precompile(Tuple{typeof(ExproniconLite.flatten_blocks), Symbol})
precompile(Tuple{Type{Array{Symbol, 1}}, UndefInitializer, Tuple{Int64}})
precompile(Tuple{typeof(Base.collect_to_with_first!), Array{Symbol, 1}, Symbol, Base.Generator{Array{Any, 1}, typeof(ExproniconLite.flatten_blocks)}, Int64})
precompile(Tuple{Type{Array{Expr, 1}}, UndefInitializer, Tuple{Int64}})
precompile(Tuple{typeof(Base.collect_to_with_first!), Array{Expr, 1}, Expr, Base.Generator{Array{Any, 1}, typeof(ExproniconLite._flatten_blocks)}, Int64})
precompile(Tuple{typeof(Base.setindex_widen_up_to), Array{LineNumberNode, 1}, Expr, Int64})
precompile(Tuple{typeof(Base.collect_to!), Array{Any, 1}, Base.Generator{Array{Any, 1}, typeof(ExproniconLite.flatten_blocks)}, Int64, Int64})
precompile(Tuple{typeof(ExproniconLite._flatten_blocks), Bool})
precompile(Tuple{typeof(Base.setindex_widen_up_to), Array{Expr, 1}, Bool, Int64})
precompile(Tuple{typeof(Base.collect_to!), Array{Any, 1}, Base.Generator{Array{Any, 1}, typeof(ExproniconLite._flatten_blocks)}, Int64, Int64})
precompile(Tuple{typeof(ExproniconLite._flatten_blocks), String})
precompile(Tuple{typeof(Base.setindex_widen_up_to), Array{Expr, 1}, String, Int64})
precompile(Tuple{typeof(ExproniconLite._flatten_blocks), Int64})
precompile(Tuple{typeof(Base.setindex_widen_up_to), Array{Expr, 1}, Int64, Int64})
precompile(Tuple{typeof(ExproniconLite._flatten_blocks), Symbol})
precompile(Tuple{typeof(Base.collect_to_with_first!), Array{Symbol, 1}, Symbol, Base.Generator{Array{Any, 1}, typeof(ExproniconLite._flatten_blocks)}, Int64})
precompile(Tuple{typeof(Base.setindex_widen_up_to), Array{Symbol, 1}, Expr, Int64})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:source,), Tuple{Nothing}}, typeof(ExproniconLite.split_field_if_match), Symbol, LineNumberNode, Bool})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:source,), Tuple{LineNumberNode}}, typeof(ExproniconLite.split_field_if_match), Symbol, Expr, Bool})
precompile(Tuple{Base.var"##s128#247", Vararg{Any, 5}})
precompile(Tuple{typeof(Base._nt_names), Type{NamedTuple{(:doc, :line), Tuple{Nothing, LineNumberNode}}}})
precompile(Tuple{typeof(Base.merge), NamedTuple{(:name, :type, :isconst, :default), Tuple{Symbol, Symbol, Bool, Expr}}, NamedTuple{(:doc, :line), Tuple{Nothing, LineNumberNode}}})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:name, :type, :isconst, :default, :doc, :line), Tuple{Symbol, Symbol, Bool, Expr, Nothing, LineNumberNode}}, Type{ExproniconLite.JLKwField}})
precompile(Tuple{typeof(Base.merge), NamedTuple{(:name, :type, :isconst, :default), Tuple{Symbol, Symbol, Bool, Bool}}, NamedTuple{(:doc, :line), Tuple{Nothing, LineNumberNode}}})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:name, :type, :isconst, :default, :doc, :line), Tuple{Symbol, Symbol, Bool, Bool, Nothing, LineNumberNode}}, Type{ExproniconLite.JLKwField}})
precompile(Tuple{typeof(Base.merge), NamedTuple{(:name, :type, :isconst, :default), Tuple{Symbol, Symbol, Bool, String}}, NamedTuple{(:doc, :line), Tuple{Nothing, LineNumberNode}}})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:name, :type, :isconst, :default, :doc, :line), Tuple{Symbol, Symbol, Bool, String, Nothing, LineNumberNode}}, Type{ExproniconLite.JLKwField}})
precompile(Tuple{typeof(Base.merge), NamedTuple{(:name, :type, :isconst, :default), Tuple{Symbol, Symbol, Bool, Int64}}, NamedTuple{(:doc, :line), Tuple{Nothing, LineNumberNode}}})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:name, :type, :isconst, :default, :doc, :line), Tuple{Symbol, Symbol, Bool, Int64, Nothing, LineNumberNode}}, Type{ExproniconLite.JLKwField}})
precompile(Tuple{typeof(Base.merge), NamedTuple{(:name, :type, :isconst, :default), Tuple{Symbol, Expr, Bool, Int64}}, NamedTuple{(:doc, :line), Tuple{Nothing, LineNumberNode}}})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:name, :type, :isconst, :default, :doc, :line), Tuple{Symbol, Expr, Bool, Int64, Nothing, LineNumberNode}}, Type{ExproniconLite.JLKwField}})
precompile(Tuple{Type{ExproniconLite.JLKwStruct}, Symbol, Nothing, Bool, Array{Any, 1}, Nothing, Array{ExproniconLite.JLKwField, 1}, Array{ExproniconLite.JLFunction, 1}, Nothing, Nothing, Array{Any, 1}})
precompile(Tuple{typeof(Base.:(==)), Symbol, Type})
precompile(Tuple{typeof(Base.:(==)), Symbol, GlobalRef})
precompile(Tuple{typeof(Base.:(==)), Symbol, Expr})
precompile(Tuple{typeof(Base.:(==)), Expr, Type})
precompile(Tuple{typeof(Base.:(==)), Expr, GlobalRef})
precompile(Tuple{typeof(Base.:(==)), Expr, Expr})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:name, :args, :kwargs, :whereparams, :body), Tuple{Symbol, Array{Any, 1}, Array{Expr, 1}, Nothing, Expr}}, Type{ExproniconLite.JLFunction}})
precompile(Tuple{typeof(ExproniconLite.codegen_ast), Expr})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:name, :args, :kwargs, :whereparams, :body), Tuple{Expr, Array{Expr, 1}, Array{Expr, 1}, Array{Expr, 1}, Expr}}, Type{ExproniconLite.JLFunction}})
precompile(Tuple{ExproniconLite.var"#11#18"{Expr}, Expr})
precompile(Tuple{typeof(Base.isequal), QuoteNode, QuoteNode})
precompile(Tuple{Type{Pair{A, B} where B where A}, Expr, Expr})
precompile(Tuple{typeof(ExproniconLite.rm_single_block), Expr})
precompile(Tuple{Type{Pair{A, B} where B where A}, Expr, Bool})
precompile(Tuple{typeof(ExproniconLite.codegen_ast), Bool})
precompile(Tuple{Type{Pair{A, B} where B where A}, Expr, String})
precompile(Tuple{typeof(ExproniconLite.codegen_ast), String})
precompile(Tuple{Type{Pair{A, B} where B where A}, Expr, Int64})
precompile(Tuple{typeof(ExproniconLite.codegen_ast), Int64})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:name, :args, :body, :whereparams), Tuple{Expr, Array{Expr, 1}, Expr, Array{Expr, 1}}}, Type{ExproniconLite.JLFunction}})
precompile(Tuple{typeof(Base.merge), NamedTuple{(:name, :type, :isconst, :default), Tuple{Symbol, Expr, Bool, Expr}}, NamedTuple{(:doc, :line), Tuple{Nothing, LineNumberNode}}})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:name, :type, :isconst, :default, :doc, :line), Tuple{Symbol, Expr, Bool, Expr, Nothing, LineNumberNode}}, Type{ExproniconLite.JLKwField}})
precompile(Tuple{typeof(Base.any), Function, Array{Any, 1}})
precompile(Tuple{typeof(Base._any), ExproniconLite.var"#46#50"{Symbol}, Array{Any, 1}, Base.Colon})
precompile(Tuple{ExproniconLite.var"#46#50"{Symbol}, Symbol})
precompile(Tuple{typeof(Base.merge), NamedTuple{(:name, :type, :isconst, :default), Tuple{Symbol, Symbol, Bool, ExproniconLite.NoDefault}}, NamedTuple{(:doc, :line), Tuple{Nothing, LineNumberNode}}})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:name, :type, :isconst, :default, :doc, :line), Tuple{Symbol, Symbol, Bool, ExproniconLite.NoDefault, Nothing, LineNumberNode}}, Type{ExproniconLite.JLKwField}})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:name, :args, :kwargs, :whereparams, :body), Tuple{Symbol, Array{Any, 1}, Array{Any, 1}, Nothing, Expr}}, Type{ExproniconLite.JLFunction}})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:name, :args, :kwargs, :whereparams, :body), Tuple{Expr, Array{Expr, 1}, Array{Any, 1}, Array{Expr, 1}, Expr}}, Type{ExproniconLite.JLFunction}})
precompile(Tuple{Type{Pair{A, B} where B where A}, Expr, ExproniconLite.NoDefault})
precompile(Tuple{typeof(ExproniconLite.codegen_ast), ExproniconLite.NoDefault})
precompile(Tuple{typeof(ExproniconLite.flatten_blocks), String})
precompile(Tuple{Type{Array{String, 1}}, UndefInitializer, Tuple{Int64}})
precompile(Tuple{typeof(Base.collect_to_with_first!), Array{String, 1}, String, Base.Generator{Array{Any, 1}, typeof(ExproniconLite.flatten_blocks)}, Int64})
precompile(Tuple{ExproniconLite.var"#46#50"{Symbol}, String})
precompile(Tuple{typeof(Base.setindex_widen_up_to), Array{Expr, 1}, Symbol, Int64})
precompile(Tuple{typeof(Base.merge), NamedTuple{(:name, :type, :isconst, :default), Tuple{Symbol, Expr, Bool, Symbol}}, NamedTuple{(:doc, :line), Tuple{Nothing, LineNumberNode}}})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:name, :type, :isconst, :default, :doc, :line), Tuple{Symbol, Expr, Bool, Symbol, Nothing, LineNumberNode}}, Type{ExproniconLite.JLKwField}})
precompile(Tuple{Type{Pair{A, B} where B where A}, Expr, Symbol})
precompile(Tuple{typeof(ExproniconLite.codegen_ast), Symbol})
precompile(Tuple{typeof(Comonicon.AST.include), String})
precompile(Tuple{typeof(Base.promote_typeof), LineNumberNode, Expr, Vararg{Expr}})
precompile(Tuple{typeof(Base.promote_typeof), Expr, Expr})
precompile(Tuple{typeof(Comonicon.Builder.include), String})
precompile(Tuple{typeof(Base._str_sizehint), Tuple{Expr, Symbol}})
precompile(Tuple{typeof(Base.print), Base.GenericIOBuffer{Array{UInt8, 1}}, Tuple{Expr, Symbol}})
precompile(Tuple{typeof(Base.hashindex), Tuple{Module, String, Float64}, Int64})
precompile(Tuple{typeof(Base.isequal), Tuple{Module, String, Float64}, Tuple{Module, String, Float64}})
julia> using PkgCacheInspector

julia> info_cachefile("Comonicon")
Contents of /home/mkitti/.julia/compiled/v1.10/Comonicon/ylrB3_Jb1xZ.so:
  modules: Any[Comonicon.Configs, Comonicon.Arg, Comonicon.AST, Comonicon.JuliaExpr, Comonicon.ZSHCompletions, Comonicon.BashCompletions, Comonicon.Tools, Comonicon.Builder, Comonicon]
  208 external methods
  3 new specializations of external methods (Comonicon 33.3%, Comonicon.Builder 33.3%, Comonicon.AST 33.3%)
  file size:   979904 (956.938 KiB)
  Segment sizes (bytes):
  system:      588244 ( 60.72%)
  isbits:      312272 ( 32.23%)
  symbols:      15782 (  1.63%)
  tags:          9033 (  0.93%)
  relocations:  43322 (  4.47%)
  gvars:           72 (  0.01%)
  fptrs:           48 (  0.00%)

Related to the compile times, depending on the design of your Rust crate linker time might make up for a decent amount of the build times. Rust changed to default to lld on most systems which helped there, and there is some more work being done towards using newer linkers (e.g. you can easily plug in mold or try this highly experimental linker that intends to become an incremental linker in the future Reddit - Dive into anything).

I know nothing useful about Rust, but I’d guess the two forms of caching and compilation are not very comparable. Rust presumably does “closed world compilation” in which you know in advance everything that will ever be needed from the code. In Julia we do “open world caching and compilation,” in which we cache not only the assembly but everything (type definitions, method definitions, constants, module definitions) that we need to compile new specializations later. I’d guess there’s really nothing comparable in Rust?

14 Likes

Rust has a solid generics system which is pervasive in the ecosystem so I wouldn’t be surprised if there are some compiler artifacts of generic functions that hang around to speed up compilation of new methods. (But I am not aware what actually gets done here; it could involve no caching at all).

I guess this^ be a contributor to the observed difference. If I want to only precompile the code needed to run a very specific script or my test suite, this might be a bit overkill. Especially when it ends up precompiling “everything” (much of which will not be used).

Maybe is there some way to get something in the middle of the two for Julia? In a REPL setting I certainly would agree the “open world” paradigm makes much more sense (probably part of why evcxr is much slower than the Julia REPL), so you can have all possible branches of code already compiled for you. But for other applications like shipping packages, or even quickly running an entire test suite, it is perhaps not as relevant and would add to bloat and longer compilation times.


But more generally I don’t think it’s the compilation model alone that makes up for the difference. I have found that compilation times of Rust projects are much faster than comparably complex projects in C/C++. As others have shared in the thread, Rust has been putting years of effort into developing new tricks for speeding this process up – so probably useful to borrow insights from some of these.

For example this one sounds pretty nifty:

1 Like

We do not even have a way to link Julia compile caches together in the traditional sense although LLD is now part of the Julia standard library. We distribute a lld executable with Julia.

For the most part Julia code loading is a bit complicated due to multiple dispatch among other things. The entire compiler architecture is built around ORCv2 JIT. Caching native code to disk is relative new (since Julia 1.9) and is really just an approximation of a dynamic library.

The problem is that many of these insights are not applicable because of the aforementioned “open world” paradigm and multiple dispatch. Essentially, we need to sacrifice some dynamism in order to benefit from some of these optimizations.

Something that I think we should consider are some kind of “sealed” modules or “sealed” functions that would allow us to close parts of Julia. If we knew that a certain module were closed or could prove that we have all the methods available, then certain optimizations become much more readily available.

There’s a good reason why multiple dispatch is not very widely adopted. There’s a fair amount of complexity that it introduces to the compilation model. A great experiment is now underway now that Mojo is entering the field. It’s purposely not implementing features like multiple dispatch in favor of a simpler compilation model.

2 Likes

I can relate to this right now because the other day I was wondering how AOT compilation handled generic functions over all supported data types in libraries wrapping C/C++/Rust code, ie what’s their equivalent to a bunch of precompile statements? Tried and failed for half an hour finding a search query to reach this obvious question.

How much do you think this has to do with headers vs modules?
Boost reported some nice improvements from their experiments with C++ modules – at least, when the dependent modules were already built.

To make a Julia parallel, modules mean that if dependencies like ExproniconLite.jl and OrderedCollections.jl are already precompiled, building Comonicon.jl, which depends on them, will be a lot faster when using C++ modules than when using C++ headers.

Modules are still a long ways from actually being “production ready” in C++ (e.g., the language server won’t work on a project using C++ modules!).
My point is just that Rust using modules could give comparable Rust projects a quicker rebuild time, while clean builds may be more comparable.

That said, here I observed an initial TTFX time of 20-48 seconds for Julia (excluding all the @time using, let along precompilation!), while a full C++ build of code doing roughly the same thing took <5s on my computer.

This blog post did some benchmarking and optimization of Rust and C++ compile times. It found incremental builds did better in C++, but in a project where there was a lot more to build incrementally. That is, not the header-only case like from the boost example. The particular example you’re benchmarking matters a great deal.

2 Likes

This is a cool idea. Maybe you could simply assert that any function declared const would not be assigned additional methods? Or, in the case of modules, any function declared within a const module could not receive additional methods outside of the module?

Then with those guarantees maybe you could take advantage of intra-package parallelism in the compilation.

Like

const module A

f(x) = x
f(x::Int) = x^2
f(x::Float64) = x/2

end

# A.f(_) will not receive more methods

So you are basically guaranteeing an absence of future invalidations.

2 Likes

This is the root of it, really—Julia’s dynamism is at odds with powerful static optimizations, Rust’s raison d’etre. But perhaps Julia can be culled into a satisfactory subset which is more compiler-friendly?

Julia is effectively a LISP masquerading as Fortran and achieving performance by carrying a whole LLVM around and somehow wrangling it to be JIT. The question here is whether we can somehow determine (or declare) what code ought to be compiled and what ought to be discarded.

With every design choice in that direction, a language is going to look a little more like Rust, or OCaml. Take TypeScript, for instance. TypeScript takes JavaScript, another language pretending not to be a LISP (this time masquerading as C/Java), applies some ML-like restrictions and gains some very nice features as a result of its compiler-friendly syntax.

Julia already has robust types, but no such “restricted” language subset that guarantees static behavior. For a dynamic language, Julia’s type system is unusually expressive via multiple dispatch. And unlike most typed languages, Julia’s execution model is unusual in that it does not exploit types as a means of validating a program’s behavior prior to execution; in Julia, it cannot be said that a function’s definition is “complete”.

In other words, Julia’s use of types is first and foremost a language feature, whereas languages with more traditional “compile-based” execution models like Java or C++ offer types as a language feature and are able to obtain guarantees about program behavior by exploiting type hierarchies.

Is there some tweak that can be made in order to obtain guarantees about a Julia program’s behavior? Almost like the inverse of Rust’s unsafe keyword.

I haven’t used Julia on GPUs but I seem to recall that GPUCompiler is limited to a restricted subset (although not guaranteed by language semantics), is this the case?

This reminds me of Kotlin’s sealed classes and interfaces, which restricts inheritance of a class or interface to classes/interfaces defined in the same package (i.e. directory). The reason for the feature in Kotlin is not related to performance (afaik), but is to provide users with sum types, and the compiler with a mechanism to validate and and enforce the sum types (or, something similar to sum types).

The analogy to Julia would be method instances, rather than classes/interfaces, but the effect of such a feature would be similar; making guarantees to the compiler that the scope of this thing (a function) is restricted to this set (method instances in this file or module), excluding it from an otherwise dynamic default behavior (potential invalidation?).

Very interesting. One thought I have is that a module as simple as this could (in theory) be outright compiled as a standalone binary, but of course the issue is really about dynamic calls like if f(x) = B.y(x).

I guess my main questions are:

  • how much Julia code could reasonably be marked as const?
  • is the code that would benefit most in performance from being made “constant” the same code that is most often re-used (hence invalidated)?

Essentially, I wonder if the code that is most problematic in causing invalidations is precisely the code that is the least feasible to make constant because it is widely used across packages

5 Likes

I think i’m rehashing what others have said a bit, but in essence I believe the difference in compile time performance is due to 3 things.

  1. Rust has spent a loot of dev time improving their compile times in the last couple years, leading to very nice improvements while julia has spent most of it’s developer time on this area on improving caching and it’s still doing that.
  2. The caching model julia uses is similar to other LISPs and smalltalk, where we basically save the session state into a big binary and reload it later, this is done because of the dinamicity of the language. This means that julia potentially saves a lot more code than it might use when it’s loaded.
  3. Julia’s compilation unit here is a package which is potentially bigger than rust’s compilation units, so precompiling for small changes is potentially a lot more expensive.

Some of the future work is to try and change our compilation model slightly, with benefits of startup time, compilation time and binary size (they all kind of scale together). One idea is to remove most of the dynamicity on specific programs, by defining entry points like main or shared library entry points, which would allow us to throw away and potentially not even compile lots of unnecessary code

22 Likes

I suppose you’re not really compiling the dependencies, they’re already precompiled. You just changed clap(?), and why should the rest be recompiled?

Basically this is my reason for Julia precompilation slow, it compiles too much:

When I see precomping happening, what’s worse is that completely unrelated code, not my dependencies get compiled, but I assume you watched and saw only those 8, still all of them, compiled?

So why does that happen? I blame inlining and/or other issues. A lot of code doesn’t need it, isn’t speed-critical like your code (though some needs it, at least to some degree).

https://doc.rust-lang.org/rustc/codegen-options/index.html

inline-threshold

This option lets you set the default threshold for inlining a function. It takes an unsigned integer as a value. Inlining is based on a cost model, where a higher threshold will allow more inlining.
The default depends on the opt-level:

opt-level Threshold
0 N/A, only inlines always-inline functions
1 N/A, only inlines always-inline functions and LLVM lifetime intrinsics
2 225
3 275
s 75
z 25

I would like to know a similar table for Julia, is it Julia or LLVM that controls this?

You can try starting Julia with -O0 or in other ways disable inlining.

That seems unfair, since you used -O3 in Rust, so you are inlining. But Rust precompiles and stores it (I assume), and for a binary does not need to run all the steps, i.e. for linking can inline, i.e. do partial compilation. I just think for now Julia can’t.

That’s also a good guess, but I think first step is getting rid of phases, thus their allocations fully. Then Julia can deallocate early, e.g. with Bumper.jl, and with such I think Rust has no advantage, but then it would need to be in Base, or a similar system like using _malloca (which is not available nor alloca). I think LLVM does partially deallocate early for its part, [it has a similar LLVM vector feature, to _malloca, to substitute for std::vector.]

Profiling, and knowing for sure what actually happens, and is the slow part would be useful…

It’s not exactly the same, but Rust does have a form of this with generics/traits. Any use within the crate can be compiled like a normal function, but generic functions that are crate-public can’t can’t be compiled without a concrete type. This of course isn’t known until building dependents, so generics wind up having similar characteristics as inline functions. (More below)

Rust compilation works by lowering through a few different stages:

  1. Source code/AST
  2. HIR: high-level IR, basically source code with macros expanded and some desugaring
  3. THIR: typed HIR, HIR after types have been resolved
  4. MIR: mid-level IR, more or less a Rust form of what gets handed to LLVM. This is where most rust-specific optimizations and transformations happen
  5. LLVM IR/GCC interface/Cranelift IR
  6. Generated code

You can see the output of these different steps on the playground by clicking the “…” dropdown next to the “Run” button in the top left.

Turning a generic function into a concretely-typed function is the monomorphization phase, which happens at the MIR level. The dev guide give some background here Monomorphization - Rust Compiler Development Guide, but the page linked from the API docs is probably immediately useful to Julia devs - it has a nice description of the tradeoffs in splitting generic code into CGUs for better compilation times.

For reference, anyone is always welcome to drop by the Rust development Zulip to ask specific questions https://rust-lang.zulipchat.com/. Quite a few people have spent a lot of time squeezing the best performance possible out of LLVM, there is definitely some knowledge to be shared. Otherwise, the dev guide and compiler API docs are reasonably comprehensive in describing how everything works.

+1, lld vs ld is a significant improvement and mold is even faster. Somebody is working on an incremental linker which will also be quite interesting (video).

4 Likes

Note that tools like JET.jl provide some form of pre-execution validation. Perhaps not “complete” in the theoretical sense, but very useful in practice.

1 Like

It’s a bit funny since the Rust compiler is famously slow. People have been complaining about it forever. For example have a look at this article from 2020. The old Rust website has a FAQ entry “Rust compilation seems slow. Why is that?” dicussing compilation speed compared to C++ (not a high bar). Compilation speed has improved over time (see Nethercote’s blog linked by @Zentrik) but it’s still one of the main concerns in the 2023 Rust survey.

From the the parallel front-end anouncement:

Rust compile times are a perennial concern

From Kobzol’s blog (2023) (see also this post from last month):

One of the most common complaints about Rust is that Rust programs are slow to compile. I won’t go into the reasons why that is the case here (short version: cargo compiles the “whole world” from scratch and Rust made several design decisions that favor runtime speed instead of compilation speed), but even though I have some opinions on the definition of “being slow” (more on that below), the fact remains that many Rust developers consider the compiler to be slow enough so that it represents a very real bottleneck to their development workflow, which is a big issue that we cannot afford to ignore.

See also for example here and here.

5 Likes

FWIW, Rust’s compilation unit is the crate, which can include several files.

1 Like

Rust seems to use cgus at the llvm stage as their compilation unit per rustc_monomorphize::partitioning - Rust. Though it’s not clear to me how big a cgu is compared to a typical crate with typical compile settings.

The default value, if not specified, is 16 for non-incremental builds. For incremental builds the default is 256 which allows caching to be more granular.

https://doc.rust-lang.org/rustc/codegen-options/index.html

2 Likes