It’s great to have this option even if slower.
Object boxing can be a performance killer (e.g. why virtual functions in C++ are). I doubt bdwgc
is the reason for slowness (testing if you have more malloc calls would help or profile in other ways), at least you are responsible for memory layout, and that GC can’t improve the situation if object boxing is already happening, but will need to follow pointers, so will be slower than regular Julia’s, since Julia’s GC isn’t handicapped it in the same way by your generated memory layout (except for type-unstable code, which often implies more memory allocations).
I can guess why you get object boxing, when Julia doesn’t. A generic multiple dispatch method in Julia implies many functions, specializations based on types, infinite in fact, why I asked if you use C++ templates. If you compile one Julia function
to one non-templated C++ function, then it must have object boxing.
Thanks, I didn’t know of the unexported Base.code_typed_by_type
, you’re sure you use it directly, not indirectly, since then hypothetically the compiler could break, since it’s not part of Julia’s stable API. I think you mean you called code_typed
which is part of the API, which calls it, and does little else, and gave in my case same result:
julia> tt = Base.signature_type(+, (Float64, Int64,));
julia> Base.code_typed_by_type(tt)
1-element Vector{Any}:
CodeInfo(
1 ─ %1 = Base.sitofp(Float64, y)::Float64
│ %2 = Base.add_float(x, %1)::Float64
└── return %2
) => Float64
You're sure you're not calling like this:
julia> Base.code_typed_by_type(tt; optimize=false)
1-element Vector{Any}:
CodeInfo(
1 ─ %1 = Base.:+::Core.Const(+)
│ %2 = Base.promote(x, y)::Tuple{Float64, Float64}
│ %3 = Core._apply_iterate(Base.iterate, %1, %2)::Float64
└── return %3
) => Float64
Similar to [`code_typed`](@ref), except the argument is a tuple type describing
a full signature to query.
"""
function code_typed_by_type(@nospecialize(tt::Type);
[..]
error("code reflection cannot be used from generated functions")
You only bypass a little bit of (that gives the same answer in my case):
function code_typed(@nospecialize(f), @nospecialize(types=default_tt(f)); kwargs...)
if isa(f, Core.OpaqueClosure)
return code_typed_opaque_closure(f; kwargs...)
end
tt = signature_type(f, types)
return code_typed_by_type(tt; kwargs...)
end
Did you consider to compile to other languages than C++ (I suppose you mean almost C), or C#? It seems to Rust wouldn’t help, since Julia doesn’t (yet) have its semantics. Actually to me to C# or Java seems sensible (Java isn’t slow when you avoid its object boxing, such is just very non-idiomatic code), which has very good GC available, also to Go. If Julia had Rust semantics, then to Vale would be interesting (most interesting language, safer than Rust… easy, and as fast as C++). Also maybe to:
It’s unclear is this enough to use CxxWrap.jl and PythonCall.jl (in both directions)?
For .NET 6 “Single-file apps (extraction-free) can be published for Linux, macOS, and Windows (previously only Linux).” Then for .NET 8:
Compile your .NET apps into native code that uses less memory and starts instantly. No need to wait for the JIT (just-in-time) compiler to compile the code at run time. No need to deploy the JIT compiler and IL code. AOT apps deploy just the code that’s needed for your app. Your app is now empowered to run in restricted environments where a JIT compiler isn’t allowed.
[Including fewer limitations then in .NET 7, including now macOS; and experimental AOT support for iOS, tvOS and Android “Experimental, no built-in Java interop”]
Did/do you compile to C# in the old version? Or .NET CLR directly? Why change to C++? Did you have the same overhead with C#, more, less?
syslabcrt.so is your runtime, written in C at 1MB. It seems like that’s the min. size for your compiled programs such as “Hello world”, when you exclude the 5MB libcrosstrace.so (and build.jl?).
You also use system libm.so.6, Julia still requires but has eliminated it as a needed dependency except for 32-bit Windows, so unclear if you really need it. Also you do not support threads(?) so why use libpthread.so.0?