SyslabCC: Suzhou-Tongyuan's proprietary Julia AOT compiler is now available for free use (personal & educational license only)

thautwarm · May 25, 2024, 3:32pm

For someone trying to use the compiler on Linux ARM64/macOS machine:
The current release does not work. Only Windows x86_64 and Linux amd64 build are provided.

For Linux ARM64 users:
You could do cross-compilation to target Linux ARM64, but the compiler itself needs to run under aforementioned platforms.

DM me if you need an ARM build, I could talk with our manager to apply one on Monday. Or you could wait until June 30 for next release, for which we might release the support for C++ target (generating C++ source code instead of binaries).

For macOS users:

If there’s no commercial push, the compiler itself won’t directly support macOS recently. The generated C++ source code may compile on macOS.

Palli · May 25, 2024, 3:57pm

Well you are so awesome. This was unexpected (this soon), except from you.

This has pros and cons as I see it:

The proprietary license is a con, that some would not touch (I guess a pro for you, money…, so curious what it costs for those needing to pay). Since this is optional, people can stay with plain Julia or e.g. StaticCompiler.jl that DOES have some advantages over this compiler, probably only the Windows support, and smaller size. Do you have any other limitations, not shared by it?

SyslabCC converts Julia programs into small binaries (executable or shared libs), these binaries are small (1~2MB for small projects) and do not depend on libjulia.

That size is great for most users, StaticCompiler.jl gives you smaller sizes (at least for e.g. Hello world), because it doesn’t link to libjulia nor any of the usual runtime libs, and misses out, neither OpenBLAS but has some possible alternative.

I understand you compile Julia to C++. I.e. all the licenses of your original code apply, e.g. if you have GPL source code, but the replacement runtime you provide is I assume at least royalty-free for all users. Or actually freely licensed? Since if it’s proprietary then it will conflict with GPL/copyleft.

C++ uses RAII, it seems you can’t compile to such idiomatic C++ code, do you add a Boehm-style GC? At least you do not use Julia’s (now multi-threaded) GC, since it’s part of libjulia, unless you ripped it out. This can be (a pro or) a con, missing out on performance of Julia’s GC.

I doubt your GC is the reason for the min. size of the compiled code you provide, I guess this is it:

github.com/JuliaLang/julia

Julia without libstdc++

opened 07:19PM - 22 May 24 UTC

PallHaraldsson

Julia doesn't need C++ except for LLVM I believe, but not libstdc++ (I assumed b…efore needed for all C++ programs): https://stackoverflow.com/questions/3714167/g-without-libstdc-can-it-be-done-a-very-configurable-lightweight-libstd >Writing c++ programs without libstdc++ is easy. I'm doing it for decades. Just avoid linking with libstdc++. That's easy: either use gcc for linking instead of g++, or provide a fake libstdc++ with only new, del and a few other functions. What first caught my eye, and this program is cool, and should also support Julia: https://github.com/LorenDB/polyglot/pull/1/files >**Note:** For C++, zig toolchain uses only [LLVM]**C++ ABI** (no [GNU] `libstdc++`) by default. https://github.com/LorenDB/polyglot/blob/master/capabilities.md A. I want to know if this is practical, or any known objections, so I first want to open an issue, in case there are known problems. B. FYI: E.g. Swift (also D) allows for bidirectional C++ interop (with limitations currently i.e. not new C++moduls support), so if we need libstdc++ then maybe consider c++call addition to Julia...: https://www.swift.org/documentation/cxx-interop/

Is the C++ code you generate readable? Does it compile to templated C++ code, since Julia’s code is generic? Even if readable I doubt you want to modify the transpiled C++ code, though you could, pros and cons for that. Can you then use any C++ compiler? I suppose, or not a Windows compiler (why that limitation)?

SyslabCC supports full Julia syntax (this is obvious as we started at the IR level).

I.e. current and future Julia syntax; and really all future Julia semantics (at least in Julia’s standard lib) assuming such Julia code type-stable, and compiling to same LLVM IR?

SyslabCC supports const global variables

I.e. only if const though this doesn’t seem like a huge limitation, you want globals const anyway, only do not do by accident, or maybe in exploratory/REPL programming.

One global variable is the RNG state. You do support rand just not for multiple treads, which is one or your limitations (that likely can be lifted?).

SyslabCC supports calling into blas, so a * b is supported

Is that a limitation, i.e. matrix division and anything more complex than +, * not supported, or were you just not specific? I.e. all of OpenBLAS is actually supported, and e.g. BLIS.jl if opting into that?

Since you compile to C++ you could e.g. use your code from R I suppose (not just Python)? C++ is commonly used there, though I think with some special interface, so I do not expect a drop-in support, just a hypothetical. Or you think easy?

I do not see a good reason native Julia to be (faster or) slower than the speed of the compiled code with this compiler:

Can anyone try to compile some of the worse performing benchmarks, such as the one above? In that case it’s 156 times slower than some other languages, likely because of allocations (for a tree) and free deferred done by GC.

The benchmark game has so far disallowed AOT compiled code for Julia, not C#) since it’s non-idiomatic Julia code when StaticCompiler.jl is used, but with this one likely would be ok, since same code. People might claim the code is fast because it’s C++ (not totally wrongly, but it would be an implementation detail).

thautwarm · May 25, 2024, 5:09pm

Well, Palli, the questions you asked are great. However, I might not respond them on the weekends. I will answer with more details in a GH repo, but now I just answer a few of them.

I’m just an employee, and I personally think there are tremendous benefits to open-sourcing this compiler: people could the static compiler to enable more use cases for Julia, and eases TTFX issues, which also greatly helps the company’s products which integrate some open-source Julia libraries. I did show my attitude to the managers multiple times, but that’s the fact. I’m not a decision maker, but I also understand the small company adopts Julia needs financial gain to continue.

Respecting licenses is the basic thing. In terms the compiler itself, several open-source libraries under MIT license are used (e.g., bdwgc). If users use our compiler to link shared libraries under GPL license or other licenses not compatible to MIT, the mechanisms should be similar to GCC or any other compiler.

We uses bdwgc.

That is not the reason. The AOT compiler achieves smaller size by trimming. Besides, we reimplement all Julia intrinsics, and write our own compilation pipeline based on code_typed_by_type. You can find some design details in my previous reply at this thread.

I will give the design later in a GH repo, with more detailed slides and code examples.

Not readable if you compare with the source code. I did say that we started at the IR level.
The codegen target now needs templates and overloading due to our approach to transpile Julia intrinsics. Hence, we now need C++ or similar languages to be the target.

Can you then use any C++ compiler? I suppose, or not a Windows compiler (why that limitation)?

I think there is no hard restriction to the platform or the compiler. In the later version, we’ll produce pure C++ source code (but also shared libraries mentioned in ccall) with a CMake project. You can even build binaries for Android/iOS.

Right but no LLVM IR. We do want to target LLVM IR, which hugely reduces the tasks of implementing some intrinsics. Unfortunately, the internal goal of our compiler is to support C/C++ codegen. I don’t think LLVM CBE is production ready.

I mean, when you do static compilation, you should avoid referencing them. Besides, we do can allow the occurrences of visiting non-const global variables, but it will throw when the execution path gets triggered.

We also support rand, not using Base.rand but a random algorithm developed by our Math group in Suzhou-Tongyuan, just to avoid non-const global variables and libuv tasks. Anyway, too many details to expand here, I’ll give more info in the GH repo.

a * b may require BLAS in Julia, and our AOT compiler respects this.

In the default case, the generated binaries link to OpenBLAS. However, Thanks to JuliaLinearAlgebra/libblastrampoline, linking to other BLAS implementation needs only a few ccalls.

Well, we export function symbols, just like what we did in C with a C compiler.

See this code and you might get it immediately.

if @isdefined(SyslabCC)
    # we export the symbol `cfft`
    SyslabCC.static_compile(
        "cfft",
        myfft!,
        (Ptr{ComplexF64}, Ptr{ComplexF64}, Int32, Int32)
    )
end

Unfortunately, binaries produced by the AOT compiler are usually slightly slower than Julia.

The main reason could be the GC. bdwgc can be too general, object boxing costs much. In some computation-intensive cases, we find the code get 50% slower without preallocation.

We might be faster on exception handling, but we finally figure out that is due to our simplification to the stacktrace.

I’ll do this on Monday. It’s very interesting to have such comparison.

Palli · May 27, 2024, 12:39am

It’s great to have this option even if slower.

Object boxing can be a performance killer (e.g. why virtual functions in C++ are). I doubt bdwgc is the reason for slowness (testing if you have more malloc calls would help or profile in other ways), at least you are responsible for memory layout, and that GC can’t improve the situation if object boxing is already happening, but will need to follow pointers, so will be slower than regular Julia’s, since Julia’s GC isn’t handicapped it in the same way by your generated memory layout (except for type-unstable code, which often implies more memory allocations).

I can guess why you get object boxing, when Julia doesn’t. A generic multiple dispatch method in Julia implies many functions, specializations based on types, infinite in fact, why I asked if you use C++ templates. If you compile one Julia function to one non-templated C++ function, then it must have object boxing.

Thanks, I didn’t know of the unexported Base.code_typed_by_type, you’re sure you use it directly, not indirectly, since then hypothetically the compiler could break, since it’s not part of Julia’s stable API. I think you mean you called code_typed which is part of the API, which calls it, and does little else, and gave in my case same result:

julia> tt = Base.signature_type(+, (Float64, Int64,));
julia> Base.code_typed_by_type(tt)
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = Base.sitofp(Float64, y)::Float64
│   %2 = Base.add_float(x, %1)::Float64
└──      return %2
) => Float64

You're sure you're not calling like this:

julia> Base.code_typed_by_type(tt; optimize=false)
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = Base.:+::Core.Const(+)
│   %2 = Base.promote(x, y)::Tuple{Float64, Float64}
│   %3 = Core._apply_iterate(Base.iterate, %1, %2)::Float64
└──      return %3
) => Float64

Similar to [`code_typed`](@ref), except the argument is a tuple type describing
a full signature to query.
"""
function code_typed_by_type(@nospecialize(tt::Type);
[..]
        error("code reflection cannot be used from generated functions")

You only bypass a little bit of (that gives the same answer in my case):

function code_typed(@nospecialize(f), @nospecialize(types=default_tt(f)); kwargs...)
    if isa(f, Core.OpaqueClosure)
        return code_typed_opaque_closure(f; kwargs...)
    end
    tt = signature_type(f, types)
    return code_typed_by_type(tt; kwargs...)
end

Did you consider to compile to other languages than C++ (I suppose you mean almost C), or C#? It seems to Rust wouldn’t help, since Julia doesn’t (yet) have its semantics. Actually to me to C# or Java seems sensible (Java isn’t slow when you avoid its object boxing, such is just very non-idiomatic code), which has very good GC available, also to Go. If Julia had Rust semantics, then to Vale would be interesting (most interesting language, safer than Rust… easy, and as fast as C++). Also maybe to:

It’s unclear is this enough to use CxxWrap.jl and PythonCall.jl (in both directions)?

For .NET 6 “Single-file apps (extraction-free) can be published for Linux, macOS, and Windows (previously only Linux).” Then for .NET 8:

Compile your .NET apps into native code that uses less memory and starts instantly. No need to wait for the JIT (just-in-time) compiler to compile the code at run time. No need to deploy the JIT compiler and IL code. AOT apps deploy just the code that’s needed for your app. Your app is now empowered to run in restricted environments where a JIT compiler isn’t allowed.

[Including fewer limitations then in .NET 7, including now macOS; and experimental AOT support for iOS, tvOS and Android “Experimental, no built-in Java interop”]

Did/do you compile to C# in the old version? Or .NET CLR directly? Why change to C++? Did you have the same overhead with C#, more, less?

syslabcrt.so is your runtime, written in C at 1MB. It seems like that’s the min. size for your compiled programs such as “Hello world”, when you exclude the 5MB libcrosstrace.so (and build.jl?).

You also use system libm.so.6, Julia still requires but has eliminated it as a needed dependency except for 32-bit Windows, so unclear if you really need it. Also you do not support threads(?) so why use libpthread.so.0?

thautwarm · May 27, 2024, 2:36am

The AOT compiler always do code specialization. We generate versioned code for distinct MethodInstances.

Besides, we need templates and overloading in the target language, as given in the previous replies. The reason why we need these features is not related to the specialization of Julia functions, but only related to the support of Julia’s intrinsics.

The point is not the API, but the access to typed & optimized CodeInfo. I said code_typed_by_type here because the compiler also supports AOT compilation for closures (where the callable object is not a singleton).

We have to focus on the need for our company’s internal use first. Besides, you might know the automotive industry generally accepts code generation targeting old-fashioned languages such as C/C++ (even C++ might get rejected), but that is not the case for Rust or any other languages not verified in that industry.

This is a runtime thing only for debugging, because retrieving stacktraces and emulating vanilla Julia’s stacktrace behaviors is generally not reliable for the small native binaries. When deploying your applications, I believe you should use some logging systems instead of stacktraces created by a trimmed binary. You could use --notraceback to avoid libcrosstrace.

I did miss build.jl which comes from the example.

build.jl

function simulate!(pU::Ptr{Cdouble}, pu::Ptr{Cdouble}, Nx, Ny, Nt, α, dx, dy, dt)
    try
        U = unsafe_wrap(Array, pU, (Nx, Ny, Nt); own=false)
        u = unsafe_wrap(Array, pu, (Nx, Ny); own=false)
        simulate!(U, u, Nx, Ny, Nt, α, dx, dy, dt)
        return true
    catch
        return false
    end
end

function simulate!(U::Array, u::Matrix, Nx, Ny, Nt, α, dx, dy, dt)
    u = reshape(u, size(u)..., 1)
    u = @view u[:, :, 1]

    # 有限差分法的时间迭代
    for n in 1:Nt
        u_new = @view U[:, :, n]
        u_new .= u
        for i in 2:(Nx-1)
            for j in 2:(Ny-1)
                u_new[i, j] = u[i, j] + α * dt / dx^2 * (u[i+1, j] - 2 * u[i, j] + u[i-1, j]) +
                              α * dt / dy^2 * (u[i, j+1] - 2 * u[i, j] + u[i, j-1])
            end
        end
        u = u_new
    end
end

if @isdefined(SyslabCC)
    SyslabCC.static_compile(
        "simulate", simulate!,
        Ptr{Cdouble}, Ptr{Cdouble},         # U 和 u
        Int64, Int64, Int64,                # Nx, Ny, Nt
        Cdouble, Cdouble, Cdouble, Cdouble, # α, dx, dy, dt
    )
end

We don’t manage things like libm but math.h is required for the C/C++ part.
It seems that libpthread is required by the parallel boehm GC, and the compiled binaries does partially support threads (being called from non-main threads in limited cases). We’re actively working to improve thread support.

GeorgeGkountouras · May 27, 2024, 1:38pm

This would preclude many potential target platforms (e.g. microcontrollers). Are there any benefits compared to the C++ target?

thautwarm · May 27, 2024, 1:50pm

This is quite a worse case for our AOT compiler as it allocates too much…

I made a benchmark with limited modifications (use println for IO and use mutable structs for self-recursive structs).

Code for vanilla Julia: bintree-jl.jl

using Printf

struct Empty # singleton type: Empty() === Empty()
end

mutable struct Node
    left::Union{Node,Empty}
    right::Union{Node,Empty}
end

const empty = Node(Empty(), Empty())

function make(d)
    if d == 0
        empty
    else
        Node(make(d-1), make(d-1))
    end
end

check(t::Empty) = 0
check(t::Node) = 1 + check(t.left) + check(t.right)

function loop_depths(d, min_depth, max_depth)
    for i = 0:div(max_depth - d, 2)
        niter = 1 << (max_depth - d + min_depth)
        c = 0
        for j = 1:niter
            c += check(make(d)) 
        end
        @printf("%i\t trees of depth %i\t check: %i\n", niter, d, c)
        d += 2
    end
end

function perf_binary_trees(N::Int=10)
    min_depth = 4
    max_depth = N
    stretch_depth = max_depth + 1

    # create and check stretch tree
    let c = check(make(stretch_depth))
        @printf("stretch tree of depth %i\t check: %i\n", stretch_depth, c)
    end

    long_lived_tree = make(max_depth)

    loop_depths(min_depth, min_depth, max_depth)
    @printf("long lived tree of depth %i\t check: %i\n", max_depth, check(long_lived_tree))

end

n = parse(Int,ARGS[1])
perf_binary_trees(n)

Code for AOT (SyslabCC)

using Printf

struct Empty # singleton type: Empty() === Empty()
end

mutable struct Node
    left::Union{Node,Empty}
    right::Union{Node,Empty}
end

const empty = Node(Empty(), Empty())

function make(d)
    if d == 0
        empty
    else
        Node(make(d - 1), make(d - 1))
    end
end

check(t::Empty) = 0
check(t::Node) = 1 + check(t.left) + check(t.right)

function loop_depths(d, min_depth, max_depth)
    for i = 0:div(max_depth - d, 2)
        niter = 1 << (max_depth - d + min_depth)
        c = 0
        for j = 1:niter
            c += check(make(d))
        end
        println("$niter\t trees of depth $d\t check: $c")
        d += 2
    end
end

function perf_binary_trees(N::Int=10)
    min_depth = 4
    max_depth = N
    stretch_depth = max_depth + 1

    # create and check stretch tree
    let c = check(make(stretch_depth))
        println("stretch tree of depth $stretch_depth\t check: $c")
    end

    long_lived_tree = make(max_depth)

    loop_depths(min_depth, min_depth, max_depth)
    println("long lived tree of depth $max_depth\t check: $(check(long_lived_tree))")
end

function perf_binary_trees_main(cstr::Cstring)
    perf_binary_trees(parse(Int, unsafe_string(cstr)))
    return Cint(0)
end

if @isdefined(SyslabCC)
    SyslabCC.static_compile(
        "perf_binary_trees",
        perf_binary_trees_main,
        Cstring,
    )
end

Our AOT does not support ARGS which is dynamic, so we need a main.c for the entrypoint.

/** compile with:
   scc bintree-aot.jl -o libbintree.dll --allow-dynamic
   gcc main.c -o main.exe -L"." -lbintree
/*

#include "stdio.h"

int perf_binary_trees(char *);

int main(int argc, char *argv[]) {
  if (argc != 2) {
    printf("Usage: %s <number of nodes>\n", argv[0]);
    return 1;
  }
  return perf_binary_trees(argv[1]);
}

The result matches:
Vanilla Julia:

> time julia bintree-jl.jl 21
stretch tree of depth 22         check: 8388607
2097152  trees of depth 4        check: 65011712
524288   trees of depth 6        check: 66584576
131072   trees of depth 8        check: 66977792
32768    trees of depth 10       check: 67076096
8192     trees of depth 12       check: 67100672
2048     trees of depth 14       check: 67106816
512      trees of depth 16       check: 67108352
128      trees of depth 18       check: 67108736
32       trees of depth 20       check: 67108832
long lived tree of depth 21      check: 4194303

real    0m8.725s
user    0m0.015s
sys     0m0.000s

AOT:

# AOT
> time ./main.exe 21
stretch tree of depth 22         check: 8388607
2097152  trees of depth 4        check: 65011712
524288   trees of depth 6        check: 66584576
131072   trees of depth 8        check: 66977792
32768    trees of depth 10       check: 67076096
8192     trees of depth 12       check: 67100672
2048     trees of depth 14       check: 67106816
512      trees of depth 16       check: 67108352
128      trees of depth 18       check: 67108736
32       trees of depth 20       check: 67108832
long lived tree of depth 21      check: 4194303

real    0m17.455s
user    0m0.000s
sys     0m0.015s

The performance is relatively stable, so I just time them once.

This the result for this worst case:

Runtime & Compiler	N	Time
Julia	21	8.73s
AOT/SyslabCC	21	17.5s
Julia	20	4.23s
AOT/SyslabCC	20	8.64s
Julia	16	0.649s
AOT/SyslabCC	16	0.467s
Julia	10	0.427s
AOT/SyslabCC	10	0.036s
Julia	5	0.419s
AOT/SyslabCC	5	0.032s

Besides, I did some optimization to the code, or the AOT compiler fails due to “Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS”:

const empty = Node(Empty(), Empty())

function make(d)
    if d == 0
        # instead of `Node(Empty(), Empty())`, 
        # we use a global constant.
        empty
    else
        Node(make(d - 1), make(d - 1))
    end
end

Palli · May 27, 2024, 2:06pm

The C++ target is non-ideomatic C++ code, i.e. it needs GC added, and the GC for C++ must in general be conservative (and I forget if any is real-time, Java actually has hard-realtime GC as an option). Some GC based languages can have more aggressive GC. I still think C++ is ok, and the GC not the problem, since the main problem is likely object boxing, and it’s bad for any language. And it’s wishful thinking that e.g. C# can optimize object boxing away, I think a very non-existing hypothetical JIT optimizer could do it…

Anyway:
https://www.nanoframework.net/

Making it easy to write C# code for embedded systems.

thautwarm · May 27, 2024, 2:11pm

Java/C# targets might be good for the AOT compiler for many reasons.

However, there is one thing critical that disable us from using Java/C#‘s mature GC: JVM/CLR uses compact GC (EDIT: a compact GC prevents you from getting a stable physical address of a heap object), but in Julia, Stdlib depends on pointer_from_objref or other C functions to access Julia objects’ physical addresses.

This is also a reason for why we need the boehm GC.

Palli · May 27, 2024, 2:18pm

$ hyperfine 'julia -e ""'
Benchmark 1: julia -e ""
  Time (mean ± σ):     970.4 ms ± 2138.8 ms    [User: 205.5 ms, System: 118.1 ms]
  Range (min … max):   197.6 ms … 7024.1 ms    10 runs

Julia (1.10.3) still has a non-small startup cost, but subtracting it you’re even (0.419-0.1976)/0.032 = 6.7x faster there (and 13x with the overhead), meaning Julia’s allocations likely worse. You’re less scalable, likely meaning your GC not aggressive enough freeing. Can it simply be tuned somehow? I suggested mimalloc privately, I think it could also help, it’s a drop-in replacement (for you, not Julia) with just an ENV var enabling it.

[That benchmark is the worst case for Julia, and I think all the other languages somehow “cheat”, i.e. use arenas. You CAN do the same in Julia, it’s just no idiomatic, and not allowed, unlike for the other languages. It will also be interesting to see your AOT compiled code with some of the other benchmarks, that are easier, e.g. the short running onces, where Julia is likely fastest already when startup-cost is eliminated, and your AOT compiler would show that.]

pierre-haessig · May 30, 2024, 4:07pm

Hello, Suzhou-Tongyuan website mentions Modelica at several places, but I didn’t get details about what is provided. Does it provide a Modelica simulator/compiler? A custom one or based on an existing one (OpenModelica’s?)

thautwarm · May 31, 2024, 8:27am

Hi Pierre, as I have limited knowledge of Modelica-related works in our company (I work in the Julia team), I submitted your question to the relevant team.

I’m glad to forward their response below:

First of all, thank you for your interest in Suzhou Tongyuan.

MWORKS.Sysplorer is a commercial Modelica simulation environment developed by Suzhou Tongyuan. It is a Modelica-based visual modeling and simulation platform for multi-domain engineering systems. It provides a visual modeling studio, an effective Modelica compiler and symbolic analyzer, as well as powerful postprocessors for curves, schema, and 3D animation. Engineering tools, such as experiment design and multi-object optimization, are included in MWorks. MWorks supports interfaces with CAD, FEM, Matlab/Simulink, and FMI. In particular, it can import general CAD files and FEM modal data into the 3D animation postprocessor, including STL, SAT, HSF, 3DS, DXF, and MNF formats. It is also convenient to customize and expand MWorks through C/C++ interfaces, COM components, and Python scripts. For more details, please visit the Modelica official website: https://modelica.org/tools/

We apologize for the inconvenience, but Suzhou Tongyuan’s English website is still under construction. However, you can download MWORKS.Sysplorer from the Suzhou Tongyuan Chinese website. We have provided MWORKS.Sysplorer with English support. After installation, you can set the software language to English through the menu “Tool” → “Language”. Download link: https://www.tongyuan.cc/download

You are welcome to try it out, and if needed, please contact us for a trial license.

For your questions:

Yes.

Not at all. You can find details in the Modelica official site mentioned above.

Vincent_Liao · June 10, 2024, 1:09am

Hello, thanks for your hard work. And we have tested your compiler in our own julia code, but we have encountered some errors. And our error displays:

scc WEM3D.jl -o libWEM3D.dll --mode shared --allow-dynamic --bundle
ERROR: LoadError: ArgumentError: Package Base does not have Distributed in its dependencies:
- You may have a partially installed environment. Try `Pkg.instantiate()`
  to ensure all packages in the environment are installed.
- Or, if you have Base checked out for development and have
  added Distributed as a dependency but haven't updated your primary
  environment's manifest file, try `Pkg.resolve()`.
- Otherwise you may need to report an issue with Base
Stacktrace:
 [1] macro expansion
   @ .\loading.jl:1634 [inlined]
 [2] macro expansion
   @ .\lock.jl:267 [inlined]
 [3] require(into::Module, mod::Symbol)
   @ Base .\loading.jl:1611
 [4] include
   @ .\Base.jl:457 [inlined]
 [5] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::String)
   @ Base .\loading.jl:2049
 [6] top-level scope
   @ stdin:3
in expression starting at D:\julia_code_w\lib_compiled_test\WEM3D\src\WEM3D.jl:2
in expression starting at stdin:3
System.Exception: Julia error: LoadError: LoadError: Failed to precompile WEM3D [c412bc1b-78d3-4f95-9579-377fcaf123f6] to "C:/Users/Public/TongYuan/.julia\\compiled\\v1.9\\WEM3D\\jl_BB47.tmp".  
Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:35
  [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, keep_loaded_modules::Bool)
    @ Base .\loading.jl:2300
  [3] compilecache
    @ .\loading.jl:2167 [inlined]
  [4] _require(pkg::Base.PkgId, env::String)
    @ Base .\loading.jl:1805
  [5] _require_prelocked(uuidkey::Base.PkgId, env::String)
    @ Base .\loading.jl:1660
  [6] macro expansion
    @ .\loading.jl:1648 [inlined]
  [7] macro expansion
    @ .\lock.jl:267 [inlined]
  [8] require(into::Module, mod::Symbol)
    @ Base .\loading.jl:1611
  [9] include(fname::String)
    @ Base.MainInclude .\client.jl:478
 [10] top-level scope
    @ D:\julia_code_w\lib_compiled_test\WEM3D\src\WEM3D.jl:7
 [11] include(fname::String)
    @ Base.MainInclude .\client.jl:478
 [12] JLCallImpl(out::Ptr{TyJuliaCAPI.JV}, funcProxy::TyJuliaCAPI.JV, argProxies::TyJuliaCAPI.TyList{TyJuliaCAPI.JV}, kwargProxies::TyJuliaCAPI.TyList{TyJuliaCAPI.TyTuple{TyJuliaCAPI.JSym, TyJuliaCAPI.JV}}, dotcall::Bool)
    @ TyJuliaCAPI C:\Users\Public\TongYuan\.julia\packages\TyJuliaCAPI\nIboO\src\core.jl:120     
 [13] JLCall(out::Ptr{TyJuliaCAPI.JV}, funcProxy::TyJuliaCAPI.JV, argProxies::TyJuliaCAPI.TyList{TyJuliaCAPI.JV}, kwargProxies::TyJuliaCAPI.TyList{TyJuliaCAPI.TyTuple{TyJuliaCAPI.JSym, TyJuliaCAPI.JV}})
    @ TyJuliaCAPI C:\Users\Public\TongYuan\.julia\packages\TyJuliaCAPI\nIboO\src\core.jl:147     
in expression starting at D:\julia_code_w\lib_compiled_test\WEM3D\src\CSEMFwdSolver\CSEMFwdSolver.jl:14
in expression starting at D:\julia_code_w\lib_compiled_test\WEM3D\src\WEM3D.jl:7
   at Syslab.Common.JV.juliaCallEx(JV[], Dictionary`2, Boolean) + 0x1fe
   at Syslab.Compiler.EntryPoint.MainImpl(EntryPoint.Options, CmdParser`1) + 0x209
   at Syslab.Compiler.EntryPoint.Main(String[]) + 0x43

# our own package uses the Distributed package
using Distributed
using DistributedArrays

Our own written package uses the Distributed package for parallel computing, does this error means your compiler haven’t prepared well for the Distributed package?

thautwarm · June 12, 2024, 6:32am

Distributed.jl is not supported yet and might not get supported in the near future.
I think you need start a real Julia process for using Distributed.jl.
The AOT compiler might help with compiling the computation logics into shared libraries and reducing runtime dependencies.
If you talk more about your concrete use case, I may give useful suggestions (e.g., if it is fine for your program to use C/C++ solution for multi-processing instead of Distributed.jl, you can then call shared libraries compiled with the Julia AOT compiler).

tim.holy · July 18, 2024, 7:16pm

To second this, Distributed is in need of a maintainer who knows enough about writing type-stable code to re-engineer/redesign the library. A lot of work to improve inferability has gone (and continues to go) into much of Julia and its standard libraries, but speaking for myself, Distributed was, I believe, the one standard library I refused to touch. (I don’t use it, and its problems run deep.) To be clear, I think most of the rest of Julia’s stdlibs are on a path to progress and eventual (or current) usability, so Distributed is really a huge outlier in this regard. To me it seems unlikely that you’ll ever be able to use it conveniently in compiled binaries unless some hero dives in and fixes the biggest issues.

Related: Distributed vs threading · Issue #36 · JuliaLinearAlgebra/NonNegLeastSquares.jl · GitHub

xgdgsc · August 6, 2024, 7:41am

What about license for open-source project? Like use for Gradual Julia-ization of Python libraries ?

thautwarm · August 7, 2024, 1:20pm

I believe the code generated by SyslabCC is yours, if you access SyslabCC in an official approach (at least for this version).

I’m pushing the progress of getting an official claim about license.

Besides, I can confirm that any collaborations are highly welcome and we are willing to provide standalone SyslabCC distribution without the whole Syslab to reduce most of the cost.

xiaodai · October 1, 2024, 5:08am

this project is rather rad!

Question for @thautwarm, was there a consideration to do this in python? why julia?

thautwarm · October 21, 2024, 5:08am

It is because the semantics of Python does not fit for AOT/high performance.

Python semantics makes it hard to optimize, it is much more dynamic (i.e., harder to optimize) than JS. Since the v8 JS engine (state-of-the-art JIT powered by Google and a bunch of compiler experts) is still much slower than Julia (at least in HPC cases), I cannot image how much resources need to be invested to make it usable for our scenarios.

Even if you don’t consider performance but just want an AOT compiler to produce standalone releases, the semantics of Python also goes too dynamic and does not fit in AOT compilation. Cython, Nuitka and other Python-to-C compilers still need to bundle the CPython runtime and other resources (this also prevents cross compilation just like Golang or SyslabCC).

Besides, I personally have spent TOO MUCH time on the study of CPython performance boosting. I used to develop an abstraction framework for Python JIT during 2020-2022, stealing some ideas from Julia:

Things might be do-able for Python, but the resources needed to make progress are ridiculous.

Bless faster-cpython, the progress looks good.

xiaodai · October 21, 2024, 6:25am

I heard Python was hard to optimise because the object model is not locked down and can veer off into wild directions.

Topic		Replies	Views
Trial of a Commercial Julia AOT Compiler with Offline License Checking Tooling aot , staticcompiler	11	1245	July 9, 2024
Chinese software MWROKS Community question	2	401	September 3, 2024
Static Compilation in Julia New to Julia question	8	1215	January 3, 2025
Ahead-Of-Time (AOT) Compiler General Usage	6	5386	September 9, 2020
AOT compiling using PackageCompiler New to Julia	14	2428	October 31, 2018

SyslabCC: Suzhou-Tongyuan's proprietary Julia AOT compiler is now available for free use (personal & educational license only)

Related topics