SyslabCC: Suzhou-Tongyuan's proprietary Julia AOT compiler is now available for free use (personal & educational license only)

For someone trying to use the compiler on Linux ARM64/macOS machine:
The current release does not work. Only Windows x86_64 and Linux amd64 build are provided.

For Linux ARM64 users:
You could do cross-compilation to target Linux ARM64, but the compiler itself needs to run under aforementioned platforms.

DM me if you need an ARM build, I could talk with our manager to apply one on Monday. Or you could wait until June 30 for next release, for which we might release the support for C++ target (generating C++ source code instead of binaries).

For macOS users:

If there’s no commercial push, the compiler itself won’t directly support macOS recently. The generated C++ source code may compile on macOS.

4 Likes

Well you are so awesome. This was unexpected (this soon), except from you.

This has pros and cons as I see it:

The proprietary license is a con, that some would not touch (I guess a pro for you, money
, so curious what it costs for those needing to pay). Since this is optional, people can stay with plain Julia or e.g. StaticCompiler.jl that DOES have some advantages over this compiler, probably only the Windows support, and smaller size. Do you have any other limitations, not shared by it?

  1. SyslabCC converts Julia programs into small binaries (executable or shared libs), these binaries are small (1~2MB for small projects) and do not depend on libjulia.

That size is great for most users, StaticCompiler.jl gives you smaller sizes (at least for e.g. Hello world), because it doesn’t link to libjulia nor any of the usual runtime libs, and misses out, neither OpenBLAS but has some possible alternative.

I understand you compile Julia to C++. I.e. all the licenses of your original code apply, e.g. if you have GPL source code, but the replacement runtime you provide is I assume at least royalty-free for all users. Or actually freely licensed? Since if it’s proprietary then it will conflict with GPL/copyleft.

C++ uses RAII, it seems you can’t compile to such idiomatic C++ code, do you add a Boehm-style GC? At least you do not use Julia’s (now multi-threaded) GC, since it’s part of libjulia, unless you ripped it out. This can be (a pro or) a con, missing out on performance of Julia’s GC.

I doubt your GC is the reason for the min. size of the compiled code you provide, I guess this is it:

Is the C++ code you generate readable? Does it compile to templated C++ code, since Julia’s code is generic? Even if readable I doubt you want to modify the transpiled C++ code, though you could, pros and cons for that. Can you then use any C++ compiler? I suppose, or not a Windows compiler (why that limitation)?

  1. SyslabCC supports full Julia syntax (this is obvious as we started at the IR level).

I.e. current and future Julia syntax; and really all future Julia semantics (at least in Julia’s standard lib) assuming such Julia code type-stable, and compiling to same LLVM IR?

  1. SyslabCC supports const global variables

I.e. only if const though this doesn’t seem like a huge limitation, you want globals const anyway, only do not do by accident, or maybe in exploratory/REPL programming.

One global variable is the RNG state. You do support rand just not for multiple treads, which is one or your limitations (that likely can be lifted?).

  1. SyslabCC supports calling into blas, so a * b is supported

Is that a limitation, i.e. matrix division and anything more complex than +, * not supported, or were you just not specific? I.e. all of OpenBLAS is actually supported, and e.g. BLIS.jl if opting into that?

Since you compile to C++ you could e.g. use your code from R I suppose (not just Python)? C++ is commonly used there, though I think with some special interface, so I do not expect a drop-in support, just a hypothetical. Or you think easy?

I do not see a good reason native Julia to be (faster or) slower than the speed of the compiled code with this compiler:

Can anyone try to compile some of the worse performing benchmarks, such as the one above? In that case it’s 156 times slower than some other languages, likely because of allocations (for a tree) and free deferred done by GC.

The benchmark game has so far disallowed AOT compiled code for Julia, not C#) since it’s non-idiomatic Julia code when StaticCompiler.jl is used, but with this one likely would be ok, since same code. People might claim the code is fast because it’s C++ (not totally wrongly, but it would be an implementation detail).

2 Likes

Well, Palli, the questions you asked are great. However, I might not respond them on the weekends. I will answer with more details in a GH repo, but now I just answer a few of them.

I’m just an employee, and I personally think there are tremendous benefits to open-sourcing this compiler: people could the static compiler to enable more use cases for Julia, and eases TTFX issues, which also greatly helps the company’s products which integrate some open-source Julia libraries. I did show my attitude to the managers multiple times, but that’s the fact. I’m not a decision maker, but I also understand the small company adopts Julia needs financial gain to continue.

Respecting licenses is the basic thing. In terms the compiler itself, several open-source libraries under MIT license are used (e.g., bdwgc). If users use our compiler to link shared libraries under GPL license or other licenses not compatible to MIT, the mechanisms should be similar to GCC or any other compiler.

We uses bdwgc.

That is not the reason. The AOT compiler achieves smaller size by trimming. Besides, we reimplement all Julia intrinsics, and write our own compilation pipeline based on code_typed_by_type. You can find some design details in my previous reply at this thread.

I will give the design later in a GH repo, with more detailed slides and code examples.

Not readable if you compare with the source code. I did say that we started at the IR level.
The codegen target now needs templates and overloading due to our approach to transpile Julia intrinsics. Hence, we now need C++ or similar languages to be the target.

Can you then use any C++ compiler? I suppose, or not a Windows compiler (why that limitation)?

I think there is no hard restriction to the platform or the compiler. In the later version, we’ll produce pure C++ source code (but also shared libraries mentioned in ccall) with a CMake project. You can even build binaries for Android/iOS.

Right but no LLVM IR. We do want to target LLVM IR, which hugely reduces the tasks of implementing some intrinsics. Unfortunately, the internal goal of our compiler is to support C/C++ codegen. I don’t think LLVM CBE is production ready.

I mean, when you do static compilation, you should avoid referencing them. Besides, we do can allow the occurrences of visiting non-const global variables, but it will throw when the execution path gets triggered.

We also support rand, not using Base.rand but a random algorithm developed by our Math group in Suzhou-Tongyuan, just to avoid non-const global variables and libuv tasks. Anyway, too many details to expand here, I’ll give more info in the GH repo.

a * b may require BLAS in Julia, and our AOT compiler respects this.

In the default case, the generated binaries link to OpenBLAS. However, Thanks to JuliaLinearAlgebra/libblastrampoline, linking to other BLAS implementation needs only a few ccalls.

Well, we export function symbols, just like what we did in C with a C compiler.

See this code and you might get it immediately.

if @isdefined(SyslabCC)
    # we export the symbol `cfft`
    SyslabCC.static_compile(
        "cfft",
        myfft!,
        (Ptr{ComplexF64}, Ptr{ComplexF64}, Int32, Int32)
    )
end

Unfortunately, binaries produced by the AOT compiler are usually slightly slower than Julia.

The main reason could be the GC. bdwgc can be too general, object boxing costs much. In some computation-intensive cases, we find the code get 50% slower without preallocation.

We might be faster on exception handling, but we finally figure out that is due to our simplification to the stacktrace.

I’ll do this on Monday. It’s very interesting to have such comparison.

24 Likes

It’s great to have this option even if slower.

Object boxing can be a performance killer (e.g. why virtual functions in C++ are). I doubt bdwgc is the reason for slowness (testing if you have more malloc calls would help or profile in other ways), at least you are responsible for memory layout, and that GC can’t improve the situation if object boxing is already happening, but will need to follow pointers, so will be slower than regular Julia’s, since Julia’s GC isn’t handicapped it in the same way by your generated memory layout (except for type-unstable code, which often implies more memory allocations).

I can guess why you get object boxing, when Julia doesn’t. A generic multiple dispatch method in Julia implies many functions, specializations based on types, infinite in fact, why I asked if you use C++ templates. If you compile one Julia function to one non-templated C++ function, then it must have object boxing.

Thanks, I didn’t know of the unexported Base.code_typed_by_type, you’re sure you use it directly, not indirectly, since then hypothetically the compiler could break, since it’s not part of Julia’s stable API. I think you mean you called code_typed which is part of the API, which calls it, and does little else, and gave in my case same result:

julia> tt = Base.signature_type(+, (Float64, Int64,));
julia> Base.code_typed_by_type(tt)
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = Base.sitofp(Float64, y)::Float64
│   %2 = Base.add_float(x, %1)::Float64
└──      return %2
) => Float64

You're sure you're not calling like this:

julia> Base.code_typed_by_type(tt; optimize=false)
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = Base.:+::Core.Const(+)
│   %2 = Base.promote(x, y)::Tuple{Float64, Float64}
│   %3 = Core._apply_iterate(Base.iterate, %1, %2)::Float64
└──      return %3
) => Float64
Similar to [`code_typed`](@ref), except the argument is a tuple type describing
a full signature to query.
"""
function code_typed_by_type(@nospecialize(tt::Type);
[..]
        error("code reflection cannot be used from generated functions")

You only bypass a little bit of (that gives the same answer in my case):

function code_typed(@nospecialize(f), @nospecialize(types=default_tt(f)); kwargs...)
    if isa(f, Core.OpaqueClosure)
        return code_typed_opaque_closure(f; kwargs...)
    end
    tt = signature_type(f, types)
    return code_typed_by_type(tt; kwargs...)
end

Did you consider to compile to other languages than C++ (I suppose you mean almost C), or C#? It seems to Rust wouldn’t help, since Julia doesn’t (yet) have its semantics. Actually to me to C# or Java seems sensible (Java isn’t slow when you avoid its object boxing, such is just very non-idiomatic code), which has very good GC available, also to Go. If Julia had Rust semantics, then to Vale would be interesting (most interesting language, safer than Rust
 easy, and as fast as C++). Also maybe to:

It’s unclear is this enough to use CxxWrap.jl and PythonCall.jl (in both directions)?

For .NET 6 “Single-file apps (extraction-free) can be published for Linux, macOS, and Windows (previously only Linux).” Then for .NET 8:

Compile your .NET apps into native code that uses less memory and starts instantly. No need to wait for the JIT (just-in-time) compiler to compile the code at run time. No need to deploy the JIT compiler and IL code. AOT apps deploy just the code that’s needed for your app. Your app is now empowered to run in restricted environments where a JIT compiler isn’t allowed.

[Including fewer limitations then in .NET 7, including now macOS; and experimental AOT support for iOS, tvOS and Android “Experimental, no built-in Java interop”]

Did/do you compile to C# in the old version? Or .NET CLR directly? Why change to C++? Did you have the same overhead with C#, more, less?

syslabcrt.so is your runtime, written in C at 1MB. It seems like that’s the min. size for your compiled programs such as “Hello world”, when you exclude the 5MB libcrosstrace.so (and build.jl?).

You also use system libm.so.6, Julia still requires but has eliminated it as a needed dependency except for 32-bit Windows, so unclear if you really need it. Also you do not support threads(?) so why use libpthread.so.0?

1 Like

The AOT compiler always do code specialization. We generate versioned code for distinct MethodInstances.

Besides, we need templates and overloading in the target language, as given in the previous replies. The reason why we need these features is not related to the specialization of Julia functions, but only related to the support of Julia’s intrinsics.

The point is not the API, but the access to typed & optimized CodeInfo. I said code_typed_by_type here because the compiler also supports AOT compilation for closures (where the callable object is not a singleton).

We have to focus on the need for our company’s internal use first. Besides, you might know the automotive industry generally accepts code generation targeting old-fashioned languages such as C/C++ (even C++ might get rejected), but that is not the case for Rust or any other languages not verified in that industry.

This is a runtime thing only for debugging, because retrieving stacktraces and emulating vanilla Julia’s stacktrace behaviors is generally not reliable for the small native binaries. When deploying your applications, I believe you should use some logging systems instead of stacktraces created by a trimmed binary. You could use --notraceback to avoid libcrosstrace.

I did miss build.jl which comes from the example.

build.jl
function simulate!(pU::Ptr{Cdouble}, pu::Ptr{Cdouble}, Nx, Ny, Nt, α, dx, dy, dt)
    try
        U = unsafe_wrap(Array, pU, (Nx, Ny, Nt); own=false)
        u = unsafe_wrap(Array, pu, (Nx, Ny); own=false)
        simulate!(U, u, Nx, Ny, Nt, α, dx, dy, dt)
        return true
    catch
        return false
    end
end

function simulate!(U::Array, u::Matrix, Nx, Ny, Nt, α, dx, dy, dt)
    u = reshape(u, size(u)..., 1)
    u = @view u[:, :, 1]

    # æœ‰é™ć·źćˆ†æł•çš„æ—¶é—Žèż­ä»Ł
    for n in 1:Nt
        u_new = @view U[:, :, n]
        u_new .= u
        for i in 2:(Nx-1)
            for j in 2:(Ny-1)
                u_new[i, j] = u[i, j] + α * dt / dx^2 * (u[i+1, j] - 2 * u[i, j] + u[i-1, j]) +
                              α * dt / dy^2 * (u[i, j+1] - 2 * u[i, j] + u[i, j-1])
            end
        end
        u = u_new
    end
end

if @isdefined(SyslabCC)
    SyslabCC.static_compile(
        "simulate", simulate!,
        Ptr{Cdouble}, Ptr{Cdouble},         # U 撌 u
        Int64, Int64, Int64,                # Nx, Ny, Nt
        Cdouble, Cdouble, Cdouble, Cdouble, # α, dx, dy, dt
    )
end

We don’t manage things like libm but math.h is required for the C/C++ part.
It seems that libpthread is required by the parallel boehm GC, and the compiled binaries does partially support threads (being called from non-main threads in limited cases). We’re actively working to improve thread support.

7 Likes

This would preclude many potential target platforms (e.g. microcontrollers). Are there any benefits compared to the C++ target?

1 Like

This is quite a worse case for our AOT compiler as it allocates too much


I made a benchmark with limited modifications (use println for IO and use mutable structs for self-recursive structs).

Code for vanilla Julia: bintree-jl.jl
using Printf

struct Empty # singleton type: Empty() === Empty()
end

mutable struct Node
    left::Union{Node,Empty}
    right::Union{Node,Empty}
end

const empty = Node(Empty(), Empty())

function make(d)
    if d == 0
        empty
    else
        Node(make(d-1), make(d-1))
    end
end

check(t::Empty) = 0
check(t::Node) = 1 + check(t.left) + check(t.right)

function loop_depths(d, min_depth, max_depth)
    for i = 0:div(max_depth - d, 2)
        niter = 1 << (max_depth - d + min_depth)
        c = 0
        for j = 1:niter
            c += check(make(d)) 
        end
        @printf("%i\t trees of depth %i\t check: %i\n", niter, d, c)
        d += 2
    end
end

function perf_binary_trees(N::Int=10)
    min_depth = 4
    max_depth = N
    stretch_depth = max_depth + 1

    # create and check stretch tree
    let c = check(make(stretch_depth))
        @printf("stretch tree of depth %i\t check: %i\n", stretch_depth, c)
    end

    long_lived_tree = make(max_depth)

    loop_depths(min_depth, min_depth, max_depth)
    @printf("long lived tree of depth %i\t check: %i\n", max_depth, check(long_lived_tree))

end

n = parse(Int,ARGS[1])
perf_binary_trees(n)
Code for AOT (SyslabCC)
using Printf

struct Empty # singleton type: Empty() === Empty()
end

mutable struct Node
    left::Union{Node,Empty}
    right::Union{Node,Empty}
end

const empty = Node(Empty(), Empty())

function make(d)
    if d == 0
        empty
    else
        Node(make(d - 1), make(d - 1))
    end
end

check(t::Empty) = 0
check(t::Node) = 1 + check(t.left) + check(t.right)

function loop_depths(d, min_depth, max_depth)
    for i = 0:div(max_depth - d, 2)
        niter = 1 << (max_depth - d + min_depth)
        c = 0
        for j = 1:niter
            c += check(make(d))
        end
        println("$niter\t trees of depth $d\t check: $c")
        d += 2
    end
end

function perf_binary_trees(N::Int=10)
    min_depth = 4
    max_depth = N
    stretch_depth = max_depth + 1

    # create and check stretch tree
    let c = check(make(stretch_depth))
        println("stretch tree of depth $stretch_depth\t check: $c")
    end

    long_lived_tree = make(max_depth)

    loop_depths(min_depth, min_depth, max_depth)
    println("long lived tree of depth $max_depth\t check: $(check(long_lived_tree))")
end

function perf_binary_trees_main(cstr::Cstring)
    perf_binary_trees(parse(Int, unsafe_string(cstr)))
    return Cint(0)
end

if @isdefined(SyslabCC)
    SyslabCC.static_compile(
        "perf_binary_trees",
        perf_binary_trees_main,
        Cstring,
    )
end

Our AOT does not support ARGS which is dynamic, so we need a main.c for the entrypoint.

/** compile with:
   scc bintree-aot.jl -o libbintree.dll --allow-dynamic
   gcc main.c -o main.exe -L"." -lbintree
/*

#include "stdio.h"

int perf_binary_trees(char *);

int main(int argc, char *argv[]) {
  if (argc != 2) {
    printf("Usage: %s <number of nodes>\n", argv[0]);
    return 1;
  }
  return perf_binary_trees(argv[1]);
}

The result matches:
Vanilla Julia:

> time julia bintree-jl.jl 21
stretch tree of depth 22         check: 8388607
2097152  trees of depth 4        check: 65011712
524288   trees of depth 6        check: 66584576
131072   trees of depth 8        check: 66977792
32768    trees of depth 10       check: 67076096
8192     trees of depth 12       check: 67100672
2048     trees of depth 14       check: 67106816
512      trees of depth 16       check: 67108352
128      trees of depth 18       check: 67108736
32       trees of depth 20       check: 67108832
long lived tree of depth 21      check: 4194303

real    0m8.725s
user    0m0.015s
sys     0m0.000s

AOT:

# AOT
> time ./main.exe 21
stretch tree of depth 22         check: 8388607
2097152  trees of depth 4        check: 65011712
524288   trees of depth 6        check: 66584576
131072   trees of depth 8        check: 66977792
32768    trees of depth 10       check: 67076096
8192     trees of depth 12       check: 67100672
2048     trees of depth 14       check: 67106816
512      trees of depth 16       check: 67108352
128      trees of depth 18       check: 67108736
32       trees of depth 20       check: 67108832
long lived tree of depth 21      check: 4194303

real    0m17.455s
user    0m0.000s
sys     0m0.015s

The performance is relatively stable, so I just time them once.

This the result for this worst case:

Runtime & Compiler N Time
Julia 21 8.73s
AOT/SyslabCC 21 17.5s
Julia 20 4.23s
AOT/SyslabCC 20 8.64s
Julia 16 0.649s
AOT/SyslabCC 16 0.467s
Julia 10 0.427s
AOT/SyslabCC 10 0.036s
Julia 5 0.419s
AOT/SyslabCC 5 0.032s

Besides, I did some optimization to the code, or the AOT compiler fails due to “Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS”:

const empty = Node(Empty(), Empty())

function make(d)
    if d == 0
        # instead of `Node(Empty(), Empty())`, 
        # we use a global constant.
        empty
    else
        Node(make(d - 1), make(d - 1))
    end
end
5 Likes

The C++ target is non-ideomatic C++ code, i.e. it needs GC added, and the GC for C++ must in general be conservative (and I forget if any is real-time, Java actually has hard-realtime GC as an option). Some GC based languages can have more aggressive GC. I still think C++ is ok, and the GC not the problem, since the main problem is likely object boxing, and it’s bad for any language. And it’s wishful thinking that e.g. C# can optimize object boxing away, I think a very non-existing hypothetical JIT optimizer could do it


Anyway:
https://www.nanoframework.net/

Making it easy to write C# code for embedded systems.

1 Like

Java/C# targets might be good for the AOT compiler for many reasons.

However, there is one thing critical that disable us from using Java/C#‘s mature GC: JVM/CLR uses compact GC (EDIT: a compact GC prevents you from getting a stable physical address of a heap object), but in Julia, Stdlib depends on pointer_from_objref or other C functions to access Julia objects’ physical addresses.

This is also a reason for why we need the boehm GC.

3 Likes
$ hyperfine 'julia -e ""'
Benchmark 1: julia -e ""
  Time (mean ± σ):     970.4 ms ± 2138.8 ms    [User: 205.5 ms, System: 118.1 ms]
  Range (min 
 max):   197.6 ms 
 7024.1 ms    10 runs

Julia (1.10.3) still has a non-small startup cost, but subtracting it you’re even (0.419-0.1976)/0.032 = 6.7x faster there (and 13x with the overhead), meaning Julia’s allocations likely worse. You’re less scalable, likely meaning your GC not aggressive enough freeing. Can it simply be tuned somehow? I suggested mimalloc privately, I think it could also help, it’s a drop-in replacement (for you, not Julia) with just an ENV var enabling it.

[That benchmark is the worst case for Julia, and I think all the other languages somehow “cheat”, i.e. use arenas. You CAN do the same in Julia, it’s just no idiomatic, and not allowed, unlike for the other languages. It will also be interesting to see your AOT compiled code with some of the other benchmarks, that are easier, e.g. the short running onces, where Julia is likely fastest already when startup-cost is eliminated, and your AOT compiler would show that.]

2 Likes

Hello, Suzhou-Tongyuan website mentions Modelica at several places, but I didn’t get details about what is provided. Does it provide a Modelica simulator/compiler? A custom one or based on an existing one (OpenModelica’s?)

Hi Pierre, as I have limited knowledge of Modelica-related works in our company (I work in the Julia team), I submitted your question to the relevant team.

I’m glad to forward their response below:

First of all, thank you for your interest in Suzhou Tongyuan.

MWORKS.Sysplorer is a commercial Modelica simulation environment developed by Suzhou Tongyuan. It is a Modelica-based visual modeling and simulation platform for multi-domain engineering systems. It provides a visual modeling studio, an effective Modelica compiler and symbolic analyzer, as well as powerful postprocessors for curves, schema, and 3D animation. Engineering tools, such as experiment design and multi-object optimization, are included in MWorks. MWorks supports interfaces with CAD, FEM, Matlab/Simulink, and FMI. In particular, it can import general CAD files and FEM modal data into the 3D animation postprocessor, including STL, SAT, HSF, 3DS, DXF, and MNF formats. It is also convenient to customize and expand MWorks through C/C++ interfaces, COM components, and Python scripts. For more details, please visit the Modelica official website: https://modelica.org/tools/

We apologize for the inconvenience, but Suzhou Tongyuan’s English website is still under construction. However, you can download MWORKS.Sysplorer from the Suzhou Tongyuan Chinese website. We have provided MWORKS.Sysplorer with English support. After installation, you can set the software language to English through the menu “Tool” → “Language”. Download link: https://www.tongyuan.cc/download

You are welcome to try it out, and if needed, please contact us for a trial license.

For your questions:

Yes.

Not at all. You can find details in the Modelica official site mentioned above.

Hello, thanks for your hard work. And we have tested your compiler in our own julia code, but we have encountered some errors. And our error displays:

scc WEM3D.jl -o libWEM3D.dll --mode shared --allow-dynamic --bundle
ERROR: LoadError: ArgumentError: Package Base does not have Distributed in its dependencies:
- You may have a partially installed environment. Try `Pkg.instantiate()`
  to ensure all packages in the environment are installed.
- Or, if you have Base checked out for development and have
  added Distributed as a dependency but haven't updated your primary
  environment's manifest file, try `Pkg.resolve()`.
- Otherwise you may need to report an issue with Base
Stacktrace:
 [1] macro expansion
   @ .\loading.jl:1634 [inlined]
 [2] macro expansion
   @ .\lock.jl:267 [inlined]
 [3] require(into::Module, mod::Symbol)
   @ Base .\loading.jl:1611
 [4] include
   @ .\Base.jl:457 [inlined]
 [5] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::String)
   @ Base .\loading.jl:2049
 [6] top-level scope
   @ stdin:3
in expression starting at D:\julia_code_w\lib_compiled_test\WEM3D\src\WEM3D.jl:2
in expression starting at stdin:3
System.Exception: Julia error: LoadError: LoadError: Failed to precompile WEM3D [c412bc1b-78d3-4f95-9579-377fcaf123f6] to "C:/Users/Public/TongYuan/.julia\\compiled\\v1.9\\WEM3D\\jl_BB47.tmp".  
Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:35
  [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, keep_loaded_modules::Bool)
    @ Base .\loading.jl:2300
  [3] compilecache
    @ .\loading.jl:2167 [inlined]
  [4] _require(pkg::Base.PkgId, env::String)
    @ Base .\loading.jl:1805
  [5] _require_prelocked(uuidkey::Base.PkgId, env::String)
    @ Base .\loading.jl:1660
  [6] macro expansion
    @ .\loading.jl:1648 [inlined]
  [7] macro expansion
    @ .\lock.jl:267 [inlined]
  [8] require(into::Module, mod::Symbol)
    @ Base .\loading.jl:1611
  [9] include(fname::String)
    @ Base.MainInclude .\client.jl:478
 [10] top-level scope
    @ D:\julia_code_w\lib_compiled_test\WEM3D\src\WEM3D.jl:7
 [11] include(fname::String)
    @ Base.MainInclude .\client.jl:478
 [12] JLCallImpl(out::Ptr{TyJuliaCAPI.JV}, funcProxy::TyJuliaCAPI.JV, argProxies::TyJuliaCAPI.TyList{TyJuliaCAPI.JV}, kwargProxies::TyJuliaCAPI.TyList{TyJuliaCAPI.TyTuple{TyJuliaCAPI.JSym, TyJuliaCAPI.JV}}, dotcall::Bool)
    @ TyJuliaCAPI C:\Users\Public\TongYuan\.julia\packages\TyJuliaCAPI\nIboO\src\core.jl:120     
 [13] JLCall(out::Ptr{TyJuliaCAPI.JV}, funcProxy::TyJuliaCAPI.JV, argProxies::TyJuliaCAPI.TyList{TyJuliaCAPI.JV}, kwargProxies::TyJuliaCAPI.TyList{TyJuliaCAPI.TyTuple{TyJuliaCAPI.JSym, TyJuliaCAPI.JV}})
    @ TyJuliaCAPI C:\Users\Public\TongYuan\.julia\packages\TyJuliaCAPI\nIboO\src\core.jl:147     
in expression starting at D:\julia_code_w\lib_compiled_test\WEM3D\src\CSEMFwdSolver\CSEMFwdSolver.jl:14
in expression starting at D:\julia_code_w\lib_compiled_test\WEM3D\src\WEM3D.jl:7
   at Syslab.Common.JV.juliaCallEx(JV[], Dictionary`2, Boolean) + 0x1fe
   at Syslab.Compiler.EntryPoint.MainImpl(EntryPoint.Options, CmdParser`1) + 0x209
   at Syslab.Compiler.EntryPoint.Main(String[]) + 0x43
# our own package uses the Distributed package
using Distributed
using DistributedArrays

Our own written package uses the Distributed package for parallel computing, does this error means your compiler haven’t prepared well for the Distributed package?

2 Likes

Distributed.jl is not supported yet and might not get supported in the near future.
I think you need start a real Julia process for using Distributed.jl.
The AOT compiler might help with compiling the computation logics into shared libraries and reducing runtime dependencies.
If you talk more about your concrete use case, I may give useful suggestions (e.g., if it is fine for your program to use C/C++ solution for multi-processing instead of Distributed.jl, you can then call shared libraries compiled with the Julia AOT compiler).

1 Like