Jlrs v0.19: many performance optimizations

jlrs is a crate for the Rust programming language that lets you embed Julia in Rust applications and export Rust functions and types to Julia. Version 0.19 puts a major focus on performance, which has led to the introduction of several new features. This release is also compatible with the recent first beta of Julia 1.10, except on Windows.

Fast TLS

Julia’s fast TLS is enabled when a runtime feature is selected. Libraries that are loaded by Julia, e.g. rustfft-jl, must not enable any runtime features. In order for this to function correctly applications that embed Julia must be compiled with the following flag: -Clink-args=-Wl,-export-dynamic.

Local scopes

Local scopes with the local targets LocalGcFrame, LocalOutput, and LocalReusableSlot have been added. Local scopes are very similar to the dynamic scopes and targets that have existed for a while with a few key differences: a LocalGcFrame has a constant size and is allocated on the stack. Local scopes can be created using arbitrary targets, thanks to this, functions that used to take an ExtendedTarget now take a Target and create a local scope.

Overall, dynamically-sized scopes and constantly-sized local scopes have similar performance characteristics, with one major exception: creating the “first” dynamic scope and resizing its backing storage is expensive. For embedders this shouldn’t be a major problem because this happens when Julia is initialized, but for exporters it can be a major issue if an exported function creates a dynamic scope. The cost associated with creating an initial dynamic scope must be paid every time the function is called. For this reason it’s strongly recommended that exporters limit themselves to using local scopes.

Caches

Symbols and types are cached after they’ve been constructed. A global in a module can also be cached by calling Module::typed_global_cached but you have to ensure that the cached global is never replaced by other data in Julia. These caches are typically implemented as HashMaps protected with an RwLock, the hash functions that are used have been selected with care based on benchmarks. If a type has no type parameters it’s cached in a separate, local static variable rather than the global cache to avoid the locking-and-looking overhead.

GC-safe synchronization primitives

While the caches mentioned above are great for performance there is a pretty big issue: what happens if a number of threads are waiting for access to a cache and the thread that holds the lock starts waiting for garbage to be collected? The answer is, unsurprisingly, that the whole process grinds to a halt because we’ve reached a deadlock; the cache won’t be unlocked until garbage has been collected, and garbage can’t be collected while those threads are waiting for the cache to be unlocked. To prevent this deadlocj from happening, wrappers around RwLock, Mutex and FairMutex from parking-lot, and the sync version of OnceCell from once-cell have been added which allow for garbage to be collected while they block. The latter is used rather than OnceLock from the standard library to avoid having to increase the minimum supported version of Rust, which is still 1.65.

Catching exceptions

Exceptions can be caught with catch_exceptions which takes two arguments: a closure which is called in a try-block, and another closure that is called on exceptions with the caught one.

This is probably the single most dangerous function you can use in jlrs today due to the way how exceptions work in Julia: setjmp and longjmp. When an exception is thrown longjmp is called and control flow jumps to the last place setjmp has been called, if there is any cleanup code associated with a Rust function that is jumped over it won’t be called. To what degree this is sound is not completely clear to me, but from what I can tell it should be fine if the stack frames of all Rust functions that are jumped over are Plain Old Frames (POFs), which can be trivially deallocated because they have no cleanup code associated with them. A stack frame is a POF if there are no pending drops, so you must ensure that there are no pending drops before calling a function that might throw.

If you do depend on some data that implements Drop and use it across a function that may throw, you can attach a parachute if that data has no lifetimes and is thread-safe by calling AttachParachute::attach_parachute. This moves ownership of the data to Julia and lets the GC clean it up.

The advantage of catch_exceptions over functions that catch exceptions is that if you need to call multiple functions that may throw, you only create one try-catch block rather than one for every function you call and creating a try-catch block is quite expensive. It’s fine to create local scopes in the fallible closure and jumping out of them, they don’t depend on Drop, jumping out of dynamic scopes is unsound.

Export improvements

Exported methods and functions can now be annotated with #[gc_safe] and #[untracked_self]. The first can be used by long-running functions that don’t need to call into Julia to allow the GC to collect garbage while that function is being called. The second one only affects methods that take self in some way, when it’s used the self parameter is not tracked before it’s accessed. While tracking is useful to enforce that Rust’s borrowing rules are respected, it is quite expensive and depends on Drop so it’s nice to have the option to skip it.

Exported types with type parameters can be created by implementing ParametricVariant and ParametricBase. You can export the implementations for each type you care about by iterating over an array types, this also applies to exported methods and functions:

julia_module! {
    for T in [f32, f64] {
        for U in [T, i32] {
            struct HasParams<T, U>;
            in HasParams<T, U> fn get(&self) -> T as has_params_get;            
        }

        fn has_generic(t: T) -> T;
    }
}

It’s also possible to throw an exception from an exported function. This used to require creating a RustResult to return the exception to Julia before throwing it to avoid jumping over Rust code. This is no longer necessary, an exported function can return either JlrsResult<T> or Result<T, ValueRet>, if an error is returned it’s converted to a JlrsCore.JlrsError or thrown directly respectively. It’s guaranteed this exception only jumps over POFs, RustResult has been deprecated.

IsBits and HasLayout derive traits

When bindings are generated with JlrsCore.Reflect two new traits are derived when applicable: IsBits and HasLayout. The first indicates that the type is an isbits type if all type parameters that affect its layout as isbits types. The second connects implementations of ConstructType to ValidLayout, it’s only derived for types that have separate layout and type constructor types.

The main purpose of IsBits is that it can be used with Value::new_bits, this is a more generic variation of Value::new that can be used with types that implement IsBits rather than just those which implement IntoJulia which can’t have any type parameters.

Docs.rs
Crate
GitHub

14 Likes