[ANN]: WrappedUnions.jl: Wrap a Union for enhanced type-stability

Hi all,

I’d like to announce the package WrappedUnions.jl. This package originates by my better understanding of type instabilities caused by using union of types and offers a better interface than my previous attempt LightSumTypes.jl.

I think that its interface is the most Julian across all packages which work with sumtypes, and it’s actually also very powerful, it can easily accomodates a lot of use cases where one wants to force the compiler to emit type-stable code. I think that it could also be very useful for trimming pipelines e.g the one described in this post: [Guide] Using sum types to define dynamic C APIs for Julia 1.12 `--trim`.

Two main macros provide the backbone of this package: @wrapped and @unionsplit. The first accepts any parametric struct which has a single fields union::Union and whose abstract type is a subtype of WrappedUnion, but, apart from that, it supports any standard struct feature as e.g. inner constructors. @unionsplit instead automatically executes a function performing union-splitting on the wrapped union arguments to make the call type-stable.

I give you a little example of its main features. Here we split automatically the sum function so that the code is inferred correctly

julia> using WrappedUnions, Test, BenchmarkTools

julia> xs_nowrap = (false, 1, [true, false], [1,2]) # what happens if no wrapping is used?
(false, 1, Bool[1, 0], [1, 2])

julia> @inferred (xs -> sum.(xs))(xs_nowrap);
ERROR: return type NTuple{4, Int64} does not match inferred return type NTuple{4, Any}
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] top-level scope
   @ REPL[3]:1

julia> @btime sum.($xs_nowrap)
  136.393 ns (5 allocations: 240 bytes)
(0, 1, 1, 3)

julia> @wrapped struct X
           union::Union{Bool, Int, Vector{Bool}, Vector{Int}}
       end

julia> xs = (X(false), X(1), X([true, false]), X([1,2]))
(X(false), X(1), X(Bool[1, 0]), X([1, 2]))

julia> splitsum(x) = @unionsplit sum(x)
splitsum (generic function with 1 method)

julia> @inferred (xs -> splitsum.(xs))(xs); # no error!

julia> @btime splitsum.($xs)
  8.494 ns (0 allocations: 0 bytes)
(0, 1, 1, 3)

Let’s do something more fancy: what if the output is type unstable? Another wrapping comes to our rescue:

julia> @btime prod.($xs_nowrap)
  164.485 ns (5 allocations: 240 bytes)
(false, 1, false, 2)

julia> splitprod(x::X) = @unionsplit prod(x)
f (generic function with 2 methods)

julia> @inferred  (xs -> splitprod.(xs))(xs) # not enough
ERROR: return type Tuple{Bool, Int64, Bool, Int64} does not match inferred return type NTuple{4, Union{Bool, Int64}}
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] top-level scope
   @ REPL[20]:1

julia> @btime splitprod.($xs)
  97.827 ns (1 allocation: 48 bytes)
(false, 1, false, 2)

julia> @wrapped struct Y <: WrappedUnion
           union::Union{Bool, Int}
       end

julia> splitprod(x::X) = Y(@unionsplit prod(x));

julia> @inferred (xs -> splitprod.(xs))(xs); # But now it is!

julia> @btime splitprod.($xs)
  14.536 ns (0 allocations: 0 bytes)
(Y(false), Y(1), Y(false), Y(2))

For more information, see the ReadMe of the package.

The package is still in registration so use ]dev https://github.com/Tortar/WrappedUnions.jl.git if you want to try it out.

Let me know if you have any kind of question/suggestion about this approach!

16 Likes

Pretty cool to put the big branch in our hands with @unionsplit. Some feedback:

  • It doesn’t take base special syntax like property access, indexing, or broadcasting, and unlike LightSumTypes, the former 2 are not automatically implemented for WrappedUnions sum type implementation.
  • The tuple broadcasting instability example is more of an artifact of the broadcasting implementation. As far as I can tell, it boils down to an ntuple call that loops sum(xs_nowrap[i]), which is an inherently type-unstable expression. On the other hand, the explicit broadcast higher-order function is type-stable:
julia> @inferred broadcast(sum, xs_nowrap)
(0, 1, 1, 3)

because it forwards to map, which is currently implemented to unroll up to 31 inlined calls from that loop into the tuple. We can adjust the example to exceed map’s tuple limit:

julia> xs = Tuple(Iterators.cycle((false, 1, [true, false], [1,2]), 9));

julia> @inferred (x -> sum.(x))(xs);
ERROR: return type NTuple{36, Int64} does not match inferred return type NTuple{36, Any}
Stacktrace:
 [1] error(s::String)
   @ Base .\error.jl:35
 [2] top-level scope
   @ REPL[180]:1

julia> @inferred broadcast(sum, xs);
ERROR: return type NTuple{36, Int64} does not match inferred return type Tuple
Stacktrace:
 [1] error(s::String)
   @ Base .\error.jl:35
 [2] top-level scope
   @ REPL[181]:1

julia> xs2 = Tuple(X(el) for el in xs);

julia> @inferred (x -> splitsum.(x))(xs2); # passed
1 Like

Thanks for fixing my example!

It doesn’t take base special syntax like property access, indexing, or broadcasting, and unlike LightSumTypes, the former 2 are not automatically implemented for WrappedUnions sum type implementation.

I made this choice on purpose so that the user defines what those functions need to return or not define them at all if they don’t need them, is your feedback negative or positive on this? I can’t tell from your wording

To clarify, I think it’d be great if I could write something like @unionsplit composite.property instead of manually lowering to @unionsplit getproperty(composite, :property). Broadcasting actually doesn’t lower to one convenient call, and as demonstrated earlier, broadcast is implemented a bit separately.

It appears that the call restriction is due to passing the callable and the arguments to a generated function unionsplit that makes the if-elseif type branch. Is there a reason for avoiding a direct macro expansion in that branch at @unionsplit’s call site? It’d allow an arbitrary input expression to duplicate across the branches. The downside would be each call site always inserting that big branch, whereas the current generated function approach would “only” contribute that big branch per unionsplit call signature; however, a user could be warned to avoid repeated macro calls by calling helper functions that contain the @unionsplit call site, similar to the function expressions made by LightSumTypes.@sumtype.

2 Likes

It appears that the call restriction is due to passing the callable and the arguments to a generated function unionsplit that makes the if-elseif type branch. Is there a reason for avoiding a direct macro expansion in that branch at @unionsplit’s call site? It’d allow an arbitrary input expression to duplicate across the branches. The downside would be each call site always inserting that big branch, whereas the current generated function approach would “only” contribute that big branch per unionsplit call signature; however, a user could be warned to avoid repeated macro calls by calling helper functions that contain the @unionsplit call site, similar to the function expressions made by LightSumTypes.@sumtype.

Not sure I’m getting this, without the generated approach how would you envision this would work? Because It seems to me that I need the types of the arguments to split into the branches which are only available at @generated time, particularly because @unionsplit supports functions with multiple arguments, mixing standard types with wrapped types at will.

1 Like

I see now, WrappedUnions.unionsplit detects WrappedUnion sum type arguments at compile-time and constructs the nested branches checking each of those arguments. That’s indeed not possible at macro expansion when we have unevaluated symbols and expressions. LightSumTypes.@sumtype didn’t need generated functions to make the single-level branches because you manually specified that one argument v, and the symbols for the component types were input at the @sumtype call, whereas @unionsplit is separated from @wrapped.

It’s technically possible to make the correct branches separated from @wrapped, but it’d be more writing, like a traditional match statement e.g. MLStyle.@match minus the differing branch expressions. The user would have to specify the arguments involved with sum types separately from a more complex branch expression that might repeat the variables. Match statements tend to manually specify the component types, but that’s more difficult for multiple sum type arguments, and it’s not necessary for duplicate branches trying to restore type-stability. Instead, the sum type’s name can be specified in let’s say @unionsplit2, and if the @wrapped expression has already been evaluated (warn against putting @wrapped and @unionsplit2 calls in the same top-level expression like a begin block), eval can get the type and thus its component types. Note that eval in macros is discouraged, but it is done rarely e.g. StaticArrays.@SVector and would be a bit safer here given the type should exist in the global scope. Ultimately, it could look something like @unionsplit2 (x::X, z::Z) f.(x, y, z) + x.time - z[]. Obviously, that’s more to manually handle and write than @unionsplit g(x, y, z).

1 Like

Ah now I get what I mean, but still I don’t see many advantages in this approach because as far as I understand it will still need compiling the big branch expression, do you think there are advantages to do this in a macro rather than a generated function?

I’m pretty much just thinking about arbitrary expressions because the function call restriction is quite steep. As we both pointed out, both a direct macro expansion to the branch and the generated function’s branch are both metaprogramming approaches; the former just happens at every macro call site while the generated function makes the branch for each call signature (so fewer times actually). Either way, a big type-stabilizing branch may not even help performance at some scale, that would need some testing and maybe spur some alternative implementations closer to vtables holding FunctionWrappers.

After staring at my last comment’s last macro calls though, I’m now thinking that the function call restriction could be worked around with nested function expressions e.g. @unionsplit ((x, y, z) -> f.(x, y, z) + x.time - z[])(x, y, z). It could use extra metaprogramming to look a bit nicer, maybe another macro that takes similar work to the @unionsplit2 call i.e. @unionsplit3 (x,y,z) f.(x, y, z) + x.time - z[]. It’s not the same as a directly expanded if branch because of the introduced local scope, but that can be either documented or even hinted at by mandating a function definition, which can be easily introduced to this example with an extra ->. You can allow @unionsplit have this different behavior specifically for input function definitions.

Small aside I probably should’ve mentioned earlier, but while we’re on the topic of macro call conveniences, some of the @wrapped call is redundant because it mandates the struct, WrappedUnion subtyping, and single union::Union{...} field. Could probably simplify that macro call to the essentials, like @wrapped [mutable] X [<: SubTypeWrappedUnion] (Bool, Int, Vector{Bool}, Vector{Int}), with [] indicating optional inputs.

1 Like

Either way, a big type-stabilizing branch may not even help performance at some scale, that would need some testing and maybe spur some alternative implementations closer to vtables holding FunctionWrappers.

Yes, it would be interesting to know when, do you have suggestions for a benchmark? Maybe just adding more and more types vs. runtime dispatch? Though I think it should be always faster to be honest (until compilation time doesn’t explode)

I’m now thinking that the function call restriction …

Is this really a big restriction though? Let’s say you have a function for you unwrapped arguments f(args...), then you just need to create a new one so that you do f2(args...) = @unionsplit f(args...) and use f2 instead of f.

Small aside I probably should’ve mentioned earlier, but while we’re on the topic of macro call conveniences, some of the @wrapped call is redundant because it mandates the struct, WrappedUnion subtyping, and single union::Union{...} field. Could probably simplify that macro call to the essentials, like @wrapped [mutable] X [<: SubTypeWrappedUnion] (Bool, Int, Vector{Bool}, Vector{Int}), with [] indicating optional inputs.

There are still inner constructors though which can be defined. I think that the standard struct syntax even if restricted to a single field union::Union should be better because of this. Also, because it’s standard syntax in the end, the struct is just validated by the macro (along with other small things added to the block) which makes the code shorter.

1 Like

Initially, I was under the impression that f2 would have to be defined completely separately in the global scope, and that would be inconvenient scoping. However, I eventually realized the function can be defined in the same scope and can capture any outer local variables, though putting code with many local variables assignments into a function would take a bit more adjustment to make the return statement and multiple assignment. Might be worth adding that tip to the docstrings, since at least one person (me) got stuck for a while.

A @unionsplit option to take an input definition and automatically call it for arguments of the same name would just be a convenience, not a necessity. If the function had multiple methods, I would actively avoid that convenience and the implication that the input method is specially invoked. Actually, the current @unionsplit usage wouldn’t even be that important, unionsplit(f, (args...,)) is already easy to edit from a function call f(args...,) and idiomatic e.g. map(f, args...), sum(f, args...), etc. I suppose a route for serious improvement could be involving keyword arguments without users having to manually make a new nested function to move them to positional ones.

Good point, worth adding one to the example if you can.

I don’t have any good ideas, but yeah, using this package in practice and benchmarking against plain runtime dispatch should go a long way. A good benchmark would be a call that doesn’t do much but won’t be optimized away, then any additional runtime over the baseline call should just be the dispatch mechanism.

1 Like

yeah, will add all your suggestions as soon as I find the time, thanks for your help! I will also post the benchmark against runtime dispatch here, let’s see what happens :slight_smile:

Right, keyword arguments, forgot about that when writing @unionsplit, shouldn’t be too hard to add (I suppose)