Swizzling the super type of a foreign Julia type (or: how evil must I be?)

Hi,

so this is a super low-level question for the people who know about or are working on the Julia kernel.

tl;dr

I need/want to create a “foreign type” MPtr (via the C function jl_new_foreign_type) with no super type, and then for reasons explained below, later modify it by swizzling its super pointer to point at some abstract type. Before this swizzling happens, the only part of the Julia kernel that is “made aware” of MPtr and allocations with it as type are the Julia GC, and of course the code creating the type; but e.g. no Julia functions or methods involving MPtr in their signature are ever declared before this (so code involving method dispatch should never have seen that type, to my understanding). Of course this is somewhat evil, but in a prototype, it seems to work perfectly well.

Questions:

  1. Is this actually working? Or am just fooled into it by not yet having tested the right things? In other words, can any Julia kernel expert think of ways this could, say, corrupt internal data structures? I tried to trace everything jl_new_foreign_type (and code it calls, including jl_new_datatype) does, and my impression was that actually no reference to that new type is retained by the Julia kernel, but I may easily have missed something. I couldn’t find anything that would seem to care about this isolated type which is never passed to any Julia code beyond the GC. But I may easily have missed lots of things sigh. Anyway, if I am right, what I am proposing above should be fine, no?

  2. How likely is this to keep working? Sorry, I know that’s rather vague; perhaps it’d be better phrased as: Can you think of any mechanism that might at some point be introduced (e.g. as an optimization) that could break this?

  3. Can anybody think of an alternative that is less evil? For this you’ll need to understand the actual problem I have to solve, and so I am afraid you’ll have to read my ramblings below to answer it.

For point 2, I was wondering whether for example Julia might introduce (or already has!) a list of “all types without super type” (I don’t have a clue why it might do that; perhaps for some clever optimizations?). If that was the case, then of course my hack would break this invariant. But my hope is that this won’t happen, or that perhaps at least for “foreign types”, an exception could be made (so that e.g. they are not added to that list-of-types-with-supertypes). We’d of course be willing to contribute patches to the Julia kernel for any such thing; but this hinges on the questions on (a) whether it would even be possible w/o hurting something, and (b) whether it would be acceptable in general.


Long version:

Some background

This question is motivated by our work on GAP.jl which is an interface between Julia and the GAP computer algebra system, as part of the OSCAR project. To enable this interface, we modified GAP to be able to use the Julia garbage collector (GC) instead of its own GC. The super nice Julia team merged some low-level patches from us into Julia 1.1 to make that possible; in particular the code in julia_gcext.h and the notion of a “foreign type” (injected into the Julia runtime via the C function jl_new_foreign_type). That allows us to declare a few low-level types that we need to make things work; the most important one of these, and the only only visible to regular users, is ForeignGAP.MPtr, or MPtr for short. Objects of this type are exposed from GAP to Julia.

One other important point: this is actually a bidirectional integration. It can be used in two ways (or three, depending on how you count, but I’ll focus on the two relevant ones)

  1. To access GAP from Julia: using GAP launches the GAP interpreter, which then among other things injects the foreign type MPtr into the Julia runtime; finally, GAP.jl loads the GAP package JuliaInterface which provides a few further C level functions to complete the interface between the two systems
  2. To access Julia from GAP: You start GAP (compiled against Julia), which during its startup very early on also initializes Julia (via jl_init) – it has to, because it uses the Julia GC. It then also injects the MPtr type early on
    • at this point, the user might stop, and not interact with Julia further
    • or you can load the GAP package JuliaInterface to get full access to all Julia features, packages etc. That package during its startup detects that GAP.jl is not yet loaded in Julia, and loads it.

Summary

There are two possible sequences in which things get loaded and initialized:

  1. Julia -> GAP.jl -> GAP -> GAP creates MPtr type -> JuliaInterface
  2. GAP -> Julia -> GAP creates MPtr type -> JuliaInterface -> GAP.jl

What’s the original problem?

Our Julia code in GAP.jl needs to interact with GAP objects of type MPtr. But in scenario 1, that type is not yet known to Julia at the time it tries to load/(pre)compile GAP.jl: after all, MPtr only gets injected when GAP is initialized, which is done by GAP.jl’s __init__ function – but that’s not yet been run, as we are trying to (pre)compile GAP.jl. Boom. Hence, no references to MPtr are allowed in the GAP.jl Julia code, other than in purely dynamic constructs (we have one use of Base.MainInclude.eval(:(ForeignGAP.MPtr)) in __init__ right now, but that’s just for compatibility with Julia < 1.3, so once we switch to requiring Julia 1.3, it can go).

How did we achieve this? Well, we introduced an empty abstract type GapObj and then used that as super type for MPtr. This way, our Julia code can reference GapObj instead of MPtr, and be precompiled. Easy peasy.

Except there is also scenario 2 to consider… Our new sequences of initialization look like this:

  1. Julia -> GAP.jl -> GAP.jl creates GapObj type -> GAP -> GAP creates MPtr type with super type GapObj -> JuliaInterface
  2. GAP -> Julia -> GAP creates MPtr type -> JuliaInterface -> GAP.jl -> GAP.jl creates GapObj type

Ooops: MPtr is created before GapObj. Can’t have a type that does not yet exist as super type, can we? And we can’t fix this, because we are rather restricted on the order things are loaded: We must have:

  • GAP before JuliaInterface
  • Julia before GAP.jl
  • JuliaInterface must either be loaded before GAP.jl, or during initialization of GAP.jl
  • but GAP cannot load packages like JuliaInterface before it has fully initialized its memory manager, and that already requires the type MPtr.

Our current “solution” (and the new problems it causes)

So to overcome this, we decided to try and break the cycle, by introducing a tiny Julia package GAPTypes.jl which basically just consists of the definition of the type GapObj. With that, we get these initialization sequences:

  1. Julia -> GAPTypes.jl (loaded as dep of GAP.jl) -> GAP.jl -> GAP -> GAP creates MPtr type with super type GAPTypes.GapObj -> JuliaInterface
  2. GAP -> Julia -> GAPTypes.jl (loaded by the GAP kernel) -> GAP creates MPtr type with super type GAPTypes.GapObj -> JuliaInterface -> GAP.jl

OK, problem solved, right? Well, yes if all packages in Julia were always loaded into the same global namespace. But as you folks know better than me, that’s not the case; there is in general a difference between the GAPTypes.jl loaded on the global level (e.g. triggered by the GAP kernel) vs. one loaded as dependency of a package like GAP.jl. In fact GAPTypes.jl might not even be installed in the global Julia environment, meaning sequence 2 above could fail. To workaround this, we did some pretty evil things (please don’t hate on me for this, I never liked this, it was simply a quick & dirty hack to get things working now, not meant as a permanent solution), namely GAP actually installs GAPTypes.jl into the global environment during its compile time (yes, I know, this fragile and bad for many reasons sigh), and also GAP.jl in its deps/build.jl tries to do that (yup, yup, gross, nasty, evil – please don’t tar & feather us :frowning_face: we want to reform our sinning ways, that’s why I am writing this post)

What else can we do?

Finally we arrived at the idea described at the top: we drop GAPTypes.jl again, and modify sequence 2 as follows: when GAP starts, it creates the Julia foreign type MPtr with no super type. Then once GAP.jl loads and created the abstract type GapObj, it uses a ccall to notify the GAP kernel about the new type; the GAP kernel then swizzles the super pointer of MPtr to point as GapObj. Prior to this, no MPtr instances created in the GAP kernel was ever visible to the Julia language level; only to the “plumbing” (the GC, and code creating datatypes).

Of course this is still evil, but I think still considerably less evil then messing with the global environment and stuff (but see questions 1 and 2 at the top of this post). That said, if anybody has suggestions how we could solve our problem in a different way, I am all ears (that’s my question 3 at the top).

3 Likes

This sounds like an interesting problem and I wish I could help you answer your question directly.

Maybe just as a sanity check – what was the reason for needing to implement this foreign type back when you did it? For example, compared to the GMP library integration (where the Julia GC is also injected into the library), what makes GAP different?

If you don’t mind me mentioning this – As an outside observer, I get a small feeling that there’s maybe a sunk cost fallacy associated to this particular solution path. If you need scary hacks (or if you have questions that only a few oversubscribed people can answer), that’s a good moment to revisit your earlier trade-offs and see if they still look reasonable.

1 Like

I wish it were that simple, but no, this isn’t a case of “sunk cost fallacy”.

The situation for GAP and GMP are radically different:

  • GMP is “just” a library (a very nice and useful one); GAP is a full-blown language with a large library and its own package ecosystem, and can be used as a standalone program
  • GMP essentially has one kind of allocations / objects: bigints. GAP has a complex type system of its own, with bigints just one of them (in fact, GAP used GMP itself)
  • GMP is built around a simple allocation model, with explicit allocations and deallocations; GAP was built around a stack scanning garbage collector, and its kernel relies on that; so in order to be able to use the Julia GC, we either had to rewrite the whole GAP kernel (not a realistic option), or patch Julia to allow us to install various hooks to be called during garbage collection, and to create a special kind of “foreign” types, with special features to support our usecase.

So, yeah, I am pretty sure we can’t just do “something simpler”. And it is unfortunate we have to do this low level messing around, but there’s no realistic alternative way for our highly specialized needs… but I think the fact that we managed to convince the Julia core devs to accept our patches to add this stuff in the first places gives at least some indication that we are not completely nuts :slight_smile:

2 Likes