Compile Time Dictionary

HashBrown · May 30, 2022, 1:37pm

Suppose I have a function that looks somewhat like the following:

foo(x::T) where {T} = begin
    v1 = lookup(T) # Lookup some value specified by type T
    # Perform work with v1
    # Return
end

My question is can I compile away the lookup because a given type T results in a specific value so when specializing on type T the compiler can just hardcode the value? If so what is the proper way to do it?

Oscar_Smith · May 30, 2022, 1:39pm

I don’t think you can get significantly better than an IDDict.

Sukera · May 30, 2022, 1:49pm

If you know that you’ll only ever put in types or other compile time constants, you can dispatch on it to return your specific constants. If they only depend on your types though, you could just as well precompute them.

Do you have a specific usecase in mind?

Henrique_Becker · May 30, 2022, 2:01pm

If the lookup returns always the same object for the same type this will happen naturally. You just need to be aware of when Julia avoids specializing.

HashBrown · May 30, 2022, 2:16pm

In my use case the type T is also a parametric type which further might be parameterized by other parametric types. Writing a dispatch function for all of them would be too tedious

HashBrown · May 30, 2022, 2:18pm

Yeah this is what my use case would demand. Is there a way to verify the specialization is happening?

Sukera · May 30, 2022, 2:21pm

Note that a lookup in some global dictionary or otherwise global constant will in general not compile away - if the object is mutable (like a Dict is), there’s no guarantee that its elements don’t change. Using dispatch to get around that doesn’t have that issue.

Is something like this an option?

for T in (types,...)
    @eval lookup(::Type{T}) = $(some_computation(T))
end

This will do the computation once for the T you’re interested in and define the lookup methods for you. For more details, I think I’d need more information about what your types look like.

HashBrown · May 30, 2022, 3:04pm

Hmm, that actually could be an option. So define the methods I need for dispatch dynamically? Coming from any other language that sounds like bad practice, but is this something more standard in Julia?

Henrique_Becker · May 30, 2022, 3:33pm

Metaprogramming should always be approached with care, because it is often harder to debug and to mantain. However, sometimes, if the alternative is writing a lot of boilerplate code, the metaprogramming can end up being more maintainable.

Henrique_Becker · May 30, 2022, 3:35pm

I think the @code_native should be able to show you if the value is inlined or not? Sorry, I do not go down to this level often.

Sukera · May 30, 2022, 3:41pm

Yes, in a sense. I don’t usually recommend doing this (avoiding eval and all that), except in cases like this where it will save you a lot of boilerplate. As long as you’ve defined those methods ahead of actually using them, it should be fine.

You may have to play a little with it to interpolate T into the expression as well, but other than that this should work just fine. Still, some more info about what the context surrounding your question is would be nice - if possible I’d like to avoid an XY-Problem.

Either way, there currently is no mechanism to guarantee that anything will be inlined or completely eliminated by the compiler at compile time. Usually though a single call is rather inexpensive, and if the called function is small it will probably be inlined anway.

HashBrown · May 30, 2022, 4:00pm

The use case is for creating a DAG workflow. The DAG consists of nodes which is an abstract type extended by many concrete mutable structs. The type of each node in the DAG is unique (the behavior of the node is entirely encoded by the type and its type parameters).

Each node contains a val and the type of val is a type parameter for many of the nodes (could be fixed in which case it is not). The nodes are mutable because in each loop of an event loop the nodes’ val are updated according to a calc function that is dispatched based on the nodes type.

There are some nodes which are meta nodes. That is their val is a description of the val of another node. For example you may want to keep track of the max value ever taken by another node. This Max node is thus parameterized by the node type it wishes to keep track of the max of (recall that the type of every node is unique in the DAG).

The calc method for the Max node will need to lookup the val of the node that the Max node is computed over. Rather than perform a lookup in a dictionary (I am very interested in extracting as much performance as possible, hence the heavy use of type parameterizations) I am looking to intern the memory location of the relevant dependency val right in a calc function that is specialized for this particular Max node.

I want to write only one calc method that handles all Max nodes (there may be more than one Max node since you might want to track the max of multiple nodes) that specializes based on the exact parameterization of the Max type (that is each Max node in the DAG should get its own specialized method without having to resort to dispatch). It should have direct access to its dependency without having to go through an intermediary.

In general once the DAG is created and all nodes clocked by the event loop at least once each node should have a fully specialized code path dedicated to its exact type that is entered when calc is called on it. There will be an Engine which is parameterized by a tuple type consisting of the types of all nodes that will construct the DAG. The Engine is what is actually responsible for calling calc on each node.

Based on your suggestion I am thinking I can insert the creation of the lookup methods in the constructor of the Engine or the run method that takes in an Engine

Is this how specialization is implemented in Julia? When I call a polymorphic function with a specific type does it just generate a new method that will be called by dispatch in the future when provided with the same type?

Henrique_Becker · May 30, 2022, 4:14pm

Methods and specializations are two different things. Each function body with a distinct signature (i.e., number/type of parameters) but the same name is a method. However, a method can be generic (as you did not specify each parameter as having a concrete type), and then when it is called with some specific set of objects as parameters it will generate a specialization for that method. The programmer in general do not interact with specializations directly, they are done on the fly when methods are called. So the hierarchy is function → method → specialization and each function can have multiple methods and each method multiple specializations. If you have a function with a single method and the method has every parameter annotated with a concrete type, then there is only one function, one method, and one specialization.

HashBrown · May 30, 2022, 4:19pm

I understand that (perhaps I did not demonstrate it as concisely as you did)

calc is a function. The function will have a polymorphic method for each node type (ignoring the parameterization of the type). And then each method will have a specialization for each parameterization of said type.

The problem at hand is that I would like to intern the memory location of the val dependencies directly into the specialization which should be possible because the dependencies of a node are given in its type parameterization. As opposed to use a dictionary that all nodes share for lookup of their specific dependencies (lookup is not specialized in this case)

Note that in my use case calc will be called on each node on the scale of ten million times and there could be a 1000 nodes, most with 1-2 direct dependencies. Hence the desire to specialize as much as possible.

Henrique_Becker · May 30, 2022, 4:28pm

I really do not understand why you mean by intern the memory location of the val dependencies directly into the specialization. You mean to inline the memory location of a val? (what exactly is this val? you mean an object of type Val? or this can be any value?)

HashBrown · May 30, 2022, 4:31pm

Every node in the DAG has a field val. A node may have dependencies it needs to lookup in order to perform its calc. However the calc method could be specialized to not have to perform the lookup for the dependencies’ val since what its dependencies are are specified in its type parameterization, and the concrete type of each node in the DAG is unique.

HashBrown · May 30, 2022, 4:35pm

For example I could have two max nodes m1 and m2. And two other nodes n1 and n2. The val of m1 should be the max value that the val of n1 ever took. Similarly for m2 and n2. m1 and m2 will have different types. The type of m1 will be parameterized by the type of n1, and the type of m2 will be parameterized by the type of n2 (although both are still Max nodes).

There is a calc method that operates on Max nodes. That is both m1 and m2 would be dispatched to this method. However it could be specialized since m1 and m2 track the max of different nodes which are in turn of different concrete types.

Sukera · May 30, 2022, 4:41pm

Let’s take a step back. It sounds like you’re trying to lift all computation over your DAG into the type domain, which is probably not a good idea, if the number of unique types you have is as large as you say. The compiler/type machinery is not designed for general computation.

Based on this, I think you’ve misunderstood what a “method” and a “specialization” in julia is (and I think that’s why @Henrique_Becker wrote their explanation). In julia, a function is just a name. A method is a concrete implementation of that function for some (possibly parametrized) typed arguments. A specialization is a concrete compiled version of a specific method for some concrete types.

Never in this chain does anything have access to fields of any runtime objects - only when the code is actually run later on does the memory backing a given object exist in the first place.

No, that’s not what I was thinking of when I made that suggestion. In fact, I’d suggest not relying on type domain computations for your problem at all - they will become a bottleneck, due to having to compile a new method for each individual object. This will greatly increase compile time and may not even eliminate all dispatch like you hope to.

Like I said above, at the point where types exist, values don’t. The core difficulty of your problem is the DAG reduction (which sounds like a symbolic execution engine to me?) – you can’t compile that away.

In general, if you find yourself wanting to create (tens of) thousands of specializations in quick succession, you will very quickly run into the reality of having to compile those specializations. At that point, keeping your code a bit more generic and taking another approach is probably a good idea (though admittedly I’d have to think more about what approach would be fastest here and what sort of problems your exact code has - as a first guess, eating the dispatch cost and not parametrizing Max on its value will probably be beneficial).

HashBrown · May 30, 2022, 4:46pm

The computation for the val of each node is kicked off by recieving a message. These messages are spaced out in time over the whole day, the critical aspect is to respond as quickly as possible. Biting the bullet on one compilation (even if it took minutes) would be fine for me because I could just warm up the engine prior to receiving the actual messages.

Hence I think I would rather eat the compilation cost and not the cost of not specializing everything that can be specialized. Keeping the structure of the DAG in the type domain rather than as a runtime object.

Sukera · May 30, 2022, 4:54pm

Does that message potentially result in a new parametrized Max? If so, you will have to eat specialization, i.e. new compilation cost. If not, why bother with types in the first place? You could just as well actually have a proper precomputed, constant lookup table that will certainly be faster than dispatching.

The process of parsing your message and constructing the object to then pass to lookup sounds inherently dynamic and will be type unstable - resulting in dynamic dispatch yet again, no matter whether you’ve conditioned your engine to compile the methods ahead of time. You could have saved the dispatch time by using the message you get as a key in a precomputed Dict or similar directly, which is just a regular memoization technique.

Maybe I’m misunderstanding though.

Topic		Replies	Views
Compile-time recognition of parametric types e.g. K{T} for any K General Usage	3	425	May 23, 2020
Partial parametric methods? General Usage	5	382	September 19, 2019
Efficient custom dict for mostly same types General Usage	9	607	January 6, 2019
A question about parametric types General Usage	13	736	July 4, 2018
Dispach on parametric types New to Julia question , parametric-types , dispatch	3	418	January 26, 2020

Compile Time Dictionary

Related topics