MultiFunctions: Context dispatch and binary cache

TsurHerman · June 10, 2018, 12:12pm

I have been writing this draft for several days , and got a little mixed up verbalising these ideas , which are simpler to understand using code and examples…
If you want to jump straight into code, there is a link at the end.

@Jameson @mbauman
tl;dr
There is a way to make Julia completely binary cacheable(between sessions), so it can become an effective way of developing and distributing .so .dll

Background
Function = a module wide constant symbol ( e.g :sum) without any specifications regarding types of input variables or output

Method = a function definition with upper bounds on types of input argument

MethodTable = each Function has a MethodTable that encapsulates all methods associated with that Function

MethodInstance = each Method may compile to several MethodInstance depending on the input types

Julia lets you define different methods (pieces of code) for the same function(a module wide const symbol) within a module. The method themselves may cover a range of input arguments. each set of runtime arguments gets jited on demand to an actual binary called a MethodInstance.

This in itself is not that flexible since you are limited to only the methods in the module. So Julia lets you
add methods to the method table of functions in other modules.

This fact introduces great flexibility for the programmer and is one of the characteristics of Julia.

So where is the problem?
Since importing a module may change other modules (for example if that module adds a method to another module) , caching the binary MethodInstances of a previous use of a module becomes impossible … or at least very volatile.

For small projects this is ok, code gets jitted on demand, first call to any function is slow, but things get quick
pretty fast.

Moreover parts of Base functions , are “baked” into the sys.so , calling those functions will use the cache
instead of re-jitting. This is a kind of a “cheat” because it is legal to do only for @pure functions … however it is
necessary in order to get the REPL responsive quickly.

For large projects , the time cost of jitting all dependencies can become a show stopper for using Julia.

Oh my that does sound sub-optimal…
It is unfortunatley, but there is a way to make Julia completely cachable

Completely?
yes completely , meaning that as long as you don’t change a module(define a new global name in it) all
the jitted code can be cached , so Julia will become: slow the first you use a function only if it is the first time ever!
Julia can become an effective mean to distribute cross-platform binaries , with source code attached and easily editable … I can imagine the equivalent of huge projects like openCV and tensorFlow.
And it will be more responsive the more you use it.
|
|
|
|

O.K how?
Dispatch according the context of the call.
If module M imports module B and each defines a function with the same name e.g addxy
where M.addxy is meant as an extension to B.addxy. Then instead of adding M.addxy to the method table of B.addxy (which is the way it is currently) and thus invalidating B’s binary cache, do the opposite.
Any call to M.addxy will both B and M methods and will see dispatch to the most specific method.

Wait, thats not good enough, what if B.addxy calls another function in B which is also imported in M,
How does it know to consider module M?
To solve this we need to introduce a new concept: context call:
calling function f from module B with the context of module M, will resolve to the most appropriate method
visible from module B and module M if M.f imports B.f.

Another change that is needed to achieve this is passing the context down the AST, so functions g,h etc which are called from within function f will also dispatch according to the parent function context.

It must be too complicated to implement…
Since Julia is such an expressive language , I was able to put up a small POC(~100 lines of code) using standard Julia constructs without going deeper into the C-code. The code proves that multiple dispatch according to context is feasible.

the Proof Of Concept is in https://github.com/TsurHerman/MultiFunctions
And it lets you test the ideas presented here.

foobar_lv2 · June 10, 2018, 12:55pm

Ok. Let me see that I get this straight. If I make your examples slightly more interesting, consider the following; how would I do that with MultiFunctions?

module Ring_base
export addxy, mulxy
function addxy end
function mulxy end
end

module Ring_int
using Reexport
@reexport using Ring_base
export foo_int
struct foo_int x::Int end
Ring_base.addxy(x::foo_int,y::foo_int) = foo_int(x.x+y.x)
Ring_base.mulxy(x::foo_int,y::foo_int)= foo_int(x.x*y.x)
end

module Ring_Q
using Reexport
@reexport using Ring_base
export foo_Q
struct foo_Q x::Rational{Int} end
Ring_base.addxy(x::foo_Q,y::foo_Q) = foo_Q(x.x+y.x)
Ring_base.mulxy(x::foo_Q,y::foo_Q)= foo_Q(x.x*y.x)
end

module Ring_fma
using Reexport
@reexport using Ring_base
export ring_fma
ring_fma(x,y,z)=addxy(x, mulxy(y,z))
end

begin #main code
using Ring_int
using Ring_Q
using Ring_fma

@show ring_fma(foo_int(3), foo_int(2), foo_int(5))
@show ring_fma(foo_Q(3//1), foo_Q(2), foo_Q(5//1))
end;
#ring_fma(foo_int(3), foo_int(2), foo_int(5)) = ring_int.foo_int(13)
#ring_fma(foo_Q(3 // 1), foo_Q(2), foo_Q(5 // 1)) = ring_Q.foo_Q(13//1)

TsurHerman · June 10, 2018, 1:40pm

I fixed a small thing where @fusing should do also using and not import … for better compatibality with “regular” code

anyway you’r example is now on GitHub
it may look like this:


module Ring_int
using MF

struct foo_int x::Int end
export foo_int

@mf addxy(x::foo_int,y::foo_int) = foo_int(x.x+y.x)
@mf mulxy(x::foo_int,y::foo_int)= foo_int(x.x*y.x)

@fexport addxy
@fexport mulxy

end

module Ring_Q
using MF

struct foo_Q x::Rational{Int} end
export foo_Q

@mf addxy(x::foo_Q,y::foo_Q) = foo_Q(x.x+y.x)
@mf mulxy(x::foo_Q,y::foo_Q)= foo_Q(x.x*y.x)

@fexport addxy
@fexport mulxy
end

module Ring_fma
using MF

@mf addxy(x,y) = error("addxy unimplemented in context for args $((typeof(x),typeof(y)))")
@mf mulxy(x,y) = error("mulxy unimplemented in context for args $((typeof(x),typeof(y)))")

@mf ring_fma(x,y,z) = addxy(x, mulxy(y,z))

@fexport addxy
@fexport mulxy
@fexport ring_fma
end


using MF
@fusing Ring_int

@fusing Ring_Q

@fusing Ring_fma


ring_fma(foo_int(3), foo_int(2), foo_int(5))
ring_fma(foo_Q(3//1), foo_Q(2), foo_Q(5//1))

ring_fma(2.2,3.2,2.3) #should print an error

jrevels · June 10, 2018, 6:14pm

Another change that is needed to achieve this is passing the context down the AST, so functions g,h etc which are called from within function f will also dispatch according to the parent function context.

This functionality is provided by Cassette’s contextual dispatch mechanism, might be worth checking out if you’re interested in this kind of stuff.

jameson · June 10, 2018, 8:31pm

MethodTable = each Function has a MethodTable that encapsulates all methods associated with that Function

The implementation may superficially have this appearance, but the “MethodTable” Type is really just an optimization hack. Comceptually, there’s actually only one method table in the system.

caching the binary MethodInstances of a previous use of a module becomes impossible

We want to listen to new ideas, but when you repeatedly state something as “impossible” which is actually already working quite well, it greatly undermines your credibility. Improving the quantity and quality of the cache is of high priority, but is mostly a matter of tedium and hard work, not complexity. We certainly have not yet hit the limit of the improvements that we can make just against the current model.

So where is the problem?
[Using the sys.so file] is a kind of a “cheat” because it is legal to do only for @pure functions

This absolutely is legal to do for all functions, without exception. The sys.so image proves that, since it contains a wide variety of code of all sorts, and most of that code is not pure.
However, looking at your package, I’m not sure you understand this term that you are using: your @generated functions do not appear to be pure (in the broadest sense of not observing or mutating external state, not the stricter @pure sense), despite that being a very strong and absolutely mandatory requirement.

TsurHerman · June 11, 2018, 3:44am

I mean caching between sessions. How is this already working quite well?

TsurHerman · June 11, 2018, 6:39am

What sys.so proves is that it is possible (although a bit painstakingly) to save a binary state of the runtime. However this binary sate is baked into the runtime in a way that does not respect the language rules…

I will explain what I mean:


module M
f(x::Float64,y::Float64) = x+y
end

M.f(1.0,2.0) # 3.0

module N
import Base.+
(+)(x::Float64,y::Float64) = 0
end
M.f(1.0,2.0) # 0

in the following example If I were to add module M into the sysimg then calling M.f(1.0,2.0) would still produce
3.0 even after I imported module N.

So there is a difference whether code is baked into sys.so or loaded at runtime.

Edit: My bad, I just tested it again, and changes to method table are visible in the modules which are compiled
as part of the runtime,

I seriously don’t understand why you all get so jinxed when I point that out, you said it yourself in a previous post,
The way multiple dispatch works right now , a method instance is dependent on the order in which modules were
added to the runtime … even modules which are not strictly visible from the module “containing” the method instance.

Another issue is purity: I believe that the term pure , in the model I propose for multiple dispatch , can be relaxed greatly to mean any function that does not declare a new global…

Well maybe a better term would be: Eventually Pure

Since the methods which a function see does not change once the module finished loading.

jameson · June 11, 2018, 3:32pm

Such a condition is neither necessary or sufficient for caching compilation. The actual requirement is that the data the function sees does not change once the module is finished loading.

Because, as you found after trying it, this is simply false. There might be cases where it fails, but that’s just a bug. A common workflow now is to use the package Revise.jl, since this problem simply doesn’t exist anymore (it was issue #265).

Sure, you could use a different version of dispatch, but note that the referenced issue is closed – we already can handle cache invalidation with the current model. If you’re going to propose a new model, at least demonstrate that you know what problem you’re solving. All of the statements in your initial post section “So where is the problem?” are false.

As Jeff mentioned in the other issue: the current, context-free dispatch semantics actually make it very easy to build this cache and guarantee that it can be statically compiled.

The sysimg build demonstrates that the technique works. All that remains is implementing it for more general use, which is expected to be a lot of work, but not especially complex.

TsurHerman · June 11, 2018, 6:37pm

I keep referring to caching compilation between sessions , and not in the same session.
Revise solves that perfectly and I am not interested in that since it is already working great.

so please no more references to issue #265 or Revise or whatnot.

About “purity” (The flawed way flawed me understands it) → If I treat any global variable referenced as an implicit function argument, I think it is sufficient that the type of the global data doesn’t change, not the value… so that compiling that function again will produce the exact same binary.

In context-free dispatch this means that only hyper-pure functions are pure, because a future change in the method table may change the way a function gets compiled to binary.

In context dispatch, things may be more relaxed.

When you’ll start working on that expected lots of work , you’ll understand more the angle I am coming from.

jameson · July 16, 2018, 5:22pm

I’m trying to subtly give you my resumé. To be more explicit, I’ve been tackling this problem on-and-off for several years now. There are now various “proofs of concept” to demonstrate that the current approach will work, there just remains the issue of putting it all together. For example, we already have binary caching (sysimg dynamic library), independent incremental compilation (modules), method invalidation (e.g. #265 and Revise.jl) – they just currently don’t communicate across processes or across restarts. That does bring in additional issues, but completely unrelated to those you mentioned here (such as the need to avoid global counters and reference and avoid pointers and pointer-based object-identities).