"Meaning", type-piracy, and method merging

Whether a given method of + (or any function) can be separately compiled is not related to whether the method is part of Base.+. What matters is

  1. The dependencies of that method.
  2. Call sites that can see the method.

If a given a + b call site can see method X — either because it’s using Base.+ which contains X, or because X is callable via ADL — then compiling that call site requires information about X. This is really the key to the whole thing. For each call site you only need to know “which methods might this call site end up calling?” With ADL, the method would still be reachable from the same (or mostly the same) call sites, so the separate compilation story doesn’t change. (Note again here I’m only talking about separate compilation, not language ergonomics or other issues.)

4 Likes

Thank you for the explanation. I am still having trouble with thinking through the Julia compilation model because my brain is too tied to the C++ style “compiling only occurs once you have a complete set of concrete types for a given generic interface”. But besides my own interest on this stuff, I don’t think I have much to add on the organization of the compiler.

Back to language (or macro) ergonomics for me, after the hijacking of this “meaning” thread for compiler details (karma and a fair response to me hijacking the last compiler thread for philisophical dispatch concerns !)

Do I understand it right that instruction selection generates “Machine IR” that can also be cached? I don’t see native codegen in the list, so I guess MIR to native is really cheap?

Then a persistent second level cache would look quite attractive, and (at least conceptually) relatively simple (using cryptographic hashes to encode the world-state of the dependency-DAG, and simply LRU without any attempts of GC, ever).

I think doing semantically clever things is very hard, because a persistent cache needs to store different versions of foo(x::Int) depending on how dependencies like Base.:+ changed, in a possibly load-order dependent way, different sessions and package versions (developers churn out a lot of versions in a single sesssion!).

Can you point me to where (source file) the dependency DAG is currently handled? Unfortunately I don’t have a good overview yet :frowning:

1 Like

I did not say that constant recompilation is taking place … why do you suggest that @StefanKarpinski ?

It makes this discussion less productive.

I said that in the current definition of multiple dispatch it is impossible to gurrantee that code loaded and compiled on the spot. Will behave the same if loaded from the cache. See my example above.

This makes caching between sessions impossible without further “rules” and compromises to the system.

Regarding your qsort example. The MethodInstance for qsorting MyType belongs probably to MyTypes module.
So it should be cached there, however if every module imported potentially changes the definition of qsort , the binary cache for MyTypes is not valid without recompilation re inference etc.

Someone pointed out that if we take note on all The modules that is imported in a certain module, and “seal” the binary cache there then we are probably safe to assume that the cache will not change.
And I agree with that.

I have a POC of forward dispatch or more precisely context dispatch,on my github account under MultiFunctions.

I will later tonight , when in front of the computer , write a detailed explanation, if you are curious you can have a look now.

And yes it involves a Julia level linker.

I don’t know this part of the code, but I feel like I do because of this:

1 Like

I think doing semantically clever things is very hard, because a persistent cache needs to store different versions of foo(x::Int) depending on how dependencies like Base.:+ changed, in a possibly load-order dependent way, different sessions and package versions (developers churn out a lot of versions in a single sesssion!).

That seems like a fairly nice summary of some of the challenges someone might encounter if they wanted to resolve https://github.com/JuliaLang/issues/265 for example. But unfortunately, as you mentioned above, this “caching between sessions [is] impossible”.

But unfortunately, as you mentioned above, this “caching between sessions [is] impossible”.

Using a hash-tree (DAG), it is afaik possible, and I outlined a possible way above.

Yes, we now turned every hash-collision into possible remote code-execution; hence, the need for crypto-hashes, and possibly command-line options to freeze/disable the persistent cache. But you have the same issue when touching a dedup-enabled zfs volume.

Examples where it seemed to me that you are saying this:

Perhaps I’m misunderstanding what you’re saying but it seems a lot like your position is that the way method extension and dispatch works in Julia either prevents us from caching compiled code or causes us to have to recompile code all the time (or both). You don’t seem to have directly addressed the various technical points that @yuyichao, @jameson, @jeff.bezanson or myself have raised. It is of course possible that we have all misunderstood your position, but if so, then it would be helpful to clarify what your actual position is.

1 Like

I believe this is just false. But it depends somewhat on the details — perfect dependency resolution is in general undecidable. For example, it’s also impossible to “guarantee” that a makefile accurately reflects the dependencies in a project, and it’s also impossible to guarantee that a C header file matches a certain binary. But none of that makes separate compilation impossible. If you want separate compilation, you just do it anyway and ignore those problems. We could do that in julia too with appropriate compiler tooling.

Not true. In fact we already resolve dependencies with method-level granularity (in the fix for issue 265).

2 Likes

I always refer to caching between sessions. I apologize if it came out differently.

Julia runtime and design is genius , in my view,
Multiple dispatch and the type system changed the way I think… the simplicity is just beatifull.

And I want to see it overcome its infancy and become a major entity in the realm of software design and development.

And I believe I found a simple and elegant way to make Julia completely cacheable … so instead of depending on huge C/C++ projects with ccalls , this code could be a Julia module.

Currently this is not practical because of the startup time cost of first jitting large projects.

After everything is “warm” then the workflow is just amazing.
This is what I am targeting.

Regarding the runtime jitting compiling inference Base the type system , interfaces broadcasting comprehensions do-notation and more , you have created something remarkable.

1 Like

So your whole aim is to make caching compiled code between Julia sessions easier? That does clarify significantly. Thank you. It had seemed to me that you were talking about recompilation of code during a single Julia session.

I still think that you haven’t addressed @jeff.bezanson’s sum example, which really cuts to the heart of the matter.

if f is exported name of JeffModule or if ‘f’ has been specifically imported in my module then calling ‘f’ from MyModule will “see” sum with the methods available in Jeff’s Module and MyModule as if they were both importing a more basic version of sum.

and dispatch will dispatch according to standard rules

Ok, so the visible sum methods within f depend on who called f? That strikes me as much less compatible with separate compilation than what we do today. How can you statically compile JeffModule if it needs to do something different based on who calls it?

I’m not sure what this means. What does “basic” mean here?

I believe I found a solution to that … or at least a promising direction

calling ‘f’ from the context of MyModule is part of MyModule static compilation , not Jeff’s.

Module basic
function sum end
end

with MyModule and JeffModule declaring
import basic.sum

Yes, the call to f is part of MyModule, but the body of f is inside JeffModule. To me, separate compilation means that the body of f is compiled once and then left alone, and then anybody who wants to can call it. If the body of f has to be re-examined in any way when compiling MyModule, that’s no longer separate compilation.

usually in multiple dispatch you pass types that do not belong to a module ,you pass them through a module that do not know of them, manipulate them using some intermediate common reasoning , with actual values coming from the calling context that do know those types.

So anyway in that case the “door” or exported function ‘f’ is not meant to be cached at its defining module.

I think a good choice would be to say that if f(args…) is compilable and callable from module M then calling
that function should look no further.

Sure, your module could pass the equivalent of a vtable pointer to f to tell it what definition of sum to use, but doesn’t that require compiling the call to sum inside f as an indirect call? That would be slow. We can easily get separate compilation at the expense of performance by compiling every call site to a dynamic dispatch.

1 Like

no, if f is compiled then f is called , otherwise if it is called from function g() in MyModule
it would be jitted at runtime.

if g() exists in the cache g will be called. you would still need dynamic dispatch in places of type uncertainty.
but you could use this context concept to limit what dynamic dispatch will see, potentially inlining it completely

POC