State of machine learning in Julia

ChrisRackauckas · January 11, 2022, 10:14pm

It’s not the same or similar thing as the E-graph, but instead it’s similar to the interfaces the E-graphs are acting on. Maybe the easiest way to describe it by saying what is the same or similar. The Python bytecode is like “the Julia IR”. Of course, as an optimizing compiler, there isn’t a singular IR, instead there are stages: the untyped IR, the typed IR, and the LLVM IR. Cassette and IRTools, the tools on which Zygote.jl was built (some notable others are AutoPreallocation.jl, SparsityDetection.jl, etc.), are probably the most similar to TorchDynamo in that on untyped syntactic IR it is a tool that transforms to another untyped syntactic IR.

It turns out that for Julia this was a bad idea because (a) the meaning of code can depend (heavily) on types, and (b) this is before compiler optimizations, and so mixing compiler optimizations with automatic differentiation is impossible. Thus Julia v1.7 added an AbstractInterpreter interface to Julia Base itself for acting on typed IR, which is then used by packages like EscapeAnalysis.jl and Diffractor.jl to write compiler passes on typed IR. And of course LLVM IR has standard interpretation techniques along with Enzyme.jl which is an AD written on LLVM IR.

So TorchDynamo is probably most similar to Cassette/IRTools, but you could also say it’s like AbstractInterpreter in that it’s acting on “the true IR of Python”, where the true IR of Julia is typed when it has all of its information while in Python it is not. But this story is why Zygote has its compile-time issues, higher order AD issues, and why all of the tooling is moving to not just a new AD tool but an entirely different IR target and compiler tool stack (note this doesn’t imply that will happen to TorchDynamo, unless they start rewriting their AD to be source-to-source on Python bytecode, but there’s precedent of that in tangent which didn’t find a nice home). Note that these tools aren’t just for AD. For example, there are PRs to Julia’s Base which are automatically analyzing loops and removing repeated allocations of immutable arrays where they are written using the AbstractInterpreter compiler plugin interface.

github.com/JuliaLang/julia

ImmutableArrays

JuliaLang:master ← ianatol:kf/immutablearray

opened 12:13AM - 02 Oct 21 UTC

ianatol

+3132 -164

This PR extends #41777 to provide a dynamically sized immutable array `Immutable…Array`. The `ImmutableArray` constructor creates an immutable copy of another array, allowing users to get the performance of a mutable array locally, but with the compositionality and safety of an immutable array at the inter-procedural level. In the cases where the compiler can prove (using info from a novel [escape analysis](https://github.com/aviatesk/EscapeAnalysis.jl) pass) that the original array is dead after copying, this benefit comes at no cost to the user. See the following for an example of a function that utilizes performant, mutating operations while only exposing an immutable array: ``` function simple() a = Vector{Float64}(undef, 5) for i = 1:5 a[i] = i end return ImmutableArray(a) end ``` Using information gathered by the escape analysis pass, the compiler can prove that `a` is dead after the return, and thus this function is neatly optimized to have the same memory allocation as one that returns a mutable object.

So that still doesn’t answer how the heck E-graphs comes into the story because I haven’t described how you write a compiler pass. It doesn’t matter what level of IR you’re on, it’s basically just a function IR->IR. So where in their blog post they say “just add code here”

def custom_compiler(graph: torch.fx.GraphModule) → Callable:
    # do cool compiler optimizations here
    return graph.forward
    
with torchdynamo.optimize(custom_compiler):
    # any PyTorch code
    # custom_compiler() is called to optimize extracted fragments
    # should reach a fixed point where nothing new is compiled
    
# Optionally:
with torchdynamo.run():
    # any PyTorch code
    # previosly compiled artifacts are reused
    # provides a quiescence guarantee, without compiles

Well, that’s true in any of these systems, just like in macros. But if you’ve ever written a macro, you’ll know that walking expression graphs is a tedious process to get correct. Wouldn’t it be nice if compiler optimizations for mathematical ideas could be expressed mathematically, and the associated compiler pass could be generated? It turns out that all Symbolics tooling really is is just tooling that performs rewrites on some IR. So Symbolics.jl has an IR that uses SymbolicUtils.jl’s rewriters and MetaTheory.jl’s E-graphs to transform symbolic IR → symbolic IR, but what we have done is made those rewrite tools generic to the IR and boom now it’s a compiler optimization pass generator.

That means you can say define an E-graph that acts on Julia typed IR and spits out the typed IR with the desired simplifications described mathematically. This is what we mean by “democratization of writing compiler passes”: we are trying to use this to build a system so that people who want to add a new linear algebra simplification pass to the Julia typed IR do not need to learn all of the details of the AbstractInterpreter and Julia Typed IR definition, and instead just write a few mathematical equalities and boom it generates a compiler pass which then generates the transformed IR. So think of the E-graphs as replacing this requirement that someone writes a function like def custom_compiler(graph: torch.fx.GraphModule) → Callable: that digs through some expression graph. Instead you just write

Man, this came out longer than expected. But since it describes why Zygote is being replaced with Diffractor and Enzyme I guess it’s a useful description for many other reasons than the original question

Topic		Replies	Views
Where does Julia provide the biggest benefits over other ML frameworks for research? Machine Learning	34	10473	September 16, 2019
Deep learning in Julia Machine Learning	35	10504	April 22, 2024
Is it a good time for a PyTorch developer to move to Julia? If so, Flux? Knet? Machine Learning	52	25031	January 11, 2021
Why is Python, not Julia, still used for most state-of-the-art AI research? Offtopic knet , flux , machine-learning , mlj , sciml	65	11348	August 14, 2021
On Machine Learning and Programming Languages Machine Learning	48	8692	January 25, 2018

State of machine learning in Julia

Related topics