I saw that MLIR technology is already driving a new programming language called MOJO with the ability to be very fast, make difficult-to-do optimizations and give instructions to many types of hardware such as CPU, GPU, TPU, etc.
Does Julia have plans to use MLIR technology in the future as an intermediate level IR or something similar?
Generating MLIR from julia was experimented on with a project called brutus GitHub - JuliaLabs/brutus, it was early days of MLIR so performance wasn’t really good. Currently I don’t see any plans of switching julia from it’s LLVM IR backend to a MLIR one, but one could use it do specialized codegen, maybe through GitHub - JuliaLabs/MLIR.jl. But it’s all in an early stage.
ability to be very fast, make difficult-to-do optimizations and give instructions to many types of hardware such as CPU, GPU, TPU, etc.
I don’t think it should ever become a priority to switch to MLIR, since you could say mostly the same for for Julia (runs on TPU, GPU and is very fast on CPU)!
It would be a huge project, with lots of problems and not that many improvements above the current state of Julia.
What @gbaraldi linked will be a much more likely integration if someone sees worth in it.
I think MLIR is cool, but Julia has been getting similar features via other means so far (e.g. GPUCompiler.jl, LoopVectorization.jl, and all the automatic differentiation packages).
To add onto this, it’s worth pointing out that MLIR and Mojo are not magic. The “ML” in MLIR stands for “multi-level”; think Inception for LLVM rather than Machine Learning. In fact, LLVM IR is a MLIR “dialect”. Most Mojo code running on CPU (and likely a lot running on GPU) is “lowered” from a higher-level dialect into LLVM IR, while all Julia code is lowered from a higher-level IR which is not a MLIR dialect into LLVM IR.
This should offer some insight into why just making Julia use MLIR instead of LLVM IR directly wouldn’t change much. Does that mean there’s no benefit to using MLIR? No. You could imagine how a library like Coil.jl could benefit if Julia IR was a MLIR dialect. Other non-LLVM MLIR dialects help Mojo achieve functionality like auto-vectorization at the language level, while the Julia ecosystem has to deal with issues like Why is LoopVectorization deprecated? because trying to integrate with the compiler is far more fragile.
Lastly, I should point out that neither Mojo nor the Julia ecosystem support TPUs right now. There have been attempts to get Julia code running on TPUs, but see above about fragile compiler integration. Additionally, TPUs speak a very limited set of high-level array operations and nothing else. Given most Julia and Mojo code is being written at a significantly lower level and using constructs (e.g. complex loops, lazy conditionals with side effects) that are not supported in the XLA IR TPUs use, it’s unlikely either will get great language-level support any time soon. The best path is likely via some high-level DSL, which is basically what JAX is for Python.
Is it possible to mix Julia SSA with other MLIR dialects? If not, would it be possible to adapt Julia SSA to become an MLIR dialect and mix it with other MLIR dialects?
At last year’s FOSDEM there was a presentation on how to build your own MLIR dialect.
Hi Xiaoxi, Julia is lowered to LLVM IR (see code_llvm in julia doc) which exists as a ML IR dialect (https://mlir.llvm.org/docs/Dialects/LLVM/) so it’s inevitably compatible with other MLIR dialects.
I don’t think MLIR is needed for Julia, for speed (or for Mojo?).
Mojo was recently claimed 50% faster than Rust for some code, and then Mojo was beat by Julia (I believe Julia could always match, depends on how good the programmer is and if using good tools e.g. Bumper.jl which sort of amends a downside vs Mojo). Julia has the performance ceiling of C/C++, i.e. as fast as possible, is in practice often faster. You often get less if you’re new to Julia or haven’t read the performance section of the manual.
I’m mostly ignorant of MLIR, but not what’s needed for speed. I’m less ignorant of Mojo (and Rust that inspired it, its borrow checker).
Allocations aren’t slow in Julia (or Mojo) but they imply GC pressure, which can be a performance killer, but that is (fully) avoidable.
I’m a bit ignorant of GPU use; with Julia, but GPUs/CUDA.jl don’t have the GC pressure problem, and Julia is already about 3% faster than CUDA C. Julia is already ease/r to code for GPUs than many/most/all? languages.
[In a recent benchmark Mojo was actually slower than Python, because of their Dict implementation, Python’s is good. That problem may apply to Julia too, though I doubt it. Julia’s should be good, it was strictly because of Strings used in Dicts, and they could be improved in Julia (and Mojo I guess).]
To be fair (while still can be used, and for years, 1.10 will likely become LTS), this is non-ideal (LoopModels will likely replace this project, if not already):
LoopVectorization only works for Julia 1.3 through 1.10. For 1.11 and newer, it simply uses @inbounds @fastmath instead, so it should still get roughly the same answer, but both runtime and compile time performance may change dramatically.
It was my favourite Julia package, seeing what (speed) it enabled, and others built on it e.g. Gaius.jl. @Elrod deprecated it but not (yet?) VectorizationBase.jl it depends on, which I’ve not looked much into. Gaius depends on both but some, at least, AccurateArithmetic.jl, depend on only the Base package.
I don’t think MLIR is needed for Julia, for speed (or for Mojo?). … Julia has the performance ceiling of C/C++, i.e. as fast as possible, is in practice often faster.
I’ll highlight the first sentence of the article:
Mojo is built on the latest compiler technology in MLIR, an evolution of LLVM which Rust lowers to, and so it can be faster.
The performance in question is not just of Julia, but to get the complete picture it is Julia + Compiler (IR) + Hardware. Let’s remove Hardware from the picture, (MLIR in facts abstracts Hardware away, more so than LLVM), and to compare the performance of a language, you compare how well that language synthesizes performant IR.
In other words, you can compare Julia with C/C++ (or any other language) as long as the compiler (IR) is the same. Julia vs C/C++ is valid assuming both use LLVM IR, which they do assuming you use clang as your C/C++ compiler.
Julia vs Mojo isn’t even an apples to apples comparison. You have to say, Julia + LLVM vs Mojo + MLIR.
Now there is some caveat to this. LLVM itself is a MLIR dialect heavily optimized for CPUs. But there are some shortcomings of LLVM for the CPU that Chris Lattner (creator of LLVM/MLIR) has pointed out. See below statement in the linked article:
LLVM (and therefore Rust) has automatic vectorization optimization passes, but they’ll never be able to reach the same level of performance as the programmer expressing exactly what they intended, because LLVM cannot change memory layout or other important details for SIMD.
Comparing actual performance results of Mojo at this time is premature in my opinion, the language still has a lot to work out and MLIR (and the various dialects) need to be proven and more battletested. But the fundamental ideas behind the stack are very sound and I believe MLIR will be revolutionary technology.
Answering the title: I believe that if Julia wants to be a truly heterogeneous compute language it must use MLIR, otherwise we run the risk of implementing the same work for multiple architectures at the language level, when in reality it should be done at the compiler (IR) level. I don’t want GPUArrays, CuArrays, ROCArrays, oneArrays (https://juliagpu.org/) in my language ecosystem. I just want Arrays.
CUDA.jl, AMDGPU.jl and OneApi.jl all work at the IR level and use LLVM to do the rest, which is pretty efficient and elegant.
I’m quite sure, that moving Julia to use MLIR will be much more work than improving Julia’s current LLVM based GPU compiler packages.
The benefits are also uncertain (we have no idea if it will be faster, it most certainly will change some fundamentals of the language etc), so it could also be a massive time sink, since we’ll only find out once Julia uses MLIR and is able to run some larger Julia programs.
I don’t want GPUArrays, CuArrays, ROCArrays, oneArrays (https://juliagpu.org/ ) in my language ecosystem. I just want Arrays.
You will still need that hints for the language to tell it where to allocate the memory, which CuArray(array) does quite elegantly.
Automatically moving things to the GPU where it fits is an incredibly hard problem, that nobody has completely solved yet.
There are some prototypes for automatically moving memory to the GPU when it improves performance, but they usually don’t scale well for any kind of code, so they don’t really have a place in a multi purpose language (yet) and are a much better fit for a package.
The comparison should be: Julia + Julia IR + LLVM vs Mojo + MLIR + LLVM.
Julia takes advantage of tools developed for LLVM IR such as Enzyme, but may miss tools developed for MLIR. If adopting MLIR is too expensive, could Julia IR adopt some MLIR dialects? For example, could Julia IR adopt the MLIR dialect for homomorphic encryption? https://heir.dev
Most likely. This talk shows the overhead of using MLIR.
This is an interesting experiment. A team from TU Munich opted to skip LLVM IR entirely and generate machine-IR (MIR) directly, yielding ~20% performance improvement in a Just-in-Time (JIT) compilation workload.
Julia has bindings for Enzyme-MLIR (Enzyme’s MLIR variant in the same repo as Enzyme) as well as XLA based fusion/linear algebra optimization. See here for a preview GitHub - EnzymeAD/Reactant.jl