Interesting thoughts from Chris Rackauckas
It shouldn’t be too hard to make one. Octavian.jl and I think also Tullio(?) already have interfaces for changing tile sizes, and Gaius.jl which is cache-oblivious has some somewhat analogous heuristics.
So you could basically just use those heristic knobs and GitHub - JuliaLinearAlgebra/BLASBenchmarksCPU.jl: Benchmark BLAS Libraries to solve for ideal parameters.
We did this a bit already in Gaius.jl but I can’t remember if there was ever code published for it. @Elrod do you happen to remember?
Edit: Also, I think I remember Chris saying that LoopModels will support tiling, and that’d presumably be configurable too.
I get your point that there’s objects that change the behavior of functions, but those are not syntactic macros, those are higher-order functions, which the Mojo docs does say. Python’s @
decorators is just syntactic sugar for a function call and a reassignment of a type or function definition; it’s not even required to return a type or function. In typical uses, a decorator just embeds the type or function in a bigger one, but it is possible to extract the bytecode and compile it to something else: this is what Numba’s decorators do to functions.
EDIT: Mojo’s decorators may be more capable, there is an example of @parameter
decorating an if
-elif
statement, which is not something you can do in Python. However, the docs never calls it a macro, and other examples only show decorators preceding type or function definitions per usual.
For this I’m just taking Chris Lattner’s word. I’m not usually the type to do that, but I think in this case I Chris has built up enough credibility that appeal to authority is more a useful heuristic than a fallacy (as it can be when relying on trusted institutions…we don’t check everything ourselves).
This was the hope of Julia, but generated functions have been less than satisfactory (see zygote, loopvec, Soss etc) and compiler plugins have proven technically challenging
Reading the Mojo docs on Parameterization: compile time meta-programming, it doesn’t seem that different from what is done in Julia, just more AOT and explicit (not a C++ user, but sounds like constexpr
). I don’t really see anything regarding compiler extensions, it’s more about type parameters and how computations on them can be specified for compile-time. As you pointed out, we have generated functions for explicit computations on compile-time information. But the Julia compiler already does simple type/parameter/constant computation for generic functions, so we don’t resort to explicit control very often. Mojo is a WIP so I expect more features will come, and they could be something Julia’s metaprogramming learns from.
No, but Octavian has code based on searching for parameters defining its tiling behavior.
Unfortunately, the model it uses has turned out to be fairly poor empirically, in that the optimal parameters vary greatly between architectures, meaning the model does not successfully describe architectural details.
LoopModels will support tiling, but not in the initial release. I’ll try to rush out something minimally viable, and then will probably need to develop it further in my spare time.
There is the technical side which we all usually gravitate towards.
Yet I think Modular and Mojo are important in another level.
Modular is a single entity that promises a specific set of features in the language.
They are not talking about potential but committing to deliver those abilities.
This is how Mojo and Modular will be measured, the more they show capability, the more the will gain credibility and resources from the community and their investors.
It is a top down approach to mange the effort of developing the language and a community. The features are similar to Julia in concept, but fundamentally different in their execution plan since Julia’s development is managed bottom up.
I guess this is probably one of the reasons they chose a new path over an existing one. They needed the ability to move things by plan.
Moreover, regarding Julia, It is really hard to change the point of view of the masses on something and Julia is not the new kid in the block anymore. It is mostly perceived as a language that has not fulfilled its promise yet.
Take BLAS for instance, from the early days I remember the statements by the core developers of the potential having the whole BLAS / LAPACK implemented in Julia. In practice, it is a potential on the table, no one has ever took ownership on the effort.
Regarding the Matrix Multiplication discussion, we compare the best of Julia vs. a simple demo of Mojo.
Have a look at The world’s fastest unified matrix multiplication.
I believe the implementation in the post is in Mojo as they say it can run on many platforms and accelerators which matches the ideas behind Mojo. Pay attention, they beat everything. They Intel MKL on x64
in multi threaded scenario. If they have done this so early in the development, they surely onto something.
I don’t think beating their 3 minutes blog post is the challenge. I believe the challenge is:
- Generate a function with performance level on par with highly tuned BLAS library (Or any other low level algorithms).
- Integrate in a higher level function. Let’s say iterative solver that uses some BLAS / LAPACK procedures.
- Be able to deploy the high level function in a form of a DLL. Integrate it into an existing pipeline.
- Be able to be efficient on all platforms: CPU (x64 / ARM) / GPU (AMD, NVIDIA, Intel) / Accelerators (TPU, NPU, FPGA?).
- Support all major OS’s.
I think this is Modular’s main promise. They will be measured on delivering it. If they do, the community will follow.
Update
It is not official that all the GEMM kernels in the post The world’s fastest unified matrix multiplication are indeed written in Mojo:
This is the answer to my own question on their official Discord channel by official member of Modular.
So Mojo’s first baby step was indeed building the world fastest GEMM. It is indeed impressive. Should be a non trivial advantage of their inference engine.
not when his money is on the line… that’s why we have a conflict of interest declaration when sending papers – who’s not an expert in the field if they’re publishing on Nature?
I think we can agree the Modular information is very marketing-y (not open source, can’t run code ourselves)
IIRC it’s not that simple? yes technically it’s “just the step between user code and native instruction”, but Julia is not passing those information useful for MLIR in any ways and I imagine it will be a huge effort with only a handful of people in the community can do and none of them has time to do it?
So I think effectively no we can’t use it
What is the “concept” that makes MLIR special, or different, from the Julia compiler?
Probably related: what makes it possible for it to provide, for instance, a matrix multiplication method that does not deal with hardware specificities while achieving top performance? (Is that even true? Or a naive interpretation of their announcement?)
I don’t know how it’s perceived or by whom, but as a user of Julia for data analysis and mathematical model building it’s fulfilled everything I wanted from it really. I never use R anymore, and my stuff both runs faster and is easier to understand than any of the tidyverse R code you read these days. It connects to databases, reads and writes modern formats, has a nice VScode integration, and continuous reduction in latency. I couldn’t be happier with it.
@dlakelan, I wasn’t saying I feel that way.
I am here, hence I find Julia useful (Though currently only for my own, for joy, projects).
I also try to spread the word in my professionals circles. Because I think it is great.
I only expressed this as my evaluation on how it is perceived by those who knows it yet don’t use it.
I might be wrong, it is just an opinion, reasonable in my opinion, yet not a fact.
Mojo has another big thing going in its favor. Sure, upon closer inspection, idiomatic Mojo code looks different enough from Python to be a new language and uses very different programming concepts e.g. Rust-like borrow-checker. However, if they uphold their promise to run any existing Python code and presumably without performance regressions, it would cost nothing for Python users to migrate to Mojo and acclimate gradually. I suppose it’s more accurate to say that Mojo is reinventing Cython. We’ll have to see how the implementation pans out, there is some skepticism at this stage (no classes yet, not open-source yet, performance comparisons write Python in a way nobody actually would).
I interpret this to mean run any existing Python code with the performance of the hypothetical fast Mojo (or say today’s Rust, or Julia). Enormous resources and time have been spent over the past 10 or 20 years trying to do just this. With little success. If Modular has a secret they didn’t hint at it. MLIR is not magic pixie dust that you can sprinkle on cpython to plug all the leaky abstractions.
No, the Mojo docs make it pretty clear this is not possible. Their examples do boast performance improvements of idiomatic Python compared to CPython, but it’s still orders of magnitude slower than idiomatic Mojo. To clarify, if Python users could move to the Mojo compiler without losing performance, they can continue to do all their work as usual while they slowly learn and incorporate Mojo code. This is a much lower barrier to entry than starting from nothing in an entirely separate language, finding equivalent libraries if those are even available, and possibly learning language interop.
Looks like I need to add another jet engine to my slide…
The analogy being:
Python and its numeric libraries are like a Victorian-era stagecoach with jet engines duct-taped to it, each pointing a different direction (=mutually incompatible).
Perhaps. But is there a chance the analogy might be more like Typescript as a superset of Javascript?
Not really, for Typescript transpiles to JS, but is not a superset per say.
Indeed, they present Mojo as a superset.
Assuming that it’s true, when using the “subset” of Mojo that is Python, you’ll run into the same problems that Numba “et al.” tried to sort.
It was quite clear from the example of matmul that the performance was obtained as the expense of a very specific (and cumbersome, to my eyes) syntax.
The question being, will they be able to hide this added complexity behind some macros?
Without further knowledge on Mojo’s macro mechanic, I’d say it’s possible in principle (well, that’s a wild guess here), and would correspond to a small “step” beyond the Python “subset”.
Is it possible that we are sort of reaching the point where the speed differences, if any, are moot and that what is most important is other language features like developer experience, etc?
It feels to me like benchmarking to find which language is nanoseconds faster is less meaningful than saying, umm… just look at the code. Which would you like to write? Which is more readable? How many iterations did it take before you reached that level of performance? So, for power users of Python, Mojo is a big win for them. They know the syntax, the quirks, and can fit this into their pipeline. For someone starting from scratch, I think that is less true.