I just looked at VectorizationBase tests and they seem to be broken even on 1.10, with the same failures and errors happening on nightly. Are the expected failures on 1.11 different from those and are they not visible in the nightly run?
The tests have been failing for a while. This, however, seems to be a regression:
Since I donāt see much movement on this, does anyone know if there are alternatives to LoopVectorization for the upcoming deprecation? I am really not looking forward to the upcoming performance drop on Julia 1.11 in some of my packages
(@inbounds @simd
vs @turbo
in one of my benchmarks)
Presumably there is some sort of future-proof alternative out there, with worse speeds than LV but still better than @inbounds @simd
?
I was in a similar boat, but managed to circumvent the use of turbo
by a full rewrite of my code (in such a way that threading loops through LoopVectorization was not possible anymore). I donāt think it is reasonable to expect this to be done by everyone who highly depends on it (like I did), so I imagine as the clock ticks closer and Julia v. 1.11 is released one of the following happens:
- People realize that
LoopVectorization.jl
is too big to fail, and it gets updated for 1.11 - Some parts of the eco-system version lock them to 1.10, going against Julia āphilosophyā of always being quite easily upgradeable and gaining the new futures quickly (perhaps my personal understanding)
LoopVectorization.jl
is not updated, allowed to deprecate. This will leave people always wondering * what if I had used @turbo? * Until something new and improved comes out.
Personally I think this is quite an interesting dilemma for Julia as a community, since it has always been proposed as a language to write the fastest numerical codes etc. and suddenly one huge back-bone of achieving this is ripped out rather abruptly.
Sorry for not being able to provide any reasonable solution, but I hope it helps to know that a lot of us are in / have been in your position and I was worried as well when my code still heavily dependended on it.
Kind regards
Why are we focusing so much on Julia 1.11 for package development at the moment? Julia 1.11 is the beginning of a new development cycle for Julia with many large changes internally. We just had a feature freeze and most of my attention there is working on internals or making former internals work better. I expect to see performance regressions there in the near term across the board as we learn to adapt and optimize for the new architecture. It seems to be too early to be optimizing for Julia 1.11 when the dust has barely settled there. Iām waiting for a beta before I start thinking about package performance on 1.11+.
Julia 1.10 is a likely candidate for a LTS release for many reasons. From a pure performance perspective in the near term, I would be thinking about how to make packages work as well as possible on Julia 1.10.x until post-1.10 Julia is clearly superior for performance.
It also looks like LoopModels isnāt really active: Branches Ā· LoopModels/LoopModels Ā· GitHub which is a bummer.
So, what other options are there for getting decent vectorization in Julia post-1.10? Maybe one option is to use SIMD.jl and vectorize things by hand?
I guess another is to even use JAX via PythonCall? Or even OpenXLA? I suppose you wouldnāt be able to write pure Julia code, but for the most expensive kernels perhaps itās something to consider.
I donāt know about others but this is the first Julia alpha since Iāve joined the community where Iām seeing 50%-80% drops in performanceā¦ (due to the LV deprecation) At worst itās been like 5% in the past. The @turbo
-ified loops are the bottleneck of my code.
So Iām trying to fix this early (seems like it will take more time than usual) before it slows down downstream applications of all my users on the latest Julia.
probably dummy suggestion: have you added @fastmath
to the loop? LV, if Iām not mistaken, assumes it. (ps: can you share the code of the loop in question? It might be of interest of other people trying to solve similar regressions)
Iām developing a library where the core is a simple loop that does a bunch of sincos
calls. This is the main bottleneck, and without Loopvectorization.jl thereās going to be a 3-5x performance loss overall (guesstimate as of now). Itās pretty bad. There are some other, less important, places in the code where missing LV will bite, too.
This will be a long-term project, so looking ahead to 1.11 and beyond is reasonable, not sure I understand the argument of @mkitti that only 1.10 is relevant.
I hope it will be possible for me to get the speedup by hand, using SIMD.jl, so far sin/cos
calls on vectors are disappointingly slow, though. The way forward is probably learning how LV does its magic. My case is pretty straightforward, so Iām relatively optimistic after all.
Please teach me how to do the magic after you learn the way
Years ago I also tried to make my packages fast for CPU.
But these days I focus on GPU performance only.
Many people in the scientific community have problems where GPUs speed things massively up (>10x).
So they use GPUs either remotely (e.g. Google Colab) or simply buy a cheap one (RTX 3060, 4060) because performance boost is huge.
Of course this is not applicable to all problems. But many problems where the runtimes is more than a couple of seconds, GPUs help a lot.
In the department of the crazy ideas, could the needed parts of Julia 1.10 by made into an artifact to be used by LV on Julia >= 1.11?
IMHO this approach is fine when computation suits GPU (branchless). The most obvious limitation is available RAM (VRAM) which is rather limited on a GPU compared to CPU.
Apple hardware offers GPU with huge RAM capacity (up to 192 Go) for a relatively cheap price compared to NVidia. Unfortunately, KernelAbstractions.jl is not really usable for Apple hardware now.
I would be very interested to see how you manage to convert your @turbo
loops to vectorised versions manually, for my use as a reference. Maybe when you do, you could link the PR here? Iām sure it would be highly appreciated.
For my use-case itās not easy. Actually for most inputs the CPU will be faster. See Native GPU support by MilesCranmer Ā· Pull Request #65 Ā· SymbolicML/DynamicExpressions.jl Ā· GitHub for my current attempt. For typical input size, an H100 GPU is not as fast as my MacBook Pro CPU, to give you a picture.
Anyways thatās a different discussion. I would strongly prefer to make CPU speeds fast with Julia without needing to switch hardware. My library is used much more by downstream users than myself so I need to make all possible hardware fast.
I did not say Julia 1.11 is not relevant. Iām saying it is not the priority at the moment for improving package performance. There are other priorities with regard to Julia 1.11 itself that need to be addressed. I would rather have a solid foundation in Julia 1.11 rather than trying to build on top of one that is still under construction.
Julia 1.10 will likely be the Long Term Support release when Julia 1.11 is released. At that point, users will have a choice between Julia 1.10 and Julia 1.11, which both will be supported by patches. If that happens and Julia 1.10 is faster for you, by all means use Julia 1.10.
From my perspective, the higher priorities at the moment are as follows.
- Security issues. The XZ backdoor is an acute problem. A chronic issue is mbedTLS long term support for Julia 1.10.
- Julia 1.11 package compatibility for those using stable interfaces. For example, fixing libuv so that Cthulhu.jlās of pipes during precompilation does not fail. Bump Libuv by Keno Ā· Pull Request #8347 Ā· JuliaPackaging/Yggdrasil Ā· GitHub
- Improving Julia 1.11 performance and latency. For example, making sure that loading Pkg.jl version 1.11 does not invalidate code in the Julia 1.11 system image: Pkg.BinaryPlatforms invalidates Base.BinaryPlatforms Ā· Issue #3702 Ā· JuliaLang/Pkg.jl Ā· GitHub
As you can see there are still a bunch of moving pieces to make Julia 1.11 a viable release.
This does not seem abrupt to me at all. Chris has provided a transition off ramp. You will still have long term support (3+ years) on Julia 1.10, which you can continue to use. You just will not have access to some new features at worse. Some of those new features are the very ones that break LoopVectorization such as fundamentally changing how Julia arrays work under the hood.
The main underlying philosophy here is semantic versioning. Your code should still run in future Julia 1.x versions and that remains true in this case. What is not guaranteed is that there performance will monotonically increase with successive Julia versions. Sometimes the underlying mechanisms need to change to make things better. Julia 1.11 is the beginning of another cycle. Performance may get worse before it gets better.
In this case this is not true, right? LoopVectorization.jl does not run anymore. But because it used Julia internals which are not guaranteed to not change.
Package pinning would fix this though, right? There are versions of LoopVectorization.jl that work on Julia 1.9 for example, so as long as you had the correct versions it will still work.
Iām not trying to influence the development of 1.11 one way or the other, nor to criticize the work being done or its prioritization. Iām simply reacting to the apparent attitude that this is not really a big deal. Iām not proposing anything āactionableā.
Unfortunately, while somewhat dreading the deprecation of LV, I am simultaneously eagerly awaiting the advances made towards compilation of executables, since my library must be distributed like that. Itās a dilemma.
I am sure there is a good reason but I confess I donāt feel this way when I stare at my 2-5x slower benchmarks
I guess I just wish LV was somehow integrated into Julia standard library a long time ago so this wouldnāt happen, and any internal changes would be copied over regularly. Julia promises speed; good vectorization seems key to that promise.
Perhaps some of these major internal changes should have been paused from release until built-in Julia vectorization caught up to same ballpark as LV, since itās evidently not yet close. (This also doesnāt seem like a niche use-case, no? Arenāt tons of people doing heavy vectorized array operations in Julia?)
I donāt see the argument about avoiding 1.11. I can do that personally, but my downstream users wonāt do this. They will feel the slowness and not really know why. And I also donāt see 1.11-beta suddenly fixing this, given regular @inbounds @simd
has never been as good as @turbo
, and now @turbo
is gone.
But this all being said I donāt want to be too pessimistic; like @DNF I am excited about the directions in static compilation. Just not sure they make up for the huge performance hits yet.
Well, this could easily be solved by taking upper bounds of the Julia version of a package in Project.toml serious and just donāt allow to install a package with a Julia version that is not mentioned in Project.toml as valid.