Why is LoopVectorization deprecated?

I just looked at VectorizationBase tests and they seem to be broken even on 1.10, with the same failures and errors happening on nightly. Are the expected failures on 1.11 different from those and are they not visible in the nightly run?

The tests have been failing for a while. This, however, seems to be a regression:

Since I donā€™t see much movement on this, does anyone know if there are alternatives to LoopVectorization for the upcoming deprecation? I am really not looking forward to the upcoming performance drop on Julia 1.11 in some of my packages :frowning:

(@inbounds @simd vs @turbo in one of my benchmarks)

Presumably there is some sort of future-proof alternative out there, with worse speeds than LV but still better than @inbounds @simd?

9 Likes

I was in a similar boat, but managed to circumvent the use of turbo by a full rewrite of my code (in such a way that threading loops through LoopVectorization was not possible anymore). I donā€™t think it is reasonable to expect this to be done by everyone who highly depends on it (like I did), so I imagine as the clock ticks closer and Julia v. 1.11 is released one of the following happens:

  • People realize that LoopVectorization.jl is too big to fail, and it gets updated for 1.11
  • Some parts of the eco-system version lock them to 1.10, going against Julia ā€œphilosophyā€ of always being quite easily upgradeable and gaining the new futures quickly (perhaps my personal understanding)
  • LoopVectorization.jl is not updated, allowed to deprecate. This will leave people always wondering * what if I had used @turbo? * Until something new and improved comes out.

Personally I think this is quite an interesting dilemma for Julia as a community, since it has always been proposed as a language to write the fastest numerical codes etc. and suddenly one huge back-bone of achieving this is ripped out rather abruptly.

Sorry for not being able to provide any reasonable solution, but I hope it helps to know that a lot of us are in / have been in your position and I was worried as well when my code still heavily dependended on it.

Kind regards

8 Likes

Why are we focusing so much on Julia 1.11 for package development at the moment? Julia 1.11 is the beginning of a new development cycle for Julia with many large changes internally. We just had a feature freeze and most of my attention there is working on internals or making former internals work better. I expect to see performance regressions there in the near term across the board as we learn to adapt and optimize for the new architecture. It seems to be too early to be optimizing for Julia 1.11 when the dust has barely settled there. Iā€™m waiting for a beta before I start thinking about package performance on 1.11+.

Julia 1.10 is a likely candidate for a LTS release for many reasons. From a pure performance perspective in the near term, I would be thinking about how to make packages work as well as possible on Julia 1.10.x until post-1.10 Julia is clearly superior for performance.

It also looks like LoopModels isnā€™t really active: Branches Ā· LoopModels/LoopModels Ā· GitHub which is a bummer.

So, what other options are there for getting decent vectorization in Julia post-1.10? Maybe one option is to use SIMD.jl and vectorize things by hand?

I guess another is to even use JAX via PythonCall? Or even OpenXLA? I suppose you wouldnā€™t be able to write pure Julia code, but for the most expensive kernels perhaps itā€™s something to consider.

I donā€™t know about others but this is the first Julia alpha since Iā€™ve joined the community where Iā€™m seeing 50%-80% drops in performanceā€¦ (due to the LV deprecation) At worst itā€™s been like 5% in the past. The @turbo-ified loops are the bottleneck of my code.

So Iā€™m trying to fix this early (seems like it will take more time than usual) before it slows down downstream applications of all my users on the latest Julia.

2 Likes

probably dummy suggestion: have you added @fastmath to the loop? LV, if Iā€™m not mistaken, assumes it. (ps: can you share the code of the loop in question? It might be of interest of other people trying to solve similar regressions)

1 Like

Iā€™m developing a library where the core is a simple loop that does a bunch of sincos calls. This is the main bottleneck, and without Loopvectorization.jl thereā€™s going to be a 3-5x performance loss overall (guesstimate as of now). Itā€™s pretty bad. There are some other, less important, places in the code where missing LV will bite, too.

This will be a long-term project, so looking ahead to 1.11 and beyond is reasonable, not sure I understand the argument of @mkitti that only 1.10 is relevant.

I hope it will be possible for me to get the speedup by hand, using SIMD.jl, so far sin/cos calls on vectors are disappointingly slow, though. The way forward is probably learning how LV does its magic. My case is pretty straightforward, so Iā€™m relatively optimistic after all.

10 Likes

Please teach me how to do the magic after you learn the way :grinning:

1 Like

Years ago I also tried to make my packages fast for CPU.

But these days I focus on GPU performance only.
Many people in the scientific community have problems where GPUs speed things massively up (>10x).
So they use GPUs either remotely (e.g. Google Colab) or simply buy a cheap one (RTX 3060, 4060) because performance boost is huge.

Of course this is not applicable to all problems. But many problems where the runtimes is more than a couple of seconds, GPUs help a lot.

1 Like

In the department of the crazy ideas, could the needed parts of Julia 1.10 by made into an artifact to be used by LV on Julia >= 1.11?

3 Likes

IMHO this approach is fine when computation suits GPU (branchless). The most obvious limitation is available RAM (VRAM) which is rather limited on a GPU compared to CPU.
Apple hardware offers GPU with huge RAM capacity (up to 192 Go) for a relatively cheap price compared to NVidia. Unfortunately, KernelAbstractions.jl is not really usable for Apple hardware now.

2 Likes

I would be very interested to see how you manage to convert your @turbo loops to vectorised versions manually, for my use as a reference. Maybe when you do, you could link the PR here? Iā€™m sure it would be highly appreciated.

For my use-case itā€™s not easy. Actually for most inputs the CPU will be faster. See Native GPU support by MilesCranmer Ā· Pull Request #65 Ā· SymbolicML/DynamicExpressions.jl Ā· GitHub for my current attempt. For typical input size, an H100 GPU is not as fast as my MacBook Pro CPU, to give you a picture.

Anyways thatā€™s a different discussion. I would strongly prefer to make CPU speeds fast with Julia without needing to switch hardware. My library is used much more by downstream users than myself so I need to make all possible hardware fast.

5 Likes

I did not say Julia 1.11 is not relevant. Iā€™m saying it is not the priority at the moment for improving package performance. There are other priorities with regard to Julia 1.11 itself that need to be addressed. I would rather have a solid foundation in Julia 1.11 rather than trying to build on top of one that is still under construction.

Julia 1.10 will likely be the Long Term Support release when Julia 1.11 is released. At that point, users will have a choice between Julia 1.10 and Julia 1.11, which both will be supported by patches. If that happens and Julia 1.10 is faster for you, by all means use Julia 1.10.

From my perspective, the higher priorities at the moment are as follows.

  1. Security issues. The XZ backdoor is an acute problem. A chronic issue is mbedTLS long term support for Julia 1.10.
  2. Julia 1.11 package compatibility for those using stable interfaces. For example, fixing libuv so that Cthulhu.jlā€™s of pipes during precompilation does not fail. Bump Libuv by Keno Ā· Pull Request #8347 Ā· JuliaPackaging/Yggdrasil Ā· GitHub
  3. Improving Julia 1.11 performance and latency. For example, making sure that loading Pkg.jl version 1.11 does not invalidate code in the Julia 1.11 system image: Pkg.BinaryPlatforms invalidates Base.BinaryPlatforms Ā· Issue #3702 Ā· JuliaLang/Pkg.jl Ā· GitHub

As you can see there are still a bunch of moving pieces to make Julia 1.11 a viable release.

1 Like

This does not seem abrupt to me at all. Chris has provided a transition off ramp. You will still have long term support (3+ years) on Julia 1.10, which you can continue to use. You just will not have access to some new features at worse. Some of those new features are the very ones that break LoopVectorization such as fundamentally changing how Julia arrays work under the hood.

The main underlying philosophy here is semantic versioning. Your code should still run in future Julia 1.x versions and that remains true in this case. What is not guaranteed is that there performance will monotonically increase with successive Julia versions. Sometimes the underlying mechanisms need to change to make things better. Julia 1.11 is the beginning of another cycle. Performance may get worse before it gets better.

4 Likes

In this case this is not true, right? LoopVectorization.jl does not run anymore. But because it used Julia internals which are not guaranteed to not change.

1 Like

Package pinning would fix this though, right? There are versions of LoopVectorization.jl that work on Julia 1.9 for example, so as long as you had the correct versions it will still work.

1 Like

Iā€™m not trying to influence the development of 1.11 one way or the other, nor to criticize the work being done or its prioritization. Iā€™m simply reacting to the apparent attitude that this is not really a big deal. Iā€™m not proposing anything ā€œactionableā€.

Unfortunately, while somewhat dreading the deprecation of LV, I am simultaneously eagerly awaiting the advances made towards compilation of executables, since my library must be distributed like that. Itā€™s a dilemma.

6 Likes

I am sure there is a good reason but I confess I donā€™t feel this way when I stare at my 2-5x slower benchmarks :confused:

I guess I just wish LV was somehow integrated into Julia standard library a long time ago so this wouldnā€™t happen, and any internal changes would be copied over regularly. Julia promises speed; good vectorization seems key to that promise.

Perhaps some of these major internal changes should have been paused from release until built-in Julia vectorization caught up to same ballpark as LV, since itā€™s evidently not yet close. (This also doesnā€™t seem like a niche use-case, no? Arenā€™t tons of people doing heavy vectorized array operations in Julia?)

I donā€™t see the argument about avoiding 1.11. I can do that personally, but my downstream users wonā€™t do this. They will feel the slowness and not really know why. And I also donā€™t see 1.11-beta suddenly fixing this, given regular @inbounds @simd has never been as good as @turbo, and now @turbo is gone.

But this all being said I donā€™t want to be too pessimistic; like @DNF I am excited about the directions in static compilation. Just not sure they make up for the huge performance hits yet.

16 Likes

Well, this could easily be solved by taking upper bounds of the Julia version of a package in Project.toml serious and just donā€™t allow to install a package with a Julia version that is not mentioned in Project.toml as valid.