Why is LoopVectorization deprecated?

Elrod · March 15, 2024, 4:11pm

The tests have been failing for a while. This, however, seems to be a regression:

github.com/JuliaSIMD/VectorizationBase.jl

Segfault in `__vload` on 1.11

opened 11:26AM - 04 Mar 24 UTC

PkgEval report https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by…_hash/0520b80_vs_bd47eca/NaNStatistics.primary.log. Happens in: https://github.com/JuliaSIMD/VectorizationBase.jl/blob/cbf6789a17f3bd26bc555fc03423b877e740dee7/src/llvm_intrin/memory_addr.jl#L987-L995 This comes from LV.jl which is deprecated but it ends up here so not sure if it is relevant. Example stacktrace ``` [83] signal 11 (128): Segmentation fault in expression starting at /home/pkgeval/.julia/packages/PlmDCA/V28iv/test/testdca.jl:105 macro expansion at /home/pkgeval/.julia/packages/VectorizationBase/xE5Tx/src/llvm_intrin/memory_addr.jl:997 [inlined] __vload at /home/pkgeval/.julia/packages/VectorizationBase/xE5Tx/src/llvm_intrin/memory_addr.jl:997 [inlined] _vload at /home/pkgeval/.julia/packages/VectorizationBase/xE5Tx/src/strided_pointers/stridedpointers.jl:105 [inlined] macro expansion at /home/pkgeval/.julia/packages/LoopVectorization/7gWfp/src/reconstruct_loopset.jl:1107 [inlined] _turbo_! at /home/pkgeval/.julia/packages/LoopVectorization/7gWfp/src/reconstruct_loopset.jl:1107 ```

MilesCranmer · March 31, 2024, 6:25pm

Since I don’t see much movement on this, does anyone know if there are alternatives to LoopVectorization for the upcoming deprecation? I am really not looking forward to the upcoming performance drop on Julia 1.11 in some of my packages

(@inbounds @simd vs @turbo in one of my benchmarks)

Presumably there is some sort of future-proof alternative out there, with worse speeds than LV but still better than @inbounds @simd?

Ahmed_Salih · March 31, 2024, 7:46pm

I was in a similar boat, but managed to circumvent the use of turbo by a full rewrite of my code (in such a way that threading loops through LoopVectorization was not possible anymore). I don’t think it is reasonable to expect this to be done by everyone who highly depends on it (like I did), so I imagine as the clock ticks closer and Julia v. 1.11 is released one of the following happens:

People realize that LoopVectorization.jl is too big to fail, and it gets updated for 1.11
Some parts of the eco-system version lock them to 1.10, going against Julia “philosophy” of always being quite easily upgradeable and gaining the new futures quickly (perhaps my personal understanding)
LoopVectorization.jl is not updated, allowed to deprecate. This will leave people always wondering * what if I had used @turbo? * Until something new and improved comes out.

Personally I think this is quite an interesting dilemma for Julia as a community, since it has always been proposed as a language to write the fastest numerical codes etc. and suddenly one huge back-bone of achieving this is ripped out rather abruptly.

Sorry for not being able to provide any reasonable solution, but I hope it helps to know that a lot of us are in / have been in your position and I was worried as well when my code still heavily dependended on it.

Kind regards

mkitti · March 31, 2024, 7:53pm

Why are we focusing so much on Julia 1.11 for package development at the moment? Julia 1.11 is the beginning of a new development cycle for Julia with many large changes internally. We just had a feature freeze and most of my attention there is working on internals or making former internals work better. I expect to see performance regressions there in the near term across the board as we learn to adapt and optimize for the new architecture. It seems to be too early to be optimizing for Julia 1.11 when the dust has barely settled there. I’m waiting for a beta before I start thinking about package performance on 1.11+.

Julia 1.10 is a likely candidate for a LTS release for many reasons. From a pure performance perspective in the near term, I would be thinking about how to make packages work as well as possible on Julia 1.10.x until post-1.10 Julia is clearly superior for performance.

MilesCranmer · April 1, 2024, 8:40am

It also looks like LoopModels isn’t really active: Branches · LoopModels/LoopModels · GitHub which is a bummer.

So, what other options are there for getting decent vectorization in Julia post-1.10? Maybe one option is to use SIMD.jl and vectorize things by hand?

I guess another is to even use JAX via PythonCall? Or even OpenXLA? I suppose you wouldn’t be able to write pure Julia code, but for the most expensive kernels perhaps it’s something to consider.

I don’t know about others but this is the first Julia alpha since I’ve joined the community where I’m seeing 50%-80% drops in performance… (due to the LV deprecation) At worst it’s been like 5% in the past. The @turbo-ified loops are the bottleneck of my code.

So I’m trying to fix this early (seems like it will take more time than usual) before it slows down downstream applications of all my users on the latest Julia.

lmiq · April 1, 2024, 11:25am

probably dummy suggestion: have you added @fastmath to the loop? LV, if I’m not mistaken, assumes it. (ps: can you share the code of the loop in question? It might be of interest of other people trying to solve similar regressions)

DNF · April 1, 2024, 12:30pm

I’m developing a library where the core is a simple loop that does a bunch of sincos calls. This is the main bottleneck, and without Loopvectorization.jl there’s going to be a 3-5x performance loss overall (guesstimate as of now). It’s pretty bad. There are some other, less important, places in the code where missing LV will bite, too.

This will be a long-term project, so looking ahead to 1.11 and beyond is reasonable, not sure I understand the argument of @mkitti that only 1.10 is relevant.

I hope it will be possible for me to get the speedup by hand, using SIMD.jl, so far sin/cos calls on vectors are disappointingly slow, though. The way forward is probably learning how LV does its magic. My case is pretty straightforward, so I’m relatively optimistic after all.

photor · April 1, 2024, 1:01pm

Please teach me how to do the magic after you learn the way

roflmaostc · April 1, 2024, 1:28pm

Years ago I also tried to make my packages fast for CPU.

But these days I focus on GPU performance only.
Many people in the scientific community have problems where GPUs speed things massively up (>10x).
So they use GPUs either remotely (e.g. Google Colab) or simply buy a cheap one (RTX 3060, 4060) because performance boost is huge.

Of course this is not applicable to all problems. But many problems where the runtimes is more than a couple of seconds, GPUs help a lot.

joa-quim · April 1, 2024, 1:33pm

In the department of the crazy ideas, could the needed parts of Julia 1.10 by made into an artifact to be used by LV on Julia >= 1.11?

LaurentPlagne · April 1, 2024, 2:38pm

IMHO this approach is fine when computation suits GPU (branchless). The most obvious limitation is available RAM (VRAM) which is rather limited on a GPU compared to CPU.
Apple hardware offers GPU with huge RAM capacity (up to 192 Go) for a relatively cheap price compared to NVidia. Unfortunately, KernelAbstractions.jl is not really usable for Apple hardware now.

MilesCranmer · April 1, 2024, 3:49pm

I would be very interested to see how you manage to convert your @turbo loops to vectorised versions manually, for my use as a reference. Maybe when you do, you could link the PR here? I’m sure it would be highly appreciated.

For my use-case it’s not easy. Actually for most inputs the CPU will be faster. See Native GPU support by MilesCranmer · Pull Request #65 · SymbolicML/DynamicExpressions.jl · GitHub for my current attempt. For typical input size, an H100 GPU is not as fast as my MacBook Pro CPU, to give you a picture.

Anyways that’s a different discussion. I would strongly prefer to make CPU speeds fast with Julia without needing to switch hardware. My library is used much more by downstream users than myself so I need to make all possible hardware fast.

mkitti · April 1, 2024, 4:19pm

I did not say Julia 1.11 is not relevant. I’m saying it is not the priority at the moment for improving package performance. There are other priorities with regard to Julia 1.11 itself that need to be addressed. I would rather have a solid foundation in Julia 1.11 rather than trying to build on top of one that is still under construction.

Julia 1.10 will likely be the Long Term Support release when Julia 1.11 is released. At that point, users will have a choice between Julia 1.10 and Julia 1.11, which both will be supported by patches. If that happens and Julia 1.10 is faster for you, by all means use Julia 1.10.

From my perspective, the higher priorities at the moment are as follows.

Security issues. The XZ backdoor is an acute problem. A chronic issue is mbedTLS long term support for Julia 1.10.
Julia 1.11 package compatibility for those using stable interfaces. For example, fixing libuv so that Cthulhu.jl’s of pipes during precompilation does not fail. Bump Libuv by Keno · Pull Request #8347 · JuliaPackaging/Yggdrasil · GitHub
Improving Julia 1.11 performance and latency. For example, making sure that loading Pkg.jl version 1.11 does not invalidate code in the Julia 1.11 system image: Pkg.BinaryPlatforms invalidates Base.BinaryPlatforms · Issue #3702 · JuliaLang/Pkg.jl · GitHub

As you can see there are still a bunch of moving pieces to make Julia 1.11 a viable release.

mkitti · April 1, 2024, 4:31pm

This does not seem abrupt to me at all. Chris has provided a transition off ramp. You will still have long term support (3+ years) on Julia 1.10, which you can continue to use. You just will not have access to some new features at worse. Some of those new features are the very ones that break LoopVectorization such as fundamentally changing how Julia arrays work under the hood.

The main underlying philosophy here is semantic versioning. Your code should still run in future Julia 1.x versions and that remains true in this case. What is not guaranteed is that there performance will monotonically increase with successive Julia versions. Sometimes the underlying mechanisms need to change to make things better. Julia 1.11 is the beginning of another cycle. Performance may get worse before it gets better.

roflmaostc · April 1, 2024, 5:50pm

In this case this is not true, right? LoopVectorization.jl does not run anymore. But because it used Julia internals which are not guaranteed to not change.

tbeason · April 1, 2024, 5:58pm

Package pinning would fix this though, right? There are versions of LoopVectorization.jl that work on Julia 1.9 for example, so as long as you had the correct versions it will still work.

DNF · April 1, 2024, 6:41pm

I’m not trying to influence the development of 1.11 one way or the other, nor to criticize the work being done or its prioritization. I’m simply reacting to the apparent attitude that this is not really a big deal. I’m not proposing anything “actionable”.

Unfortunately, while somewhat dreading the deprecation of LV, I am simultaneously eagerly awaiting the advances made towards compilation of executables, since my library must be distributed like that. It’s a dilemma.

MilesCranmer · April 1, 2024, 7:03pm

I am sure there is a good reason but I confess I don’t feel this way when I stare at my 2-5x slower benchmarks

I guess I just wish LV was somehow integrated into Julia standard library a long time ago so this wouldn’t happen, and any internal changes would be copied over regularly. Julia promises speed; good vectorization seems key to that promise.

Perhaps some of these major internal changes should have been paused from release until built-in Julia vectorization caught up to same ballpark as LV, since it’s evidently not yet close. (This also doesn’t seem like a niche use-case, no? Aren’t tons of people doing heavy vectorized array operations in Julia?)

I don’t see the argument about avoiding 1.11. I can do that personally, but my downstream users won’t do this. They will feel the slowness and not really know why. And I also don’t see 1.11-beta suddenly fixing this, given regular @inbounds @simd has never been as good as @turbo, and now @turbo is gone.

But this all being said I don’t want to be too pessimistic; like @DNF I am excited about the directions in static compilation. Just not sure they make up for the huge performance hits yet.

ufechner7 · April 1, 2024, 7:25pm

Well, this could easily be solved by taking upper bounds of the Julia version of a package in Project.toml serious and just don’t allow to install a package with a Julia version that is not mentioned in Project.toml as valid.

Ahmed_Salih · April 1, 2024, 10:39pm

That is not a solution for Miles or his package users anyways, since then the users would be implicitly version locked to 1.10? I know from my self, that I have not in the past not upgraded Julia to stick to one package, the only time I remember actually considering not to upgrade, is in this case for the upcoming 1.11, since my work at the time would be meaningless without LoopVectorization.jl - it would become too slow to be practical.

I think there are a lot of valid points floating around, such as to get somewhere better one has to forego the best solution currently, but it is a bit scary for the eco-system as a whole when one package is deprecated and performance tanks in multiple different packages from all kind of projects.

I think the stress would be a lot lower, if someone could show-case how to get near similar level of performance as LoopVectorization without using it - but I have not seen anyone do that yet.

Topic		Replies	Views
Should one use Julia to create libraries? General Usage question	5	265	February 6, 2025
Can I try updating LoopVectorization? General Usage loopvectorization	4	381	January 26, 2025
Autovectorization in Julia 101 Internals & Design simd , loopvectorization	2	342	December 5, 2024
[ANN] LoopVectorization Package Announcements	157	23240	May 27, 2020
[ANN] VectorizationTransformations.jl Package Announcements package , announcement , linearalgebra	2	504	December 26, 2023

Why is LoopVectorization deprecated?

Related topics