[ANN]: PaddedMatrices.jl, Julia BLAS and partially sized arrays

Mason · June 26, 2020, 7:07pm

I’m the author of Gaius.jl. As it says on the tin, don’t use it for anything important. It was really just a little experiment I ran where I wanted to understand what it would take to get multi-threading working with LoopVectorization.jl matmul kernels. Once things got sufficiently complicated and I got sufficiently busy, I backed off and stopped working on the project.

@Elrod’s work on PaddedMatrices.jl is strictly more promising than Gaius.jl, even if it’s multi-threading isn’t as good as Gaius.jl (last I checked, maybe this changed?). I don’t expect that there are any fundamental problems with it’s approach to multi-threading and with some elbow grease, it should be great across many cores.

OpenBLAS is an external project to Julia, but Julia farms out it’s linear algebra work to OpenBLAS when advantageous. There is at least some interest in replacing OpenBLAS with something pure julia, mostly out of ideological convictions that Julia solves the two language problem and so we shouldn’t need something like OpenBLAS. That said, there are practical reasons to want to get rid of OpenBLAS despite it’s fantastic performance. Linking to OpenBLAS contributes at least somewhat to the time it takes to start up a Julia session, plus it does not handle arbitrary matrix eltypes, only hardware floating point numbers and complex numbers.

For the time being, I would not expect any of these pure julia implementations to replace OpenBLAS at least for a couple of years. These projects are more precursors of things to come, as well as existance proofs that LoopVectorization.jl is scalable to rather hard problems.

Topic		Replies	Views
We can write an optimized BLAS library in pure Julia (please skip OP and jump to post 4) Numerics	17	13567	October 30, 2019
Julia matrix-multiplication performance Performance linearalgebra	20	8691	October 30, 2022
@inbounds: is the compiler now so smart that this is no longer necessary? Performance	33	2910	July 16, 2018
Performance gotcha in linear algebra lu() General Usage performance , linearalgebra	33	3619	February 11, 2020
OpenBLAS is faster than Intel MKL on AMD Hardware (Ryzen) Performance blas , lapack	40	36490	June 19, 2020

[ANN]: PaddedMatrices.jl, Julia BLAS and partially sized arrays

Related topics