Is there anything users can do to help move Apple Silicon support from Tier 3 to Tier 1?

I have found an Alternative for testing M1 ( for the package developers )

Scaleway: Apple silicon M1 as-a-Service. Cloud Mac | Scaleway

  • minimum 1day cost = 2.4 EUR ( 8GB RAM, M1 )
    • Availability Zone = PARIS; ( EU )
    • macOS Monterey 12
    • €0.10/hour “As required by Apple License, you must keep this instance for at least 24 hours. You will be able to delete it only after 24 hours.”

For ad-hoc testing ( 1-2 days/month ) it is perfect for me.

it is also perfect for benchmarking special Julia code on AppleSilicon ( before investing in the hardware )

4 Likes

Doesn’t Apple have it’s own linear algebra libraries?

They do, but it does not seem to be a trivial job to get it to work. See

https://github.com/JuliaLang/julia/issues/42312

FYI:

Some good news is that Apple hardware is now Tier 2.

5 Likes

Doesn’t that predate M1’s? Googling around gives threads stating that Numpy supports Accelerate again now and it does give a performance boost to build against it.

Can you post some links?

I just Google’d “ numpy accelerate m1”. A bunch of threads pop up on Reddit and Stackoverflow going back to November about building Numpy successfully against Accelerate/vecLib. (Note, the ones I skimmed were about NumPy though, so perhaps it still won’t work for SciPy.) Here is one example: https://www.reddit.com/r/Python/comments/qog8x3/if_you_are_using_apples_m1_macs_compiling_numpy/

1 Like

What about using something like GitHub - JuliaLinearAlgebra/Octavian.jl: Multi-threaded BLAS-like library that provides pure Julia matrix multiplication for BLAS on m1 macs? Has anyone tried it?

We should be able to switch to calling Accelerate through libblastrampoline. A little bit of work needs to be done on building LAPACK in Yggdrasil, appropriately patched for ILP64 and all that.

The main reason is that Accelerate uses an ancient LAPACK, and we use functions from recent versions - so when we use LBT to switch to using BLAS from Accelerate, we don’t want to use its LAPACK, and instead provide our own.

One downside is that OpenBLAS does patch LAPACK to provide multi-threaded versions of common LAPACK functions, that one would lose in the configuration I describe above.

Eventually, we hope to have native Julia solutions like Octavian for clean, high performance multi-threaded linear algebra kernels.

8 Likes

This question may be above your ( and anyone else’s who does not work for Apple ) paygrade. I looked at the very limited documentation for Accelerate and saw no evidence that it supports Float16. Have you seen any such support? The reading I’ve done seems to say that Apple has not done anything to update Accelerate in many years.

The line in your post ``A little bit of work needs to be done’’ sounds encouraging for Float64 and Float32 work anyhow.

I’ve done a little bit of Float64/32 testing (lu …) on an M1 Macbook of OpenBLAS vs @Elrod’s ApppleAcclerate.jl package. It seems to be as fast as or faster than threaded OpenBLAS without explicit threading. Of course, nobody outside of Apple knows what Accelerate really does, so it may use threads (or not).

For gemm it looks that cpu (and gpu) cores are not used at all (see the perf monitor) and the cpu temperature stays super low (<50 degC without fan). The so-called AMX co-processor The Secret Apple M1 Coprocessor. Developer Dougall Johnson has through… | by Erik Engheim | The Startup | Medium is supposed to be in use…

One may try to saturate the use of the tensor unit with a constant AI load while evaluating Accelerate gemm to see if AMX is related to tensor unit.

I just registered LAPACK_jll with LAPACK 3.10 and the right 64_ suffixes for 64-bit systems. So, the path to making Accelerate as easy to use as MKL.jl is, is to have LBT point to it for BLAS and to LAPACK_jll for LAPACK.

We should do this in either AppleAccelerate.jl perhaps and revive it.

6 Likes