Is there anything users can do to help move Apple Silicon support from Tier 3 to Tier 1?

We both have different perspectives on the problem. ( or different pain points )

  • He is a core developer with access to the AppleSilicon hardware
  • I am part of the ecosystem without AppleSilicon testing possibilities.

There is free/cheap X86_64 testing for the Julia ecosystem …
but I don’t know of a similar cheap AppleSilicon testing facility.

It seems unnecessary to provide free cloud computing time on very expensive hardware which is Tier 3 and on which most non trivial workloads currently seem to segfault as Mose says?

2 Likes

Setting aside the segfault issues, this seems like a catch-22:

  • “We don’t need CI for Apple Silicon because Apple Silicon is Tier 3.”
  • “Apple Silicon is Tier 3 because we don’t have CI.”

Well, I just wouldn’t agree with that point :slight_smile:

Would it be great to have free Apple Silicon CI available for the entire ecosystem? Sure.
Will we have it at some point? Maybe.
Is this the reason why Julia is Tier 3? Absolutely not.

The way I see it, the Tier levels have hardly anything to do with the (general) ecosystem. They make a statement about Julia itself.

1 Like

I think this discussion got a bit off track: When we are talking about CI on M1, and renting it in the cloud, or buying some etc. then I think this can only applies to making that CI hardware available to the CI of the Julia project itself, and thus in the end core devs.

I do not think the Julia project itself can afford to provide M1 CI instances to arbitrary packages. Do you expect them to create an alternative to GitHub Actions, Travis, AppVeyor, CircleCI, etc. ??? No, if you want arbitrary packages to have M1 CI, you’ll have to hope that some of those providers will eventually offer them.

Thus, when it comes to making M1 Tier 2 or even Tier 1, while of course having consistent CI for Julia itself is kinda essential for that, first the already known serious problems ought to be resolved; there is no point setting up a CI for something that will just crash all the time. Once these fix are in place, I am sure there’ll be a way to get M1 builders for the Julia CI. But providing this for the wider package ecosystem is not something the Julia team can take care of.

On the upside, waiting increases the chance that e.g. GitHub Actions will just make something like this available. However, AFAIK GitHub Actions are on Microsoft Azure and I have no idea when, if ever, they will provide M1 support. On the upside, AWS EC2 M1 Mac instances are in preview; once they are generally available, I am hopeful some CI providers using AWS will eventually offer M1 instances.

8 Likes

It’s not: to unblock the situation, already known issues have to be fixed. CI is really a minor problem at this point. Also, I had already pointed out a side approach to make more likely packages will work on the M1: test them on aarch64 Linux. It’s clearly not the same thing, but helps a lot for catching architecture-related issues, and I’ve personally caught some this way. It doesn’t help much with platform-specific issues (like the well known segfault).

4 Likes

I have found an Alternative for testing M1 ( for the package developers )

Scaleway: Apple silicon M1 as-a-Service. Cloud Mac | Scaleway

  • minimum 1day cost = 2.4 EUR ( 8GB RAM, M1 )
    • Availability Zone = PARIS; ( EU )
    • macOS Monterey 12
    • €0.10/hour “As required by Apple License, you must keep this instance for at least 24 hours. You will be able to delete it only after 24 hours.”

For ad-hoc testing ( 1-2 days/month ) it is perfect for me.

it is also perfect for benchmarking special Julia code on AppleSilicon ( before investing in the hardware )

4 Likes

Doesn’t Apple have it’s own linear algebra libraries?

They do, but it does not seem to be a trivial job to get it to work. See

https://github.com/JuliaLang/julia/issues/42312

FYI:

Some good news is that Apple hardware is now Tier 2.

5 Likes

Doesn’t that predate M1’s? Googling around gives threads stating that Numpy supports Accelerate again now and it does give a performance boost to build against it.

Can you post some links?

I just Google’d “ numpy accelerate m1”. A bunch of threads pop up on Reddit and Stackoverflow going back to November about building Numpy successfully against Accelerate/vecLib. (Note, the ones I skimmed were about NumPy though, so perhaps it still won’t work for SciPy.) Here is one example: https://www.reddit.com/r/Python/comments/qog8x3/if_you_are_using_apples_m1_macs_compiling_numpy/

1 Like

What about using something like GitHub - JuliaLinearAlgebra/Octavian.jl: Multi-threaded BLAS-like library that provides pure Julia matrix multiplication for BLAS on m1 macs? Has anyone tried it?

We should be able to switch to calling Accelerate through libblastrampoline. A little bit of work needs to be done on building LAPACK in Yggdrasil, appropriately patched for ILP64 and all that.

The main reason is that Accelerate uses an ancient LAPACK, and we use functions from recent versions - so when we use LBT to switch to using BLAS from Accelerate, we don’t want to use its LAPACK, and instead provide our own.

One downside is that OpenBLAS does patch LAPACK to provide multi-threaded versions of common LAPACK functions, that one would lose in the configuration I describe above.

Eventually, we hope to have native Julia solutions like Octavian for clean, high performance multi-threaded linear algebra kernels.

8 Likes

This question may be above your ( and anyone else’s who does not work for Apple ) paygrade. I looked at the very limited documentation for Accelerate and saw no evidence that it supports Float16. Have you seen any such support? The reading I’ve done seems to say that Apple has not done anything to update Accelerate in many years.

The line in your post ``A little bit of work needs to be done’’ sounds encouraging for Float64 and Float32 work anyhow.

I’ve done a little bit of Float64/32 testing (lu …) on an M1 Macbook of OpenBLAS vs @Elrod’s ApppleAcclerate.jl package. It seems to be as fast as or faster than threaded OpenBLAS without explicit threading. Of course, nobody outside of Apple knows what Accelerate really does, so it may use threads (or not).

For gemm it looks that cpu (and gpu) cores are not used at all (see the perf monitor) and the cpu temperature stays super low (<50 degC without fan). The so-called AMX co-processor The Secret Apple M1 Coprocessor. Developer Dougall Johnson has through… | by Erik Engheim | The Startup | Medium is supposed to be in use…

One may try to saturate the use of the tensor unit with a constant AI load while evaluating Accelerate gemm to see if AMX is related to tensor unit.

I just registered LAPACK_jll with LAPACK 3.10 and the right 64_ suffixes for 64-bit systems. So, the path to making Accelerate as easy to use as MKL.jl is, is to have LBT point to it for BLAS and to LAPACK_jll for LAPACK.

We should do this in either AppleAccelerate.jl perhaps and revive it.

6 Likes