Inverses aren’t excluded, because they’re based on BLAS/LAPACK (just as matmuls).
Yes, IIRC the max number of BLAS threads is set to 32 by Julia. This has changed in 1.8 where you can get, I believe, up to 512 BLAS threads. See BLAS performance testing for Julia 1.8