JuliaPro 1.0.1.1 is available, but no MKL?

tomtom · October 11, 2018, 7:31am

how to ask them to build the MKL version?

Palli · October 19, 2018, 2:40pm

There was a table showing free and commercial version with the latter with MKL. Or I seem to remember that, and no longer find it.

There seems to be only one version without MKL (as of JuliaPro 1.0). I don’t know for sure why [EDIT: it may not be needed, at least with the latest OpenBLAS, see see my third post in this thread why and how to use it.]. Maybe it’s no longer considered needed with Julia/OpenBLAS now faster? Or possibly it’s just easy to get MKL separately? Before you needed a non-GPL version of Julia and now it’s close to it or already there. Possibly even if Julia is not yet GPL-free, possibly JuliaPro is (as it’s strictly speaking proprietary software built on open source)? And in either case, you’re always allowed to add propriatary, e.g. MKL, in your installation, even adding to GPL, then only you’re not allowed to distribute the whole later.

You can also download older versions of JuliaPro, possibly one of is MKL (i.e. still available online) if that works for you temporarily.

The discussion ends like (and I believe SuiteSparse is no longer a dependeny of Julia):

If MKL and OpenBLAS are ABI-compatible, then LD_PRELOAD should do the trick. If they are not ABI-compatible, the one can […] As MKL ships an FFTW-compatible interface, it sounds like SuiteSparse is your only mandatary GPL dependency interfering with MKL binary redistribution.
[…]

JuliaPro-0.6.2.2 – MKL (for Windows) - (762.17M)
JuliaPro-0.6.2.2 – MKL (for Linux) - (1.02G)
JuliaPro 0.6.2.2 - MKL (for Mac) - (2.47G)
JuliaPro-0.6.2.2 – MKL (for Linux) – ASC - (490.00B)

[…]
Sorry to bring up an old thread. I am curious as to what changed in order for Julia to be able to distribute both an MKL and non-MKL version?

[…]

Please discuss this on discourse.

I am locking this in order to not discuss this further here, and redirecting folks to discourse.

Possibly some of the googleble solutions are helpful (what applies to 0.7, should also to Julia 1.0, and also JuliaPro I think):

Elrod · October 19, 2018, 4:16pm

OpenBLAS is still incredibly slow compared to MKL on any processor with avx-512, so it’s not that it has improved (although OpenBLAS is actively adding kernels).

I have the impression however that MKL is no longer supported. Arpack.jl depends on OpenBLAS, and many packages depend on Arpack, including Distributions.jl and LightGraphs.jl if you use MKL. Both of these are supported by JuliaPro.

tomtom · October 23, 2018, 12:49pm

besides the lack of MKL build, I encountered problems calling Pkg.add() in JuliaPro 1.0.1.1 (something like “Authentication required”) …

it makes me going back to the standard Julia 1.0.1

Palli · November 6, 2018, 11:21am

I believe there’s a workaround, but since you get essentially the same that way (and no MKL either way it seems; you need to add) and by adding Juno, it seem like a plan:

https://juliacomputing.com/blog/2018/10/16/juliapro.html

The new JuliaPro releases (based on Julia 1.0) therefore do not bundle packages any more. The downloadable distributions contain only the compiler, the standard library, and the Juno IDE.

Even though the packages are not bundled, JuliaPro users still benefit from a curated set of packages. This is provided through the JuliaPro package registry hosted by Julia Computing. Incidentally, this registry is also used to provide the same supported packages on JuliaBox.

The JuliaPro registry contains a subset of packages from Julia’s General registry, but with an additional layer of testing and curation. The list of packages supported by the JuliaPro registry is displayed on the JuliaPro product page. Users can change to the General registry through a manual process.

Palli · November 6, 2018, 11:30am

OpenBLAS got AVX-512 support in latest August 2018 0.3.3 version.

It’s not bundled with latest stable Julia 1.0.1 but support was merged 8 days ago, (so I expect it in Julia 1.0.2 or at least it) should be included in:

https://julialang.org/downloads/nightlies.html

I also think you can use any OpenBLAS if you have it (and dynamically link), but I may be wrong regarding that or who easy it is (is OpenBLAS statically linked by default?).

Besides, since a long time ago:

http://www.tomshardware.co.uk/answers/id-3685153/threadripper-support-avx-512-perform-7900x.html

Looking at the Julia source code for AVX512 I found “HasAVX512” (or strictly in a patch for LLVM, i.e. not directly related to OpenBLAS, so I’m curious what the support is):

github.com

JuliaLang/julia/blob/e99204b52be23c49b5185d270b6697b097b16513/deps/patches/llvm-rL326967-aligned-load.patch

commit b398d8e1fa5a5a914957fa22d0a64db97f6c265e
Author: Craig Topper <craig.topper@intel.com>
Date:   Thu Mar 8 00:21:17 2018 +0000

    [X86] Fix some isel patterns that used aligned vector load instructions with unaligned predicates.
    
    These patterns weren't checking the alignment of the load, but were using the aligned instructions. This will cause a GP fault if the data isn't aligned.
    
    I believe these were introduced in r312450.
    
    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326967 91177308-0d34-0410-b5e6-96231b3b80d8

diff --git a/lib/Target/X86/X86InstrVecCompiler.td b/lib/Target/X86/X86InstrVecCompiler.td
index db3dfe56531..50c7763a2c3 100644
--- a/lib/Target/X86/X86InstrVecCompiler.td
+++ b/lib/Target/X86/X86InstrVecCompiler.td
@@ -261,10 +261,10 @@ let Predicates = [HasVLX] in {
 // will zero the upper bits.
 // TODO: Is there a safe way to detect whether the producing instruction
 // already zeroed the upper bits?

This file has been truncated. show original

Elrod · November 6, 2018, 6:15pm

Yes, you can build OpenBLAS as a system BLAS and link it. I described how to do it in the opening post, although I’ve since just let Julia’s build system handle all that.

The i9 7900X + MKL is about 4x faster for matrix multiplication than the Threadripper 1950X.
If you want number crunching power on the CPU and use optimized libraries / compile your own numerical code for the CPU (probably most folks using Julia), AVX512 is the way to go.

julia> versioninfo()
Julia Version 1.1.0-DEV.631
Commit 0fde275eff (2018-11-06 16:09 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Core(TM) i9-7900X CPU @ 3.30GHz
  WORD_SIZE: 64
  LIBM: libimf
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

julia> using BenchmarkTools, LinearAlgebra, StaticArrays

julia> W = @SMatrix randn(8,8);

julia> X = @SMatrix randn(8,8);

julia> @benchmark $W * $X
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     12.950 ns (0.00% GC)
  median time:      13.664 ns (0.00% GC)
  mean time:        13.603 ns (0.00% GC)
  maximum time:     46.453 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

julia> C, A, B = randn(5000,5000), randn(5000,5000), randn(5000,5000);

julia> @benchmark mul!($C, $A, $B)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     281.320 ms (0.00% GC)
  median time:      281.859 ms (0.00% GC)
  mean time:        282.021 ms (0.00% GC)
  maximum time:     284.623 ms (0.00% GC)
  --------------
  samples:          18
  evals/sample:     1

VS
This comparison is unfortunately unfair. I have 10 processes running at 100% on the Threadripper that I will not kill, and I’m likely to start more once they finish.

julia> versioninfo()
Julia Version 1.1.0-DEV.631
Commit 0fde275eff (2018-11-06 16:09 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Ryzen Threadripper 1950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, znver1)

julia> using BenchmarkTools, LinearAlgebra, StaticArrays

julia> W = @SMatrix randn(8,8);

julia> X = @SMatrix randn(8,8);

julia> @benchmark $W * $X
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     46.758 ns (0.00% GC)
  median time:      48.259 ns (0.00% GC)
  mean time:        48.567 ns (0.00% GC)
  maximum time:     89.399 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     988

julia> C, A, B = randn(5000,5000), randn(5000,5000), randn(5000,5000);

julia> @benchmark mul!($C, $A, $B)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.009 s (0.00% GC)
  median time:      2.039 s (0.00% GC)
  mean time:        2.062 s (0.00% GC)
  maximum time:     2.137 s (0.00% GC)
  --------------
  samples:          3
  evals/sample:     1

Unburdened, I think it is closer to 1.2 seconds. If I remember, I’ll update it the next time I’m not running other processes.

I say “probably most folks”, because

There’s increasing interest in Julia as a generic language.
While AVX-512 greatly increases your CPU’s throughput/$, it’s not on the same level as a GPU. A Vega 64 can multiply 5000x5000 (single precision) matrices in around 20ms, vs about 150 and 600 ms (single precision) for the 7900X and 1950X. If you can offload your vectorizable number crunching to your GPU…
Some code doesn’t actually like to optimize well. Even what should be highly vectorizable Stan models, seem similarly fast per core on both CPUs, rather than 2-4x faster on the 7900X. In a less than perfect world, most of your time is probably spent running poorly optimized / vectorized code. In that case, more cores is better.

One day I’ll learn how to write optimized code for the GPU! But not yet. Maybe after Julia 1.x starts supporting AMD graphics cards.

Topic		Replies	Views
ANN: The JuliaPro distribution by Julia Computing Community announcement , juliacomputing	75	20381	November 29, 2018
Intel Mkl Windows building Internals & Design windows , blas , mkl , build	33	5931	October 7, 2019
OpenBLAS is faster than Intel MKL on AMD Hardware (Ryzen) Performance blas , lapack	40	36480	June 19, 2020
Current OpenBLAS Versions (January 2022) do not support Intel gen 11 performantly? Performance linearalgebra	50	4618	April 7, 2022
LU factorization performance issue New to Julia linearalgebra	30	718	June 6, 2022

JuliaPro 1.0.1.1 is available, but no MKL?

Related topics