v?Mul in MKL

andrew-saydjari · November 6, 2021, 5:06am

In trying to speed up .*, I was wondering if much progress had been made at making v?Mul accessible (maybe as part of MKL.jl). Mostly just wondering if it is worth upgrading to 1.7 to make the install work. Currently the fastest .* that I can see is just to do:

function vMul_test!(A,B,C)
    n = length(A)
    @simd for i = 1 : n
        @inbounds A[i] = B[i] * C[i]
    end
end

carstenbauer · November 6, 2021, 7:04am

Have you tried ccalling into MKL and benchmarked the performance of the MKL functions? I’d be curious.

I would assume that the MKL variants are multithreaded. So you should probably also use threads in your Julia implementation. Unfortunately, we don’t have multithreaded broadcasting (yet). Maybe there exist implementations in packages?

carstenbauer · November 6, 2021, 7:05am

Oh, and using MKL.jl with Julia 1.7 is a dream. Definitely worth the upgrade!

ranocha · November 6, 2021, 7:19am

FastBroadcast.jl has multithreaded broadcast for such a case.

N5N3 · November 6, 2021, 11:29am

You can access MKL’s v?Mul via IntelVectorMath.jl
Previous bench shows that the performance of v?Mul is equivalent to Base or slower. So the offical release didn’t add related routines.
You can add

def_binary_op(Float64, Float64, :multiply, :multiply!, :Mul, false)
def_binary_op(Float32, Float32, :multiply, :multiply!, :Mul, false)
def_binary_op(ComplexF64, ComplexF64, :multiply, :multiply!, :Mul, false)
def_binary_op(ComplexF32, ComplexF32, :multiply, :multiply!, :Mul, false)

to src\IntelVectorMath.jl, and call IVM.multiply(!) for usage.

andrew-saydjari · November 6, 2021, 5:17pm

Thanks for pointing out how to add the v?Mul to IntelVectorMath.jl (which is a really nice package!) and providing context as to why it was not added to the official release. I will check out some of the other multithreading suggestions above for an attempt at a speed up.

andrew-saydjari · November 6, 2021, 5:51pm

For context timings on my machine with some of the easiest multithreading solutions.

using Einsum, BenchmarkTools, LoopVectorization

A = rand(1000,1000)
B = rand(1000,1000)
C = rand(1000,1000);

function f1!(A,B,C)
    @einsum A[i,j] = B[i,j] * C[i,j]
end

function f2!(A,B,C)
    n = length(A)
    @simd for i = 1 : n
        @inbounds A[i] = B[i] * C[i]
    end
end

function f3!(A,B,C)
    n = length(A)
    @avxt for i = 1 : n
        @inbounds A[i] = B[i] * C[i]
    end
end

function f4!(A,B,C)
    @vielsum A[i,j] = B[i,j] * C[i,j]
end

On a single thread I find ~ 1 ms from all of the methods with f2! being optimal (since it has no overhead asking about if there are more threads. When using 2 threads, f3! seems optimal, timings below.

julia> @btime f1!($A,$B,$C);
  1.174 ms (0 allocations: 0 bytes)

julia> @btime f2!($A,$B,$C);
  1.029 ms (0 allocations: 0 bytes)

julia> @btime f3!($A,$B,$C);
  457.723 μs (0 allocations: 0 bytes)

julia> @btime f4!($A,$B,$C);
  567.861 μs (11 allocations: 1.55 KiB)

DNF · November 6, 2021, 6:19pm

I wouldn’t expect MKL to be faster than Julia on plain broadcasted multiplication. Is there any reason to?

Are you using an old version of LoopVectorization.jl? Or are the @avx/@avxt still available along with the new @turbo/@tturbo?

andrew-saydjari · November 6, 2021, 6:26pm

Whoops, I was looking at old docs, but using a current version (v0.12.66), It appears both are still available, but thanks for the comment. I will switch to @tturbo.

And, I guess I don’t have a good enough intuition to know whether or not to expect the MKL solution would be faster which is why I wanted to just do the experiment.

Topic		Replies	Views
MKL slower than openblas in intel cpu Performance mkl , linearalgebra	7	3456	March 2, 2022
[ANN] IntelVectorMath.jl (revived VML.jl) Package Announcements	7	1039	February 4, 2020
Help me understand multi-threaded scaling for matrix multiplication Performance question	22	602	April 16, 2024
Matlab's matmul much faster than julia's New to Julia	6	699	April 5, 2024
Is MKL performance on AMD no longer crippled? General Usage	4	1786	May 11, 2024

v?Mul in MKL

Related topics