Optimization Based on Intel MKL Matrix Multiplication Batch Mode

RoyiAvital · June 27, 2018, 6:15am

Intel MKL has a batch mode for Matrix Multiplication (See Introducing Batch GEMM Operations).

It seems it has great potential on speeding up small matrix multiplication operations in 2 ways (I can see, probably more):

Broadcasting
When multiplying 2 2D array with 3D array and broadcasting the Matrix Multiplication operation along the 3rd dimension.
Lazy Evaluation
When many small matrix operation are called in a loop and then they are accumulated and sent as a batch job for the batch mode in MKL.

Is this implemented?
If not, could it be added to 1.x (Of course not 1.0, but on its optimization phase once it is released)?

antoine-levitt · June 27, 2018, 7:13am

This seems to be basically a thin wrapper around threading. It should be supported by good support for easy threading in Julia rather than at the level of MKL.

RoyiAvital · June 27, 2018, 7:51am

@antoine-levitt,
Indeed if the whole magic is via Multi Threading it should be don in Julia level.
Though it might be less overhead to call one C function instead of multi calls, no?

Unless small matrices will be treated by JuliaBLAS and then engine which will Multi Threaded those operations as above will be the best choice.

antoine-levitt · June 27, 2018, 7:53am

I don’t think the overhead of calling C is significant. And if you’re doing very small matrices, you’re probably better off with StaticArrays anyway, which bypasses BLAS.

Elrod · June 27, 2018, 7:59am

TensorOperations.jl maps some tensor operations to BLAS calls already, so you can take advantage of MKL.
https://github.com/Jutho/TensorOperations.jl/blob/6bb64d5c83a890b2ac4e4eddb35af36d0bedf89b/src/implementation/stridedarray.jl

Topic		Replies	Views
Batched Matrix Multiply General Usage gpu , blas , linearalgebra , cuarrays	11	3711	January 31, 2025
Multiply many-matrices by many-vectors Performance matlab , parallel , multithreading , tensors	33	6739	December 14, 2018
v?Mul in MKL New to Julia performance	8	580	November 6, 2021
Optimizing Complex Batch Matrix Multiplication Performance question	2	422	October 25, 2023
Matrix vector multiplication Performance question	4	909	September 27, 2020

Optimization Based on Intel MKL Matrix Multiplication Batch Mode

Related topics