I’m hoping someone can explain the performance degradation that I’m seeing with Eigen matrix multiplication (and other methods in the Eigen C++ library).
I have a simple example here:
The C++ file has a simple example of a function that multiplies two 3000x3000 Eigen matrices together (with the BLAS preprocessor directive EIGEN_USE_BLAS), which is compiled into a library libccall.so as well as an executable ccall_ex. The library surfaces the function (matmulcpp) for use by Python and Julia.
void matmulcpp() {
auto t0 = std::chrono::high_resolution_clock::now();
const int N = 3000;
Eigen::MatrixXd A = Eigen::MatrixXd::Random(N, N);
Eigen::MatrixXd B = Eigen::MatrixXd::Random(N, N);
Eigen::MatrixXd C = A * B;
auto t1 = std::chrono::high_resolution_clock::now();
double elapsed_ms = duration_cast<std::chrono::microseconds>(t1 - t0).count() / 1000.0;
std::cout << "Elapsed: " << elapsed_ms << " ms\n";
}
Running the C++ executable outputs the timings (e.g.):
> ./build/ccall_ex
Elapsed: 604.718 ms
Elapsed: 573.624 ms
Elapsed: 585.415 ms
Elapsed: 559.289 ms
Elapsed: 576.903 ms
I get similar performance calling the function from Python using ctypes (e.g.):
> python3 ccall.py
Elapsed: 588.029 ms
Elapsed: 600.39 ms
Elapsed: 580.682 ms
Elapsed: 615.011 ms
Elapsed: 585.849 ms
But in Julia, using ccall, I get significantly worse performance (e.g.):
> julia ccall.jl
Elapsed: 1437.66 ms
Elapsed: 1405.58 ms
Elapsed: 1389.82 ms
Elapsed: 1394.7 ms
Elapsed: 1395.29 ms
Can anyone explain why I’m seeing this performance degradation? Thanks for the help!