I’m coming from Python and trying to port my Python code to Julia to gain more performance.
I want to compute a lot of log-likelihoods in a loop, so I need a fast function that computes the log-likelihood. The computation is essentially sum(log.(p * C))
, where C
is a constant matrix and p
is a row vector.
Here’s my Python code:
import numpy as np
def log_likelihood(p, C):
return np.log(p @ C).sum()
Nx = 5500
Np = 200
p = np.linspace(0, 1000, Np)
C = np.random.rand(Np, Nx)
print(f"Python with Nx={Nx}, Np={Np}")
%timeit log_likelihood(p, C)
And Julia code:
using BenchmarkTools
using LinearAlgebra
log_likelihood_types(p::Array{Float64, 2}, C::Array{Float64, 2})::Float64 = sum(log.(p * C))
log_likelihood_no_types(p, C) = sum(log.(p * C))
Nx = 5500
Np = 200
p = range(0, 1000, length=Np)' |> collect
C = rand(Np, Nx)
# Compile
log_likelihood_types(p, C)
log_likelihood_no_types(p, C)
println("Julia with Nx=$Nx, Np=$Np; BLAS threads: $(BLAS.get_num_threads())")
@btime log_likelihood_types($p, $C)
@btime log_likelihood_no_types($p, $C)
Test results
forcebru ~/t/julia_numpy_bench> ipython row_mul.ipy
Python with Nx=5500, Np=200
349 µs ± 1.45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
forcebru ~/t/julia_numpy_bench> julia-1.6 row_mul.jl
Julia with Nx=5500, Np=200; BLAS threads: 4
1.240 ms (4 allocations: 86.16 KiB)
1.238 ms (4 allocations: 86.16 KiB)
So, 349 µs with NumPy and 1238 µs with Julia, so Julia is 3.5 times (!) slower. Also see this benchmark (https://gist.github.com/ForceBru/2da87f80725d62eb8f62454d53e00118) which shows similar results.
For Nx == 500
I’m getting:
- Julia: 101.575 μs
- NumPy: 36.9 µs ± 152 ns
Here Julia is 2.7 times slower!
Versions
- julia version 1.6.0-beta1
np.__version__ == '1.19.5'
- Python 3.7.1
Questions
Am I doing something wrong? Am I maybe measuring something incorrectly? Why the 3x speed difference? Is there a way to “fix” it?