I’m coming from Python and trying to port my Python code to Julia to gain more performance.

I want to compute a lot of log-likelihoods in a loop, so I need a fast function that computes the log-likelihood. The computation is essentially `sum(log.(p * C))`

, where `C`

is a constant matrix and `p`

is a row vector.

Here’s my Python code:

```
import numpy as np
def log_likelihood(p, C):
return np.log(p @ C).sum()
Nx = 5500
Np = 200
p = np.linspace(0, 1000, Np)
C = np.random.rand(Np, Nx)
print(f"Python with Nx={Nx}, Np={Np}")
%timeit log_likelihood(p, C)
```

And Julia code:

```
using BenchmarkTools
using LinearAlgebra
log_likelihood_types(p::Array{Float64, 2}, C::Array{Float64, 2})::Float64 = sum(log.(p * C))
log_likelihood_no_types(p, C) = sum(log.(p * C))
Nx = 5500
Np = 200
p = range(0, 1000, length=Np)' |> collect
C = rand(Np, Nx)
# Compile
log_likelihood_types(p, C)
log_likelihood_no_types(p, C)
println("Julia with Nx=$Nx, Np=$Np; BLAS threads: $(BLAS.get_num_threads())")
@btime log_likelihood_types($p, $C)
@btime log_likelihood_no_types($p, $C)
```

## Test results

```
forcebru ~/t/julia_numpy_bench> ipython row_mul.ipy
Python with Nx=5500, Np=200
349 µs ± 1.45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
forcebru ~/t/julia_numpy_bench> julia-1.6 row_mul.jl
Julia with Nx=5500, Np=200; BLAS threads: 4
1.240 ms (4 allocations: 86.16 KiB)
1.238 ms (4 allocations: 86.16 KiB)
```

So, 349 µs with NumPy and 1238 µs with Julia, so Julia is 3.5 times (!) slower. Also see this benchmark (https://gist.github.com/ForceBru/2da87f80725d62eb8f62454d53e00118) which shows similar results.

For `Nx == 500`

I’m getting:

- Julia: 101.575 μs
- NumPy: 36.9 µs ± 152 ns

Here Julia is 2.7 times slower!

## Versions

- julia version 1.6.0-beta1
`np.__version__ == '1.19.5'`

- Python 3.7.1

## Questions

Am I doing something wrong? Am I maybe measuring something incorrectly? Why the 3x speed difference? Is there a way to “fix” it?