Dot function

Going off topic:

I’m trying to do a reality check since the last epic discussion about “is numba good enough”…

My test result for this example shows Julia being 2-3x slower. Maybe the code wasn’t translated in a good way? (data set: UCI Machine Learning Repository: Wine Quality Data Set)

Python/Numba:

In [64]: @numba.jit
    ...: def logistic_regression3(Y, X, w, iterations):
    ...:     for i in range(iterations):
    ...:         w -= np.dot(((1.0 / (1.0 + np.exp(-Y * np.dot(X, w))) - 1.0) * Y), X)
    ...:     return w
    ...: 

In [65]: %timeit logistic_regression3(Y2, X2, w, 1000)
22.3 ms ± 464 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [66]: Y.shape
Out[66]: (1599,)

In [67]: X.shape
Out[67]: (1599, 2)

In [68]: w.shape
Out[68]: (2,)

Julia:

using CSV, DataFrames
df = CSV.read("/Users/tomkwong/Downloads/winequality-red.csv", 
        delim=";", types=Dict(6=>Float64, 7=>Float64))
X = convert(Array, df[[Symbol("fixed acidity"), Symbol("volatile acidity")]])
Y = df[:quality]
w = ones(size(X)[2])

function logistic_regression(Y, X, w, iterations)
    for i in 1:iterations
        w -= (((1.0 ./ (1.0 .+ e .^ (-Y .* (X * w))) - 1.0) .* Y)' * X)'
    end
    w
end

using BenchmarkTools
@benchmark logistic_regression($Y, $X, $w, 1000)

Julia results:

julia> @benchmark logistic_regression($Y, $X, $w, 1000)
BenchmarkTools.Trial: 
  memory estimate:  86.49 MiB
  allocs estimate:  9000
  --------------
  minimum time:     60.454 ms (9.32% GC)
  median time:      68.327 ms (14.48% GC)
  mean time:        68.327 ms (14.23% GC)
  maximum time:     80.246 ms (13.99% GC)
  --------------
  samples:          74
  evals/sample:     1