Going off topic:
I’m trying to do a reality check since the last epic discussion about “is numba good enough”…
My test result for this example shows Julia being 2-3x slower. Maybe the code wasn’t translated in a good way? (data set: UCI Machine Learning Repository: Wine Quality Data Set)
Python/Numba:
In [64]: @numba.jit
...: def logistic_regression3(Y, X, w, iterations):
...: for i in range(iterations):
...: w -= np.dot(((1.0 / (1.0 + np.exp(-Y * np.dot(X, w))) - 1.0) * Y), X)
...: return w
...:
In [65]: %timeit logistic_regression3(Y2, X2, w, 1000)
22.3 ms ± 464 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [66]: Y.shape
Out[66]: (1599,)
In [67]: X.shape
Out[67]: (1599, 2)
In [68]: w.shape
Out[68]: (2,)
Julia:
using CSV, DataFrames
df = CSV.read("/Users/tomkwong/Downloads/winequality-red.csv",
delim=";", types=Dict(6=>Float64, 7=>Float64))
X = convert(Array, df[[Symbol("fixed acidity"), Symbol("volatile acidity")]])
Y = df[:quality]
w = ones(size(X)[2])
function logistic_regression(Y, X, w, iterations)
for i in 1:iterations
w -= (((1.0 ./ (1.0 .+ e .^ (-Y .* (X * w))) - 1.0) .* Y)' * X)'
end
w
end
using BenchmarkTools
@benchmark logistic_regression($Y, $X, $w, 1000)
Julia results:
julia> @benchmark logistic_regression($Y, $X, $w, 1000)
BenchmarkTools.Trial:
memory estimate: 86.49 MiB
allocs estimate: 9000
--------------
minimum time: 60.454 ms (9.32% GC)
median time: 68.327 ms (14.48% GC)
mean time: 68.327 ms (14.23% GC)
maximum time: 80.246 ms (13.99% GC)
--------------
samples: 74
evals/sample: 1