Dot function

There are other things I don’t recognize about the algorithm. Logistic regression is usually applied to a binary response coded as 0/1 but the values of Y are integers between 3 and 8

julia> extrema(Y)
(3, 8)

The algorithm came straight from section 1.9.3 of the following web page. I didn’t really try to understand what it does :slight_smile: 1.9. Automatic parallelization with @jit — Numba 0.36.1-py2.7-macosx-10.6-x86_64.egg documentation

Funny thing is - I didn’t see any speedup when parallel is set to true for Numba.

BTW, I also picked the test data set randomly from the UCI archive.

It is not stated explicitly but I am pretty sure that the algorithm will only be meaningful if Y is binary and coded as 0/1. Even then I don’t quite understand the algorithm and it doesn’t seem to converge to estimates that I would recognize.

Consider the artificial data available as http://ucanalytics.com/blogs/wp-content/uploads/2017/09/Data-L-Reg-Gradient-Descent.csv A simple model fit using a Formula and DataFrame could be performed as

using BenchmarkTools, CSV, DataFrames, GLM

dat = CSV.read("/home/bates/Downloads/Data-L-Reg-Gradient-Descent.csv")

glmmodel = fit(GeneralizedLinearModel, @formula(y ~ 1 + x1 + x2), dat, Bernoulli(), verbose=true)

yielding

1: 357.0332597753779, 0.02273100323606281
2: 356.7380859889383, 0.000827424371079909
3: 356.7375363233818, 1.5408122233397183e-6
4: 356.73753632128995, 5.863800696862342e-12
StatsModels.DataFrameRegressionModel{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Bernoulli{Float64},GLM.LogitLink},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

Formula: y ~ 1 + x1 + x2

Coefficients:
             Estimate Std.Error  z value Pr(>|z|)
(Intercept)  -15.4233   1.59076 -9.69553   <1e-21
x1           0.108959 0.0151936  7.17136   <1e-12
x2           0.109673 0.0117273  9.35196   <1e-20

The IRLS algorithm converges in 4 iterations and is reasonably fast

julia> @btime(fit(GeneralizedLinearModel, @formula(y ~ 1 + x1 + x2), dat, Bernoulli()));
    425.407 μs (705 allocations: 97.69 KiB)

Of course, most of the time is taken up in forming the model matrix and the various structures for the model. If we generate those values ahead of time we have

X = hcat(ones(size(dat, 1)), Array(dat[[:x1, :x2]]));
Y = convert(Vector{eltype(X)}, dat[:y]);
julia> @btime fit(GeneralizedLinearModel, X, Y, Bernoulli())
  179.251 μs (56 allocations: 37.39 KiB)

The result is the same.

I’m not sure what the calculations described at 1.9. Automatic parallelization with @jit — Numba 0.36.1-py2.7-macosx-10.6-x86_64.egg documentation do, I thought they might be a gradient descent algorithm but, if so, there are many hidden assumptions. In the form shown here, they converge quickly but to

julia>  logistic_regression(y, X, zeros(size(X, 2)), 10)
3-element Array{Float64,1}:
  100.0
 7449.62
 7815.87

Does anyone recognize what that calculation may be?

I went back to http://ucanalytics.com/blogs/gradient-descent-logistic-regression-simplified-step-step-visual-guide/ to refresh my memory on gradient descent for logistic regression.

I still don’t recognize the formula in the Numba docs. Generally there is a step factor or “learning rate” and a normalization for the number of observations incorporated into the update of the parameter vector. Also, for stability, it is best to center and scale the numeric covariates. I give some code below on a way to define the “cost” for a logistic model and the gradient descent using the “learning rate”, α.

A statistician like me would describe the iterative process as

  1. Evaluate the linear predictor, η = X * w
  2. Evaluate the mean, μ = logistic.(η)
  3. Evaluate the negative residual, r = μ - y
  4. The gradient for the cost is X'r ./ length(r)

The expression in the Numba documentation seems to have a superfluous multiplication by y in the exponential and subtracts 1 then multiplies by y instead of just subtracting Y from μ.

This version of logistic_regression converges to the same parameter estimates as the functions in GLM but much more slowly. After 1000 iterations it still needs more refinement. In the page reference above they use 50,000 iterations. The IRLS algorithm converges in 4 iterations.

using BenchmarkTools, CSV, DataFrames, GLM, StatsBase

dat = CSV.read("/home/bates/Downloads/Data-L-Reg-Gradient-Descent.csv")

glmmodel = fit(GeneralizedLinearModel, @formula(y ~ 1 + x1 + x2), dat, Bernoulli(), verbose=true)
@btime(fit(GeneralizedLinearModel, @formula(y ~ 1 + x1 + x2), dat, Bernoulli()));

const X = hcat(ones(size(dat, 1)), Array(dat[[:x1, :x2]]));
const Y = convert(Vector{eltype(X)}, dat[:y]);
fit(GeneralizedLinearModel, X, Y, Bernoulli())
@btime fit(GeneralizedLinearModel, X, Y, Bernoulli());

const Xz = hcat(ones(size(dat, 1)), zscore(dat[:x1]), zscore(dat[:x2]));

println(coef(fit(GeneralizedLinearModel, Xz, Y, Bernoulli(), verbose=true)))
@btime fit(GeneralizedLinearModel, Xz, Y, Bernoulli());

@inline logistic(x::Real) = inv(one(x) + exp(-x))

function logistic_regression(Y::Vector{T}, X::Matrix{T}, α = 0.1, iterations = 1000) where {T<:AbstractFloat}
    w = zeros(T, size(X, 2))
    r = similar(Y)
    stepfactor = -α / length(Y)
    for i in 1:iterations
        broadcast!(-, r, broadcast!(logistic, r, A_mul_B!(r, X, w)), Y)
        BLAS.gemv!('T', stepfactor, X, r, one(T), w)
    end
    w
end

println(logistic_regression(Y, Xz))
@btime logistic_regression(Y,Xz);

@inline isone(x) = x == one(x)

function cost(μ::Vector{T}, y::Vector{T}) where {T <: AbstractFloat}
    (n = length(μ)) == length(y) || throw(ArgumentError("μ and y are not the same length"))
    s = zero(T)
    for i in 1:n
        yi = y[i]
        s += iszero(yi) ? log1p(-μ[i]) : (isone(yi) ? log(μ[i]) : throw(ArgumentError("y must contain only zeros and ones")))
    end
    -s / n
end

For me the results of running the script are

1: 357.033259775378, 0.02273100323606233
2: 356.73808598893817, 0.0008274243710803873
3: 356.73753632338185, 1.5408122228616909e-6
4: 356.73753632128995, 5.863960039272583e-12
  465.975 μs (704 allocations: 97.67 KiB)
  208.097 μs (55 allocations: 37.38 KiB)
1: 357.0332597753777, 0.022731003236056296
2: 356.73808598893817, 0.0008274243710795906
3: 356.73753632338185, 1.5408122228616909e-6
4: 356.73753632128995, 5.863960039272583e-12
[-0.0314084, 1.12834, 1.72304]
  208.669 μs (55 allocations: 37.38 KiB)
[-0.0312968, 1.12711, 1.72119]
  6.317 ms (2 allocations: 3.36 KiB)

using Julia v0.6.2 with MKL BLAS.

1 Like

There’s multi-nomial logistic regression and ordinal logistic regression. Maybe it’s doing one of those?

I’m so late for the party, and probably no one cares now that Julia 1.0 is out, but anyway…

I’m surprised that no one mentioned datashader (though maybe you already noticed it by now). It uses Numba. The reason why you didn’t find it in PyPI was probably that, IIRC, they wanted it to be installed via conda since making the dependency correct for Numba was (still is?) hard back then. Well, as you pointed out, you can use it as the evidence of Julia doing a better job on binary distribution (via BinaryProvider.jl).

1 Like