Area under a ROC Curve after a logistic regression

I am trying to compute ROCAnalysis.jl package to compute the AUC value after a logistic regression. When I ran

auc(roc(dep, pred)),

where dep is the vector with original binary dependent variable and pred the predicted values from the logistic regression.

I get a value very different from the AUC value I get from Stata. I am wondering whether there is a good documentation on how to use these functions.

The function seems to expect certain kind of scores for targets and non-targets, not truths and predictions. You can find other implementations in different packages. (General caution for beginners such as myself: it’s crucial to pay attention to how mature the packages are, and the more rough and abandoned it looks, the more you need to understand the source code to trust it.)

I’ve just had a nice needless romp through the finer details of ML implementations in Lux.jl because of my misunderstanding of ROCAnalysis.jl… Turns out I should read about things before using them.

There are some important points about this implementation of the ROC curve:

  1. The ROC curve is computed over False Negatives and False Positives, instead of True Positives and False Positives. So, your curve will be upside-down and a lower AUC is better.
  2. It takes target and non-target arrays as input. So, if you had y_hat as a vector of logits (or probabilities) and y_true as the 1/0 labels, then you’d use:
target = y_hat[y_true .== 1]
nontarget = y_hat[y_true .== 0]
roc(target, nontarget)

If you reverse the arguments, you’ll get a ROC curve that looks normal, but mirrored. Probably not ideal.

Anyways, after I figured all that out, I gave up on more packages and just used this:

function rocauc(y_true, y_pred)
    sorted_indices = sortperm(y_pred, rev=true)
    sorted_true, sorted_pred = y_true[sorted_indices], y_pred[sorted_indices]
    total_positive  = Int(sum(y_true))
    total_negative = length(y_true) - total_positive

    tpr, fpr, thresholds = [], [], []
    tp, fp = 0, 0
    for i in 1:length(y_true)
        isone(sorted_true[i]) ? (tp += 1) : (fp += 1)
        push!(tpr, tp / total_positive)
        push!(fpr, fp / total_negative)
        push!(thresholds, sorted_pred[i])
    end

    auc = sum((fpr[i] - fpr[i-1]) * (tpr[i] + tpr[i-1]) / 2 for i in 2:length(fpr))
    
    auc, fpr, tpr
end

Which will give you the normal AUC with y_true as described above and y_pred as the associated probabilities. If anyone wants to use it, you can print AUC and plot the ROC curve with:

using Plots
auc, fpr, tpr = rocauc(y_true, y_pred)
println("AUC: ", auc)
plot(fpr, tpr, 
     label="ROC w/ AUC = $(round(auc, digits=4))", 
     xlabel="False Positive Rate", 
     ylabel="True Positive Rate",
     title="ROC Curve"
)

If you want to compute the AUC after a logistic regression, try LogisticROC. lroc() function will give you both the plot and AUC value. lroc() takes the return value from a GLM model.