Bayesian logistic regression with Turing.jl

torfjelde · May 10, 2021, 10:53am

A couple of notes here:

Your map is actually only computing a single element:) That’s why it’s significantly faster. map and broadcast have different behavior when the arguments are of different shapes: map I believe falls back to the smallest common size, e.g. in this case it’s 1, so it will only call the function once, while broadcast will do what you want.
When using BenchmarkTools.jl, you need to use interpolation to ensure that you’re not seeing performance decrease because of scoping issues and whatnot.
If-statements within the function you’re broadcasting over really hampers performance. In the case of StatsFuns.normlogdf there are two versions of the method: 1) takes mean, std, and value, from which it computes the z-value and then finally the logpdf, 2) takes the z-value directly and computes logpdf. In (1) there are some if-statements while in (2) there are none. As a result, (2) will be muuuuch faster when broadcasted/vectorized.
Tullio.jl can’t derive the gradients automatically when the RHS include “complex” function calls, e.g. StatsFuns.normlogpdf. So in this particular case you probably shouldn’t use it.

All in all, your example should be something like:

using Tullio, CUDA

X = rand(960*950);
Y = rand(960*950);
sig = 1.0

f(X, Y, sig) = sum(StatsFuns.normlogpdf.(X, sig, Y))
g(X, Y, sig) = @tullio lp = StatsFuns.normlogpdf(X[i], sig, Y[i])

@benchmark $f($X, $Y, $sig)
# BenchmarkTools.Trial: 
#   memory estimate:  6.96 MiB
#   allocs estimate:  2
#   --------------
#   minimum time:     6.359 ms (0.00% GC)
#   median time:      6.662 ms (0.00% GC)
#   mean time:        6.847 ms (1.98% GC)
#   maximum time:     11.060 ms (23.53% GC)
#   --------------
#   samples:          730
#   evals/sample:     1

@benchmark $g($X, $Y, $sig)
# BenchmarkTools.Trial: 
#   memory estimate:  128 bytes
#   allocs estimate:  5
#   --------------
#   minimum time:     6.571 ms (0.00% GC)
#   median time:      6.799 ms (0.00% GC)
#   mean time:        6.821 ms (0.00% GC)
#   maximum time:     8.835 ms (0.00% GC)
#   --------------
#   samples:          732
#   evals/sample:     1


# Circumvent if-statements used in `StatsFuns.normlogpdf(::Number, ::Number, ::Number)`.
h(X, Y, sig) = sum(StatsFuns.normlogpdf.(Y - X / sig))
i(X, Y, sig) = @tullio (+) lp = StatsFuns.normlogpdf(Y[i] - X[i] / sig)

@benchmark $h($X, $Y, $sig)
# markTools.Trial: 
#   memory estimate:  20.87 MiB
#   allocs estimate:  6
#   --------------
#   minimum time:     2.947 ms (0.00% GC)
#   median time:      3.497 ms (0.00% GC)
#   mean time:        4.436 ms (12.28% GC)
#   maximum time:     13.297 ms (25.54% GC)
#   --------------
#   samples:          1123
#   evals/sample:     1

@benchmark $i($X, $Y, $sig)
# BenchmarkTools.Trial: 
#   memory estimate:  128 bytes
#   allocs estimate:  5
#   --------------
#   minimum time:     476.989 μs (0.00% GC)
#   median time:      556.951 μs (0.00% GC)
#   mean time:        562.737 μs (0.00% GC)
#   maximum time:     999.113 μs (0.00% GC)
#   --------------
#   samples:          8765
#   evals/sample:     1

But as mentioned above, you want gradients through it, so your best bet here is probably just to use h in the above example. EDIT: Actually, it might work very nicely with ForwardDiff.jl? I just know that for reverse-rules then you can’t use “complex” function calls on the RHS.

Topic		Replies	Views
Case study: Speeding up a logistic regression with RHS prior (Turing vs Numpyro) - any tricks I'm missing? Probabilistic Programming	38	3334	February 6, 2023
Speeding up MCMC for logistic regression with large datasets using Turing or some alternative Statistics question	2	694	March 12, 2020
Turing: help with slow model Probabilistic Programming	46	2647	February 13, 2021
Numerical errors in logit normal model using Turing.jl Probabilistic Programming question , turing	27	3756	November 9, 2019
Custom likelihoods in Turing.jl General Usage	15	3717	October 26, 2018

Bayesian logistic regression with Turing.jl

Related topics