Hi all,

I’ve benchmarked a simple implementation of regression with a horseshoe prior over the coefficients in both numpyro and Turing. I found numpyro to over an order of magnitude faster, which I found quite surprising. Here’s the Turing implementation - would be grateful for feedback on anything obviously incorrect here:

```
using Distributions
using Turing
# using TuringBenchmarking
using Statistics
using LinearAlgebra
using Enzyme
using ReverseDiff
function get_data()
N = 100
P = 1000
D = 5
X = rand(Normal(0, 1), N, P)
W = zeros(P)
W[1:D] .= [0.2, -0.3, 0.35, -0.4, 0.45]
Y = X * W + rand(Normal(0, sqrt(1 - .615)), N)
return X, Y, W
end
@model function model(X; N::Int64, P::Int64)
λ ~ filldist(truncated(Cauchy(0.0, 1.0); lower=0), P)
τ ~ truncated(Cauchy(0.0, 1.0); lower=0)
W ~ MvNormal(zeros(P), I * λ.^2 * τ^2)
μ = X * W
σ ~ truncated(Cauchy(0.0, 1.0); lower=0)
Y ~ MvNormal(μ, I * σ^2)
end
function inference()
@time X, Y, W = get_data()
@time m = model(X; N=size(X, 1), P=size(X, 2)) | (; Y=Y)
# turing_suite = make_turing_suite(m)
# run(turing_suite)
# @time chain = sample(m, NUTS(0.85; adtype=Turing.Essential.AutoEnzyme()), MCMCThreads(), 500, 2)
@time chain = sample(m, NUTS(0.85; adtype=AutoReverseDiff(true)), MCMCThreads(), 500, 2)
return chain
end
# inference()
```

I’ve installing the Turing branch with Enzyme AD functionality, though not clear to me whether that is considered ready for use yet. The numpyro implementation assumes the exact same data generating process, etc, and I think the max depth in NUTS is the same.

Thanks!