Gradient through categorical distribution slow and has many allocations

scottjordan · July 26, 2020, 1:21am

I am trying have performant gradients computations for a function the defines a conditional distribution over a discrete space. More specifically, I want the gradient of the following function:

import Flux
import Zygote
using Distributions
using BenchmarkTools

function logprob(W, x, a)
    probs = Flux.softmax(W' * x))
    d = Categorical(probs)
    return logpdf(d,a)
end

x = collect(Float64, 1:100)
w = zeros(Float64, (100,2))
@btime (w->Zygote.gradient(logprob(w,x,1),w)
# running stats: 150.143 μs (608 allocations: 29.55 KiB)

By removing the categorical distribution I tend to see about two orders of magnitude decrease in computation time.

function logprob2(W, x, a)
    probs = Flux.softmax(W' * x))
    return log(probs[a])
end
@btime (w->Zygote.gradient(logprob2(w,x,1),w)
# running stats: 1.031 μs (26 allocations: 4.53 KiB)

My questions are: what is causing the slow down and is it possible to speed up the process while still leveraging Distributions.jl functions?

Topic		Replies	Views
Is allocation inevitable when generating random numbers from a categorical distribution? General Usage question	9	511	July 18, 2023
Innocent looking optimization of the forward pass causes performance cliff in gradient calculation with Zygote.jl v0.4.7 Machine Learning performance	5	942	February 5, 2020
Zygote performances for simple function Performance	14	487	October 3, 2023
Product distribution allocates (a lot) New to Julia memory-allocation , distributions , random	9	169	March 11, 2025
Speeding up gradient of logpdf Machine Learning question , performance , autodiff	19	767	February 12, 2024

Gradient through categorical distribution slow and has many allocations

Related topics