ForwardDiff - Multiple gradient evaluations at once?


so I’m fairly new to Julia coming from Python.

I was playing around with the ForwardDiff package and stumbled upon the following ‘problem’:

A 2-D Gaussian from the Distributions package can take a matrix of size = (2, N) and outputs N probabilities.

Now I wanted to evaluate the gradient of multiple data points similar to PyTorch but apparently the gradient can only be evaluated for single data vectors and not data matrices.

Here is a MWE of my ‘problem’:

using Distributions
using Plots
using ForwardDiff

μ = [ 1. ; 2.]
Σ = [ 1. 0 ; 0 1]
dist = MvNormal(μ, Σ)

# samples = rand(dist, 2000)
# scatter(samples[1,:], samples[2,:]) for visualization purposes

matrix = [ 1 1 ; 2. 2]
vec = [ 1. ; 2.]

prob_vec = pdf(dist, vec) # evaluates to scalar value
prob_matrix = pdf(dist, matrix) # evaluate to array with two values like it should
grad_vec = ForwardDiff.gradient(x -> pdf(dist, x), vec) # evaluates to gradient with two 0's as it should bc of mean position
grad_matrix = ForwardDiff.gradient(x -> pdf(dist, x), matrix) # doesnt work

So is there any way to evaluate the gradient of multiple data points or do I really have to put in a for loop?
Or vectorize it with the dot operator although I haven’t figured out how to do that.

Thank you in advance for your time and effort! =)

PS: This was the error message
ERROR: MethodError: no method matching extract_gradient!(::Type{ForwardDiff.Tag{getfield(Main, Symbol("##31#32")),Float64}}, ::Array{Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(Main, Symbol("##31#32")),Float64},Float64,4},1},2}, ::Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(Main, Symbol("##31#32")),Float64},Float64,4},1})

As far as I can tell you are asking for the value of the pdf of a vector distribution at a matrix value, which doesn’t make sense. You can’t get a matrix value from a vector distribution.

So, yes you either have to write a loop or perhaps broadcast over eachcolumn(matrix) or something like that (or maybe it’s eachcol)

1 Like

Mathematically you’re absolutely right.

Yet I was hoping for some kind of batched operator for evaluating the gradient?

Thanks for your answer. =)

There is generally no need to “vectorize” code like this in Julia–if you want to do something multiple times, just use a loop. Loops are fast.

You can probably broadcast over eachcolumn as suggested above, but the only reason to do so would be if it makes your code easier to understand or if you can let broadcast fusion combine multiple operations. Otherwise, just use a loop.


I just found a solution (also it’s a bit clunky) by using using a array of vectors and the vectorized operator:

matrix = [ [ 1. ; 2.], [ 2. ; 2.]]
grad = ForwardDiff.gradient.(x -> pdf(dist, x), matrix)

So I know that for-loops are not as bad as in Python (they are actually horribly slow in Python) because of the JIT compiler.
In terms of performance, can I readily use for-loops in Julia and the JIT compiler is able to optimize them well?

Yes, although batching can still be useful in some cases eg when it sets up the memory layout more favourably.

Cool, thanks a lot folks

Loops in Julia are as fast as loops in C, C++, fortran, etc.

Moreover, broadcasting (including dot-calls like gradient.(x) are internally implemented as loops, just like all the “vectorized” operations in tools like Numpy.

1 Like

Those batched operations are really just for-loops under the hood. In python or Matlab they just call out to for-loops written in a fast language. Julia is such a fast language.