Gluing symbolic manipulations from Calculus.jl to write a julia code for a gradient and a Jacobian



Lately I have had some deep procrastinating thoughts coming from putting some crazy function into a minimizer. The issue is that I would prefer not to write out an analytic Jacobian as that introduces a lot of human errors. In julia ecosystem I found a JuliaDiff.jl which seems cool but I don’t feel quite confident of using something like that. I would rather like to generate a functions for a gradient and a Jacobian from existing code. As Mathematica can do that I thought that julia is also capable of that with Calculus.jl package. But I can not find a macros which could glue Calculus.jl together with an actual code.

The first ingredient for which I am looking for is how one can transform already defined function to an expression when symbolic arguments are passed. For example in Mathematica one could do something like:

    f[x_,y_,z_] := x^2 + y + 2 z
    f[1,2,3] (* this is a function *)
    f[x,y,z] (* this is equivalent to an expression *)

In julia I would expect to be able to get the same behavior with some “@turntoexpr” macro as follows:

    f(x,y,z) = x^2 + y + 2 z
    f(1,2,3) # an evaluated function
    @turntoexpr f(:x,:y,:z) # generates an expression

Does anyone already have such code which could simulate this behavior?

When I would have gotten expression for f (let’s name it fexpr) I would use it to generate an gradient expression:

    ∇fexpr = [differentiate(fexpr,si) for si in [:x,:y,:z]]

and similarly for a Jacobian. And now I actually want to define a functions using these expressions. For example a macros which would work as follows would be great:

    @turntofunc ∇f(:x,:y,:z) = ∇fexpr
    @turntofunc! ∇f!(:x,:y,:z) = ∇fexpr ### Which would create a function ∇f!(storage,x,y,z)

Had anyone already written such macros? Are there a design shortcomings why that could not work or would be inconvenient for a real problems? How could one proceed to implement such macros? I am dedicated to learn about julia macros, but I have fragmented view of how to develop them. What resources could be useful for learning about macro programming in julia and especially relating to this problem? Feedback appreciated.


Before you go too far down the metaprogramming rabbit hole, are you sure that forward-mode automatic differentiation isn’t good enough for your case? ForwardDiff.jl is amazingly easy to use and should be very efficient for functions with less than 100 input dimensions:

julia> f(x) = x[1]^2 + x[2] + x[3]
f (generic function with 1 method)

julia> using ForwardDiff

julia> ForwardDiff.gradient(f, [1.0, 2.0, 3.0])
3-element Array{Float64,1}:


ForwardDiff is too slow for me and I will have some order of 100 parameters. For 27 parameters I have a gradient function which is about 20 times faster (and I see huge speed benefits I could get putting in a bit more work) than ForwardDiff.jl. Also I don’t feel confident enough of using it. I have a large factorials in my minimizing function. Are there possible cases when calculated gradient with ForwardDiff.jl is not the one you would get by patiently putting in the derivatives yourself?

PS. Macro rabbit hole sounds fun :slight_smile:


Sorry to ask, but have you tried? I am using ForwardDiff with 100-200 variables and it is doing fine. Make sure you read the manual on chunk size, preallocation, etc.

Yes. 1% of those are bugs. 99% of those are me making an algebra mistake.


Gradients are easy:


using XGrad

xdiff(:(x^2 + (y + 2z)); x=1.0, y=1.0, z=1.0)
# quote
#     tmp671 = 2
#     tmp673 = 2
#     tmp679 = 1
#     tmp680 = tmp671 - tmp679
#     tmp681 = x ^ tmp680
#     tmp672 = x ^ tmp671
#     tmp674 = tmp673 * z
#     dtmp676!dy = 1.0
#     tmp675 = y + tmp674
#     tmp676 = tmp672 + tmp675
#     dtmp676!dtmp674 = 1.0
#     dtmp676!dz = tmp673 * dtmp676!dtmp674
#     dtmp676!dtmp672 = 1.0
#     dtmp676!dx = tmp671 * tmp681 * dtmp676!dtmp672
#     tmp683 = (tmp676, dtmp676!dx, dtmp676!dy, dtmp676!dz)
# end

where x=1.0, y=1.0, z=1.0 are “example values” - we need them to distinguish between scalars and tensors since diff rules are very different for them.

The resulting expression is not very well optimized (e.g. many repeating constants) because XGrad is primarily designed to work with tensors, but you can either simplify expression manually or open an issue so that I added simplifications for this case too.

You can also generate derivative function directly:

f(x,y,z) = x^2 + (y + 2z)
df = xdiff(f; x=1.0, y=1.0, z=1.0)
df(1, 2, 3)
# (9, 2.0, 1.0, 2.0)

Although in practice instead of xdiff I usually use xgrad which compiles derivative dynamically and caches it (though, be aware of resulting overhead - negligible for tensors but may be large for simple scalar expressions like this).

There’s also optional mem=Dict() argument that may be used to keep intermediate buffers and make nearly zero allocating code, but again it’s useless for scalars.

Jacobians are harder. Basically, Jacobians are useful when you have a function y = f(x) where both x and y are tensors, e.g.:

y = 2x

If x (and thus y) is a vector, Jacobian is a diagonal matrix:

dy[i] / dx[j] = 2 if i == j
dy[i] / dx[j] = 0 otherwise

vectorized code for this case is possible, but is hard to make efficient. However, if x is a matrix, dy/dx is a 4D tensor that simply can’t be expressed symbolically (note, you can tell that some variable U is a 4D tensor, but you can’t express it via 1D and 2D input tensors).

So you either have to drop symbolic manipulations and go with something like ReverseDiffSparse.jl, or use element-wise notation. For example, I used to use Einstein notation for this purpose in XDiff.jl, and I believe it’s useful as an alternative to manual inference, but it’s really hard to generate efficient code from it.

There is a few more tricks for Jacobians, but I need to know more about your use case to propose them.


I normally cannot get FirwardDiff to get within a factor 30 of hand-coded gradients, when I have O(100) variables, even after sitting down with jrevels to optimise. Same with ReverseDiff.jl.


Ok. Seems I made an error in benchmarking. The ForwardDiff is only 3 times slower than my hand written version. Here is the code

const N = 10 ### Quick and dirty

import Base.factorial
factorial(x::Int) = x>=0 ? gamma(x+1) : 0.

B(α,β) = mod(α,2)==0 && mod(β,2)==0 ? 2*beta((α+1)/2,(β+1)/2) : 0.

Dx2(n_,m_,n,m) = n*(n-1)*factorial(n_+n-2+m_+m+1)*B(n_+n-2,m_+m) + factorial(n_+n+m_+m+1)/4*B(n_+n+2,m_+m) - (2*n+1)/2*factorial(n_+n+m_+m)*B(n_+n,m_+m) + factorial(n_+n+m_+m)/2*B(n_+n+2,m_+m)

Dy2(n_,m_,n,m) = m*(m-1)*factorial(n_+n-2+m_+m+1)*B(n_+n,m_+m-2) + factorial(n_+n+m_+m+1)/4*B(n_+n,m_+m+2) - (2*m+1)/2*factorial(n_+n+m_+m)*B(n_+n,m_+m) + factorial(n_+n+m_+m)/2*B(n_+n,m_+m+2)

Columb(n_,m_,n,m) = factorial(n_+n+m_+m)*B(n_+n,m_+m)

### This is the function I want to minimize with respect to P,Γ,λ
function F(P,Γ,λ,mx,my)
    N = size(P,2)
    s = 0.
    for n_ in 0:N-1
        for m_ in 0:N-1
            for n in 0:N-1
                for m in 0:N-1
                    s += -λ*P[n_+1,m_+1]*P[n+1,m+1]*factorial(n_+n+m_+m+1)*B(n_+n,m_+m)*Γ^2
                    s += -P[n_+1,m_+1]*P[n+1,m+1]*(Dx2(n_,m_,n,m)/2/mx + Dy2(n_,m_,n,m)/2/my + Columb(n_,m_,n,m)*Γ)
    return s + λ

function DFλ(P,Γ,λ,mx,my)
    N = size(P,2)
    s = 0.
    for n_ in 0:N-1
        for m_ in 0:N-1
            for n in 0:N-1
                for m in 0:N-1
                    s += -P[n_+1,m_+1]*P[n+1,m+1]*factorial(n_+n+m_+m+1)*B(n_+n,m_+m)*Γ^2
    return s + 1

function DFΓ(P,Γ,λ,mx,my)
    N = size(P,2)
    s = 0.
    for n_ in 0:N-1
        for m_ in 0:N-1
            for n in 0:N-1
                for m in 0:N-1
                    s += -λ*P[n_+1,m_+1]*P[n+1,m+1]*factorial(n_+n+m_+m+1)*B(n_+n,m_+m)*2*Γ
                    s += -P[n_+1,m_+1]*P[n+1,m+1]*Columb(n_,m_,n,m)
    return s

function DFPnm(P,n,m,Γ,λ,mx,my)
    N = size(P,2)
    s = 0.
    for n_ in 0:N-1
        for m_ in 0:N-1
            s += - λ * P[n_+1,m_+1]*factorial(n_+n+m_+m+1)*B(n_+n,m_+m)*Γ^2
            s += -P[n_+1,m_+1]*(Dx2(n_,m_,n,m)/2/mx + Dy2(n_,m_,n,m)/2/my + Columb(n_,m_,n,m)*Γ)
    return 2*s ### because there are two terms with the same indicies

function ∇F!(storage,x,mx,my)
    P,Γ,λ = reshape(x[3:end],N,N),x[1],x[2]
    storage[1] = DFΓ(P,Γ,λ,mx,my)
    storage[2] = DFλ(P,Γ,λ,mx,my)
    for n in 0:N-1
        for m in 0:N-1
            ### Need to think about this one
            storage[2+n+m*N+1] = DFPnm(P,n,m,Γ,λ,mx,my)

∇f_(x,mx,my) = (a=zeros(x);∇F!(a,x,mx,my);a)

using ForwardDiff
f(x,mx,my) = F(reshape(x[3:end],N,N),x[1],x[2],mx,my)
∇f(x,mx,my) = ForwardDiff.gradient(x->f(x,mx,my),x)

########### Testing ######### 

P = zeros(N,N)
P[1,1] = 1.2
P[2,1] = 0.01
x0 = [1/4,-4,P[:]...]

@time ∇f_(x0,1,1);
@time ∇f(x0,1,1);


I usually find that I can speed things up considerably by calling ForwardDiff.GradientConfig once, then reusing it. YMMV.

Also, are you compiling those functions before timing them? I generally try to use


I generally execute whole file twice in the REPL. So that should be fine. In the solver (nlsolve from NLsolve) it is only 4 times slower than my handcoded version which seems reasonable. But I still want to provide a Hessian (Jacobian of gradient) for the solver as sometimes it does not converge.

Calculation of Hessian takes a lot of time with ForwardDiff.hesian where I think hand coded version would run as fast as calculation of gradient since loops go away. My guess is that putting in analytic derivatives is a way one needs to go for getting reasonable performance in my case. It is just very hard to type.


Sorry, I missed the fact that you were calculating Hessians. Certainly, if you have a convenient closed-form expression that can exploit some structure, you may do better than AD. But generally, apart from simple cases, I find manually coding these things so costly that I am willing to put up with a factor of 5–10 just to automate it. If coding the Hessians works for your use case, you can still use AD for unit testing.


You can build something like ParameterizedFunctions.jl

That uses SymEngine.jl in a macro with a DSL and spits out Jacobians, inverted Jacobian, etc. functions for differential equations. It would be pretty easy to make that split out a gradient function as well using the same techniques. Just check out the internal code.


Yup, for unit testing ForwardDiff.jl is perfect and manually coding gradients, gradients sucks. I actually gave up julia today and started to code in Mathematica for this particular problem as these things comes for free. Nevertheless I will keep an eye on how can one use metaprograming to solve issues like these in julia :wink:


There’s probably a way to do this as a macro, but I can’t seem to get it to work. So, here’s a function that use eval instead.

julia> using SymEngine

julia> function generate_jacobian(expr, var_names...)
        l = length(var_names)
        v = []
        ex = Basic(expr)
        for var_name in var_names
           push!(v, SymEngine.walk_expression(diff(ex, var_name)))
        func = Expr(:function, Expr(:call, gensym(), map(Symbol,var_names)...), Expr(:tuple, v...))
        return eval(func)
generate_jacobian (generic function with 1 method)

julia> jac = generate_jacobian("x^2 - y^2 + 2 * x * y", "x", "y")
##707 (generic function with 1 method)

julia> jac(1, 2)
(6, -2)


Here’s the macro version

julia> macro generate_jacobian(expr, var_names...)
           l = length(var_names)
           v = []
           ex = Basic(expr)
           for var_name in var_names
              push!(v, SymEngine.walk_expression(diff(ex, var_name)))
           func = Expr(:function, Expr(:call, gensym(), map(Symbol,var_names)...), Expr(:tuple, v...))
           return Expr(:escape, func)
@generate_jacobian (macro with 1 method)

julia> jac = @generate_jacobian x^2-y^2 x y
##735 (generic function with 1 method)

julia> jac(1, 2)
(2, -4)


Not to put too fine a point on it, but your implementation of f is just not very fast (it allocates! The horror!). This will in turn make forward diff slow as hell.

After tracking down the spurious allocations, I found them in the special functions, where beta calls lgamma_r, which in turn creates a Ref for a ccall to libm. This allocation is optimized out on 0.7.

Sidenote: Language feature I’d like to see: A StackRef type for ccalls, that is guaranteed to never allocate; instead throw during codegen if it cannot be made to happen (UnsafeStackRef: create undefined behaviour if a reference escapes). The existing code on 0.61 for beta is almost an anti-feature, i.e. an evil trap for unsuspecting users, and should imho refuse to compile (so that automated tests catch all regressions on this). If you need fast beta-functions, you currently have to wrap libm yourself. Rant over.

The allocations of beta, however, dont matter. Your coefficients (factorial/beta/Dx/Coulomb/etc) want to be pre-computed. Your current code gives

@btime F(P, 0.25, -4.0, 1, 1)
 15.792 ms (75001 allocations: 1.14 MiB)

and almost all of it is spent evaluating your constant coefficients.

Unfortunately, annotating @Base.pure to your coefficient functions and passing N as Val{N}() appear to be not enough to get this removed, so you will need to build your coefficient table by hand.


Thank you everyone for your suggestions. Although I switched to Mathematica for a particular problem I had, I knew that with meta-programming I could get desired result of not having to write derivatives myself. Thus I learned some meta-programming and got desired result in 50 lines of code, but not with the same design I had started my question.

As stated by @foobar_lv2, for optimal performance I actually need to precalculate the coefficients and that is why I need to start my code from writing down expression of my function. When that is done I need to make manipulations of my expressions to define my functions. One way is to use macros like @isuruf had suggested with macros but that was not simple enough. I wanted something more general.

Instead of writing my own macro as @turntoexpr I found that generated functions with @generated macro is a way to go. I just needed to preprocess expression to take derivatives, convert to a single expression, substitute variables for making desired function signature. Also I needed to patch Calculus.jl for not using eval for simplify operations which was quite easy. The result is as simple as one could do that in Mathematica :slight_smile: :

# Go here to see my workaround for caculus package:

using Calculus
using MacroTools: postwalk

Formula = Union{Expr,Symbol,Number}

function substitute{T<:Formula}(expr::T,pre,post)
    d = Dict([i for i in zip(pre,post)])
    postwalk(x -> x isa Symbol && Base.isidentifier(x) && haskey(d,x) ? d[x] : x, expr)
substitute{T<:Formula}(expr::Array{T},pre,post) = map(x->substitute(x,pre,post),expr)

import Base.convert
convert{T<:Formula}(::Type{Expr},arrexpr::T) = arrexpr ### Would it work if it would be an integer?
convert{T<:Formula}(::Type{Expr},arrexpr::Array{T}) = Expr(:call,:reshape,Expr(:vect,arrexpr...),:($(size(arrexpr))))

∇{T<:Formula}(expr::T,xs::Array{Symbol,1}) = Formula[differentiate(expr,xi) for xi in xs]
function ∇{T<:Formula}(expr::Array{T},xs::Array{Symbol,1})
    exprarr = Formula[]
    for expri in expr

### Now it is easy to define functions with @generated macro

expr = :((x^2 + y^2 + z^4 + y*z^2 + y*sin(x))*λ^8)
xs = [:x,:y,:z]

@generated f(x,λ) = substitute(convert(Expr,expr),xs,[:(x[$i]) for i in 1:length(xs)])
@generated ∇f(x,λ) = substitute(convert(Expr,∇(expr,xs)),xs,[:(x[$i]) for i in 1:length(xs)])
@generated ∇∇f(x,λ) = substitute(convert(Expr,∇(∇(expr,xs),xs)),xs,[:(x[$i]) for i in 1:length(xs)])
@generated ∇∇∇f(x,λ) = substitute(convert(Expr,∇(∇(∇(expr,xs),xs),xs)),xs,[:(x[$i]) for i in 1:length(xs)])

### Testing

using ForwardDiff

x0 = [1,2,3]
λ0 = 1

@assert ∇f(x0,λ0)==ForwardDiff.gradient(x->f(x,λ0),x0)
@assert ∇∇f(x0,λ0)==ForwardDiff.hessian(x->f(x,λ0),x0)