ANN: XDiff.jl - an expression differentiation package

I was going to ask if XDiff will be compatible with some Cuda kernels. It would be really nice to have a common set of primitive operations that can be used by XDiff, Knet, or ReverseDiff, because than one can use these libraries interchangeably, which would be interesting I think. It is something that Mike Innes have envisioned in his JuliaCon talk.

Regarding the segmented operations, TensorFlow support unsorted_segment_sum, which is written in GPU and takes the total number of segments as an argument, which might makes things easier, and that the rest is written in python.

I have tried to find the implementation in tensorflow repository on github, but I have failed. It is mentioned here
/tensorflow/python/ops/math_ops.py
but I have not find a definition and I have to confess that I am not a TensorFlow guru.

In my library, I have implemented each operation (layer) to have a allocated buffer and after the end of the operation, I have returned view into this buffer. This was the way, how I have dealt with the efficiency.

I see 4 types of operations:

  1. Broadcasting. XDiff already fuses all such sequential operations into a single expression (even if they are not sequential in source code) and GPUArrays already can deal with it.
  2. Matrix multiplication. GPUArrays already use cuBLAS to handle it.
  3. Convolutions. cuDNN should cover it, though AFAIK there’s no actively developed wrapper for it.
  4. All other functions (like segmented_sum). I believe the most efficient way would be to implement such functions separately for CPU and GPU (CUDAnative should be helpful here).

I haven’t tested it yet, but XDiff’s generated code should work out of the box for (1) and (2) with GPUArrays, (3) should be easy to add given a good wrapper for cuDNN, and (4) depends on the primitives like segmented_sum, but should be doable in general.

As far as I understand, Knet uses AutoGrad.jl which is a port of Python’s AutoGrad, and both seem to support only CPU. I don’t know much about internals of these libraries, but I guess this is because of the wide range of operations that pure automatic differentiation can support - not all of these operations are easy (or even possible) to translate to GPU.

On the other hand, symbolic differentiation libraries like Theano, TensorFlow or XDiff produce code that is easy to translate to GPU, but lack some of the AD features (e.g. loops and conditions). So I’m not sure it’s possible to make these libraries completely interchangeable.

But what we definitely can do is to provide a set of common functions (e.g. relu, conv2d, etc.) with efficient CPU and GPU implementations that other packages can use.

But in your case bags isn’t a single number, right? Can you provide a definition of bags type?

I have tried it and I have failed.

May I ask you, what am I doing wring>

import XDiff
x,y = randn(11,1000),randn(1,1000)
θ = [randn(1,size(x,1)),rand()];
predict(θ,x) = θ[1] * x .+ θ[2];
loss(θ,x,y) = sum(abs2.(y - predict(θ,x))) / size(y,2);
df = XDiff.xdiff(loss; θ = θ, x = x, y = y )

gives me this error

ERROR: MethodError: no method matching parse!(::Espresso.ExGraph, ::GlobalRef)
Closest candidates are:
parse!(::Espresso.ExGraph, ::Espresso.ExH{:tuple}) at /Users/tpevny/.julia/v0.6/Espresso/src/exgraph.jl:214
parse!(::Espresso.ExGraph, ::Espresso.ExH{:'}) at /Users/tpevny/.julia/v0.6/Espresso/src/exgraph.jl:200
parse!(::Espresso.ExGraph, ::Espresso.ExH{:.}) at /Users/tpevny/.julia/v0.6/Espresso/src/exgraph.jl:188

Stacktrace:
[1] collect(::Base.Generator{Array{Any,1},Espresso.##80#81{Espresso.ExGraph}}) at ./array.jl:441
[2] parse!(::Espresso.ExGraph, ::Espresso.ExH{:call}) at /Users/tpevny/.julia/v0.6/Espresso/src/exgraph.jl:180
[3] parse!(::Espresso.ExGraph, ::Expr) at /Users/tpevny/.julia/v0.6/Espresso/src/exgraph.jl:146
[4] collect(::Base.Generator{Array{Any,1},Espresso.##80#81{Espresso.ExGraph}}) at ./array.jl:441
[5] parse!(::Espresso.ExGraph, ::Espresso.ExH{:call}) at /Users/tpevny/.julia/v0.6/Espresso/src/exgraph.jl:180
[6] parse!(::Espresso.ExGraph, ::Expr) at /Users/tpevny/.julia/v0.6/Espresso/src/exgraph.jl:146
[7] collect(::Base.Generator{Array{Any,1},Espresso.##80#81{Espresso.ExGraph}}) at ./array.jl:441
[8] parse!(::Espresso.ExGraph, ::Espresso.ExH{:call}) at /Users/tpevny/.julia/v0.6/Espresso/src/exgraph.jl:180
[9] parse!(::Espresso.ExGraph, ::Expr) at /Users/tpevny/.julia/v0.6/Espresso/src/exgraph.jl:146
[10] collect(::Base.Generator{Array{Any,1},Espresso.##84#85{Espresso.ExGraph}}) at ./array.jl:441
[11] parse!(::Espresso.ExGraph, ::Espresso.ExH{:body}) at /Users/tpevny/.julia/v0.6/Espresso/src/exgraph.jl:208
[12] #ExGraph#74(::Bool, ::Dict{Any,Any}, ::Array{Any,1}, ::Type{T} where T, ::Expr) at /Users/tpevny/.julia/v0.6/Espresso/src/exgraph.jl:26
[13] (::Core.#kw#Type)(::Array{Any,1}, ::Type{Espresso.ExGraph}, ::Expr) at ./:0
[14] #to_einstein#147(::Dict{Any,Any}, ::Array{Any,1}, ::Function, ::Expr) at /Users/tpevny/.julia/v0.6/Espresso/src/to_einstein.jl:21
[15] (::Espresso.#kw##to_einstein)(::Array{Any,1}, ::Espresso.#to_einstein, ::Expr) at ./:0
[16] #xdiff#53(::Dict{Any,Any}, ::Array{Any,1}, ::Function, ::Expr) at /Users/tpevny/.julia/v0.6/XDiff/src/rdiff.jl:319
[17] (::XDiff.#kw##xdiff)(::Array{Any,1}, ::XDiff.#xdiff, ::Expr) at ./:0
[18] #xdiff#58(::Dict{Any,Any}, ::Array{Any,1}, ::Function, ::Function) at /Users/tpevny/.julia/v0.6/XDiff/src/rdiff.jl:346
[19] (::XDiff.#kw##xdiff)(::Array{Any,1}, ::XDiff.#xdiff, ::Function) at ./:0

Thanks for help

I would write it as follows:

# funcs.jl
loss(W, b, x, y) = sum(abs2.(y - (W * x .+ b)))

# main.jl
using XDiff

include("funcs.jl")

x, y = randn(11,1000), randn(1,1000)
W = randn(1,size(x,1))
b = rand()
df = xdiff(loss; W=W, b=b, x=x, y=y)

result = df(W, b, x, y)
# or 
mem = Dict()  # buffers
result = df(W, b, x, y; mem=mem) 

Changes:

  1. I moved functions to a separate file (needed only if you run the code from REPL). xdiff tries to extract the source code of the function using Sugar.jl. Sugar.jl in turn first tries to find source file for a function and, if it fails, parses its AST. If the function is defined in REPL, it doesn’t have a source file, and the code recovered from AST is somewhat hard for XDiff to understand.Thus loading functions from file is preferable (should be partially fixed in future releases).
  2. “Inlined” predict into loss. For the same reason as above, I’ve switched off recursive parsing of nested functions for now.
  3. Split θ into W and b because XDiff doesn’t support indexing in vector notation. The primary reason is that separate arrays are much easier to process on GPU than indexing/slicing of a single array.
  4. Removed size - not sure it’s differentiable and it’s not really useful here assuming your are going to minimize the loss.

Is it a suitable replacement for you?

Yes, thanks a lot.
This clarifies a lot.

It’s now possible to define derivatives for “special” functions - the ones that can’t be expressed in indexing notation. Example:

# in test_funcs.jl

foo(x) = sum(x, 2)
foo_grad(dzdy, x) = repmat(dzdy, 1, size(x, 2))

# define derivative for foo(); allowed variable names are `x`, `x1`-`x4`, `y`
@specialdiff (y = foo(x)) (dz!dx = foo_grad(dz!dy, x))

function test_special(u, v)
    x = u .+ v
    y = foo(x)
    z = sum(1.0 .* y) # note: don't remove 1.0
end


# in main.jl or REPL

Pkg.checkout("Espresso")
Pkg.checkout("XDiff")

using XDiff

include("test_funcs.jl")

dtest_special = xdiff(test_special; u=rand(3,3), v=rand(3,3))
dtest_special(rand(3,3), rand(3,3))
# ==> (11.113605420231774, [1.0 1.0 1.0; 1.0 1.0 1.0; 1.0 1.0 1.0], [1.0 1.0 1.0; 1.0 1.0 1.0; 1.0 1.0 1.0]) 

I can’t test segmented_sum because I still don’t know what is the bags argument there, but it should be something like:

@specialdiff (y = segmented_sum(x1, x2)) (dz!dx1 = grad_segmented_sum(dz!dy, x1, x2))

There are still some issues, but as a proof-of-concept it certainly works. Special functions will get to XDiff v0.2.