Announcement: Knet-1.1.0 and AutoGrad-1.1.0 have been released with callable object support


#1

I added support for the Flux style of defining models/layers as callable
objects
.

struct Linear; w; b; end
(m::Linear)(x) = m.w * x .+ m.b

This way a model/layer acts as a (predict) function as well as a collection of parameters:

m = Linear(randn(10,784), zeros(10))  # define model
y = m(x)                              # gives the prediction
for p in (m.w, m.b)                   # iterate over parameters

For training, the parameters should be marked as AutoGrad.Param objects which makes it
possible to use the four interface functions @diff, grad, params, value:

m = Linear(Param(randn(10,784)), Param(zeros(10)))
for p in params(m)      # iterates over parameters
y = m(x)                # returns the same y value as above (test mode)
y = @diff m(x)          # gives an object with both value and grad info
value(y)                # gives the prediction value
grad(y, m.w)            # gives the gradient of value(y) wrt m.w

This interface is not mandatory, everything should be backwardly compatible and old Knet
code should continue to work (as long as it is upgraded to Julia 1.0). However the new
interface should allow people to easily define their layer/model collections and thus
address Knet issues #144, #147, #341.

For more in depth examples check out the new tutorial.


#2

This is great!
Thank you for your efforts.

Any chance of having updated benchmarks?


#3

Yes, it is in the todo list (but it is a long list :slight_smile:


#4

Long list indeed.
By the way, could you leave some example using the old style of KNet.jl?


#5

Are there plans to merge Flux and Knet into a single project at some point?


#6

Currently most of the Knet/examples use the old style. I was planning to slowly update them, I thought keeping both styles might be confusing to the newcomer. Do you see any advantage of the old style? I find keeping the predict function and the weights separate and trying to fit all parameters into one variable shortcomings of the old style.


#7

Short answer: yes, and we will call it Klux (just kidding :))

Long answer: As developers of related packages, we have given this some thought, and here is where I currently stand: On the one hand newcomers would appreciate the “one true way” to do deep learning in Julia. On the other hand having multiple packages drives innovation: see Zygote and Capstan for AD, CuArrays and CUDAnative for GPU etc. To do deep learning you need three things: (1) Automatic Differentiation, (2) GPU arrays, (3) a library with training utilities, predefined models and layers etc. One short term solution could be to design common interfaces, so a user can pick any AD package, any GPU array package, and run the same code. Model/Layer libraries can be implemented relying on these interfaces. It turns out this is not too hard for GPU arrays (see the AbstractArray interface) but not as trivial for AD (still debating it). When it comes to a Model / Layer library I think we will always have multiple options (look at Python). In summary, I think packages will be maintained as long as they contribute new ideas in performance, stability, coverage, APIs etc. these ideas will cross-polinate, and we will eventually see some convergence.


#8

Thanks for the reply, the future of the Julia ecosystem is probably in such a tightly integrated ecosystem, where the different parts can simply be swapped out. Julia is like Lego bricks in this aspect, which is really cool, it can be quite confusing for newcomers to find their way around and to choose the right parts though.


#9

I just tried this out, and it might be very useful for me, thanks.

I have a question about custom gradients. Suppose f acts on two vectors & gives a number, and I have a two functions which work out its gradients. Then I believe this is the syntax:

@primitive f(x1,x2),dy,y dy .* fgrad1(value(x1),value(x2)) dy .* fgrad2(value(x1),value(x2))

But if it would be easier / quicker to work out both gradients at once, fgrad12(x1,x2) = g1, g2, is there a way to give this function to @primitive?


#10

Unfortunately not in the current interface (unless you do some caching, memoization etc.). Each arg gets a different method for gradient wrt that arg.