Announcement: Knet-1.1.0 and AutoGrad-1.1.0 have been released with callable object support

denizyuret · September 13, 2018, 2:58pm

I added support for the Flux style of defining models/layers as callable
objects.

struct Linear; w; b; end
(m::Linear)(x) = m.w * x .+ m.b

This way a model/layer acts as a (predict) function as well as a collection of parameters:

m = Linear(randn(10,784), zeros(10))  # define model
y = m(x)                              # gives the prediction
for p in (m.w, m.b)                   # iterate over parameters

For training, the parameters should be marked as AutoGrad.Param objects which makes it
possible to use the four interface functions @diff, grad, params, value:

m = Linear(Param(randn(10,784)), Param(zeros(10)))
for p in params(m)      # iterates over parameters
y = m(x)                # returns the same y value as above (test mode)
y = @diff m(x)          # gives an object with both value and grad info
value(y)                # gives the prediction value
grad(y, m.w)            # gives the gradient of value(y) wrt m.w

This interface is not mandatory, everything should be backwardly compatible and old Knet
code should continue to work (as long as it is upgraded to Julia 1.0). However the new
interface should allow people to easily define their layer/model collections and thus
address Knet issues #144, #147, #341.

For more in depth examples check out the new tutorial.

RoyiAvital · September 13, 2018, 6:11pm

This is great!
Thank you for your efforts.

Any chance of having updated benchmarks?

denizyuret · September 13, 2018, 8:56pm

Yes, it is in the todo list (but it is a long list

RoyiAvital · September 14, 2018, 7:17am

Long list indeed.
By the way, could you leave some example using the old style of KNet.jl?

gdkrmr · September 14, 2018, 7:18am

Are there plans to merge Flux and Knet into a single project at some point?

denizyuret · September 14, 2018, 11:42am

Currently most of the Knet/examples use the old style. I was planning to slowly update them, I thought keeping both styles might be confusing to the newcomer. Do you see any advantage of the old style? I find keeping the predict function and the weights separate and trying to fit all parameters into one variable shortcomings of the old style.

denizyuret · September 14, 2018, 1:09pm

Short answer: yes, and we will call it Klux (just kidding :))

Long answer: As developers of related packages, we have given this some thought, and here is where I currently stand: On the one hand newcomers would appreciate the “one true way” to do deep learning in Julia. On the other hand having multiple packages drives innovation: see Zygote and Capstan for AD, CuArrays and CUDAnative for GPU etc. To do deep learning you need three things: (1) Automatic Differentiation, (2) GPU arrays, (3) a library with training utilities, predefined models and layers etc. One short term solution could be to design common interfaces, so a user can pick any AD package, any GPU array package, and run the same code. Model/Layer libraries can be implemented relying on these interfaces. It turns out this is not too hard for GPU arrays (see the AbstractArray interface) but not as trivial for AD (still debating it). When it comes to a Model / Layer library I think we will always have multiple options (look at Python). In summary, I think packages will be maintained as long as they contribute new ideas in performance, stability, coverage, APIs etc. these ideas will cross-polinate, and we will eventually see some convergence.

gdkrmr · September 14, 2018, 2:59pm

Thanks for the reply, the future of the Julia ecosystem is probably in such a tightly integrated ecosystem, where the different parts can simply be swapped out. Julia is like Lego bricks in this aspect, which is really cool, it can be quite confusing for newcomers to find their way around and to choose the right parts though.

improbable22 · September 14, 2018, 7:01pm

I just tried this out, and it might be very useful for me, thanks.

I have a question about custom gradients. Suppose f acts on two vectors & gives a number, and I have a two functions which work out its gradients. Then I believe this is the syntax:

@primitive f(x1,x2),dy,y dy .* fgrad1(value(x1),value(x2)) dy .* fgrad2(value(x1),value(x2))

But if it would be easier / quicker to work out both gradients at once, fgrad12(x1,x2) = g1, g2, is there a way to give this function to @primitive?

denizyuret · September 14, 2018, 7:50pm

Unfortunately not in the current interface (unless you do some caching, memoization etc.). Each arg gets a different method for gradient wrt that arg.

Topic		Replies	Views
[ANN] Flux v0.10 Machine Learning	36	5288	February 4, 2020
Flux: Machine Learning with Julia Machine Learning package , announcement	8	7836	March 3, 2017
Flux 3, now with 100% more Julia! Machine Learning flux	18	7399	December 18, 2017
Flux.jl: params() and gradient() ocnfusion Machine Learning	4	619	August 23, 2021
Function callback in function converge, what's the meaning?. And how to use it General Usage question	0	935	November 12, 2018

Announcement: Knet-1.1.0 and AutoGrad-1.1.0 have been released with callable object support

Related topics