ANN: Knet 1.4.0: accelerating CuArrays

Interesting, I did not even know MLJ could be used this way. It seems a bit overkill for many purposes though and so I like the idea of having Knet expose compatibility modules with Flux (serendipitously, the last Knet refactoring seems to be allowing exactly that).

How is the comparison of Knet and Flux on CPU? I admit that we cannot run things on GPU, as our data are big and IO is killing us (read, we are not doing that right).

2 Likes

Here’s my 2019 post regarding this CuArray allocation issue: How often is it a problem? - #4 by Marc.Cox , there may have been improvement on this since then.

Does it make sense to extract these high level layers into something like NNlib (i.e. “neutral territory”), if only so that one doesn’t have to load both Flux and Knet to use them? Indeed, how in the loop are the most active NNlib maintainers about all this? Conversations like https://github.com/JuliaGPU/CUDA.jl/issues/343 seem to indicate there’s not a ton of coordination at the moment, but I don’t want to misrepresent things.

3 Likes

That quite a decent idea

1 Like

The ‘neutral territory’ is CUDA.jl, those Knet.jl-optimized kernels just need to be reimplemented in Julia (that’s where I think most of the performance difference comes from).

5 Likes

The CUDA.jl integration work is very exciting! I was primarily responding to @denizyuret’s comment above:

Presumably CUDA.jl wants to keep at the level of implementing NNlib APIs and not hosting Dense, Chain, Conv, etc? This seems to call for a NNlib for high-level layer definitions so that both Flux and Knet don’t have to reimplement the same few callable structs on top of base NNlib. I found the very promising Split off the CPU implementation · Issue #224 · FluxML/NNlib.jl · GitHub, but it seems to stop short of these higher-level APIs.