ANN: Knet 1.4.0: accelerating CuArrays

I just released Knet 1.4.0, a major refactoring of the code (lots of submodules) for future improvements without any current API effects (hopefully).

My first goal for this release was to fully support CuArrays without any performance penalty. With generous help from @maleadt this was mostly achieved:

  • This table gives Array/KnetArray/CuArray benchmarks for two dozen operators commonly used in deep learning (defined in the Knet.Ops20 module).
  • This list shows notebooks and examples tested and marks the decreasing number that still lag in performance with CuArrays.

To get this far, Tim tweaked some array code in CUDA.jl and I handled the rest by binding CuArray functions to Knet kernels in the Knet.CuArrays module. Hopefully by further tweaking of CUDA.jl and performance improvements using CUDNN by @gartangh the Knet kernels will eventually become unnecessary.

To ease installation any dependency on GPU libraries outside of CUDA.jl was eliminated, and libknet8.so which contains Knet kernels is automatically downloaded as an artifact. If you have a GPU driver and a functioning CUDA.jl, Knet should work out of the box (with no need for CUDA compiler/toolkit installation).

My second goal for this release was to lay the groundwork for supporting multiple operator/layer/model sets with backward compatibility. I want the user to easily switch between different versions of Knet vs NNlib vs Torch operators, ONNX vs Keras layers, and have a standard interface for loading, saving, training and running state of the art models like Yolo, BERT, ResNet, GPT etc. I have a vague idea about how to do this with submodules, but we’ll see how it goes. For now I split everything in Knet into these semi-independent submodules:

  • Knet.LibKnet8: library of hand-written cuda kernels.
  • Knet.KnetArrays: the KnetArray type and its Base function implementations.
  • Knet.CuArrays: performant versions of some CuArray Base functions.
  • Knet.AutoGrad_gpu: AutoGrad support for KnetArrays and CuArrays.
  • Knet.FileIO_gpu: FileIO functions for KnetArrays (CuArrays support needed)
  • Knet.Ops20: A sample operator set, about 25 operators out of which all Knet models are currently written: conv4, pool, batchnorm, dropout, relu, nll… This module provides documentation, generic implementations and gradients.
  • Knet.Ops20_gpu: KnetArray and CuArray implementations for Knet.Ops20.
  • Knet.Train20: Model training and data manipulation functions: minibatch, adam, etc.

The idea is for users to import specific operator/layer/model submodules for their application and for me not to break these applications by implementing new submodules instead of breaking existing ones. For now v1.4.0 is exporting the exact same functions as v1.3.9 so hopefully I won’t break any existing code. Starting with v2.0.0 I will require the user to import the specific submodules they are using.

55 Likes