ANN: Knet 1.4.0: accelerating CuArrays

denizyuret · August 19, 2020, 11:53am

I just released Knet 1.4.0, a major refactoring of the code (lots of submodules) for future improvements without any current API effects (hopefully).

My first goal for this release was to fully support CuArrays without any performance penalty. With generous help from @maleadt this was mostly achieved:

This table gives Array/KnetArray/CuArray benchmarks for two dozen operators commonly used in deep learning (defined in the Knet.Ops20 module).
This list shows notebooks and examples tested and marks the decreasing number that still lag in performance with CuArrays.

To get this far, Tim tweaked some array code in CUDA.jl and I handled the rest by binding CuArray functions to Knet kernels in the Knet.CuArrays module. Hopefully by further tweaking of CUDA.jl and performance improvements using CUDNN by @gartangh the Knet kernels will eventually become unnecessary.

To ease installation any dependency on GPU libraries outside of CUDA.jl was eliminated, and libknet8.so which contains Knet kernels is automatically downloaded as an artifact. If you have a GPU driver and a functioning CUDA.jl, Knet should work out of the box (with no need for CUDA compiler/toolkit installation).

My second goal for this release was to lay the groundwork for supporting multiple operator/layer/model sets with backward compatibility. I want the user to easily switch between different versions of Knet vs NNlib vs Torch operators, ONNX vs Keras layers, and have a standard interface for loading, saving, training and running state of the art models like Yolo, BERT, ResNet, GPT etc. I have a vague idea about how to do this with submodules, but we’ll see how it goes. For now I split everything in Knet into these semi-independent submodules:

Knet.LibKnet8: library of hand-written cuda kernels.
Knet.KnetArrays: the KnetArray type and its Base function implementations.
Knet.CuArrays: performant versions of some CuArray Base functions.
Knet.AutoGrad_gpu: AutoGrad support for KnetArrays and CuArrays.
Knet.FileIO_gpu: FileIO functions for KnetArrays (CuArrays support needed)
Knet.Ops20: A sample operator set, about 25 operators out of which all Knet models are currently written: conv4, pool, batchnorm, dropout, relu, nll… This module provides documentation, generic implementations and gradients.
Knet.Ops20_gpu: KnetArray and CuArray implementations for Knet.Ops20.
Knet.Train20: Model training and data manipulation functions: minibatch, adam, etc.

The idea is for users to import specific operator/layer/model submodules for their application and for me not to break these applications by implementing new submodules instead of breaking existing ones. For now v1.4.0 is exporting the exact same functions as v1.3.9 so hopefully I won’t break any existing code. Starting with v2.0.0 I will require the user to import the specific submodules they are using.

Topic		Replies	Views
Knet v0.9.0 supports windows, beats new benchmarks Machine Learning	0	1112	December 27, 2017
CuArray allocation issue: How often is it a problem? GPU knet , flux	5	1438	September 30, 2019
Knet.jl CNN Tutorial Speed Machine Learning performance , knet	3	1687	December 12, 2017
Reading from a KNetArray within a GPU kernel function GPU question , knet	0	495	March 9, 2020
Knet vs Flux etc Machine Learning	10	5632	November 9, 2018

ANN: Knet 1.4.0: accelerating CuArrays

Related topics