[ANN] GradValley.jl: A new package for Deep Learning with Julia

I am really happy to announce my first Julia package today!

GradValley.jl

GradValley is a new lightweight package for Deep Learning written in 100% Julia. Currently, GradValley focuses on computer vision tasks. It runs on CPUs and Nvidia GPUs
GradValley offers a high level interface for flexible model building and training.

To get started, see Installation and Getting Started. After that, you could look at the Tutorials and Examples section in the documentation. Or directly start using a pre-trained model, for example a pre-trained ResNet.

GradValley is independent of other machine learning packages like Flux, Knet, NNlib or NNPACK (see dependencies).

Simple Example

This is the Getting Started paragraph from the Readme.md.

using GradValley
using GradValley.Layers # The "Layers" module provides all the building blocks for creating a model.
using GradValley.Optimization # The "Optimization" module provides different loss functions and optimizers.

# Definition of a LeNet-like model consisting of a feature extractor and a classifier
feature_extractor = SequentialContainer([ # a convolution layer with 1 in channel, 6 out channels, a 5*5 kernel and a relu activation
                                         Conv(1, 6, (5, 5), activation_function="relu"),
                                         # an average pooling layer with a 2*2 filter (when not specified, stride is automatically set to kernel size)
                                         AvgPool((2, 2)),
                                         Conv(6, 16, (5, 5), activation_function="relu"),
                                         AvgPool((2, 2))])
flatten = Reshape((256, ))
classifier = SequentialContainer([ # a fully connected layer (also known as dense or linear) with 256 in features, 120 out features and a relu activation
                                  Fc(256, 120, activation_function="relu"),
                                  Fc(120, 84, activation_function="relu"),
                                  Fc(84, 10),
                                  # a softmax activation layer, the softmax will be calculated along the first dimension (the features dimension)
                                  Softmax(dims=1)])
# The final model consists of three different submodules, 
# which shows that a SequentialContainer can contain not only layers, but also other SequentialContainers
model = SequentialContainer([feature_extractor, flatten, classifier])
                                  
# feeding the network with some random data
# After a model is initialized, its parameters are Float32 arrays by default. The input to the model must always be of the same element type as its parameters!
# You can change the device (CPU/GPU) and element type of the model's parameters with the function module_to_eltype_device!
input = rand(Float32, 28, 28, 1, 32) # a batch of 32 images with one channel and a size of 28*28 pixels
prediction = model(input) # layers and containers are callable, alternatively, you can call the forward function directly: forward(model, input)

# choosing an optimizer for training
learning_rate = 0.05
optimizer = MSGD(model, learning_rate, momentum=0.5) # momentum stochastic gradient decent with a momentum of 0.5

# generating some random target data for a training step
target = rand(Float32, size(prediction)...) # remember to specify the correct element type here as well
# backpropagation
zero_gradients(model)
loss, derivative_loss = mse_loss(prediction, target) # mean squared error
backward(model, derivative_loss) # computing gradients
step!(optimizer) # making a optimization step with the calculated gradients and the optimizer

Why GradValley.jl

This is a little shorter version of the Why GradValley.jl paragraph from the documentation.

  • Intuitive Model Building: Model building is normally done using Containers. With Containers, large models can be broken down into smaller components (e.g. ResNets in ResBlocks), which in turn can then be easily combined into one large model. See the ResNets example in the Tutorials and Examples section.
  • Flexible: Containers behave like layers, so you can use containers in containers in containers… (arbitrary nesting allowed). GraphContainer’s automatic differentiation allows defining your own computational graph in a function, which then can be automatically differentiated during backward pass (using reverse mode AD, aka Backpropagation).
  • Switching from Python to Julia: Model building is very similar to other frameworks and the behavior of the layers is strongly oriented towards PyTorch, e.g. the algorithm behind adaptive pooling.
  • 100% Julia: Julia’s biggest advantage compared to Python is speed. This allows you to easily extend existing Julia packages yourself. Extending python packages is, at least if they use e.g. C code in the backend, much more difficult.
  • Well documented: The documentation aims to provide detailed information about all of GradValley’s functionalities. For example, the documentation of each layer contains e.g. a description, an argument list, a mathematical definition and extensive examples.

GPU Support

For further information about using the GPU with GradValley, see, for example, the GPU Support section in the documentation.

Explanation of the name “GradValley”

When optimizing the weights of a machine learning model, an attempt is always made to find the best possible error minimum. The derivatives, i.e. the gradients, of the error function in relation to the weights are required for this. So the goal is to find the “valley” of the error using the gradients (“grad” stands for gradient). That’s why it’s called GradValley.

Why reinvent the wheel? Flux is already there!

GradValley is NOT intended to be a successor to existing Julia Deep Learning packages (such as Flux).
I mainly developed GradValley for fun, it can be seen as a new - but not necessarily better - option in the world of Julia Deep Learning packages.

I would be very happy if you give GradValley a try, build and train models with it or even contribute to it!

Questions, bugs, etc.

If you have any questions about this software package, please let me know. For example, use the discussion section of the GitHub repository. Or just here on Discourse.
If you find a bug, please open an issue. I work on GradValley only in my free time, so unfortunately I will not always be able to answer questions immediately.

38 Likes

@Jonas208 how do you compute the gradients in GradValley? I see no dependency to Zygote, Enzyme or Yota

1 Like

Hi @mariusd
You’re right, GradValley doesn’t depend on any particular frontend automatic differentiation (AD) package. Instead, GradValley has its own little, rudimentary function overload based automatic differentiation system based on ChainRules.jl.

For most neural networks, it’s enough to support the derivation of simple operations like addition (for residual connections) or concatenation (for Dense Blocks), for example. GradValley’s AD might not be fast for small problems, however, for neural networks with at least thousands of parameters, the majority of speed comes from the performance of the rrules implementing the computation of gradients of common Deep Learning operations (such as convolution or matrix multiplication). The performance of the AD that combines all these computationally intensive components is not that important in these cases. GradValley intentionally keeps his AD relatively simple and short. This way, it’s possible to have one dependency less.

You can access GradValley’s AD through the GradValley.Layers.GraphContainer.
This is a simple example from the documentation:

julia> using GradValley
julia> using GradValley.Layers
# a simple example with a polynomial, just to show that it is possible to use the GraphContainer like an automatic differentiation (AD) tool 
julia> f(layers, x) = 0.5x^3 - 2x^2 + 10
julia> df(x) = 1.5x^2 - 4x # checking the result of the AD with this manually written derivation 
julia> m = GraphContainer(f, [])
julia> y = forward(m, 3)
julia> dydx = backward(m, 1) # in this case, no loss function was used, so we have no gradient information, therefore we use 1 as the so-called seed
1-element Vector{Float64}:
 1.5
julia> manual_dydx = df(3)
1.5
julia> isapprox(dydx[1], manual_dydx)
true

More information about GradValley’s AD can be found in the documentation of the GradValley.Layers.GraphContainer. Or take a look at the source.

6 Likes

Wow! Super nice.
Thanks for sharing. Looking forward to testing it.

1 Like