I am really happy to announce my first Julia package today!
GradValley.jl
GradValley is a new lightweight package for Deep Learning written in 100% Julia. Currently, GradValley focuses on computer vision tasks. It runs on CPUs and Nvidia GPUs
GradValley offers a high level interface for flexible model building and training.
To get started, see Installation and Getting Started. After that, you could look at the Tutorials and Examples section in the documentation. Or directly start using a pre-trained model, for example a pre-trained ResNet.
GradValley is independent of other machine learning packages like Flux, Knet, NNlib or NNPACK (see dependencies).
Simple Example
This is the Getting Started paragraph from the Readme.md.
using GradValley
using GradValley.Layers # The "Layers" module provides all the building blocks for creating a model.
using GradValley.Optimization # The "Optimization" module provides different loss functions and optimizers.
# Definition of a LeNet-like model consisting of a feature extractor and a classifier
feature_extractor = SequentialContainer([ # a convolution layer with 1 in channel, 6 out channels, a 5*5 kernel and a relu activation
Conv(1, 6, (5, 5), activation_function="relu"),
# an average pooling layer with a 2*2 filter (when not specified, stride is automatically set to kernel size)
AvgPool((2, 2)),
Conv(6, 16, (5, 5), activation_function="relu"),
AvgPool((2, 2))])
flatten = Reshape((256, ))
classifier = SequentialContainer([ # a fully connected layer (also known as dense or linear) with 256 in features, 120 out features and a relu activation
Fc(256, 120, activation_function="relu"),
Fc(120, 84, activation_function="relu"),
Fc(84, 10),
# a softmax activation layer, the softmax will be calculated along the first dimension (the features dimension)
Softmax(dims=1)])
# The final model consists of three different submodules,
# which shows that a SequentialContainer can contain not only layers, but also other SequentialContainers
model = SequentialContainer([feature_extractor, flatten, classifier])
# feeding the network with some random data
# After a model is initialized, its parameters are Float32 arrays by default. The input to the model must always be of the same element type as its parameters!
# You can change the device (CPU/GPU) and element type of the model's parameters with the function module_to_eltype_device!
input = rand(Float32, 28, 28, 1, 32) # a batch of 32 images with one channel and a size of 28*28 pixels
prediction = model(input) # layers and containers are callable, alternatively, you can call the forward function directly: forward(model, input)
# choosing an optimizer for training
learning_rate = 0.05
optimizer = MSGD(model, learning_rate, momentum=0.5) # momentum stochastic gradient decent with a momentum of 0.5
# generating some random target data for a training step
target = rand(Float32, size(prediction)...) # remember to specify the correct element type here as well
# backpropagation
zero_gradients(model)
loss, derivative_loss = mse_loss(prediction, target) # mean squared error
backward(model, derivative_loss) # computing gradients
step!(optimizer) # making a optimization step with the calculated gradients and the optimizer
Why GradValley.jl
This is a little shorter version of the Why GradValley.jl paragraph from the documentation.
- Intuitive Model Building: Model building is normally done using Containers. With Containers, large models can be broken down into smaller components (e.g. ResNets in ResBlocks), which in turn can then be easily combined into one large model. See the ResNets example in the Tutorials and Examples section.
- Flexible: Containers behave like layers, so you can use containers in containers in containers… (arbitrary nesting allowed). GraphContainer’s automatic differentiation allows defining your own computational graph in a function, which then can be automatically differentiated during backward pass (using reverse mode AD, aka Backpropagation).
- Switching from Python to Julia: Model building is very similar to other frameworks and the behavior of the layers is strongly oriented towards PyTorch, e.g. the algorithm behind adaptive pooling.
- 100% Julia: Julia’s biggest advantage compared to Python is speed. This allows you to easily extend existing Julia packages yourself. Extending python packages is, at least if they use e.g. C code in the backend, much more difficult.
- Well documented: The documentation aims to provide detailed information about all of GradValley’s functionalities. For example, the documentation of each layer contains e.g. a description, an argument list, a mathematical definition and extensive examples.
GPU Support
For further information about using the GPU with GradValley, see, for example, the GPU Support section in the documentation.
Explanation of the name “GradValley”
When optimizing the weights of a machine learning model, an attempt is always made to find the best possible error minimum. The derivatives, i.e. the gradients, of the error function in relation to the weights are required for this. So the goal is to find the “valley” of the error using the gradients (“grad” stands for gradient). That’s why it’s called GradValley.
Why reinvent the wheel? Flux is already there!
GradValley is NOT intended to be a successor to existing Julia Deep Learning packages (such as Flux).
I mainly developed GradValley for fun, it can be seen as a new - but not necessarily better - option in the world of Julia Deep Learning packages.
I would be very happy if you give GradValley a try, build and train models with it or even contribute to it!
Questions, bugs, etc.
If you have any questions about this software package, please let me know. For example, use the discussion section of the GitHub repository. Or just here on Discourse.
If you find a bug, please open an issue. I work on GradValley only in my free time, so unfortunately I will not always be able to answer questions immediately.