[ANN] FluxTraining.jl

I am happy to announce the release of FluxTraining.jl, a deep learning training package for Flux models.

It has an extensible callback system inspired by fastai’s and comes with features like metrics, hyperparameter scheduling, TensorBoard logging and model checkpointing. It also makes it easy to customize the training loop.

You can find the documentation on all those features here.

Let me know if you find it useful or something is unclear, otherwise, happy training!

22 Likes

This is what the TensorBoard integration looks like, by the way!

3 Likes

Can you use this also for Deep Reinforcement Learning? Because of the ongoing generation of training data. So at the beginning there isn’t any data and the data I want to train on changes every n seconds. Haven’t looked into the code, but probably I could find a relatively easy “workaround” for my usage :slight_smile:
Hopefully, because your package looks very promising!

Ohh - do you. think it’s possible to use a different logger instead of TensorBoard, like Weights&Biases?

Thank you in advance for your help and your whole work on this package! :slight_smile:

Hey Peter!

It’s definitely possible to use this with reinforcement learning. You can implement custom training logic by creating a new Phase and then implementing fitepochphase! and/or fitbatchphase! for it. The default implementation should be a good starting point. As you can see, it simply loops over the data iterator, but you can overwrite that. If you only want to change the epoch (i.e. data iteration) logic, you can make your phase be a subtype of AbstractTrainingPhase, that way it will use the regular fitbatchphase! definition. To make it work with the callbacks, you should also throw the necessary events as is done in the default implementation.

Then you would simply call fit!(learner, ReinforcementPhase()) (or what you called the phase).

Making a tutorial on this for the documentation is on my to-do list.


Adding Weights&Biases support should be even easier, as there is an interface specifically for creating new logger “backends” that can be used with the logging callbacks. The implementation of TensorBoardBackend should give enough info.

The implementation boils down to implementing log_to methods for the various types that can be logged. Since there is no native Julia client, the easiest way to connect this to W&B would be to use PyCall.jl to wrap the Python client.

Let me know if you try yourself on either of these, and feel free to ask for more information :slight_smile:

1 Like

Great premise. Because I’m selfish I’ll cue in some of the things I really rely on when I use Flux.

I very often end up writing my own optimizers and training loops, would this package be the place to make that less painful? (IE: importing things not exported in Flux, overloading, etc).

Would it be possible to put a “stop” button in? Or bake in some convenience functions for caching models as training progresses(at the users discretion of course)? VSCode lack an “interrupt” button right now, and it costs me a fair deal of time when I use things like Flux, Turing, etc, but maybe there’s a way to write a simple hook via tensorboard? Maybe that’s a bad hack for a specific user :P(me). Just some passing thoughts.

Writing custom training loops is possible, see what I wrote above. FluxTraining.jl works with any optimizer that works with Flux.jl if that is what you mean.

Regarding the “stop” button: I assume you’re using Alt+Enter to run things in VSCode, and that can’t be interrupted, but you can just paste the long-running code (e.g. fit!(learner, 10)) in the terminal manually and then it is possible to interrupt it using Ctrl+C. Would that work for you?

2 Likes

RE optimizers specifically, https://github.com/FluxML/Optimisers.jl is the WIP to create a nice, Flux-compatible interface.

Support for proper interruption in VS Code is also merged, so it might be in the insiders channel already?

1 Like

There’s always Flux.stop and Flux.skip

1 Like

The training pipeline of reinforcement learning is kind of special here.

You may take a look at ReinforcementLearningCore.jl. Also callbacks and stop_conditions.

I think the core idea behind is very close to FluxTraining.jl.

First of all, thank you for a nice package.

Is there any example how to use TensorBoard with FluxTrain.

traindata, valdata = splitobs((data_X, data_Y))
trainiter, valiter = DataLoader(traindata, 128), DataLoader(valdata, 256);

learner = Learner(model, (trainiter, valiter), optim, lossfn, ToGPU(), Metrics(accuracy))

N_epochs = 500

FluxTraining.fit!(learner, N_epochs)

Basically, I don’t know where are values of loss functions/metrics logged.

You can use the LogMetrics callback with a TensorBoard backend:

logcb = LogMetrics(TensorBoardBackend("tblogs"))
Learner(model, (trainiter, valiter), optim, lossfn, ToGPU(), Metrics(accuracy), logcb)

That way, they’ll be stored in the folder “tblogs”.

If you want to access the raw metrics themselves, they are stored in learner.cbstate.metricsstep and learner.cbstate.metricsepoch as dictionaries of MVHistorys (see GitHub - JuliaML/ValueHistories.jl: Utilities to efficiently track learning curves or other optimization information).
Hope that helps!

1 Like