I ported an algorithm described in the book Grokking Deep Reinforcement Learning (which I recommend if you’re a beginner to RL) from PyTorch to Julia. This algorithm is described in chapter 8 of the book, and it is meant to be a very basic RL algorithm which future chapters build on. The algorithm is called “neural fitted Q-iteration”.
The PyTorch code can be found here: https://github.com/mimoralea/gdrl/blob/master/notebooks/chapter_08/chapter-08.ipynb
My Julia / Flux implementation can be found here: https://github.com/DevJac/gdrl-with-flux/blob/5611b4216ef941f2d89636e8783c56236d70da41/Ch8.jl
I’ve run that notebook on my computer and it solves the RL environment in about 3000 episodes or less, which takes about 5 minutes. The Julia version takes about an hour to run 3000 episodes, almost all of the time being spend on line 57: qs = q.network(all_s)
. Both implementations were run on the CPU. I do not have a GPU.
Again, profiling showed all the time being spend on line 57: qs = q.network(all_s)
; network
is a Flux model, 3 Dense
layers. This is the only forward pass through the model which tracks the gradients, which I use a few lines later to update the model parameters. Other forward passes through the model are much quicker, which is to be expected because they aren’t tracking the gradients.
I believe my algorithm is exactly the same as the linked PyTorch implementation. Unless there is some difference in similar library concepts, like for example, if the RMSProp optimizers are slightly different in PyTorch vs Flux? I’ve left out some logging and skipped a few “design patterns”, but every other detail is the same as best I can tell.
Any ideas why this code is so much slower in Julia / Flux?