Slow LSTM on GPU in Flux

JLDC · July 15, 2022, 5:24am

Turns out I was comparing apple to oranges… the correct code for Flux should be:

function julia_nn(X, Y, epochs=100)
    lstm = Chain(
        LSTM(100 => 128),
        LSTM(128 => 128),
        Dense(128 => 1)
    ) |> gpu

    opt = ADAM()
    θ = Flux.params(lstm)
    for epoch ∈ 1:epochs
        Flux.reset!(lstm)
        ∇ = gradient(θ) do 
            [lstm(x) for x ∈ X[1:end-1]] # Warm up
            Flux.Losses.mse(lstm(X[end]), Y[end]) # MSE on last item only
        end
        Flux.update!(opt, θ, ∇)
    end
end

Since there I was doing the MSE on the full sequence instead of the last item only. Doing so brings the performance much closer to PyTorch with approximately 630 ms average speed.

Topic		Replies	Views
Training Flux LSTM on GPU is slower than on CPU Machine Learning question , flux , lstm	1	252	May 16, 2024
LSTM on a GPU Machine Learning	0	541	June 10, 2019
Flux on GPU too slow Machine Learning gpu , cuda , flux	5	1110	September 22, 2022
Flux running slow? Machine Learning	16	2733	August 19, 2021
Flux vs pytorch cpu performance Machine Learning first-steps , flux	59	9192	October 2, 2020

Slow LSTM on GPU in Flux

Related topics