LSTM on a GPU

tanhevg · June 10, 2019, 11:06am

Hello,

I am trying to run the Flux model zoo LSTM example on a GPU. Turns out there is no performance improvement; on the contrary, it gets worse. The example uses ~500mb of training data, and each epoch runs for ~25sec on CPU and ~55sec on GPU. I am using the CUDAnative workflow from Flux (not CuLSTM). A few questions:

Should I expect better GPU performance on a problem of this size?
I suspect one of the reasons of poor performance could be unnecessary copying of data between CPU and GPU. Are there tools to debug this?
I noticed there is another set of models in Flux that use the cudnn NVidia library, the CuLSTM model. Is it better to use those? Are there any examples?
Should I be using an entirely different ML package?

Thanks in advance.

Topic		Replies	Views
Improve performance in Flux.jl Performance performance , flux , lstm	0	84	April 10, 2025
Flux.jl: training fails at GPU but works on CPU Machine Learning gpu , flux	1	642	September 19, 2019
Training Flux LSTM on GPU is slower than on CPU Machine Learning question , flux , lstm	1	281	May 16, 2024
Slow LSTM on GPU in Flux Machine Learning gpu , flux , pytorch	21	2206	February 15, 2024
Flux RNN on a GPU - unnecessary copying Machine Learning	0	464	June 12, 2019

LSTM on a GPU

Related topics