Significantly Higher VRAM Usage and Slower Training on Flux Compared to PyTorch

JoshuaBillson · May 18, 2026, 5:01am

Also see here, I’m working on a timm port for Lux.jl: [ANN] Jimm.jl: Lux ports of timm image backbones, with HuggingFace pretrained weights

That’s almost exactly what I’m working on (my working name was even Jimm). I’ll take a look and see if I can contribute. So far, I’ve implemented all variants of Timm’s VisionTransformer, ConvNeXt (both v1 and v2), and Eva (basically ViT with rotary positional embeddings used by SAM3). I also have implementations for Swin, PVT, and Twins, but I didn’t get around to adding pre-trained weights yet. It should be relatively straightforward to convert from Flux to Lux.

Topic		Replies	Views
Slow LSTM on GPU in Flux Machine Learning gpu , flux , pytorch	21	2447	February 15, 2024
Flux ready for a beginner deep learning project? Machine Learning flux	31	9042	June 20, 2019
Deep learning in Julia Machine Learning	35	13349	April 22, 2024
Flux running slow? Machine Learning	16	2987	August 19, 2021
Is it a good time for a PyTorch developer to move to Julia? If so, Flux? Knet? Machine Learning	52	25950	January 11, 2021

Significantly Higher VRAM Usage and Slower Training on Flux Compared to PyTorch

Related topics