[ANN] Luximm.jl: Lux ports of timm image backbones, with HuggingFace pretrained weights

I’m happy to share a first public look at Luximm.jl, a Julia package that ports image-classification backbones from Ross Wightman’s timm (PyTorch Image Models) to Lux.jl. Pretrained weights load directly from the HuggingFace Hub in .safetensors format, sharing the same on-disk cache timm and huggingface_hub use.

Repo: GitHub - csvance/Luximm.jl: Lux.jl Image Models · GitHub (see the README for install, quickstart, the full variant table, and the porting workflow; this post is just the elevator pitch.)

Why this exists

The motivation was concrete: I needed Julia’s SciML ecosystem together with modern pretrained vision backbones, and Python doesn’t have a peer for SciML. The original stack was vision encoders feeding torchdiffeq in PyTorch, which works but leaves much to be desired. Moving the DiffEQ side to Julia meant the vision side had to come too. Jimm started as a one-off port of a single backbone for that internal use case and snowballed from there. If your work also lives at that intersection of pretrained vision encoders and the rest of the SciML stack, the hope is that Jimm makes Julia a more complete option for that workload.

What you get today

A small but useful set of modern CNN backbones for research and practitioner use, with pretrained weights: BiT ResNetV2 (15 variants, Apache 2.0), ConvNeXt v1 (19 Facebook AI variants from the original 2022 paper, Apache 2.0, plus 4 DINOv3 encoders under Meta’s DINOv3 License), and ConvNeXt V2 (26 FCMAE variants, CC BY-NC 4.0). 64 checkpoints in total. Pretrained weights load by passing the variant key to a one-line load_<family>_pretrained call. ViT, EfficientNet, Swin, and the rest of the timm catalog are open targets for contribution.

What Jimm is, and isn’t

It is a strict port: same architectures, same hyperparameters, same weight init, same state_dict key layout, so any timm/<variant> checkpoint on HuggingFace loads without manual rewiring and the forward pass matches timm to within float32 round-off. It is not a Julia-native reimagining, a general CV toolkit, or a training framework, and it is not at 1:1 parity with the full timm catalog, nor is it likely to ever be. timm has hundreds of architectures and thousands of checkpoints; Jimm tracks the subset its contributors actually use. Backbones land via PR.

Correctness gate

Every registered variant has a parity test that downloads the real .safetensors from HuggingFace, loads them through the Lux model, runs the forward pass, and asserts max-abs-diff against timm’s output on the same input is under 1e-3 (most variants land closer to 1e-4). That single test covers both the architecture port and the weight loader: if the safetensors loader misroutes a tensor or applies the wrong axis permutation, the forward output diverges and the test fails. CI is still a work in progress (the full sweep is expensive and we are figuring out how to run it with the resources we have), but contributors can scope runs to a single variant via JIMM_TEST_VARIANTS.

How this code was produced (and a caveat)

Most of Jimm was written by AI agents driving the porting workflow encoded in .claude/skills/timm-to-lux/, with human review at each phase and the parity tests as the correctness backstop. The code is already being used in real projects, so it works, but expect bugs and rough edges, especially around features that the parity tests do not exercise (anything past forward inference with the released weights). File issues; we will fix them.

Porting new backbones with Claude Code

If you have Claude Code and tokens to spare, the practical path to a new backbone is: open the repo, ask Claude to port timm/<your_model>, and follow the skill at .claude/skills/timm-to-lux/. It produces a working PR for most timm architectures in a single session taking into account the differences between the two frameworks.

Contributions are welcome and encouraged, with or without the skill. See the README’s “Contributing a new backbone” section for the acceptance criteria. Bug reports, PRs for new variants of registered families, and PRs for entirely new families are all in scope.

Thanks to Ross Wightman for timm, the Julia ML ecosystem maintainers, and to my employer Medical Metrics Inc..

A few updates:

  • First external contribution to the project from PaioPaio : classic ResNet (Deep residual learning for image recognition)
  • There is now a working CI system that tests every single backbone. I wasn’t able to use Github actions for this due to size of the weights / parity fixture cache, so I built a custom interface with Tachikoma.jl that runs everything inside of a dedicated virtual machine. CI is triggered manually to prevent bitcoin mining :sweat_smile:
  • Documenter.jl is now setup for the repo and I spent a decent amount of time trying to better refine and organize everything including the main readme.
  • I had to update how we handle the tolerance check for features. The new formula is max-abs-diff / max-abs(timm ref). Absolute tolerance on its own was proving rather difficult to work with across a wide range of models with different activation scales and depths. We should be able to use 1e-5 rtol as the new threshold using this formula and have it work across all backbones. Open to any suggestions here.
  • The main thing left before registering the package is making sure that all of our interfaces are sound / we are not doing anything that causes type instability, etc.

So so cool!!!

Can’t believe nobody else posted a response than me. So let me repeat what I said:

This is actually really cool! Rest assured I’ll give it a go. Hi five!

Congratz on the package :rocket:

This sounds like a solid package to develop the ecosystem, very nice indeed.

v0.1.0 has been published to the general registry! The package has been renamed Luximm.jl to satisfy naming guidelines.

This is a breaking release in terms of interface from the pre-release version, but it streamlines everything making it much easier to use. It’s much more timm-like even if its not fully possible to capture the exact same semantics with how Lux.jl handles parameters and state.

using Luximm, Lux, Random

# ResNet50 with the trained 1000-class ImageNet head.
# `create_pretrained` is family-agnostic; the symbol selects the family.
# It returns the model and a closure that loads the released weights
# into `(ps, st)` once you've run `Lux.setup`.
model, load = create_pretrained(:resnet50_a1_in1k)
ps, st = Lux.setup(Xoshiro(0), model)
ps, st = load(ps, st)
st = Lux.testmode(st)                     # BatchNorm/Dropout in eval mode

x = randn(Float32, 224, 224, 3, 1)
logits, _ = model(x, ps, st)              # (1000, 1)
top1 = argmax(vec(logits))                # ImageNet class index

QoL Improvements

  • You no longer need to remember the exact names for the model creation / weight loading functions for each model family. We now have a standard interface across all families: create_pretrained and create_model.
  • No more passing the same arguments to both the model creation and weight loading functions. create_pretrained returns a function capturing the arguments needed to initialize the weights.

I won’t rule out changes to the interface for v1.0.0 someday, but the interface should remain stable for the forseeable future.