[ANN] Luximm.jl: Lux ports of timm image backbones, with HuggingFace pretrained weights

I’m happy to share a first public look at Luximm.jl, a Julia package that ports image-classification backbones from Ross Wightman’s timm (PyTorch Image Models) to Lux.jl. Pretrained weights load directly from the HuggingFace Hub in .safetensors format, sharing the same on-disk cache timm and huggingface_hub use.

Repo: GitHub - csvance/Luximm.jl: Lux.jl Image Models · GitHub (see the README for install, quickstart, the full variant table, and the porting workflow; this post is just the elevator pitch.)

Why this exists

The motivation was concrete: I needed Julia’s SciML ecosystem together with modern pretrained vision backbones, and Python doesn’t have a peer for SciML. The original stack was vision encoders feeding torchdiffeq in PyTorch, which works but leaves much to be desired. Moving the DiffEQ side to Julia meant the vision side had to come too. Jimm started as a one-off port of a single backbone for that internal use case and snowballed from there. If your work also lives at that intersection of pretrained vision encoders and the rest of the SciML stack, the hope is that Jimm makes Julia a more complete option for that workload.

What you get today

A small but useful set of modern CNN backbones for research and practitioner use, with pretrained weights: BiT ResNetV2 (15 variants, Apache 2.0), ConvNeXt v1 (19 Facebook AI variants from the original 2022 paper, Apache 2.0, plus 4 DINOv3 encoders under Meta’s DINOv3 License), and ConvNeXt V2 (26 FCMAE variants, CC BY-NC 4.0). 64 checkpoints in total. Pretrained weights load by passing the variant key to a one-line load_<family>_pretrained call. ViT, EfficientNet, Swin, and the rest of the timm catalog are open targets for contribution.

What Jimm is, and isn’t

It is a strict port: same architectures, same hyperparameters, same weight init, same state_dict key layout, so any timm/<variant> checkpoint on HuggingFace loads without manual rewiring and the forward pass matches timm to within float32 round-off. It is not a Julia-native reimagining, a general CV toolkit, or a training framework, and it is not at 1:1 parity with the full timm catalog, nor is it likely to ever be. timm has hundreds of architectures and thousands of checkpoints; Jimm tracks the subset its contributors actually use. Backbones land via PR.

Correctness gate

Every registered variant has a parity test that downloads the real .safetensors from HuggingFace, loads them through the Lux model, runs the forward pass, and asserts max-abs-diff against timm’s output on the same input is under 1e-3 (most variants land closer to 1e-4). That single test covers both the architecture port and the weight loader: if the safetensors loader misroutes a tensor or applies the wrong axis permutation, the forward output diverges and the test fails. CI is still a work in progress (the full sweep is expensive and we are figuring out how to run it with the resources we have), but contributors can scope runs to a single variant via JIMM_TEST_VARIANTS.

How this code was produced (and a caveat)

Most of Jimm was written by AI agents driving the porting workflow encoded in .claude/skills/timm-to-lux/, with human review at each phase and the parity tests as the correctness backstop. The code is already being used in real projects, so it works, but expect bugs and rough edges, especially around features that the parity tests do not exercise (anything past forward inference with the released weights). File issues; we will fix them.

Porting new backbones with Claude Code

If you have Claude Code and tokens to spare, the practical path to a new backbone is: open the repo, ask Claude to port timm/<your_model>, and follow the skill at .claude/skills/timm-to-lux/. It produces a working PR for most timm architectures in a single session taking into account the differences between the two frameworks.

Contributions are welcome and encouraged, with or without the skill. See the README’s “Contributing a new backbone” section for the acceptance criteria. Bug reports, PRs for new variants of registered families, and PRs for entirely new families are all in scope.

Thanks to Ross Wightman for timm, the Julia ML ecosystem maintainers, and to my employer Medical Metrics Inc..

A few updates:

  • First external contribution to the project from PaioPaio : classic ResNet (Deep residual learning for image recognition)
  • There is now a working CI system that tests every single backbone. I wasn’t able to use Github actions for this due to size of the weights / parity fixture cache, so I built a custom interface with Tachikoma.jl that runs everything inside of a dedicated virtual machine. CI is triggered manually to prevent bitcoin mining :sweat_smile:
  • Documenter.jl is now setup for the repo and I spent a decent amount of time trying to better refine and organize everything including the main readme.
  • I had to update how we handle the tolerance check for features. The new formula is max-abs-diff / max-abs(timm ref). Absolute tolerance on its own was proving rather difficult to work with across a wide range of models with different activation scales and depths. We should be able to use 1e-5 rtol as the new threshold using this formula and have it work across all backbones. Open to any suggestions here.
  • The main thing left before registering the package is making sure that all of our interfaces are sound / we are not doing anything that causes type instability, etc.

So so cool!!!

Can’t believe nobody else posted a response than me. So let me repeat what I said:

This is actually really cool! Rest assured I’ll give it a go. Hi five!

Congratz on the package :rocket:

This sounds like a solid package to develop the ecosystem, very nice indeed.

v0.1.0 has been published to the general registry! The package has been renamed Luximm.jl to satisfy naming guidelines.

This is a breaking release in terms of interface from the pre-release version, but it streamlines everything making it much easier to use. It’s much more timm-like even if its not fully possible to capture the exact same semantics with how Lux.jl handles parameters and state.

using Luximm, Lux, Random

# ResNet50 with the trained 1000-class ImageNet head.
# `create_pretrained` is family-agnostic; the symbol selects the family.
# It returns the model and a closure that loads the released weights
# into `(ps, st)` once you've run `Lux.setup`.
model, load = create_pretrained(:resnet50_a1_in1k)
ps, st = Lux.setup(Xoshiro(0), model)
ps, st = load(ps, st)
st = Lux.testmode(st)                     # BatchNorm/Dropout in eval mode

x = randn(Float32, 224, 224, 3, 1)
logits, _ = model(x, ps, st)              # (1000, 1)
top1 = argmax(vec(logits))                # ImageNet class index

QoL Improvements

  • You no longer need to remember the exact names for the model creation / weight loading functions for each model family. We now have a standard interface across all families: create_pretrained and create_model.
  • No more passing the same arguments to both the model creation and weight loading functions. create_pretrained returns a function capturing the arguments needed to initialize the weights.

I won’t rule out changes to the interface for v1.0.0 someday, but the interface should remain stable for the forseeable future.

What’s the main difference from Metalhead.jl? (1) Lux instead of Flux, obviously. (2) Ability to load pretrained weights from HF? (3) A non-overlapping coverage of CNN architectures - I guess Metalhead has a broader coverage, while Luximm has more up-to-date models?

TLDR: The goal of Luximm is different: it aspires to be a Julia counterpart to timm. Luximm aims to eventually cover most of the numerous backbones timm offers, all with pretrained weights. The recency bias in v0.1.0 simply reflects the backbones I work with day-to-day. The focus for this release was building the foundation to add many more architectures easily. I have another ~5 planned for the next month, and I intend to keep pushing them out on a regular basis after that.

Pretrained weights are central to the value proposition for me. As a machine learning engineer who works with these backbones regularly, training from scratch isn’t usually practical: pretraining consumes a large amount of resources, and generalization tends to suffer significantly without it. So every architecture in Luximm ships with pretrained weights where they are available.

EDIT: I’m also open to the idea of trying to split off a common core of functionality that could be used by both Lux.jl and Flux.jl to load these models. I don’t know exactly what that looks like, perhaps a TimmCommon.jl package that handled most of the work of interacting with Hugging Face / providing a general way to easily map the loaded weights into your framework of choice. Feel free to reach out if you are interested in something like this!

Hi very nice package have you tested if it’s compatible with Reactant and the whole MLIR tools around.

It absolutely works with Reactant! At the end of the day it’s just standard Lux.jl and NNlib. It’s not something we actively test currently as part of CI, but my current project is using one of the backbones with Reactant.

v0.1.1 has been released and it builds the foundation to provide ViT based models. Future releases will target many of the common ones used by researcher and practitioners (ie CLIP and DINO). In addition to that, I included a few more classic architectures:

  • VGG16/19
  • SE-ResNet
  • CoAtNet

I’m aiming on pushing out new sets of weights / architectures around twice a month until I feel like we have coverage of the most essential backbones.