Is that’s what you mean with “interoperability with existing DL frameworks”? You may want to spell it out in the docs. And since plural what other? ONNX? And/or PyTorch Lightning? I’m not up-to-speed on the latter, or if some Julia package corresponds to it. In general, see also:
Since you link to [vision] transformer, would your package be best to replicate BERT-models, GPT-3, or Google’s even larger Switch Transformer model? Since it’s sparse, would that be a hindrance?