Pickle.Defer when loading PyTorch tensors in Julia

Hi everyone! I’m working on implementing the ZINC molecular dataset for MLDatasets.jl and ran into something I wanted to get thoughts on.
The raw ZINC data is distributed as Python pickle files where the tensors are stored as PyTorch tensors (torch.Tensor). When I load them with Pickle.jl, instead of actual arrays I get Pickle.Defer objects (which makes sense since Pickle.jl has no idea what a PyTorch tensor is).

My current workaround is a one-time Python conversion script that unpacks the tensors via .numpy() and saves everything as flat NPZ files, which NPZ.jl reads perfectly. Then the Julia loader just reads the NPZ.

Is there a cleaner pure-Julia approach? Has anyone dealt with PyTorch pickle files in Julia before? Is Pickle.Defer something that can be handled/extended, or is the Python preprocessing step just the accepted pattern for datasets like this?

You can reproduce this yourself without needing the ZINC dataset at all

create_test.py

import torch, pickle

data = [
{“atom_type”: torch.tensor([6, 7, 8, 6]), # atom types
“bond_type”: torch.zeros(4, 4).long(), # bond adjacency matrix
“logP_SA_cycle_normalized”: torch.tensor(-0.5)}
]

with open(“test.pickle”, “wb”) as f:
pickle.dump(data, f)

print(“test.pickle created!”)

test_load.jl

using Pickle

data = Pickle.load(“test.pickle”)
mol = data[1]

@show typeof(mol[“atom_type”]) # → Pickle.Defer (expected Array!)
@show typeof(mol[“logP_SA_cycle_normalized”]) # → Pickle.Defer

atom_type = Int.(mol[“atom_type”]) # MethodError: no method matching length(::Pickle.Defer)