New Artifact System, Data, and OneDrive

Yes, one of the this DataDeps lets you do that Artifacts intentionally doesn’t is customize your transport mechanism via setting the fetch method. Or via having a path-like type that overloads Base.download.

Its relatively easy to add support for downloading via some other means.
You just set the fetch_method in the registration block.

Something like shelling out to the linux secure copy tool looks like:

fetch_method =  (local_dir, remote_path) -> run(`scp myserver:$remote_dir $filepath)

Or you use a type that overloads Base.download:
like
AWSS3.jl which works with DataDeps and with AWS’s auth system out of the box.
If you’ve used AWS S3 before that would be my go to choice, that gets used all the time at Invenia.

There is a more proof of conceppt example in
PyDrive.jl for how to use PyCall to access GoogleDrive with their auth system and DataDeps.jl


Artifacts has some lessons learned from DataDeps. So there are many overlaps.
DataDeps is more flexible it also allows custom post-processing so you can use any file format not just tarballs. (and you don’t have to have all data inside tarballs).

Artifacts uses tree-hashs opf the post unpacking file structure which means they can check artififacts decompressed right.
Where as DataDeps only uses a hash of the file downloaded so can only check if the download worked.

Artifacts use content addressing so can’t ever run into a name clash.


I also would like to know if anyone has used Artifacts for data in the wild.

2 Likes