Is there a package that abstracts the storage location so that it's cloud and local path agnostic?

Say I want to save a file somewhere on the hard drive, it’s actually not that different to saving it to a S3 bucket.

E.g. save(location, object) should work regardless if the location is S3 or local path. Wonder if such a package exists already. E.g. R’s pins is like that I think.

Not as general as you are looking for, but my package Alexander Plavin / OpenScienceFramework.jl · GitLab defines methods for Base filesystem functions to operate (r/w) on remote files stored at https://osf.io.
Alexander Plavin / SquashFS.jl · GitLab does the same (read-only) for squashfs archives.
So, it’s clearly possible to provide an almost-uniform interface to access files on different backends.

You can have S3 compatible storage locally.
For instance if you configure a CEPH filesystem. Personally, I would say configuring CEPH on your laptop would be overkill and send you prematurely grey, but hey.
Doing a quick search you would be better setting up Minio, which is relatively easy.

What are you trying to achieve here? Have you looked at HDF5 type storage?

make a set of code that abstracts away the storage mechanism so I can develop code offline and move them to aws or the like without changing one line of code.

By the way, exposing my ignorance here, saving to a local filesystem and saving to S3 ona web service are different. (Lets assume you are not running Minio locally).
Your local filesystem is a POSIX filesystem and you make POSIX standard calls to manipulate files.

S3 is more recent and uses HTTP semantics to PUT and GET objects. You cannot change a a S3 object - you can change it and write it again, but it is then a different object (I may not be entirely correct here)
There wa a great presnetation on storage in the Dell HPC series yesterday. When it is available I will post a link (assuming I remember)

FilePathsBase.jl tries to make this abstraction, and e.g. AWSS3.jl exposes a FilesPathBase-based S3Path that supports write, read, open, etc. s3 and local filesystems are different enough in subtle ways to make this a bit annoying and occasionally error-prone but it’s largely workable.

Edit: for another approach, see Datasets.jl (JuliaCon talk)

3 Likes

eh… that goes without sayng right? also save(path_or_s3, object) load(path_or_s3, object) totally abstract-able right?

Datasets.jl looks amazing. I worked for several years running an HPC system used by CFD engineers where we used meaningful directory names in a tree structure. That concept is just so outdated. You are using directory names as metadata - I know this is quite common.

FilePathsBase might work better as I don’t want an extra toml file for the workflow I want.