Is there a package that abstracts the storage location so that it's cloud and local path agnostic?

xiaodai · September 9, 2021, 9:59am

Say I want to save a file somewhere on the hard drive, it’s actually not that different to saving it to a S3 bucket.

E.g. save(location, object) should work regardless if the location is S3 or local path. Wonder if such a package exists already. E.g. R’s pins is like that I think.

aplavin · September 9, 2021, 10:37am

Not as general as you are looking for, but my package Alexander Plavin / OpenScienceFramework.jl · GitLab defines methods for Base filesystem functions to operate (r/w) on remote files stored at https://osf.io.
Alexander Plavin / SquashFS.jl · GitLab does the same (read-only) for squashfs archives.
So, it’s clearly possible to provide an almost-uniform interface to access files on different backends.

johnh · September 9, 2021, 12:03pm

You can have S3 compatible storage locally.
For instance if you configure a CEPH filesystem. Personally, I would say configuring CEPH on your laptop would be overkill and send you prematurely grey, but hey.
Doing a quick search you would be better setting up Minio, which is relatively easy.

What are you trying to achieve here? Have you looked at HDF5 type storage?

xiaodai · September 9, 2021, 12:04pm

make a set of code that abstracts away the storage mechanism so I can develop code offline and move them to aws or the like without changing one line of code.

johnh · September 9, 2021, 12:07pm

By the way, exposing my ignorance here, saving to a local filesystem and saving to S3 ona web service are different. (Lets assume you are not running Minio locally).
Your local filesystem is a POSIX filesystem and you make POSIX standard calls to manipulate files.

S3 is more recent and uses HTTP semantics to PUT and GET objects. You cannot change a a S3 object - you can change it and write it again, but it is then a different object (I may not be entirely correct here)
There wa a great presnetation on storage in the Dell HPC series yesterday. When it is available I will post a link (assuming I remember)

ericphanson · September 9, 2021, 12:20pm

FilePathsBase.jl tries to make this abstraction, and e.g. AWSS3.jl exposes a FilesPathBase-based S3Path that supports write, read, open, etc. s3 and local filesystems are different enough in subtle ways to make this a bit annoying and occasionally error-prone but it’s largely workable.

Edit: for another approach, see Datasets.jl (JuliaCon talk)

xiaodai · September 9, 2021, 12:25pm

eh… that goes without sayng right? also save(path_or_s3, object) load(path_or_s3, object) totally abstract-able right?

johnh · September 9, 2021, 1:20pm

Datasets.jl looks amazing. I worked for several years running an HPC system used by CFD engineers where we used meaningful directory names in a tree structure. That concept is just so outdated. You are using directory names as metadata - I know this is quite common.

xiaodai · September 9, 2021, 11:45pm

FilePathsBase might work better as I don’t want an extra toml file for the workflow I want.

Topic		Replies	Views
How to access S3 compatible CEPH bucket? Julia at Scale ceph	4	1056	November 2, 2021
Is there any package supporting Samba? or is there any workaround? General Usage question , networking , filesystem , io	2	442	November 28, 2022
Any chance to use AWSS3 to read from google cloud storage General Usage question , cloud-computing	2	638	October 31, 2018
[ANN] ObjectStores.jl Package Announcements	0	616	December 13, 2018
Question about Dagger's DTable Data dagger	5	638	July 23, 2023

Is there a package that abstracts the storage location so that it's cloud and local path agnostic?

Related topics