How to access S3 compatible CEPH bucket?

Hi, Is there any recommended way to access S3 compatible CEPH bucket and send and receive a file? What would be the best way to do it? AWS.jl, or are there maybe any other packages? Are there maybe any examples apart to the one provided at AWS.jl (https://github.com/JuliaCloud/AWS.jl) related to S3 compatible Minio? I did some searching and was not able to find any directly related topics on Julia discourse thus the question. I have access key, secret key and endpoint and was able to configure and use it so far with the use of rclone. I would appreciate any potential advice.

There’s an example in the readme for creating a new AWSConfig type to connect to minio: https://github.com/JuliaCloud/AWS.jl#modifying-functionality

I spotted it as well. I was not sure if this is the right path, however, after your confirmation I made it working. Thanks!

using Minio
CephConfig = MinioConfig("https://<endpoint::String>:<port::String>"; region="<region::String>", username="<access_key_id::String>", password="<secret_key::String>")

using AWSS3
AWSS3.s3_get_file([::AbstractAWSConfig], bucket, path, filename; [version=], kwargs...) # to download object
AWSS3.s3_put([::AbstractAWSConfig], bucket, path, data, data_type="", encoding=""; <keyword arguments>) # to upload file

I can not quite understand two commands from AWSS3 package [https://juliacloud.github.io/AWSS3.jl/stable/api/#S3-Interaction]. Would you have any comments?

The first one is:

s3_list_objects([::AbstractAWSConfig], bucket, [path_prefix]; delimiter="/", max_items=1000, kwargs...)

Should return an iterator of Dicts with keys Key, LastModified, ETag, Size, Owner, StorageClass.

And in my case when I execute

list_objects = AWSS3.s3_list_objects(CephConfig, bucket_name; delimiter="/", max_items=1000)

it returns

Channel{OrderedCollections.LittleDict}(128) (empty)

Is this correct? (there are objects in this bucket) How do I work with it to receive the list of objects? Dict(value)?

The second one is:

s3_list_keys([::AbstractAWSConfig], bucket, [path_prefix]; kwargs...)

that is similar to s3_list_objects and should return object keys as Vector{String}.

And when I try to execute

list_keys = AWSS3.s3_list_keys(CephConfig, bucket_name)

it returns:

Base.Generator{Channel{OrderedCollections.LittleDict}, AWSS3.var"#41#42"}(AWSS3.var"#41#42"(), Channel{OrderedCollections.LittleDict}(128))

Is this correct? How do I use it?

Also I am wondering in case of AWSS3.s3_put is it possible to upload a file without reading it first into memory as a vector of bytes? Now I am doing it like below, however, in some cases it seems to allocate quite a lot:

dir_source = "/path/to/data" # source directory of the file to upload
readdir(dir_source) # read source directory
file_name = "data.txt" # name of the file to upload
path_file_upload = joinpath(dir_source, file_name) # path to the file
data_to_upload = Base.read(path_file_upload) # reading the file into memory
bucket_name = "public" # bucket name
AWSS3.s3_put(CephConfig, bucket_name, file_name, data_to_upload)

I would appreciate any additional help.

It looks like AWSS3.s3_list_keys returns a Base.Generator. This is kind-of like an array, but the elements are only evaluated individually, usually to be processed one-at-a-time, which avoids array allocation. See Multi-dimensional Arrays · The Julia Language or What is a Base.Generator?.

Try collect(AWSS3.s3_list_keys(CephConfig, bucket_name)) to evaluate the items and store them in an array.

Thank you.

I am also finding useful:
https://docs.julialang.org/en/v1/base/collections/#Dictionaries

OrderedCollections.jl: https://github.com/JuliaCollections/OrderedCollections.jl