Hi, Is there any recommended way to access S3 compatible CEPH bucket and send and receive a file? What would be the best way to do it? AWS.jl, or are there maybe any other packages? Are there maybe any examples apart to the one provided at AWS.jl (https://github.com/JuliaCloud/AWS.jl) related to S3 compatible Minio? I did some searching and was not able to find any directly related topics on Julia discourse thus the question. I have access key, secret key and endpoint and was able to configure and use it so far with the use of rclone. I would appreciate any potential advice.
There’s an example in the readme for creating a new AWSConfig type to connect to minio: https://github.com/JuliaCloud/AWS.jl#modifying-functionality
I spotted it as well. I was not sure if this is the right path, however, after your confirmation I made it working. Thanks!
using Minio
CephConfig = MinioConfig("https://<endpoint::String>:<port::String>"; region="<region::String>", username="<access_key_id::String>", password="<secret_key::String>")
using AWSS3
AWSS3.s3_get_file([::AbstractAWSConfig], bucket, path, filename; [version=], kwargs...) # to download object
AWSS3.s3_put([::AbstractAWSConfig], bucket, path, data, data_type="", encoding=""; <keyword arguments>) # to upload file
I can not quite understand two commands from AWSS3 package [https://juliacloud.github.io/AWSS3.jl/stable/api/#S3-Interaction]. Would you have any comments?
The first one is:
s3_list_objects([::AbstractAWSConfig], bucket, [path_prefix]; delimiter="/", max_items=1000, kwargs...)
Should return an iterator of Dicts with keys Key, LastModified, ETag, Size, Owner, StorageClass.
And in my case when I execute
list_objects = AWSS3.s3_list_objects(CephConfig, bucket_name; delimiter="/", max_items=1000)
it returns
Channel{OrderedCollections.LittleDict}(128) (empty)
Is this correct? (there are objects in this bucket) How do I work with it to receive the list of objects? Dict(value)?
The second one is:
s3_list_keys([::AbstractAWSConfig], bucket, [path_prefix]; kwargs...)
that is similar to s3_list_objects and should return object keys as Vector{String}.
And when I try to execute
list_keys = AWSS3.s3_list_keys(CephConfig, bucket_name)
it returns:
Base.Generator{Channel{OrderedCollections.LittleDict}, AWSS3.var"#41#42"}(AWSS3.var"#41#42"(), Channel{OrderedCollections.LittleDict}(128))
Is this correct? How do I use it?
Also I am wondering in case of AWSS3.s3_put
is it possible to upload a file without reading it first into memory as a vector of bytes? Now I am doing it like below, however, in some cases it seems to allocate quite a lot:
dir_source = "/path/to/data" # source directory of the file to upload
readdir(dir_source) # read source directory
file_name = "data.txt" # name of the file to upload
path_file_upload = joinpath(dir_source, file_name) # path to the file
data_to_upload = Base.read(path_file_upload) # reading the file into memory
bucket_name = "public" # bucket name
AWSS3.s3_put(CephConfig, bucket_name, file_name, data_to_upload)
I would appreciate any additional help.
It looks like AWSS3.s3_list_keys
returns a Base.Generator
. This is kind-of like an array, but the elements are only evaluated individually, usually to be processed one-at-a-time, which avoids array allocation. See Multi-dimensional Arrays · The Julia Language or What is a Base.Generator?.
Try collect(AWSS3.s3_list_keys(CephConfig, bucket_name))
to evaluate the items and store them in an array.
Thank you.
I am also finding useful:
https://docs.julialang.org/en/v1/base/collections/#Dictionaries
OrderedCollections.jl: https://github.com/JuliaCollections/OrderedCollections.jl