For my package, I want to download some data upon the user’s request and store them locally for future usage. A bit of research seems to indicate that DataDeps.jl by @oxinabox is the package to use, but I’m not sure if it can be used to solve my specific problem described below.
There is a website at
https://website.com, and multiple data files are stored at
etc. The shared portion of all these paths is
https://website.com/data. I would like to register this part as a
DataDep and let the users to download and store the individual files in the individual subdirectories.
I hoped that some thing like
register(DataDep("Website.COM Data", "Data published in website.com", "https://website.com/data"))
and subsequent calls of
data_A1 = read(datadep"Website.COM Data" * "/subA/A1.csv")
data_A2 = read(datadep"Website.COM Data" * "/subA/A2.csv")
data_B1 = read(datadep"Website.COM Data" * "/subB/B1.csv")
data_B2 = read(datadep"Website.COM Data" * "/subB/B2.csv")
would download the files into
~/.julia/datadeps/Website.COM Data/, but it didn’t work.
Is there a way to achieve the goal described above using DataDeps or any other packages?
Correct DataDeps needs a full list of files to fetch.
You need to list them all in the
(DataDepsGenerators.jl can help with this, some of the time)
might be a suitable alternative that works better for this usecase.
I am not sure, I haven’t tried it
@oxinabox, thanks for your answer!
Could you explain how to register
DataDep such that subfolders are created as mentioned in this documentation? In the documentation, I was not able to find an example describing the method to create subfolders. If I know how to create subfolders, I might be able to devise a method to achieve what I want.
RemoteFiles, but it doesn’t seem to support creation of subfolders in the default location (the root directory of the package using
Here is an example
post_fetch_method = [
# 1st applies to 1st file, i.e 10.txt
filename -> mv(filename, joinpath(mkpath("ten"), basename(filename))),
# 2nd applies to 2nd listed file, i.e 100.txt
filename -> mv(filename, joinpath(mkpath("hundred"), basename(filename))),
# Applies to all things in 3rd (the inner vector) ie. 1000.txt, 10000.txt, and 100000.txt)
# alt could have written a vector of 3 function here to treat those differently
filename -> mv(filename, joinpath(mkpath("lots"), basename(filename))),
Output at end is
post_fetch_method you can run whatever code you like to derive the subfolder name from the filename. But the filename won’t have the subfolder embedded in it – blame RFC 6266 I guess.
@oxinabox, thanks! This is closer to what I am trying to do.
I have one more question. When you have many files listed in one
DataDep, it seems that reading one file from the
DataDep downloads all the listed files. Is there a way to make it download only one file if that is the only file the user requests?
Thank you for the confirmation! I think I can live with the situation.