For my package, I want to download some data upon the user’s request and store them locally for future usage. A bit of research seems to indicate that DataDeps.jl by @oxinabox is the package to use, but I’m not sure if it can be used to solve my specific problem described below.
There is a website at https://website.com, and multiple data files are stored at
https://website.com/data/subA/A1.csv
https://website.com/data/subA/A2.csv
…
and
https://website.com/data/subB/B1.csv
https://website.com/data/subB/B2.csv
…
etc. The shared portion of all these paths is https://website.com/data. I would like to register this part as a DataDep and let the users to download and store the individual files in the individual subdirectories.
I hoped that some thing like
register(DataDep("Website.COM Data", "Data published in website.com", "https://website.com/data"))
Correct DataDeps needs a full list of files to fetch.
You need to list them all in the register block.
(DataDepsGenerators.jl can help with this, some of the time)
RemoteFiles.jl
might be a suitable alternative that works better for this usecase.
I am not sure, I haven’t tried it
Could you explain how to register DataDep such that subfolders are created as mentioned in this documentation? In the documentation, I was not able to find an example describing the method to create subfolders. If I know how to create subfolders, I might be able to devise a method to achieve what I want.
I tried RemoteFiles, but it doesn’t seem to support creation of subfolders in the default location (the root directory of the package using RemoteFiles).
register(DataDep(
"Pi3",
"Some message",
[
"https://www.angio.net/pi/digits/10.txt",
"https://www.angio.net/pi/digits/100.txt",
[
"https://www.angio.net/pi/digits/1000.txt",
"https://www.angio.net/pi/digits/10000.txt",
"https://www.angio.net/pi/digits/100000.txt"
]
],
sha2_256,
post_fetch_method = [
# 1st applies to 1st file, i.e 10.txt
filename -> mv(filename, joinpath(mkpath("ten"), basename(filename))),
# 2nd applies to 2nd listed file, i.e 100.txt
filename -> mv(filename, joinpath(mkpath("hundred"), basename(filename))),
# Applies to all things in 3rd (the inner vector) ie. 1000.txt, 10000.txt, and 100000.txt)
# alt could have written a vector of 3 function here to treat those differently
filename -> mv(filename, joinpath(mkpath("lots"), basename(filename))),
]
))
readdir(datadep"Pi3")
readdir(datadep"Pi3/ten")
readdir(datadep"Pi3/lots")
In post_fetch_method you can run whatever code you like to derive the subfolder name from the filename. But the filename won’t have the subfolder embedded in it – blame RFC 6266 I guess.
@oxinabox, thanks! This is closer to what I am trying to do.
I have one more question. When you have many files listed in one DataDep, it seems that reading one file from the DataDep downloads all the listed files. Is there a way to make it download only one file if that is the only file the user requests?