Using DataDeps for multiple files in multiple subdirectories

For my package, I want to download some data upon the user’s request and store them locally for future usage. A bit of research seems to indicate that DataDeps.jl by @oxinabox is the package to use, but I’m not sure if it can be used to solve my specific problem described below.

There is a website at, and multiple data files are stored at




etc. The shared portion of all these paths is I would like to register this part as a DataDep and let the users to download and store the individual files in the individual subdirectories.

I hoped that some thing like

register(DataDep("Website.COM Data", "Data published in", ""))

and subsequent calls of

data_A1 = read(datadep"Website.COM Data" * "/subA/A1.csv")
data_A2 = read(datadep"Website.COM Data" * "/subA/A2.csv")
data_B1 = read(datadep"Website.COM Data" * "/subB/B1.csv")
data_B2 = read(datadep"Website.COM Data" * "/subB/B2.csv")

would download the files into ~/.julia/datadeps/Website.COM Data/, but it didn’t work.

Is there a way to achieve the goal described above using DataDeps or any other packages?

Correct DataDeps needs a full list of files to fetch.
You need to list them all in the register block.
(DataDepsGenerators.jl can help with this, some of the time)

might be a suitable alternative that works better for this usecase.
I am not sure, I haven’t tried it

@oxinabox, thanks for your answer!

Could you explain how to register DataDep such that subfolders are created as mentioned in this documentation? In the documentation, I was not able to find an example describing the method to create subfolders. If I know how to create subfolders, I might be able to devise a method to achieve what I want.

I tried RemoteFiles, but it doesn’t seem to support creation of subfolders in the default location (the root directory of the package using RemoteFiles).

Here is an example

    "Some message",
    post_fetch_method = [
        # 1st applies to 1st file, i.e 10.txt
        filename -> mv(filename, joinpath(mkpath("ten"), basename(filename))),
        # 2nd applies to 2nd listed file, i.e 100.txt
        filename -> mv(filename, joinpath(mkpath("hundred"), basename(filename))),
        # Applies to all things in 3rd (the inner vector) ie. 1000.txt, 10000.txt, and 100000.txt)
        # alt could have written a vector of 3 function here to treat those differently
        filename -> mv(filename, joinpath(mkpath("lots"), basename(filename))),


Output at end is

julia> readdir(datadep"Pi3")
3-element Vector{String}:

julia> readdir(datadep"Pi3/ten")
1-element Vector{String}:

julia> readdir(datadep"Pi3/lots")
3-element Vector{String}:

In post_fetch_method you can run whatever code you like to derive the subfolder name from the filename. But the filename won’t have the subfolder embedded in it – blame RFC 6266 I guess.

1 Like

@oxinabox, thanks! This is closer to what I am trying to do.

I have one more question. When you have many files listed in one DataDep, it seems that reading one file from the DataDep downloads all the listed files. Is there a way to make it download only one file if that is the only file the user requests?

There is not

Thank you for the confirmation! I think I can live with the situation.