is there a way to set it up to work with data that is not publicly accessible?
It is a good question.
I have been doing a bit of thinking about that yesterday (I drew a flow chart even but I haven’t released it).
The package is primarily concerned with getting the easy and fairly common case where the data is static and public.
If it is private, but not confidential, then I am thinking that a secret URL is probably fine. Things like Google Drive, dropbox and (at least my) universities data store offer those as a sharing option.
Slightly more secure than that would putting the data on a local websever that is firewalled to only allow local connections.
Beyond that, I believe it should be possible to modify the
fetch_method (which you can already do on a per-datadep-registration level) to use something that does Auth (rather than
I believe Basic HTTP auth wouldn’t be hard to setup. Something more complex like OAUTH probably would be.
I’ld be interested in talking to anyone who is in working in the “Gather Data, run code, repeat, publish” kind area and trying to get this going.
I’m in a “Use a external (standard) dataset, run code, repeat, publish” area, just adding more data sets, not more data.
So my notions could be off for those cases.
One thing for sure is that DataDeps.jl doesn’t know when you update your remote data-source.
It only attempts a download if it can;t find a local copy of the folder.
Of course the other thing to do is setup a ManualDataDep, and have a mounted networked filestore in your DATADEPS_LOAD_PATH.
Then that is all easy.
Then once you are about to publish, upload that to some service like FigShare, and change the registration block to a normal (automatic) DataDep.
That is probably a better work flow, thinking about it.