I’m trying to access data on a Google Drive using the excellent DataDeps.jl.
The drive has been shared with me (so I don’t own it) and the data in it isn’t zipped (there are multiple folders in the shared parent folder). The data in it changes but very sporadically. While I could download the whole thing and work from there, it would be better if I could “access” it with DataDeps. I then could delete the local repository once in a while to keep it fresh.
Has anyone managed to get DataDeps to work with Google Drive?
I take it you don’t have a public URL for it?
It is private, authed to you?
I am sure this can be made to work.
By replacing the fetch_method with something that does auth
I just am not sure exactly how.
Of course you can always use a ManualDataDep and external syncing
Which while it misses out on most of the advantages it does mm eave you an easy path forward once we workout how to automate it, since only the registration block would change
If you are on Linux easiest might be to make a function that wraps run(`grive2 -s $remote`)
Plus some combination of cd and mv to make it work roughly like download(remote, local).
So I did some poking,
it is actually really easy to download a file from drive IF you can sort out auth.
You can get the ID out of the webpage (inspect element shows it as a data field in the list of files. And you can get it a few other ways).
Which basically gives you a URL
See: Files: get | Drive API | Google Developers
Problem is setting up Auth
OAuth 2.0 is just a really annoying process to set up.
I mean it is as nice as it can be while still being secure, but that is not nice.
If we had a good Outh 2.0 library, this would be doable.
I am not aware of once.
Tried it. It most probably could be made to work, but:
googledrive has libcurl4-openssl-dev as a dependency, so build dependencies need to include that.
there is no native solution for recursively downloading a folder (except this code snippet).
because of the hell that is native Google file types, a folder can contain two files with identical names (google-drive-ocamlfuse solves this by slapping an extra file extension on the ambiguous ones, e.g. a file.csv that is a google spreadsheet becomes file.csv.csv when downloaded). If trying to download such identical files it throws an error.
I guess that ideally we would have some Julian API to deal with google drive (and other such popular solutions) and once that API is solid DataDeps could use that to provide the same functionality it has for other repos. The API R has for google drive is lacking (e.g. we can’t even download a folder). So I’m not sure you want to build and rely on that. But I guess it’s better than nothing at all.
Right, concept proved.
This works with DataDeps.jl
It wraps PyDrive.
It throws a lot of warnings, because DataDeps.jl has a kind of assumption that remotes would be HTTP URLS represented as Strings, but its fallbacks kick in and so does deal with it.
Though it seems like there over tight constraints are a bit of a minor bug.
Another thing is that while the download of the Files is lazy,
the download of the file names in the registration block is eager,
I think the simple solution to that is to use some kind of LazyVector type,
Right now it is almost all in that notebook rather than in a proper julia repo.
If you or someone else want to take PyDrive wrapping stuff and make it into a proper julia repo, that would be cool.
idk when (/if) I’ld have time to work on this again.
Having the concept proven is pleasing to me.
DataDeps itself doesn’t deal with the idea of downloading a folder very well.
Because most of the time when you want to download a folder that means downloadng a tarball or a .zip which keeps structure. Otherwise you are downloading a collection of files without structure to where they end up (except as imposed by applying mv as a post0fetch method)
I’m trying to understand how you mean for people to use this. Do you mean that people that want to use DataDeps with a Google Drive will need to include PyDrive.jl (after we clean it up and all) for it to work?
Kind of like this?
using PyDrive, DataDeps
register(DataDep("GoogleDriveDemo",
"Demonstration of google drive",
list_files_in_folder("Demo"),
fetch_method = drive_download));