Pluto notebook HTML version: how to download data from repository?

I’m using Pluto to report a statistical analysis. Right now, the notebook loads data I have stored locally. However, I’d like to convert the notebook to a HTML file so that my collaborators can run it in the cloud, using Binder. Is it possible to store the data in, say, an OSF repository and then have it downloaded from there when the others run the notebook? I tried to do this, but wasn’t successful – but maybe I’m doing something wrong.

What have you tried and what didn’t work? Can you not just upload a csv file somewhere and do

using Downloads, CSV
f = CSV.File(Downloads.download(url))

?

Converting to HTML will not allow them to run it. That option is for displaying. Notice that it is called “Static HTML” in the export menu.

I used the Resource function from PlutoUI. Maybe that was a mistake, although I’m now thinking that the problem may have been on the OSF side. At the moment, the data cannot be in a public repository, so maybe this is not going to work already for that reason.

Perhaps this is a recent addition, but the HTML file now has a button that allows you to run it in Binder. I have used this before, just not with data that had to be downloaded.

I don’t think that’s what Resource is for, from the docstring:

A container for a URL-addressed resource that displays correctly in rich IDEs.

If I understand you correctly you don’t want to display anything, but simply load data into a DataFrame (or similar) for the analysis in the notebook? If that’s right I would re-suggest what I wrote above, although you’re right if the data isn’t publicly available then that clearly won’t work (although I presume in that case nothing will work!?)

1 Like

You can indeed upload your data to OSF, and get a secret link if the data is private. This link can just be downloaded with download() function, and it works in notebooks just fine.
Also see the OSF.jl (my) package if you want to programmatically work with OSF.

2 Likes

This sounds great. I knew about the anonymous link, of course, but not about the OSF.jl package. However, when I try this, I get an error. When I set

osf = OSF.Client(; token="[here goes the anonymous link]")

everything looks fine. But then running

proj = OSF.project(osf; title="MyTitle")

gives an error. There is a lot of output, but probably the most important part of the message is this: “User provided an invalid OAuth2 access token”. Am I missing something?

If you already have a download link, there is no need to use that package: just put the link into download() function. OSF.jl is for cases when you need something more: eg upload the file and get the link in the first place, or list files in some project/directory, etc.

2 Likes

I see. Thank you. I’m not sure how this works, however. Using the download() function produces a string, which I supposed would be a directory from which I could then retrieve the data files. But that does not seem to be the case. Is it possible to obtain separate links to each of the files in an OSF repository, rather than just one link for the whole directory? I could imagine that that would make things easier.

download() returns the actual contents of the csv file, not a string. Here’s an example using some data from one of my packages:

julia> using CSV, DataFrames, Downloads

julia> CSV.read(Downloads.download("https://raw.githubusercontent.com/nilshg/SynthControl.jl/master/data/basque.csv"), DataFrame)
774×17 DataFrame
(...)
2 Likes

Thank you. This would work if the OSF were to give me a link for each file in the repository, but I only get a secret link for the repository as a whole. I was hoping that @aplavin’s OSF.jl package would help to download the repository, given the secret link, and would then help to access the files in the repository. But trying to set that up gave me the error message I reported earlier. I realize that I could put all files in a public GitHub repository and then use your solution. But at least for now I have to keep the data private.

@nilshg, just a small remark, it seems that download() is available in Base and using Downloads is not needed?

Just doing as I’m told :smiley:

help?> Base.download
  download(url::AbstractString, [path::AbstractString = tempname()]) -> path

  Download a file from the given url, saving it to the location path, or if not specified, a temporary path. Returns the path of the downloaded file.

  │ Note
  │
  │  Since Julia 1.6, this function is deprecated and is just a thin wrapper around Downloads.download. In new code, you should use that function directly instead of calling this.
1 Like

You need to create an API token on the OSF website to use the OSF.jl package - the token is user-based, not project-based.

To obtain download links for each file, generate a so-called “view-only link” for that project on their website. Then there is a link for each file that can be used with the download() function.

Use the website if you need these links for a couple of files only, and the OSF.jl package if lots of links/automatic generation of these links is needed.

Thanks, this would solve the problem indeed. But although I have a ‘view-only link’, I cannot find any links for the separate files. Probably I’m missing something obvious. Can you please tell me where I should be looking?

Indeed, looks like there is no obvious button in the OSF website to get a file download link. However, you can just append the view_only=... parameter from the project-wide view-only link to individual file downloads.

Sorry, I’m not following. Say I have a file ‘data.csv’ in the OSF repository and I have my view_only=... link. What would I put into the download() function?

Have you managed to resolve this?
I see OSF.jl has a function for accessing data under view-only links:

struct ViewOnlyLink
    entity::API.Entity{:view_only_links}
end

function view_only_links(proj::Project)
    links = API.relationship(client(proj), proj.entity, :view_only_links)
    return map(ViewOnlyLink, links.data)
end

From src/highlevel.jl
Most of functions I could follow through but I still cannot comprehend how to list the files in a repository when you do not know their names. Maybe I could ask @aplavin for help? :sweat_smile: Thanks!

I didn’t resolve it but I remember that I found a workaround by putting the data I needed in a GitHub repository instead of in an OSF repository. Of course, it would be nice if one could download data directly from OSF.