[ANN] RemoteFiles.jl: Download files from the Internet and keep them up-to-date

The title says it all :wink:

Remote files are declared through the @RemoteFile macro:

using RemoteFiles

@RemoteFile(JULIA_BINARY, "https://status.julialang.org/download/win64",
    file="julia-nightly-x64.exe", updates=:daily)

# Download the file if it is out-of-date
download(JULIA_BINARY)

# Check whether the file has been downloaded
isfile(JULIA_BINARY)

# Get the path
path(JULIA_BINARY)

By default the file is downloaded to Pkg.dir(CURRENT_PACKAGE)/data.
This can be customized with the dir keyword argument to the @RemoteFile macro.

RemoteFiles can be grouped together in a RemoteFileSet:

@RemoteFileSet BINARIES "Julia Binaries" begin
    win = @RemoteFile "https://julialang-s3.julialang.org/bin/winnt/x64/0.6/julia-0.6.0-win64.exe"
    osx = @RemoteFile "https://julialang-s3.julialang.org/bin/osx/x64/0.6/julia-0.6.0-osx10.7+.dmg"
end

# Download all of them

download(BINARIES)

Have a look here.

8 Likes

Can you explain more about how updates=:daily works?

Before starting the download it checks this condition: (last, now) -> Date(now) > Date(last) where last is the time when the file was last modified and now is the current time.

These are the other possible values:

  • :never
    - :daily
    - :monthly
    - :yearly
    - :mondays/:weekly, :tuesdays, etc.

Looking at the code, I can’t see that you’re actually checking if the file needs to be updated. Your isoutdated function only looks at update frequency - not the file modification date on the server.

You should check the remote date, or use the If-Modified-Since header (the -z flag with curl) to reduce strain on the server.

PR? :wink:

Sorry, I don’t have time to do a PR right now, but I suspect all you have to do for the non-windows case is add -z after curl.

I think a proper solutions is a bit more involved. Actually checking on the server whether the file was modified would make the updates parameter obsolete, would it not?

A cross-platform-compatible solution should also probably use LibCURL.jl. But it might take some time until I am able to tackle that.

No. The updates parameter would still determine how frequently the server is contacted. (Checking headers can also overload a server unless the rate is limited.) The If-Modified-Since header is to avoid downloading the exact same file twice. Both mechanisms are necessary.

If you publish a package that downloads large files frequently and unnecessarily, and that package gets popular, then people who pay for server bandwidth will get annoyed with you. For example, if 10 000 people download a 100 megabyte file each day, even though the file has not changed, then that’s about 100 dollars a day in extra bandwidth cost.