Best practice for storing data in Packages

I would like to create a package for a model that ships with its own internal data. Basically, I’m looking for the Julia equivalent to R’s R/sysdata.rda (link).

I currently have something that looks like this:

module MyModule
using Serialization: deserialized

export myfunction

mydata = deserialize("data/mydata")

function myfunction(a, b)
  a .* mydata[:,1] .+ b .* mydata[:,2]
end

end

My questions are:

(1) How do I generalize the data path so it works anywhere? Right now this only works when I’m doing analysis inside the package directory. I would like to be able to install this package from GitHub (via Pkg.add) and use it in other projects, but when I try to do that, the package install fails because it can’t find the data directory (because the path is relative to the current directory).

(2) Is the fact that myfunction uses a variable from outside its scope inefficient (similar to how functions using global variables are inefficient)? Or is it fine because it’s in a module?

(3) Should mydata here be a const?

Welcome to the Julia community! See if this works:

project_path(parts...) = normpath(joinpath(@__DIR__, "..", parts...))

mydata = deserialize(project_path("data/mydata"))

I don’t remember who I stole the project_path function from but it was from another user on here :wink:

5 Likes

Welcome!
Maybe this new V1.4 function is useful for you (I have not tried it by myself yet):

pkgdir(ModuleName) now provides a simpler way to return the package root directory of a module (or submodule) than the typically used dirname(dirname(pathof(ModuleName))) (#33128).

4 Likes

An alternative to Artifacts is DataDeps.jl.
Either a normal datadep, with the data stored externally, e.g. on FigShare or Zenodo.
Or a ManualDataDep which will work for data stored in <project>/deps/data
(or you can put instructions on how to manually load the data.)

DataDeps lets you avoid worrying about where the data is stored,
because instead of writing things like ./../data/GoodData
you write datadep"GoodData" and it resolves to a string tht is the file path
(can also do datadepp"GoodData/myfile.csv etc)

IIRC you can do similar with artifacts using artifact"GoodData" but I am not 100% sure

There are number of pros and cons between Artifacts and DataDeps.
One pro of DataDeps is it works with julia 1.0.x (the LTS) not just with 1.3+
the others are around being more flexible for transport (can use a secure download e.g. AWSS3.jl, GoogleDrive.jl), and have post-fetch methods for unpacking random archieves (not just tarballs)
Downside of DataDeps is Artifacts use content addressing which is really clever and means it is basiclly impossibly to run into a name collision.
Artifacts also know how to clean themselves up when not needed anymore

2 Likes

Thanks! This is very useful. Technically, this is probably the (more) correct answer, but I marked the other one as the solution because it works with Julia 1.3 (which I happen to currently be using right now…though I’ll be sure to update to 1.4 soon!).