Including CSV Files in a Package I'm Developing..?

I’m attempting to develop my first package and I have a question about how to incorporate some necessary data into the package. Right now, I have four .csv files that include some data/lookup values that will be necessary for package users. Is it okay to do it this way or is there a better approach? For example, would it be better to simply convert them to Dicts and assign them to variables that are defined in a .jl file?

The current structure looks something like this:

src /
  data_dicts/
      csv1.csv
      csv2.csv
      csv3.csv
      csv4.csv
  MyPkg.jl

In this contrived example, MyPkg.jl reads from the .csv files. The files are small (6kb each) and wouldn’t be too much of a pain to convert to dicts, but it’s more convenient for me to leave them in .csv format.

It depends on what you are doing, but it seems quite unlikely that using CSV is the best approach as reading them is rather slow (even though CSV.jl is excellent). You may want to consider a different serialization format such as HDF5.

If it’s a small enough amount of data, writing a script that converts the CSV’s to Julia source code can be a good approach. It has the significant advantage that presumably then you would not have to decide at what point in your program you’d have to read in the data and possibly memoize it.

I personally would put such data in a separate data directory rather than in src, since it isn’t source code unless you convert it.

If you don’t care any of that, of course there’s nothing stopping you from just using CSV’s if that’s what you want to do.

1 Like

Thanks for the info. I’m going to go the HDF5 (or some other format) route (stored in a separate data folder) for now because the amount of data is likely to grow so I like the idea of storing it in a more efficient format.

1 Like

I think that saving small example datasets a directory of your choice (eg data/) within a package is fine. And CSV is a great choice, since for small datasets size and other considerations (like reading time, ability to mmap, etc) not very significant compared to the advantages (simplicity, transparency, lightweight dependencies).

For certain kinds of data,

https://docs.julialang.org/en/v1/stdlib/DelimitedFiles/

may be enough, too.

You can then define a path lookup function in your package.

3 Likes

Thanks for the info @Tamas_Papp!