Including CSV Files in a Package I'm Developing..?

mthelm85 · November 26, 2019, 2:37pm

I’m attempting to develop my first package and I have a question about how to incorporate some necessary data into the package. Right now, I have four .csv files that include some data/lookup values that will be necessary for package users. Is it okay to do it this way or is there a better approach? For example, would it be better to simply convert them to Dicts and assign them to variables that are defined in a .jl file?

The current structure looks something like this:

src /
  data_dicts/
      csv1.csv
      csv2.csv
      csv3.csv
      csv4.csv
  MyPkg.jl

In this contrived example, MyPkg.jl reads from the .csv files. The files are small (6kb each) and wouldn’t be too much of a pain to convert to dicts, but it’s more convenient for me to leave them in .csv format.

ExpandingMan · November 26, 2019, 2:45pm

It depends on what you are doing, but it seems quite unlikely that using CSV is the best approach as reading them is rather slow (even though CSV.jl is excellent). You may want to consider a different serialization format such as HDF5.

If it’s a small enough amount of data, writing a script that converts the CSV’s to Julia source code can be a good approach. It has the significant advantage that presumably then you would not have to decide at what point in your program you’d have to read in the data and possibly memoize it.

I personally would put such data in a separate data directory rather than in src, since it isn’t source code unless you convert it.

If you don’t care any of that, of course there’s nothing stopping you from just using CSV’s if that’s what you want to do.

mthelm85 · November 26, 2019, 2:59pm

Thanks for the info. I’m going to go the HDF5 (or some other format) route (stored in a separate data folder) for now because the amount of data is likely to grow so I like the idea of storing it in a more efficient format.

Tamas_Papp · November 26, 2019, 3:12pm

I think that saving small example datasets a directory of your choice (eg data/) within a package is fine. And CSV is a great choice, since for small datasets size and other considerations (like reading time, ability to mmap, etc) not very significant compared to the advantages (simplicity, transparency, lightweight dependencies).

For certain kinds of data,

https://docs.julialang.org/en/v1/stdlib/DelimitedFiles/

may be enough, too.

You can then define a path lookup function in your package.

mthelm85 · November 26, 2019, 4:03pm

Thanks for the info @Tamas_Papp!

Topic		Replies	Views
What's the difference between CSV.jl and CSVFiles.jl? New to Julia	24	8429	January 29, 2020
How to include non-jl files in a package General Usage package	5	521	April 22, 2025
Julia data storage New to Julia question	5	1047	August 5, 2020
Alternative to DataFrame Readtable to read large data files with headers Data	17	4209	November 12, 2018
Providing datasets in a package General Usage	7	439	March 14, 2024

Including CSV Files in a Package I'm Developing..?

Related topics