Which should I use `__init__` or `deps/build.jl`?

I’m writing a module which has a CSV file as data, and should be expanded and processed to DataFrame and saved as JLD2 file. However the data size of the generated JLD2 file is much larger than original CSV file, I want to generate them when the module is going to be used.

I came up with two ways:

  • Write the process in the __init__ function of the module
  • Write the process in deps/build.jl

Which (or another way) should I choose?

Artifacts?

1 Like

Also, is it important that the DataFrame be saved as JLD2? Arrow storage from Arrow.jl is often more compact and usually much faster to read than other formats. In MixedModels.jl we use an Artifact with .arrow files to store and access data for tests and examples. That has worked well for us.

2 Likes