How to use exactly a single Module with a single code?

I’ve already looked at several tutorials on code layout that are a bit long. But I can’t apply them to my case because I can’t see my mistake. I use Module.

Here’s the problem. I have a code that contains a lot of data that I would like to separate from the calculation code.
Let’s say my data are in Data.jl and my computation code is Calc.jl

Data.jl contains:

> module  Data 
>     # Here are data
> end

What exactly do I need to do to make Calc.jl aware of the data contained in the Data module? Where should the “include” and “using” instructions appear?

Thank you for your answer and sorry for disturbances since I’m sure that this is a very basic question that has been answered in extenso.

Thierry

Hi @aquarelleX332,

A few questions to understand your needs better:

  • Are Data and Calc in the same big repo? Do they belong together or do they make sense independently?
  • What do you mean by “code that contains data”?

Before we discuss modules, you need to know that include has nothing to do with modules (unlike Python’s import). Conceptually it just copy-pastes code from one file to another, which means it’s just a way to have smaller files but it does not influence the logical structure of your code. Thus you can reason as if your whole code were in one single file.

3 Likes

Hello gdalle,
Thank for you interest,

Yes: files Data.jl and Calc.jl are in the same repository.
Calc.jl must access to all the data contained in Data.jl

In its original version my code contains:
A lot of physical data, and a second part resolves a differential equation and some others computation.

I would like to group together data in a separate file (say, Data.jl) so that the main code access to the data. So I could change data within Data.jl without modifications of the computation part.

Perhaps I’m misunderstanding here, but I wouldn’t use a package to encode data — instead I’d use the raw/original data file(s) that make sense for your application, in whatever raw format you have them in (be it CSV or HDF5 or Parquet or whatever). And if your data are bigger than a handful of MBs or change frequently, then I wouldn’t commit them into a git repository at all.

You could have a package that helps you read in data files or a submodule that helps you pre-process them, though! There are indeed several ways to arrange two modules together, and what makes the most sense would depend upon the particulars of your use-case.

3 Likes

The simplest way would be two different files within the same module.
To create the right file structure, open a Julia REPL in Pkg mode (with ]) and then run

pkg> generate MyPackage

Then you can add two files data.jl and calc.jl to the src folder, before editing your main file src/MyPackage.jl like so:

module MyPackage

include("data.jl")
include("calc.jl")

end

As long as you include data.jl before calc.jl, the objects defined in the latter will have access to the objects defined in the former.

To use your brand new package, just stay in the Julia REPL, activate the environment corresponding to MyPackage and then you can do

julia> using MyPackage

to gain access to everything you defined.

1 Like

I don’t want to derail the discussion but if be interested what your advice here would be instead.
Personally, I like to keep even moderately sized data files as git lfs because it is nice to be able to clone the repo containing the data and my evaluation scripts. But I admit it can take quite long to run git commands.

Before I used to keep the data files in a separate locations from the git folder but I didn’t like that so much because it is less reproducible and I often work on several machines depending which one is more convenient…

I usually use DataDeps.jl

3 Likes

I use most frequently use JuliaHub’s DataSets these days (as you might expect :slight_smile: ) — when executing a batch job it records the exact version of both the code and data accessed.

1 Like

Hello gdalle,

Sorry for the late reply. I was away from my computer yesterday.

This is indeed a very good solution for my problem. Thank you very much for your help.

1 Like

Thanks also to everyone who answered my question. I read your links.

Hello mbauman,

The tutorial for basic processing of datasets is very interesting. Many thanks!

1 Like