Best practice: modules and scripts for publishing

BioTurboNick · June 17, 2019, 5:02pm

I’m getting ready to publish an article using Julia-based code. I’m curious if anyone has thoughts about how such code should be arranged.

Right now, I have a bunch of modules to separate functions, and a top-level script to run them.

However, since these are modules but not packages, they exist in Main instead of on their own, and that has occasionally created problems when working with them.

Would it be better to have everything in the top level but separate files to be included? Would one option be easier or harder for others in some way?

Should I put everything into functions or is running a script file with calls for the working parts okay?

Curious if anyone has thoughts on this.

ffevotte · June 17, 2019, 9:14pm

What I would do is organize things as if it were a “full-fledged” project

dependencies listed in Project.toml and Manifest.toml
real source code in files under the src directory, possibly organized in submodules
top-level “glue” script in test/runtests.jl

So that anyone wanting to reproduce your results only has to clone your repository (or otherwise get the files), then issue

using Pkg
Pkg.instantiate()
Pkg.test()

This may be overkill, but I would think that it is worth a little extra effort to make others’ life easier.

For a more minimalistic option, you might consider only defining an environment (via the Project.toml and Manifest.toml files) alongside a one-file “script” which users would have to run.

Tamas_Papp · June 18, 2019, 6:13am

I don’t think it is. Organizing code for a paper as a package has a lot of various advantages, starting with CI. We do this with coauthors (on Gitlab), primarily to ensure that code on master is always kept in a working state, so that when we throw it on the cluster for running with the large dataset, we don’t get back failed batch jobs (we generated mock data for testing and benchmarking).

The other advantage comes from committing the Manifest.toml: you get a reproducible environment. This is invaluable when tracking down issues.

kevbonham · June 18, 2019, 10:32am

It’s overkill in the sense that the vast majority of scientific code is nowhere near this level of organization and reproducibility. That said, most programming languages don’t give you such an easy time providing a gift wrapped reproducible environment. It would be a crime not to use it.

For my next paper I’ve got a module, and then all my analysis code written as docs + doctests.

BioTurboNick · August 21, 2019, 8:15pm

I’m in flux a bit on this, but what I’m settling on so far:

Make a full package, as recommended above.
Name should be descriptive and include something unique, like initials, so that it won’t crowd the namespace (e.g. CoolNewScript_NCB). Unless the package is intended to be more general-purpose and long-lasting, then go ahead and take a creative and/or general name.
The main script file sets up the module, loads dependencies, and includes subfiles.
Functions and data structures split into separate files, grouped into folders if more than a few. No submodules or second modules.
Any solid general purpose code should ideally be split into a separate repository from the custom/experimental stuff, for inclusion in the package registry.

Haven’t settled on how to run it though. I’m hesitant to use “runtests.jl” and Pkg.test() for anything other than unit testing. Haven’t gotten the hang of doctests yet.

Maybe I should be doing something like a Juptyer notebook for the run script, but haven’t gotten into those either. I may just stick with a “run.jl” file or similar.

Thanks for your help! And @kevbonham, I’ll be curious to see how you structured your code once it’s out.

kevbonham · August 21, 2019, 9:08pm

It’s still WIP, but you can take a look here. src/ has a module, set up like a normal package. docs/ are analysis notebooks (currently using Weave, not Documenter, but will try to do it with Literate + Documenter later). There’s also a data/ folder that will eventually interface with DataDeps to allow downloads of relevant stuff that currently only lives on our private server, and bin/ contains some miscellaneous scripts.

ffevotte · August 22, 2019, 11:54am

I used to rely on Pkg.test(), which had the nice property of taking care of test-specific dependencies (or, in this case, use-case-specific dependencies, as opposed to dependencies required by the code under the src/ dir). However, I don’t think I would recommend (ab)using Pkg.test() any more for running plain use cases (as opposed to unit tests), especially since I recently realized that Pkg.test() did a lot more than what I initially thought. In particular, it sets --check-bounds=yes , which is a very sensible thing to do for tests, but impacts the performance for “regular” use cases.

Topic		Replies	Views
Why start domain-specific scientific projects as Packages? General Usage	16	2310	May 16, 2020
Workflow for Julia scripts New to Julia	14	10085	November 16, 2018
Organizing local Julia code into modules and/or packages General Usage package-manager , modules , code-organization	7	1359	October 14, 2020
How to group several modules in a package New to Julia packages , book , tutorials	27	1274	September 7, 2018
Distinguishing projects from packages Internals & Design packages , code-organization , project	38	5353	March 23, 2018

Best practice: modules and scripts for publishing

Related topics