How to write a program as a Julia package?

All the Julia packages I have encountered export functions to be used as building blocks for other code.
The main file looks something like this:

module Package

using Dependency1, Dependency2

export function1, function2

include("file1.jl")
include("file2.jl")

end # module

However, I just want my Julia package to read inputs (probably from a TOML file), process the data, and then dump a bunch of outputs like one big script. How can I write my package to make input → output as simple as possible for my users?

I was going to try a structure like this:

module Package

using Dependency1, Dependency2
using TOML
input = TOML.parsefile("./input.toml")

export process, readfunction1, readfunction2, etc

include(joinpath(@__DIR__, "readfunctions.jl"))
include(joinpath(@__DIR__, "buildfunctions.jl"))
include(joinpath(@__DIR__, "writefunctions.jl"))
include(joinpath(@__DIR__, "plotfunctions.jl"))
function process(input)
    include(joinpath(@__DIR__, "read.jl"))
    include(joinpath(@__DIR__, "build.jl"))
    include(joinpath(@__DIR__, "write.jl"))
    include(joinpath(@__DIR__, "plot.jl"))
end

end # module

My biggest issue with this approach is that you cannot tinker with and explore the output in Julia afterwards. I could return a struct from process but it would have like 12 dataframes, 6 vectors, 8 scalars, and 4 plots. I’m not sure that makes much sense to bundle together. I would also rather define the functions in readfunctions.jl at the top of read.jl instead, but then again I cannot tinker with them outside of process.

I would eventually like to create a GUI for locating input.toml and running process(input), but I don’t know how to do that yet.

Actual code here.

3 Likes

At the moment, your the input.toml is parsed during precompilation rather than run time. If you really do want to parse at compile time, you should make your input a const:

const input = TOML.parsefile("./input.toml")

If you want to execute this at runtime, then I would consider creating a main function and call it from __init__().

function main()
    input = TOML.parsefile("./input.toml")
    return input
end

const input= Ref{Dict{String, Any}}()

function __init__()
   input[] = main()
end

This will then execute when you do using Package.

1 Like

I’ve mentioned this before to you, but Infiltrator.jl can help you a lot here with working in specific scopes.

Also, this

function process(input)
    include(joinpath(@__DIR__, "read.jl"))
    include(joinpath(@__DIR__, "build.jl"))
    include(joinpath(@__DIR__, "write.jl"))
    include(joinpath(@__DIR__, "plot.jl"))
end

is code-smell. You should define smaller functions and build a larger function by calling those smaller functions, not by making one giant process function.

1 Like

Also note that if you do julia --project=. src/Package.jl then your __init__() will be run.

Suppose if you have src/FooBar.jl as follows

module FooBar
    __init__() = println("Hello World from FooBar.__init__()")
end

This will work:

$ julia --project=. src/FooBar.jl
Hello World from FooBar.__init__()
1 Like

Just popping here to say that I’ve had trouble in the past when combining include with joinpath: last time I checked, it seemed to block some of VSCode’s introspection capabilities.
Besides, the path in include is always understood from the file that contains it, so you can just do include("readfunctions.jl").

4 Likes

It is not working for me that way. It tries to read from the package directory rather than from the package/src directory if I do not include @__DIR__.

julia> process()
ERROR: SystemError: opening file "C:\\Users\\nboyer.AIP\\.julia\\dev\\ASME_Materials\\ReadTables.jl": No such file or directory

Thanks for the tips. I was getting confused about some of my code executing during precompilation.

The README for Infiltrator.jl says it is missing some features included in VSCode’s debugger, so I am just going to continue using that built in tool. Are you saying that I should use a debugger to explore the data rather than trying to return everything from the main process function?

I get stuck on how to do that. I make functions for everything I can simplify, but at some point I just need to call simple functions serially. Take my read.jl below for example. How would I return all those tables from a function nicely, and what benefit would that have over just including the file?

read.jl
# Read Tables
function readtable(filepath, sheetname)
    DataFrame(XLSX.readtable(filepath, sheetname, first_row = 2, infer_eltypes=true)...)
end
tableY = readtable(inputfilepath, "Table Y-1")
tableU = readtable(inputfilepath, "Table U")
tableTMkey = readtable(inputfilepath, "Table TM-1 - Key")
tablePRDkey = readtable(inputfilepath, "Table PRD - Key")
tableTEkey = readtable(inputfilepath, "Table TE-1 - Key")
tableTCDkey = readtable(inputfilepath, "Table TCD - Key")
tableTM = readtable(inputfilepath, "Table TM-1")
tablePRD = readtable(inputfilepath, "Table PRD")

# Find Chemical Composition
nomcomp = only(tableY[
                (tableY."Spec. No." .== specno) .&
                (tableY."Type/Grade" .== type_grade) .&
                (tableY."Class/Condition/Temper" .== class_condition_temper)
                , "Nominal Composition"])

# Find Groups
function findgroup(df, value)
    for group in names(df)
        if first(df[:, group]) === missing
            continue
        end
        if value in df[:, group]
            return group
        end
    end
end
TMgroup = findgroup(tableTMkey, nomcomp)
PRDgroup = findgroup(tablePRDkey, nomcomp)
TEgroup = findgroup(tableTEkey, nomcomp)
TCDgroup = findgroup(tableTCDkey, nomcomp)

# Read Key-Dependent Tables
tableTE = readtable(inputfilepath, "Table TE-1 - " * TEgroup)
tableTCD = readtable(inputfilepath, "Table TCD - " * TCDgroup)

I would organize code so that you have global dictionaries, and the functions modify the dictionaries.

global tables = Dict{String, Any}()

function make_tables(tables, inputpath)
    tables["TE"] = readtable(...)
    tables["TCD"] = readtable(...)
end

but also, there’s no need to make everything a module in Julia. Packages are meant for re-useable code. It looks like this is analysis, which doesn’t need to be re-used. So it doesn’t really need to all be in one module.

As long as the performance-critical bits are put into functions, there’s nothing wrong with just doing include with scripts and avoid putting everything in a module.

Anything at the “top level” of your module will be executed during precompilation. That is it will only run once when the source files change, and then data will be saved to disk in your precompilation cache. When the module is loaded, __init__ will be called. Generally, everything should be in a function that will be called from some main function that will be called from __init__.

I think in OP’s case, there is nothing wrong with ignoring calling main() from within __init__. That’s probably over-complicating things for them. They can do using Module; main()

1 Like

This is analysis that will need to be run repetitively on different input data and by other users. I am using a package for reproducibility and easy updating. It is my understanding that packages must have a module. (I am not using modules on purpose to segregate the namespace.) I am open to other options.

My background is from Matlab where I just hit the play button on a script and all my data was available in the workspace. So far, I like better code organization and version control of Julia packages.
However:

  1. Packaging all my data into progressively more nested containers for function IO, makes accessing the data later a bigger pain; finaloutput.tables["TE"][:,"Coefficient of Thermal Expansion (°F^-1)"] is a lot of code to access a vector of datapoints for plotting.

  2. Modules and environments are tricky.

At the end of the day, I am asking for the Julian way to write and organize one big reproducible program that I can send to other people to run on their datasets. I will keep working on learning what I need to in order to make it work correctly.

1 Like

I’m confused. Is this input file something the user provides?

If so, that should be an input of a function of your package, not something to be executed or read during package loading.

Yeah, I messed that up as mkitti pointed out. Right now, I just have the input as constants at the top of the main file, but I want to transition to a TOML file that I would package with my code and have users edit. I am also keeping an eye on Best way to specify a filename inside a function - New to Julia - JuliaLang for other user input options.

I would not package it with the code. You can provide it as an example input file, and read it as the input of a function.

Having the state of the package mixed with the user input is a likely source of issues.

3 Likes

I think I would need to keep the example input file in the repository in case the format needs to change when I update the package. In the past I have added a run_this.jl file to the top level of my repository. I told users to copy, modify, and then run that file in the README.md.

1 Like

Keeping it in the repository is fine, in the top folder, or in an “examples” folder, as you feel is more natural.

The thing is that this file should not be read during package initialization. Otherwise a user will not be able to naturally run two different setups, for example.

The most common pattern, I think, would be to provide this example file and then use it as the input of your first interface function, lets say, something like:

setup = SetupData("/home/user/myfolder/input.toml")

and then this sets up the setup data structure with whatever needed to run the other functions of your package.

I don’t think this is the right thinking.

Your package should provide an API that does not change for the user, i.e. a set of structs and functions for the user to work with.

Then the documentation should be clear about how the user should interact with that API.

But I don’t think you should be telling the user to modify some specific file.

1 Like

My users don’t know how to code, so calling a series of functions to process data is out of the question. I’d rather them not have to modify files too, but I need some way for them to tell my code where their data is stored and what type of analysis they want to run.

I initially used a .jl file for input because that was easiest for me, and they could press “play” in Atom to run it, but it was hard for them to understand what to enter on each line with comments, definitions, and function calls in the same file. I then tried a Pluto notebook for input, so I could write instructions in markdown for each code cell, but they couldn’t open the notebooks in browser. I haven’t tried .toml or Preferences.jl yet.

I think I am going to try including a run_this.jl file that just contains:

using MyPackage
input = SetupData(joinpath(@__dir__, "input.toml"))
process(input)

and tell them to copy run_this.jl and input.toml to the same directory as their data files.

Then my question is what should I make MyPackage.jl look like. Is this okay? (It still doesn’t work for me without joinpath(@__dir__, ) btw):

module MyPackage

using Dependency1, Dependency2

export process, SetupData

function process(input)
    include("read.jl")
    include("build.jl")
    include("write.jl")
    include("plot.jl")
    return "Done"
end

or is there a good reason to change include("build.jl") into complicated_build_struct = build(input, complicated_read_struct)?

1 Like

Build a GUI?

I understand you. I would suggest to exemplify with the full path to the file, otherwise if they start Julia by double clicking you are again in trouble. (If they ate Linux users that is less of a trouble)

1 Like