Best Practices for Package Resource Location

Hello,

I’m creating a package that will perform table lookups on some standardized chemical tables one can find online, and I’d like to inquire as to best practices for where and how to look for these resources once downloaded.

The current workflow will be:

  1. User finds table (one for hydrogen, one for nitrogen, etc) online they like want to work with, and they download it into a location of their choice. Perhaps eventually I’ll migrate this step within the package itself.
  2. User imports my package
  3. User sets path to the library of various tables they have downloaded
  4. User calls function from my package, and it grabs the table requested from the path where the user is storing these tables

Example usage:

using MyPackage

libpath = "/home/user/chemtables/"

value = MyPackage.hf("H2O",298.15) 
println("The enthalpy of formation of water at 298.15K is: $value")

I’d rather not always have the path as an input to the function that will access the table. This seems to suggest that my only option is to be playing with global variables, and have a reserved variable name that the package always accesses for the user-defined path (which I suppose would always be needed at the top of the script, or maybe have a backup default location if it can’t find it). Is there a better way to approach this?

Edit: for those bothered by inputting a temperature for a formation enthalpy, forgive me, I was thinking sensible enthalpy at the time. :slight_smile:

Using global variables would make it difficult to use multiple tables in the same caller, no?

I would tend to suggest having some sort of data structure that wraps a table, e.g.

ct = ChemTable("/home/user/chemtables/datafile")

and then define functions that act on this, e.g. hf(ct, "H2O", 298.15). Or maybe have ct["H2O"] return a data structure about the molecule (similar to https://github.com/JuliaPhysics/PeriodicTable.jl) that you can operate on:

H2O = ct["H2O"]
hf(H2O, 298.15)

or even use dot overloading: ct["H2O"].hf(298.15).

Partly it depends on what your data looks like. You may not want to hit the disk for every query — either load in all of the data when you construct ChemTable(...), or load it in lazily and cache it in a dictionary internally to ct.

2 Likes

Good thoughts on the disk overhead each call, I hadn’t considered that. I’ll look more into a custom data structure of sorts that will just hold it there. The tables aren’t big, so that shouldn’t be an issue.