API‑design feedback for DataSplits.jl (dataset‑splitting package)

Hello everyone,

In the past month I’ve been working on an article on the effectiveness of dataset splitting algorithms in cheminformatics. Since I didn’t need specific chemistry libraries for this project I decided to use it as an occasion to pick up Julia. Many of these algorithms are applicable to other fields and can be reused and I decided I could take it out of my specific code and make a package. Some of the algorithms require chemistry specific utilities - which are not yet really available in the Julia ecosystem and thus I’ll keep it in my research code, but many other algorithms can be moved to this package.
I temporarily named it DataSplits.jl GitHub - davide-grheco/DataSplits.jl: A Julia package implementing several data splitting algorithms, but maybe you could advise a better name. I’ve been wrestling with how to design a clean, idiomatic public API and
would greatly appreciate any advice or pointers to existing best‑practice guides.

What I’m wondering

  1. Custom types vs. standalone functions
    I currently define one struct …Split per algorithm and dispatch a single split(X, strategy) function on the type. Would it be more better to skip the wrapper types and just provide algorithm‑named functions (e.g. kennardstone(X, frac))? What are the trade‑offs in discoverability, extensibility, and
    multiple dispatch? I noticed some libraries follow this approach, such as Distances.jl, while other do not.

struct KennardStoneSplit <: SplitStrategy
    frac::Float64
end

split(X, KennardStoneSplit(0.8))
  1. Return types
    Right now each splitter returns a TrainTestSplit (or TrainValTestSplit, etc.). Would it be acceptable/better to return plain tuples of index vectors (train, test) and let callers destructure them? When is it worth introducing custom result types?
struct TrainTestSplit
    train::Vector{Int}
    test::Vector{Int}
end
  1. Extensibility for custom user strategies
    If users want to write their own splitting strategy, should they define a new subtype of SplitStrategy and overload split, or is there a simpler plugin pattern?
  2. Error‐handling conventions
    I’ve added a few custom exception types (SplitInputError, SplitParameterError, etc.) to make failures catchable. Is that overkill, or recommended for library code?
  3. Learning resources
    I haven’t found a central guide to Julia library design. I have skimmed Hands-On Design Patterns and Best Practices with Julia and various guides available online but did not find any complete explanation on the topic. Are there any templates, blog posts, or style guides you’d recommend?

Thank you for any feedback or examples of how you’ve tackled similar design questions! I’m happy to share more context or code snippets if it helps.

1 Like