API‑design feedback for DataSplits.jl (dataset‑splitting package)

davide.crucitti · July 30, 2025, 9:54am

Hello everyone,

In the past month I’ve been working on an article on the effectiveness of dataset splitting algorithms in cheminformatics. Since I didn’t need specific chemistry libraries for this project I decided to use it as an occasion to pick up Julia. Many of these algorithms are applicable to other fields and can be reused and I decided I could take it out of my specific code and make a package. Some of the algorithms require chemistry specific utilities - which are not yet really available in the Julia ecosystem and thus I’ll keep it in my research code, but many other algorithms can be moved to this package.
I temporarily named it DataSplits.jl GitHub - davide-grheco/DataSplits.jl: A Julia package implementing several data splitting algorithms, but maybe you could advise a better name. I’ve been wrestling with how to design a clean, idiomatic public API and
would greatly appreciate any advice or pointers to existing best‑practice guides.

What I’m wondering

Custom types vs. standalone functions
I currently define one struct …Split per algorithm and dispatch a single split(X, strategy) function on the type. Would it be more better to skip the wrapper types and just provide algorithm‑named functions (e.g. kennardstone(X, frac))? What are the trade‑offs in discoverability, extensibility, and
multiple dispatch? I noticed some libraries follow this approach, such as Distances.jl, while other do not.


struct KennardStoneSplit <: SplitStrategy
    frac::Float64
end

split(X, KennardStoneSplit(0.8))

Return types
Right now each splitter returns a TrainTestSplit (or TrainValTestSplit, etc.). Would it be acceptable/better to return plain tuples of index vectors (train, test) and let callers destructure them? When is it worth introducing custom result types?

struct TrainTestSplit
    train::Vector{Int}
    test::Vector{Int}
end

Extensibility for custom user strategies
If users want to write their own splitting strategy, should they define a new subtype of SplitStrategy and overload split, or is there a simpler plugin pattern?
Error‐handling conventions
I’ve added a few custom exception types (SplitInputError, SplitParameterError, etc.) to make failures catchable. Is that overkill, or recommended for library code?
Learning resources
I haven’t found a central guide to Julia library design. I have skimmed Hands-On Design Patterns and Best Practices with Julia and various guides available online but did not find any complete explanation on the topic. Are there any templates, blog posts, or style guides you’d recommend?

Thank you for any feedback or examples of how you’ve tackled similar design questions! I’m happy to share more context or code snippets if it helps.

Topic		Replies	Views
Python train_test_split vs Julia splitobs Data	11	3744	May 5, 2017
Industrial standards for Julia packages Internals & Design	20	2585	October 16, 2018
Examples of Well-Written Julia packages to emulate--in terms of generic functions, design, etc New to Julia package , design-pattern	16	2117	July 29, 2019
"Names" packages? Tooling	31	1981	April 13, 2021
Julia stats, data, ML: expanding usability Statistics statistics	84	5409	October 14, 2021

API‑design feedback for DataSplits.jl (dataset‑splitting package)

What I’m wondering

Related topics