Hi,
I’m excited to announce OnlineStatsChains.jl, a package that enables you to chain OnlineStats computations in a Directed Acyclic Graph (DAG) structure with automatic value propagation.
Important: This is a highly experimental package. It was created as an exploration of AI-assisted development and may not be suitable for production use without thorough testing and validation for your specific use case.
What it does
OnlineStatsChains.jl allows you to build computational pipelines where statistics are automatically updated as data flows through a graph. Perfect for streaming data processing, incremental analytics, and complex statistical workflows.
This package addresses the feature request in OnlineStats.jl#272 for chaining OnlineStats computations.
Key Features
DAG Construction: Build computational graphs with automatic cycle detection
Three Evaluation Strategies:
- Eager (default): Immediate propagation when
fit!()
is called - Lazy: Deferred computation until
value()
is requested - Partial: Optimized propagation for affected subgraphs only
- Eager (default): Immediate propagation when
Multi-Input Nodes: Support for fan-in and fan-out patterns
Batch & Streaming: Process data element-by-element or in batches
Type-Safe: Works with any
OnlineStat
from OnlineStatsBase.jl
Quick Example
using OnlineStatsChains
using OnlineStatsBase
# Create a computational DAG
dag = StatDAG()
# Add nodes
add_node!(dag, :source, Mean())
add_node!(dag, :variance, Variance())
# Connect nodes
connect!(dag, :source, :variance)
# Fit data (propagates automatically)
fit!(dag, :source => [1.0, 2.0, 3.0, 4.0, 5.0])
# Get results
println("Mean: ", value(dag, :source)) # 3.0
println("Variance: ", value(dag, :variance))
AI-Generated Package Notice
This package was entirely generated using Claude Code (Anthropic’s AI coding tool).
I wanted to experiment with AI-assisted package development and see what could be achieved. The entire codebase, tests, and documentation were created through an iterative process with Claude Code.
Development Methodology
The development process leveraged:
- EARS (Easy Approach to Requirements Syntax): Requirements were written using structured natural language patterns to ensure clarity and completeness
- BDD (Behavior-Driven Development): Test scenarios were written in Given-When-Then format to describe expected behavior before implementation
- Iterative refinement: Claude Code generated code, tests, and documentation through multiple iterations based on these specifications
Why share this?
- Transparency: Users should know the development process and experimental nature
- Experimentation: Testing the boundaries of AI-assisted package development in Julia
- Community feedback: Learning what works (and what doesn’t) in AI-generated code
- Research interest: Understanding how AI tools can contribute to the Julia ecosystem
What this means for users
This package should be considered experimental software. While it includes comprehensive tests and documentation, I strongly encourage you to:
Review the code for your specific use case
Add application-specific tests
Report any issues found
Verify behavior matches your requirements
Do not use in production without extensive validation
Expect potential breaking changes as the API stabilizes
This is part of exploring how AI tools can accelerate Julia package development while maintaining quality standards. I’m particularly interested in feedback on both the package functionality and this development approach. Consider this a proof-of-concept and learning experiment rather than a battle-tested library.
Installation
# Once registered in General
using Pkg
Pkg.add("OnlineStatsChains")
# Or from GitHub
Pkg.add(url="https://github.com/femtotrader/OnlineStatsChains.jl")
Use Cases
- Streaming analytics: Update multiple statistics as data arrives
- Pipeline processing: Chain transformations and aggregations
- Complex workflows: Build sophisticated statistical pipelines with dependencies
- Incremental computation: Efficiently recompute only affected parts of the graph
Related Packages
While there are several excellent Julia packages for DAG-based computation (like Dagger.jl for parallel execution, ReactiveGraphs.jl for reactive programming, or DirectedAcyclicGraphs.jl for general DAG infrastructure), OnlineStatsChains.jl focuses specifically on:
- Incremental statistics: Built specifically for OnlineStats.jl’s streaming statistics
- Automatic propagation: Statistics update automatically as data flows through the graph
- Multiple evaluation strategies: Eager/Lazy/Partial modes optimized for statistical workflows
- Lightweight: Designed for statistical pipelines rather than general-purpose parallel computation
If you need heavy parallel computation across multiple machines, Dagger.jl is likely more appropriate. If you’re building reactive data processing applications, check out ReactiveGraphs.jl. OnlineStatsChains.jl is best suited for chaining statistical computations in an incremental manner.
Resources
- Documentation: Home · OnlineStatsChains.jl
- Repository: GitHub - femtotrader/OnlineStatsChains.jl: A Julia package for chaining OnlineStats computations in a Directed Acyclic Graph (DAG) structure with automatic value propagation
- Issues: GitHub · Where software is built
Acknowledgments
Built on top of the excellent OnlineStats.jl ecosystem by @joshday.
Looking forward to your feedback, suggestions, and hearing about how you might use this! Questions and contributions are very welcome.
Potential Extensions
The current implementation provides core DAG functionality for chaining OnlineStats. Some areas that could potentially be explored in the future include parallel execution of independent branches, integration with streaming data sources, or more sophisticated graph manipulation capabilities.
However, I’m particularly interested in hearing what features would be most valuable to actual users. What would make this package more useful for your workflows?