[ANN]: MetidaFlows.jl - experimental workflow engine project for Julia

PharmCat · May 27, 2026, 4:11pm

Hi everyone!

I’d like to share an early-stage project called MetidaFlows.jl .

The package is currently in active development and should be considered experimental .
The goal right now is to explore workflow-engine design patterns in pure Julia and collect feedback from the community.

What is MetidaFlows.jl?

MetidaFlows.jl is a lightweight graph-based workflow engine prototype for building:

data-processing pipelines
DAG-based computations
agent/event-driven workflows
typed node graphs with validation

It is designed to stay minimal, explicit, and fully Julia-native (no external runtime).

Current functionality

The package already implements:

typed input/output ports
graph-based workflow model
connection validation with type checking
DAG topological scheduler (DAW )
queue-based scheduler (ABW , experimental)
execution invalidation propagation
node execution state tracking
validation hooks (settings / result / structure)
incremental graph modification
simple serialization helpers

Execution model

Nodes are defined via multiple dispatch:

function MetidaFlows.execute_unsafe!(node::DataNode{MyNode})
    ...
end

Workflow modes

DAW — Data Analysis Workflow

Deterministic execution using topological sorting of a DAG.

ABW — Agent-Based Workflow

Queue-based execution model for dynamic/reactive workflows (still experimental).

Real working example

This example is taken directly from the test suite and actually runs:

using MetidaFlows
using CSV, DataFrames

struct CSVNode <: AbstractNodeType end
struct DataFrameNode <: AbstractNodeType end

csv_spec = NodeSpec(
    "Load CSV",
    PortSpec[],
    [PortSpec("CSV File", CSV.File, :csv)],
    [:file]
)

df_spec = NodeSpec(
    "DataFrame",
    [PortSpec("CSV File", CSV.File, :csv)],
    [PortSpec("DataFrame", DataFrame, :dataframe)]
)

function MetidaFlows.execute_unsafe!(node::DataNode{CSVNode})
    csv = CSV.File(node.settings[:file])
    setdata!(node, :csv, csv)
    return [:csv]
end

function MetidaFlows.execute_unsafe!(node::DataNode{DataFrameNode})
    csv = getinputdata(node, :csv)
    setdata!(node, :dataframe, DataFrame(csv))
    return [:dataframe]
end

workflow = Workflow(0)

id1 = add_node!(workflow, DataNode(CSVNode, csv_spec))
id2 = add_node!(workflow, DataNode(DataFrameNode, df_spec))

add_connection!(workflow, id1, :csv, id2, :csv)

setsettings!(workflow, id1, Dict(:file => "data.csv"))

scheduler!(workflow)

df = getdata(workflow, id2, :dataframe)

Current status

This is not a production-ready package .

Expect breaking changes while the architecture evolves. The main focus right now is:

stabilizing execution semantics
improving scheduler correctness
refining invalidation model
improving test coverage

Roadmap ideas

caching & checkpointing
audit/logging system
better serialization/export formats

Feedback welcome

I’d especially appreciate feedback from people working with:

DAG systems
ETL pipelines
scientific computing workflows
node-based editors
agent-based systems

GitHub:
MetidaFlows.jl

Documentation:
Documentation link

tp2750 · May 30, 2026, 7:34pm

Looks cool! Thanks for sharing.
Are you planning dashboard functionality like Apache Airflow?

kahliburke · June 1, 2026, 8:06am

This looks on first glance a bit like Apache Hamilton?

sylvaticus · June 1, 2026, 2:02pm

..or like Snakemake in Python ??

PharmCat · June 15, 2026, 1:07am

Hi! Thank you all for the feedback and interest in the project. I’m really happy to see that it has sparked some discussion. The package was originally developed for my own work, but I thought it could also be useful to the Julia community.

At the moment, the package is still at a very early stage. A visual dashboard or UI is definitely an interesting direction, but it’s not a primary goal yet.

Yes, Apache Hamilton follows a similar idea of building data workflows from interconnected computational units.

Yes, there is some overlap in the overall goal of reproducible and structured data analysis.

The project is still evolving, so feedback are suggestions are very welcome.

davide.crucitti · June 15, 2026, 10:04am

I don’t know whether you know row, it is very useful in HPC since it does not depend on a service running: you submit it and it executes all available actions. This is better on an HPC as you can schedule a cron or something like that rather than trying to seup a long running job which may stop at any time.
Moreover, it does not touch the actual files, allowing a job to manage its own files autonomously. For example SnakeMake will delete results if the calculation did not complete, this is not always desirable. You can set a calculation to restart from a checkpoint and would not want to lose days of computation.

Overall, this is an approach I prefer when you have to conduct long running and compute intensive calculations. While other managers are better when handling many short running calculations.

Topic		Replies	Views
[RFC] Mr Phelps - a distributed workflow orchestrator Package Announcements	42	6145	August 4, 2021
Suggestions for best practice in scaling analysis Julia at Scale	3	479	June 16, 2024
Snakemake with julia (and similarities with Dr Watson or..) General Usage workflow , scientific-project , drwatson , reproducibility	9	796	February 23, 2025
Visual workflow tool for Julia - let's build one! Data	24	7060	April 15, 2021
Workfow for data analysis project: avoid re-running with many inputs, intermediate steps, and models New to Julia workflow	8	1035	October 8, 2020