[RFC] Mr Phelps - a distributed workflow orchestrator

From what I hear, everyone who uses CWL hates it.
But my sample size isn’t huge.
I have some exposure to Galaxy, which is a CWL based workflow GUI thing uses in Bioinfomatics and Speech Processing (turns out those have some interesting tooling overlap, esp w.r.t history of using bash to glue seriously complicated stuff together.)


You might like to take a look at DataDepsPaths.jl

I have been told its kind of like snakemake.

DataDepsPaths doesn’t currently work, and I probably won’t have time to work on it any time soon.
But it is the kind of design for DataDeps v2, which unifies the action of fetching (downloading), with post-fetch-processing (e.g. unpacking), into a single action of resolving: which boils down to “run arbitrary code, to create this file.”, and the idea that when ever one trys to access a file (a data dep path), it tried to resolve it if it doesn’t exist (which could trigger accessing a file …)

It’s definately not like what you are after, but I think it has some interesting ideas.
In particular, because its DAG of resolution is implicit in arbitrary code, it may be too hard to parallelize.