I think this could be a version 2 thing. A complete tool without CWL integration is definitely possible - but I think it would be a nice expansion feature. I don’t think it’s necessary to worry about it from the beginning, it can be bolted on later.
I wouldn’t say a lot, but a fair amount, and I have access to a number of clusters that use it that I could use for testing. None of them can be plugged into CI, unfortunately, but I’ve been meaning to look for a solution there anyway. If we could provide guidance for including CI for workflows, I feel like that would be a huge advantage as well.
Snakemake’s approach to SLURM and other cluster managers is to make the user do most of the work - you can define default parameters (in terms of memory, cores, etc) and rule-specific parameters using a config file, and then you have to provide the sbatch
command directly, eg
$ snakemake -s my_workflow.snakefile --configfile config.yaml --cluster-config cluster.yaml \
--cluster "sbatch -n {cluster.processors} -N {cluster.nodes} -t {cluster.time} --mem {cluster.memory} -o output/logs/{rule}-%j.out -e output/logs/{rule}-%j.err -p my_partition"
This is a bit annoying as a user, but easier for the developer I think - again, we could start this way and bolt some convenience functions on after.
I totally get it, and it’s clever, my primary objection here is the default male-bias of things like this. “Oh look, all of the productivity and data science tools are named after male characters, I guess that field is for men.” I work at a women’s college and have done some work on the gender gap in my field - it’s easy to overlook stuff like this, and individually it may not be a huge deal (I doubt anyone explicitly has that thought above), but the cumulative effect can be quite detrimental.
I think that it’s not that it can’t be done, it’s that some things would be a pain to do. That said, I’m guessing that these pain points are things that might be worth surfacing as issues in the core language, so having the intention of doing everything in julia and then raising issues when things are hard would be worthwhile.
I believe this is how snakemake does it, and that’s mostly fine, but their handling of things like temporary files and ability to manually override isn’t great. Not sure if it’s because they haven’t bothered or because it’s hard, but for me that’s an important thing to get right.
I forgot to mention, more important to me than a GUI is self-documenting runs and good report generation.
The one my previous lab developed is an acronym for “Another Automated Data Analysis Management Application.”