I want to try to use Julia for analyzing cancer genomics data but I am not sure how. BioJulia seems to cover the basics, and I have contacted Julia Computing for info about JuliaRun and JuliaDB.
Genomics data are increasingly important for cancer research. However, most platforms assume healthy DNA and do not use Julia. An example is Databricks “Unified Analytics Platforms for Genomics” which is based on ADAM which is based on PySpark, the Python API for Apache Spark. Ben Ward had an inspiring talk at JuliaCon 2018 titled BioJulia and Bioinformatics in Julia: Past, Present, Future. What seems specific in genomics for cancer data is what Ben and others calls “variation”.
I identified GeneticVariation.jl as a possible Julia solution for the cancer variation problem. The package has this description:
GeneticVariation provides types and methods for working with datasets of genetic variation. It provides a VCF and BCF parser, as well as methods for working with variation in sequences such as evolutionary distance computation, and counting different mutation types.
Can GeneticVariation be used for analyzing cancer genomics data? Is there a way in Julia to use a somatic variant caller like GATK4 Mutect2?
Any recommendation for how to use Julia for cancer genomics data will be most appreciated.