[ANN] COBREXA.jl: COnstraint-Based Reconstruction and EXa-scale Analysis in Julia

Large-scale mechanistic biological models present unique difficulties in terms of system identification and parameterization. Constraint-based reconstruction and analysis (COBRA) methods are a relatively recent development aimed at reducing the measurement burden associated with fitting parameters, or even kinetic laws, to systems composed of hundreds of thousands of reactions. In short, genetic information is used to build a metabolic model and optimization techniques are used to find a reaction flux distribution that best describes the system under some physiological constraints.

We are pleased to announce the release of COBREXA: a Julia package that implements COBRA methods in an environment suitable for very large-scale parallel analysis of constraint-based models. The main design features of COBREXA.jl may be summarized as follows:

  1. Easy integration with JuMP.jl and use of any JuMP-compatible solver

  2. Compatibility with many popular model types (SBML, Matlab, JSON models, etc.) together with extensible model representation

  3. Direct utilization and support of of Julia’s distributed computing facilities by default, in all suitable methods

  4. All functions are designed for composability, which vastly simplifies construction of custom, complicated workflows by users

The utilization of Distributed functionality in COBREXA.jl allows COBRA methods to be applied to incredibly large systems that other software packages are unable to handle gracefully.

The documentation contains many tutorials and interactive Jupyter notebooks.

Using COBREXA.jl to analyze existing models is straightforward; for example, you can find a steady state solution of the metabolism of E. coli as follows:

using COBREXA, Tulip

download("http://bigg.ucsd.edu/static/models/e_coli_core.xml", "e_coli_core.xml")

model = load_model("e_coli_core.xml")

fluxes = flux_balance_analysis_dict(model, Tulip.Optimizer)

In return, you get a dictionary of reactions mapped to fluxes that happen in the organism:

Dict{String,Float64} with 95 entries:
"R_EX_fum_e" => 0.0
"R_ACONTb" => 6.00725
"R_TPI" => 7.47738
"R_SUCOAS" => -5.06438

We hope that the metabolic modeling community will enjoy working with this package as much as we do. Feel free to use the GitHub issues to contact us with any ideas and suggestions; we welcome code and method contributions.

cc @MirekKratochvil @laurentheirendt


Nice! How large are the JuMP problems you solve?

I would say that “typical” constraint-based models have ~1500 variables, ~1000 linear constraints, and the problems are usually LPs or QPs. This is about the size of a single organism metabolic model. However, the field is rapidly moving towards community models, where the size increases linearly with the number of single organism models stitched together to form the complete community model. For reference, the current human gut microbiome community model is comprised of ~800 single organism metabolic models, resulting in a system of about 1.2 million variables and 800,000 constraints. One of the issues in the field is that these models often needs to be solved repeatedly with slightly different constraints or objectives, hence it really pays to do it in parallel and as efficiently as possible. Additionally, with sequencing technology getting cheaper by the day the community model sizes will just continue to grow. This is what motivated COBREXA.jl - the desire to future proof COBRA-based methods using all Julia and JuMP’s state of the art features.