Today, I would like to announce the creation of novel JuliaHealth tooling to support Observational Health research capacity within the Julia ecosystem! To give some background to this, I recently gave a lightning talk at the - awesome! - JuliaCon 2022 on an actual observational health research study (which utilizes “real world data” in the OMOP CDM format) I am doing using Julia tooling to investigate disparities in mental healthcare. Link: Using Julia for Observational Health Research | Jacob Zelko | JuliaCon 2022 - YouTube
It brings me great joy to announce the creation of three packages I have developed through this work that are now happily housed within the JuliaHealth organization:
OMOPCDMCohortCreator.jl - a modular interface that lets one build incremental queries against an OMOP CDM database for observational health research and analytics. This package works on version 5.4 of the OMOP CDM and provides a functional approach to building studies.
HealthSampleData.jl - a package that provides sample health data sources for a variety of health formats and use cases. Uses the wonderful DataDeps.jl package to automatically download, verify, and manage the download for you.
OMOPCDMDatabaseConnector.jl - utility package to connect to databases in the OMOP CDM database.
The most thorough and important package of this set is OMOPCDMCohortCreator.jl as it enables fast creation of observational studies. It is built upon tooling provided from the fantastic package, FunSQL.jl, for which I am ever grateful to @cce and @xitology ! Furthermore, this work was directly inspired by the work done in OHDSI by folks such as Adam Black, Martijn Schuemie, Anthony Sena, and Gowtham Rao (amongst others).
Click me to see the code in action!
Although the tutorial in OMOPCDMCohortCreator.jl for beginners should be enough to get started, here is an even smaller example tutorial on how this tooling works:
- You will need the following packages for this tutorial which you can install in package mode:
pkg> add OMOPCDMCohortCreator pkg> add SQLite pkg> add DataFrames pkg> add HealthSampleData
- For this tutorial, we will work with synthetic patient data from Eunomia that is stored in a SQLite format.
To install the data on your machine, execute the following code block and follow the prompts - you will need a stable internet connection for the download to complete:
using HealthSampleData eunomia = Eunomia()
- After you have finished your set up in the Julia, we need to establish a connection to the Eunomia SQLite database that we will use for the rest of the tutorial:
using SQLite using OMOPCDMCohortCreator conn = SQLite.DB(eunomia) GenerateDatabaseDetails( :sqlite, "main" )
- Finally, we will generate internal representations of each table found within Eunomia for OMOPCDMCohortCreator to use:
- As all the tools are working properly, let’s do what is called a characterization study - a study that characterizes a group of patients with a certain condition (or conditions) across various attributes like race, age, and combinations thereof. We are going to do miniature version of such a study looking at patients with strep throat. For this, we will use the
condition_concept_id: 28060 - this will be needed for you to get correct results:
strep_patients = ConditionFilterPersonIDs(28060, conn)
- For the patients who have strep throat diagnoses, find their race:
strep_patients_race = GetPatientRace(strep_patients, conn)
- Characterize, anonymize, and aggregate patients’ with strep throat by their race:
using DataFrames strep_patients_characterized = outerjoin(strep_patients_race; on = :person_id, matchmissing = :equal) strep_patients_characterized = strep_patients_characterized[:, Not(:person_id)] strep_patient_groups = groupby(strep_patients_characterized, [:race_concept_id]) strep_patient_groups = combine(strep_patient_groups, nrow => :counts)
From here, the potential to calculate valuable metrics disease prevalence across niche populations is rich and opens up the possibility to do even more with these subpopulations one could generate (e.g. cross walking with survey panel data, area estimation, etc.). Furthermore, I have tested this but have not had an opportunity to write a tutorial on the matter, but these functionalities work with Distributed.jl to enable non-blocking asynchronous code execution (still working out exact approach/functionalities) to fully utilize Julia’s performance when working and transforming large amounts of data that is often found in observational studies.
As I was recently funded another year to continue my health disparities research (yay!), I have great ambitions to build out observational health research capacity in JuliaHealth through avenues such as:
- Finalize a submission to JuliaCon Proceedings (need to finish writing tests across this portfolio and adding documentation in the coming weeks)
- Partnering on potentially running studies within the Julia community using these tools
- Scoping out better support for survey and panel data
- Building additional tooling around observational health research (such as determining fairness metrics, easily deployable interactive results explorers, interop with OHDSI tools, etc.)
- Lowering barrier to entry of prototyping large scale studies for researchers
In conclusion, these tools aren’t just an idea or an exercise - I am actively using them in my work to deliver important findings that have already been making a difference. I am even launching another study where these tools are its research tooling backbone and preliminary work from my previous study was recently accepted at the 2022 OHDSI Symposium. If you are interested in learning more about these tools, collaborating either on a study or building out tooling, have feedback or want to know more, feel free to reach out or comment below! My email is firstname.lastname@example.org.
With warmest thanks and regards to the Julia community,
P.S. Also especial thanks as well to @dilumaluthge for his incredible work in JuliaHealth!