Patient Level Prediction in Julia Blog Series

TheCedarPrince · May 6, 2025, 1:42pm

NOTE: JuliaHealth contributor, @kosuri-indu, wrote a great series on Patient Level Prediction Julia workflows! She can’t post links to Discourse so I am posting her work below!

Introduction

Hello everyone! I’m Kosuri L Indu, a student and open-source contributor with a strong interest in health data, machine learning, and the Julia programming language. Over the past few months, I have been working on building a patient-level prediction (PLP) pipeline using clinical data in the OMOP Common Data Model (CDM) format, and I have documented my journey in a three-part blog series.

Patient-level prediction (PLP) refers to using historical clinical data to predict individual patient outcomes - like whether a patient with hypertension might develop diabetes. It’s a powerful tool for personalized medicine, and building these pipelines in Julia showcases how performant, flexible, and open Julia can be for real-world health data science.

Blog Posts

Through this series, I have tried to share the process in a simple, approachable way from asking the right questions to building models and reflecting on the results.
Below are short summaries of each post, along with links to the full versions.

Part 1: From Research Question to Cohort Construction

This post walks through how to translate a clinical question like predicting diabetes onset in hypertensive patients into a structured cohort definition using the OMOP CDM. It explains how I used Julia tools to define and extract cohorts, while discussing the key concepts.

Part 2: From Raw Clinical Data to Predictive Models

Here, I dive into how the raw clinical data was processed into a machine learning-ready format. It covers feature extraction, handling missing values, normalization, encoding, data splitting, and training ML models using the MLJ.jl ecosystem.

Part 3: Lessons Learned, Key Challenges, and What Comes Next

In the final post, I reflect on the challenges I faced like low model performance and data limitations and outline what I learned. I also share ideas for how the pipeline can be improved and extended, including visualization and cohort quality tools.

Conclusion

A big shoutout to Jacob S Zelko (@TheCedarPrince) for being an incredible mentor and guide throughout this journey - your support, feedback, and encouragement truly made all the difference.

And to everyone reading, if you are working with healthcare data or exploring patient-level prediction in Julia, I hope this series offers something helpful or sparks fresh ideas. I’m always happy to connect, so feel free to share your thoughts, feedback, or questions!

~ Kosuri L Indu

Topic		Replies	Views
Building a Patient Level Prediction Package within Julia General Usage question , package , machine-learning , juliahealth	12	1694	March 5, 2023
Building Observational Health Research Capacity in JuliaHealth Community package , juliahealth , research	0	962	August 11, 2022
Export trained model to PMML Machine Learning question	0	525	September 24, 2019
Julia newbie, interested in "learning by doing" in collaboration Teaching & Outreach time-series , machine-learning	0	483	October 25, 2020
[ANN] JuliaEpi: Collaborative Computational Epidemiology in Julia Biology, Health, and Medicine announcement , data , community , modelling	4	1016	August 10, 2022