NOTE: JuliaHealth contributor, @kosuri-indu, wrote a great series on Patient Level Prediction Julia workflows! She can’t post links to Discourse so I am posting her work below!
Introduction
Hello everyone! I’m Kosuri L Indu, a student and open-source contributor with a strong interest in health data, machine learning, and the Julia programming language. Over the past few months, I have been working on building a patient-level prediction (PLP) pipeline using clinical data in the OMOP Common Data Model (CDM) format, and I have documented my journey in a three-part blog series.
Patient-level prediction (PLP) refers to using historical clinical data to predict individual patient outcomes - like whether a patient with hypertension might develop diabetes. It’s a powerful tool for personalized medicine, and building these pipelines in Julia showcases how performant, flexible, and open Julia can be for real-world health data science.
Blog Posts
Through this series, I have tried to share the process in a simple, approachable way from asking the right questions to building models and reflecting on the results.
Below are short summaries of each post, along with links to the full versions.
Part 1: From Research Question to Cohort Construction
This post walks through how to translate a clinical question like predicting diabetes onset in hypertensive patients into a structured cohort definition using the OMOP CDM. It explains how I used Julia tools to define and extract cohorts, while discussing the key concepts.
Part 2: From Raw Clinical Data to Predictive Models
Here, I dive into how the raw clinical data was processed into a machine learning-ready format. It covers feature extraction, handling missing values, normalization, encoding, data splitting, and training ML models using the MLJ.jl ecosystem.
Part 3: Lessons Learned, Key Challenges, and What Comes Next
In the final post, I reflect on the challenges I faced like low model performance and data limitations and outline what I learned. I also share ideas for how the pipeline can be improved and extended, including visualization and cohort quality tools.
Conclusion
A big shoutout to Jacob S Zelko (@TheCedarPrince) for being an incredible mentor and guide throughout this journey - your support, feedback, and encouragement truly made all the difference.
And to everyone reading, if you are working with healthcare data or exploring patient-level prediction in Julia, I hope this series offers something helpful or sparks fresh ideas. I’m always happy to connect, so feel free to share your thoughts, feedback, or questions!