Two years experience in biostatistics and data management for NCA using Julia

Hi! I’m working on bioequivalence and clinical trials and try to include Julia in my pipeline since 2 years.

And now want to tell what problems I had and what problems exist now. Usually I used R project, IBM SPSS and Phoenix WinNonlin, and some time it was a problem to make seamless pipeline despite IBM SPSS and Phoenix WinNonlin have integration with R Project, and R can be used as language for data management. Schema of pipeline on image bellow.

from: 4 LECTURE: The Data Science Pipeline | Statistical Computing (Biostatistics 140.776)

When we are talking about statistical pipeline we should define role and coverage of each instrument. So from one side we can talk about “stage-coverage” as a part of tasks inside each stage, which can be solved by this instrument - for example R project and Julia can solve 100% of tasks in data management, on the other hand it is hard to implement IBM SPSS for data management and Phoenix WinNonlin can be partial implemented. From other side we can talk about “project-coverage” as a stake of tasks from concrete type of tasks - for example: Phoenix WinNonlin can solve all problems for bioequivalence projects including simple data management, analytic part and presentation part.

As on image above we have:

  • Processing / data management part;
  • Analytic / NCA / statistical part;
  • Presentation part:
  • Figures;
  • Tables;
  • Numerical summaries.

So, I started using Julia because it is a one of the best chose for trial simulations - I try to use Julia in other part of statistical pipeline and decided to rewrite it under Julia.

And what I found:

  • Data management part can be successful transferred to Julia. There no problem at this part. But if you are using Phoenix or IBM SPSS - you have no abilities to call Julia directly from SPSS or Phoenix, so really if you have satisfying pipeline with R - you have no reason to switch.

  • For NCA / statistical part Julia can solve common statistical tasks, but it since have no good ANOVA implementation (like in SPSS, SAS, STATA). You may interested in Julia because it have very powerful instruments with very good performance as MixedModels, Turing and other if your R code is slow. Also Julia haven’t good framework for descriptive statistics. For my projects Julia haven’t enough “stage-coverage” for statistical part, despite I wrote NCA package and package for mixed-effect model analysis for repeated measures.

  • Presentation part - for presentation part it is very simple to use many packages from R Project - it is very simple to get formatted tables for descriptive statistics - and you can’t do this with Julia efficiently. Yes, I know that Julia have Weave, and partial support of descriptive statistics in DataFrames. But really it is much more simple to get tables from R.

What is a view of biostatistisian: “Good instrument is a instrument that solving your problem, and not if you are solving problems of that instrument” - if you are need to write tonnes of code for simple statistical tables - it is not a choice. That why Julia take only a part of my statistical pipeline. And most of all because I could switch NCA and data management task from R and Phoenix. Most of statistical tasks and presentation part I make with IBM SPSS because there is no simple framework in Julia for that. From other side I hope to make full coverage for bioequivalence project with Julia, when I finish presentation part.