Two years experience in biostatistics and data management for NCA using Julia

Hi! I’m working on bioequivalence and clinical trials and try to include Julia in my pipeline since 2 years.

And now want to tell what problems I had and what problems exist now. Usually I used R project, IBM SPSS and Phoenix WinNonlin, and some time it was a problem to make seamless pipeline despite IBM SPSS and Phoenix WinNonlin have integration with R Project, and R can be used as language for data management. Schema of pipeline on image bellow.

from: 4 LECTURE: The Data Science Pipeline | Statistical Computing (Biostatistics 140.776)

When we are talking about statistical pipeline we should define role and coverage of each instrument. So from one side we can talk about “stage-coverage” as a part of tasks inside each stage, which can be solved by this instrument - for example R project and Julia can solve 100% of tasks in data management, on the other hand it is hard to implement IBM SPSS for data management and Phoenix WinNonlin can be partial implemented. From other side we can talk about “project-coverage” as a stake of tasks from concrete type of tasks - for example: Phoenix WinNonlin can solve all problems for bioequivalence projects including simple data management, analytic part and presentation part.

As on image above we have:

  • Processing / data management part;
  • Analytic / NCA / statistical part;
  • Presentation part:
  • Figures;
  • Tables;
  • Numerical summaries.

So, I started using Julia because it is a one of the best chose for trial simulations - I try to use Julia in other part of statistical pipeline and decided to rewrite it under Julia.

And what I found:

  • Data management part can be successful transferred to Julia. There no problem at this part. But if you are using Phoenix or IBM SPSS - you have no abilities to call Julia directly from SPSS or Phoenix, so really if you have satisfying pipeline with R - you have no reason to switch.

  • For NCA / statistical part Julia can solve common statistical tasks, but it since have no good ANOVA implementation (like in SPSS, SAS, STATA). You may interested in Julia because it have very powerful instruments with very good performance as MixedModels, Turing and other if your R code is slow. Also Julia haven’t good framework for descriptive statistics. For my projects Julia haven’t enough “stage-coverage” for statistical part, despite I wrote NCA package and package for mixed-effect model analysis for repeated measures.

  • Presentation part - for presentation part it is very simple to use many packages from R Project - it is very simple to get formatted tables for descriptive statistics - and you can’t do this with Julia efficiently. Yes, I know that Julia have Weave, and partial support of descriptive statistics in DataFrames. But really it is much more simple to get tables from R.

What is a view of biostatistisian: “Good instrument is a instrument that solving your problem, and not if you are solving problems of that instrument” - if you are need to write tonnes of code for simple statistical tables - it is not a choice. That why Julia take only a part of my statistical pipeline. And most of all because I could switch NCA and data management task from R and Phoenix. Most of statistical tasks and presentation part I make with IBM SPSS because there is no simple framework in Julia for that. From other side I hope to make full coverage for bioequivalence project with Julia, when I finish presentation part.

10 Likes

One year later. The above remains valid.

Two new packages slightly make the situation better:

ReadStatTables.jl is a package for reading and writing Stata, SAS and SPSS data files with Tables.jl -compatible tables.

WriteDocx.jl - A Julia package to create docx files for Microsoft Word from scratch.

You may be interested in the SummaryTables.jl package by the authors of WriteDoc. This thread mentions many packages that may interest you.

1 Like

Not sure exactly what descriptive tables you need, but in addition to SummaryTables.jl, RegressionTables.jl now has better support for exporting descriptive tables (see API · RegressionTables.jl (jmboehm.github.io)), and basically any Matrix. Someday I will also get junder873/RegressionTablesXLSX.jl registered.

2 Likes

Hi! This package was registered 3 weeks ago. Seems it can be very helpful!

Hi! Last update was very exciting.

Hm, is it possible to edit first post to make list of Julia packages for statistician pipeline?

Making tables is so weirdly difficult if you want something even slightly customized. My experience in R has been frustrating to say the least. I wish someone solved this well like I feel other parts of the workflow have been solved. I’m looking forward to checking some of the mentioned packages.

SummaryTables.jl really close to be good, but not yet deep customizable, I use self written MetidaStats.jl and SummaryTables.jl to get good tables… but my favorite “custom tables” in SPSS looks like better in this time.

What are the things you most want to customize in SummaryTables that you can’t, currently? I’m probably adding some global theme functionality soon

2 Likes

Hi!

At first I wand to say, that SummaryTables.jl seems is the best package for tables in Julia and it really help me to move a part of my routines to Julia (group_totals looks great - thanks a lot! ).

  • From my side most wanted something like in this spss custom tables video - you can make “column-grouping” as in table_one , you can make “row-grouping” as in summarytable and you can do grouping together - column-grouping by one factors and row-grouping by another, besides this you can show summary statistics position by columns or in row.

and some minor things

  • I think it has lack of documentation, for example, for postprocess_cel and postprocess_table.
  • Now table for multiple columns have hard syntax, as in #59
  • skipmissing needs additional workaround as in # 61