Two years experience in biostatistics and data management for NCA using Julia

PharmCat · March 20, 2023, 2:45pm

Hi! I’m working on bioequivalence and clinical trials and try to include Julia in my pipeline since 2 years.

And now want to tell what problems I had and what problems exist now. Usually I used R project, IBM SPSS and Phoenix WinNonlin, and some time it was a problem to make seamless pipeline despite IBM SPSS and Phoenix WinNonlin have integration with R Project, and R can be used as language for data management. Schema of pipeline on image bellow.

from: 4 LECTURE: The Data Science Pipeline | Statistical Computing (Biostatistics 140.776)

When we are talking about statistical pipeline we should define role and coverage of each instrument. So from one side we can talk about “stage-coverage” as a part of tasks inside each stage, which can be solved by this instrument - for example R project and Julia can solve 100% of tasks in data management, on the other hand it is hard to implement IBM SPSS for data management and Phoenix WinNonlin can be partial implemented. From other side we can talk about “project-coverage” as a stake of tasks from concrete type of tasks - for example: Phoenix WinNonlin can solve all problems for bioequivalence projects including simple data management, analytic part and presentation part.

As on image above we have:

Processing / data management part;
Analytic / NCA / statistical part;
Presentation part:
Figures;
Tables;
Numerical summaries.

So, I started using Julia because it is a one of the best chose for trial simulations - I try to use Julia in other part of statistical pipeline and decided to rewrite it under Julia.

And what I found:

Data management part can be successful transferred to Julia. There no problem at this part. But if you are using Phoenix or IBM SPSS - you have no abilities to call Julia directly from SPSS or Phoenix, so really if you have satisfying pipeline with R - you have no reason to switch.
For NCA / statistical part Julia can solve common statistical tasks, but it since have no good ANOVA implementation (like in SPSS, SAS, STATA). You may interested in Julia because it have very powerful instruments with very good performance as MixedModels, Turing and other if your R code is slow. Also Julia haven’t good framework for descriptive statistics. For my projects Julia haven’t enough “stage-coverage” for statistical part, despite I wrote NCA package and package for mixed-effect model analysis for repeated measures.
Presentation part - for presentation part it is very simple to use many packages from R Project - it is very simple to get formatted tables for descriptive statistics - and you can’t do this with Julia efficiently. Yes, I know that Julia have Weave, and partial support of descriptive statistics in DataFrames. But really it is much more simple to get tables from R.

What is a view of biostatistisian: “Good instrument is a instrument that solving your problem, and not if you are solving problems of that instrument” - if you are need to write tonnes of code for simple statistical tables - it is not a choice. That why Julia take only a part of my statistical pipeline. And most of all because I could switch NCA and data management task from R and Phoenix. Most of statistical tasks and presentation part I make with IBM SPSS because there is no simple framework in Julia for that. From other side I hope to make full coverage for bioequivalence project with Julia, when I finish presentation part.

PharmCat · April 19, 2024, 12:09pm

One year later. The above remains valid.

Two new packages slightly make the situation better:

ReadStatTables.jl is a package for reading and writing Stata, SAS and SPSS data files with Tables.jl -compatible tables.

WriteDocx.jl - A Julia package to create docx files for Microsoft Word from scratch.

George9000 · April 19, 2024, 2:13pm

You may be interested in the SummaryTables.jl package by the authors of WriteDoc. This thread mentions many packages that may interest you.

junder873 · April 19, 2024, 3:50pm

Not sure exactly what descriptive tables you need, but in addition to SummaryTables.jl, RegressionTables.jl now has better support for exporting descriptive tables (see API · RegressionTables.jl (jmboehm.github.io)), and basically any Matrix. Someday I will also get junder873/RegressionTablesXLSX.jl registered.

PharmCat · April 19, 2024, 8:40pm

Hi! This package was registered 3 weeks ago. Seems it can be very helpful!

PharmCat · September 28, 2024, 12:57pm

Hi! Last update was very exciting.

Hm, is it possible to edit first post to make list of Julia packages for statistician pipeline?

eteppo · February 11, 2025, 6:20pm

Making tables is so weirdly difficult if you want something even slightly customized. My experience in R has been frustrating to say the least. I wish someone solved this well like I feel other parts of the workflow have been solved. I’m looking forward to checking some of the mentioned packages.

PharmCat · February 13, 2025, 12:56am

SummaryTables.jl really close to be good, but not yet deep customizable, I use self written MetidaStats.jl and SummaryTables.jl to get good tables… but my favorite “custom tables” in SPSS looks like better in this time.

jules · February 13, 2025, 4:41am

What are the things you most want to customize in SummaryTables that you can’t, currently? I’m probably adding some global theme functionality soon

PharmCat · February 13, 2025, 1:57pm

Hi!

At first I wand to say, that SummaryTables.jl seems is the best package for tables in Julia and it really help me to move a part of my routines to Julia (group_totals looks great - thanks a lot! ).

From my side most wanted something like in this spss custom tables video - you can make “column-grouping” as in table_one , you can make “row-grouping” as in summarytable and you can do grouping together - column-grouping by one factors and row-grouping by another, besides this you can show summary statistics position by columns or in row.

and some minor things

I think it has lack of documentation, for example, for postprocess_cel and postprocess_table.
Now table for multiple columns have hard syntax, as in #59
skipmissing needs additional workaround as in # 61

Topic		Replies	Views
Julia stats, data, ML: expanding usability Statistics statistics	84	5376	October 14, 2021
Teaching data analysis with Julia - what to do about DataFrames and all that? Data	18	5071	November 21, 2016
Please recommend a Julia ecosystem for Statistics New to Julia	28	4334	June 8, 2019
What features will I miss in Julia? New to Julia	88	13573	November 19, 2018
How do DataFrames.jl compare to R's? And Interoperability between R and Julia General Usage	23	6606	January 3, 2018

Two years experience in biostatistics and data management for NCA using Julia

Related topics