Configurable system for biomedical time series processing

I am working on an application that performs analysis of long-term multichannel biomedical signals coming from different sensors. The core of such application is made with:

I. Data
Array buffers and storage for streaming data input and output (chunk- and/or point-wise).
Data can be:

  1. time series of base frequency,
  2. decimated time series,
  3. irregular series - time events, parametrized time events, time intervals (e.g. constructed from “begin” and “end” events), etc.

Such data types represent the result of different analysis steps (filtering, event detection, parametrization, segmentation, identification of physiological states and diagnostic phenomena, etc.).

II. Algorithms
Algorithms for data processing, they are connected to buffers for data input and output. (The simplest case: stateful FIR filter.)
Algorithms differ from simple functions:

  • they take input data from buffers, and push to output buffers,
  • have their own states and output data delays or offsets,
  • require old buffered data of some length, or have inner data buffers,
  • can be saved and loaded to resume processing.

III. Complex algorithms (decomposition)
A more complex analysis is constructed in a form of a directed graph, with algorithms on its nodes and data streams on its branches.
There are different time window lengths covered with different algorithms, e.g. some have 1-second window for automatic event detection, other — 6-hour window for retrospective manual analysis.

IV. Whole system configuration and management
To setup and run processing sessions, it should be able to:

  • store buffered data,
  • configure data items list,
  • configure algorithms list,
  • configure (connect) data and algorithms into processing graph.
  • save and load configurations and processing states

.

The problems:

This is a very abstract description that has been made from the development of a more specific application.
It was written in C++ and C# and with time became pretty hard in terms of consistence, maintainability, and readability. This description come from the need to simplify the development in the terms of:

  1. Interchangeability of parts of the system (graph nodes), where:
    – the same algorithm can be applied in many places,
    – one algorithm can be replaced with another, if its output data items are the same.

  2. Data-based debug: unsatisfactory result tracking and understanding through intermediate data logs inspection, with no need to iteratively: add debug code, compile, re-run the whole app and dig into debug-mode.

  3. Flexible data visualization tools over the saved datasets.

.

Questions:

  1. If I can manage this in Julia, I am looking to make it public package of core system with some commonly-used algorithms - as a tool to construct more specific analyses and result visualizers. Or maybe I should merge with some existing package?

  2. What Julia packages / features / design patterns can be taken into accout to start migrating to Julia language and further development? In general, what can you advice about this description (I to IV), from Julian point of view?

  3. What can you advice about using these packages:
    – HDF5 for chunked data storage.
    – DataFrames.jl as solution for AoS/SoA + named data, to keep named data items (with the same time indexing) within one dataset?

  4. Maybe there are similar applications that can be useful to know? Or maybe there are some people who worked on a similar tasks?

Hi,

This sounds interesting! I’ve a couple of questions:

  • Which type of biomedical signals are you thinking about?
  • Is this system designed for exploratory analysis, or rather production & high throughput?

Concerning your problems: Are you sure that by making the system more generic the development will get simpler? I’m not sure how much effort you already put into this, but I’ve made the experience that this kind of generic-nice-composable-data-flow apps sound good at the beginning of the design phase but quickly grow in complexity…

My input on packages:

  • DSP.jl will be useful for filtering
  • DataFrames.jl might not be the best solution - they aren’t type stable and so performance wise there are better alternatives (others now more about this).

I’m interested in this because I plan something similar, but with less whistles (e.g. no graph based algorithm composition, eg…)

1 Like

Might be useful:

In particular the interval indexing.

2 Likes

Almost any sensor that can be mounted on human - ECG, blood oxygen, breath signals, pressure, movement (excluding EEG and invasive sensors - simply didn’t work with them), for long-term monitoring.

The original system is a final product & high throughput.
Here, it is intended to be high performance exploratory system in the first place. For now, I don’t want to think about static compiling / embedding for production. On the other hand, if you can produce files with good results, it is some sort of data-production.

As far as I know, “non-generic” systems struggle from getting more complex and obscure in time. This is partly about legacy and third-party code impovement, number of depencencies, hard-coded fixes and enchancing the existing system without rewriting the whole thing.

Also, with time you just need to add much bigger functionality for exploratory algorithm design, that is absent in production code. In some “high-production” it may lower to just a more constant expression.

Here, I want to write “core” or basic tools, that can ease the development of specialized algorithms of this type.
E.g. you don’t have to resolve similar tasks in multiple places in low abstraction code; you have interchangeable components; data, algorithms and visualization are decoupled; you have flexible debug based on that you can setup any additional data log with any visualizer.

I need help with understanding the Julia environment for this purpose, as well as finding similar projects, that I can rely on.

What exactly do you plan?

My Plan is to write a couple of tools that let me quickly analyse biomedical data. My main starting point will be acceleration data similar to GGIR package (available for Gnu R). I guess the overlap to your project is the need to batch-transform data, apply filters, and fuse data in a composable way.
First things I wanted to start on:

  • abstraction of different file formats
  • dealing with different sampling frequencies (fixed/variable)

This is in flux and I have other things todo so I can’t commit full time.

Posted this issue explaining multi-rate time series indexing functionality:
https://github.com/JuliaStats/TimeSeries.jl/issues/376