I am working on an application that performs analysis of long-term multichannel biomedical signals coming from different sensors. The core of such application is made with:
I. Data
Array buffers and storage for streaming data input and output (chunk- and/or point-wise).
Data can be:
- time series of base frequency,
- decimated time series,
- irregular series - time events, parametrized time events, time intervals (e.g. constructed from “begin” and “end” events), etc.
Such data types represent the result of different analysis steps (filtering, event detection, parametrization, segmentation, identification of physiological states and diagnostic phenomena, etc.).
II. Algorithms
Algorithms for data processing, they are connected to buffers for data input and output. (The simplest case: stateful FIR filter.)
Algorithms differ from simple functions:
- they take input data from buffers, and push to output buffers,
- have their own states and output data delays or offsets,
- require old buffered data of some length, or have inner data buffers,
- can be saved and loaded to resume processing.
III. Complex algorithms (decomposition)
A more complex analysis is constructed in a form of a directed graph, with algorithms on its nodes and data streams on its branches.
There are different time window lengths covered with different algorithms, e.g. some have 1-second window for automatic event detection, other — 6-hour window for retrospective manual analysis.
IV. Whole system configuration and management
To setup and run processing sessions, it should be able to:
- store buffered data,
- configure data items list,
- configure algorithms list,
- configure (connect) data and algorithms into processing graph.
- save and load configurations and processing states
.
The problems:
This is a very abstract description that has been made from the development of a more specific application.
It was written in C++ and C# and with time became pretty hard in terms of consistence, maintainability, and readability. This description come from the need to simplify the development in the terms of:
-
Interchangeability of parts of the system (graph nodes), where:
– the same algorithm can be applied in many places,
– one algorithm can be replaced with another, if its output data items are the same. -
Data-based debug: unsatisfactory result tracking and understanding through intermediate data logs inspection, with no need to iteratively: add debug code, compile, re-run the whole app and dig into debug-mode.
-
Flexible data visualization tools over the saved datasets.
.
Questions:
-
If I can manage this in Julia, I am looking to make it public package of core system with some commonly-used algorithms - as a tool to construct more specific analyses and result visualizers. Or maybe I should merge with some existing package?
-
What Julia packages / features / design patterns can be taken into accout to start migrating to Julia language and further development? In general, what can you advice about this description (I to IV), from Julian point of view?
-
What can you advice about using these packages:
– HDF5 for chunked data storage.
– DataFrames.jl as solution for AoS/SoA + named data, to keep named data items (with the same time indexing) within one dataset? -
Maybe there are similar applications that can be useful to know? Or maybe there are some people who worked on a similar tasks?