EEG.jl -> Present and Future

Hi everybody,

  1. Present
    Today I tried to use the EEG.jl package but I could not add it. Is anybody using it?
(@v1.5) pkg> add https://github.com/rob-luke/EEG.jl
   Updating git-repo `https://github.com/rob-luke/EEG.jl`
  Resolving package versions...
ERROR: Unsatisfiable requirements detected for package Eglob [ba70085b]:
 Eglob [ba70085b] log:
 ├─Eglob [ba70085b] has no known versions!
 └─restricted to versions * by EEG [70dcca16] — no versions left
   └─EEG [70dcca16] log:
     ├─possible versions are: 0.0.0 or uninstalled
     └─EEG [70dcca16] is fixed to version 0.0.0
  1. Future
    Any Julia-EEGistas over here?
    The EEG.jl package looks promising, thanks to the author.
    I think it can be way better. Anyone interested in improving Julia-EEG?

Have a nice day!

1 Like

I’ve messaged with the author in an issue in the past and I believe the consensus was that we should move away from one package being responsible for all of Julia’s EEG analysis and interface. I want to dig into this today and form a sort of game plan. I believe @ElectronicTeaCup, @Marco-Congedo, @jrevels, @laborg, and @dave.f.kleinschmidt use EEG. Anyone else is more than welcome to jump in. I’ll start by throwing out some ideas here and maybe something more actionable will come from it.

If we treat EEG data conceptually as an array channels × time × epochs then pretty much all relevant file formats could be read into a common interface. Using NamedDims.jl and TimeAxes.jl should provide most of the internal machinery for indexing by dimension name or accessing indices based on time. Once that’s in place a lot of things are generically available throughout the Julia ecosystem. There are a ton of different formats out there, but as far as I’m aware most treat each channel as a separate set of time series data. Would it be accurate to say that the biggest problem with achieving the interface I’m proposing is that the literal data structure from file would be read more literally as ChannelVector{TimeSeriesVector{EpochVector}}?

Of course file specific metadata is a whole other issue, but I think that’s going require more one on one maintenance for each file type.

3 Likes

I think it is better to treat EEG data as it is, that is, as an array channels × time. Stimulations and other markers, for example to get epochs, are usualy given in an accessory channel. Cheers.

2 Likes

Hello, I have never used EEG.jl. I am collecting code that may be used to build upon it or to make a new package for EEG processing. Cheers.

The nice part of working with named dimensions is that we could even have time × channels × observations and create a super simple method that produces an iterator over each observation of channels × time.

I am super into it

So you say one IO package and many for analysis?
Would you say one for pre-processing (e.g. PCA, rereferencing), other for spatial stuff (e.g.EEG topography ) and so on??

What @Marco-Congedo makes sense to me, at first glance.

So you say one IO package and many for analysis?

It wouldn’t necessarily need to be one IO package. We have FileIO.jl already for a generic IO interface. As long as the loaded type has a generic interface we can generically build algorithms around it. Once that’s in it’s pretty straightforward to do stuff like PCA.

Correction: I shouldn’t have said build algorithms around it. We just need to have a method that does something like get_channels_by_time_matrix(x) and pass the result to already built stuff.

Having a common interface for (de)serializing LPCM sample data from arbitrary domain-specific formats (living in arbitrary storage layers - local filesystems, S3, etc.) to a common matrix representation (that can be backed by any <:AbstractMatrix) is exactly the motivation behind https://github.com/beacon-biosignals/Onda.jl :slight_smile: The package currently implements a TimeSpan type and an Annotation type which wraps it - both can be used to index sample data. Onda’s Paths/Serialization API enables users to efficiently request discontiguous data chunks on per-TimeSpan basis as long as the underlying storage layer/file format support it (if not, a slower but still correct fallback path is used). In a similar vein, since Onda.Samples types just wrap AbstractMatrixs, you can also e.g. just use mmap on top of raw LPCM blobs.

We’re also highly interested in @shashi’s recent release of FileTrees.jl, which we believe will compose quite well as a compute framework on top of Onda Datasets.

7 Likes

But doesn’t Onda.jl conform to a specific dataset structure and use it’s own time series interface? The point of TimeAxes.jl is that you can arbitrarily define which dimension is the time dimension, get the same type of functionality as you get from TimeSeries.jl, and it works with things like FFTW.jl. If we rely on Onda.jl for describing the generic interface then packages that don’t care about any sort of file reading would have to depend on it.

What I’m proposing is that binding to something like Flux.jl could be completely agnostic to where the data originally came from. For example, we could do something like this…

(c::Flux.Chain)(x::SomeEEGType) = c(x[channels=:,time=:,observation=:])
3 Likes

BTW. I built TimeAxes.jl after people kept mentioning the need for a generic way of dealing with time data in Julia. If it’s absolutely horrible and we need to start from the ground up on it I’m fine with that as long is its still accomplishing the same basic goal.

1 Like

I don’t work too much with EEG, but I’m happy to help to the extent that it’s a common data structure with MEA, ECoG, and VSD (and I don’t see why it shouldn’t be – they’re all just MultiChannelTimeSeries with some specialized info tacked on)

To be honest, I’m not super familiar with the Onda format. I think the utility of the Onda format and a common interface are orthogonal issues. To be clear, nothing I’ve said so far should be interpreted as being against the Onda format. What I’m proposing is something that is implemented entirely independent of the file format or dataset organization but IO routines can rely on to make imported data compatible with the larger Julia ecosystem.

2 Likes

I would say this is an important issue. Maybe the package needs to be named differently to include all extracellular recordings, in this sense: https://www.nature.com/articles/nrn3241

This would, at least :

  1. invite more people
  2. Offer an integrated environment to work in. For instance, you may want to analyse LFP-ECoG cross correlation,etc.

Oh sure! I was more replying to your first post in agreement with "EEG data is conceptually just channels x time x epochs" and in hopes of reinforcing multimodality (I interpreted previous uses of “common interface” as still being within the confines of the many different EEG formats. Though I do see that Onda is more generic, and of course your packages are also more generic, but the conversation still felt like “use these generic tools to implement an EEG.jl package”). Basically I agree with what @VMHidalgo said.

I also didn’t mean to imply anything about Onda, as I also don’t know anything about it. Whatever it may be, in general I support the Julian approach of implementing AbstractX.jl that creates a common interface, completely agnostic to implementation. It seems like Onda is a specific implementation of a data structure and associated methods to handle large timeseries signal data well, in which case I would like to have something like a hypothetical AbstractTimeAxes on top of it so that someone else with a different approach could implement in a different way and my code could simply not care. But I could be completely wrong about that.

While we’re on the topic of different modalities, I’d love to have something like this for ECG and/or PPG as well. Working with high-frequency, irregular length waveform data in Flux requires quite a bit of massaging right now, so a common interface for slicing/batching/padding/truncating/etc. would be very much welcome.

When you say “irregular length” do you mean different channels have different lengths?

Different records, I think it’s safe to assume channels in the same record will have the same length for a given modality.

1 Like

So is it really about conveniently being able to do the “slicing/batching/padding/truncating/etc” so you can pipeline it to Flux?

More or less? Having just had a look through Onda, I think something like the Onda data structures without the on-disk format (for those of us who have to work with existing datasets) would be a good sweet spot.

Just saw this and feel very happy that people are interested in working with EEG data in Julia. I have mainly worked with MNE in the past and while I generally like the idea to rely on Julia’s existing ecosystem, I think that the most common analysis steps should be centralized in a specific package. The scope of such a package would definitely debatable, but having all the basic tools for working with EEG/MEG/etc. data in one place while honoring existing best-practices (e.g. filtering) seems like a good idea. From there on it should be easy to pipe data into any appropriate package for more sophisticated analysis steps.

I would also like to stress the need to not just look at this from a software engineering standpoint, but also consider how a modern analysis package could improve research practices, encourage data sharing, reproducibility and “good science” in general.

Anyway, I find this exciting, and I’m happy to contribute!

1 Like