ANN: ShiftedArrays and support for ShiftedArrays in GroupedErrors

announcement

#1

This is an announcement of two things at the same time:

  1. A package for lazily shifting arrays: ShiftedArrays
  2. Support for aligning, grouping and averaging time varying signals in GroupedErrors

Lazily shifting arrays

lag, lead functions, to shift an array and add missing (or a custom default value in the latest not yet released version) where the data is not available, or circshift for shifting circularly in a lazy (non allocating) way:

julia> v = [1.2, 2.3, 3.4]
3-element Array{Float64,1}:
 1.2
 2.3
 3.4

julia> lag(v)
3-element ShiftedArrays.ShiftedArray{Float64,Missings.Missing,1,Array{Float64,1}}:
  missing
 1.2
 2.3

julia> lag(v, default = NaN)
3-element ShiftedArrays.ShiftedArray{Float64,Float64,1,Array{Float64,1}}:
 NaN
   1.2
   2.3

julia> ShiftedArrays.circshift(v, 1)
3-element ShiftedArrays.CircShiftedArray{Float64,1,Array{Float64,1}}:
 3.4
 1.2
 2.3

Just copy the returned custom array type to get a regular Array.

Time varying signal

A particular use case is the analysis of a time varying signal. For example imagine you run an experiment where you measure the heart rate of a subject (which is a vector h) and show her some images (at times ts, another vector). You want to compute the average heart rate around the time an image is shown:

h = randn(100)
ts = [13, 45, 76]
# create the vector of ShiftedArrays: as they do not allocate this is cheap
signal = [ShiftedArray(h, -t) for t in ts]
range = -5:5 # specify range around which to compute average heart rate: -5:5 is our x axis
avg = reduce_vec(mean, signal, range)
using Plots
plot(range, avg)

The usual GroupedErrors machinery applies to split data or compute error across subjects, see a slightly more realistic example. The example is with JuliaDB but will also work with a DataFrame with the same columns.

Feedback

I’m very happy to get feedback on the time varying analysis part as I do not work with them a lot (but some of my colleagues - which I carelessly convinced to use Julia - do).


#2

I had implemented a few shift arrays operations in some of my packages which I would be glad to port to use ShiftedArrays instead. As for the time series, I did work at a neuroeconometrics lab for a few years so I am familiar with those operations. Might be nice to provide a function to resample and provide statistic by various intervals. For example, a Vector{<:DateTime} that can resample the mean from seconds to minutes and whatnot.


Suggestion: move DataFrames, plotting into standard distribution
#3

Thanks for the input. I should probably look into how to add actual DateTime vectors to the picture (here is all index based, so I’m tacitly assuming some fixed timestep, but there so reason to impose this limitation). How is it generally done in your field? You would have a vector signal and t::Vector{<:DateTime} of the same length and you would expect some utility functions averageby(signal, t, Dates.Minute) that would give two new vectors s, tm with the new timestamps (this time one per minute) and s being the average per minute?

This can be done quite elegantly with JuliaDB tables with sorted keys and could easily be incorporated in GroupedErrors without extra dependencies (plus, I already have support for binning data in GroupedErrors so I should extend it to time data).