# Refactoring searching through a time-series

I’ve written a short script to search a time-series for a specific sub-sequence. I’m having trouble refactoring the code. I’ve linked to question on Code Review StackExchange if someone wants to do a complete review. However, I was hoping I could get some general refactoring advice here for this type of code smell.

Yours is a non - trivial refactoring question. Perhaps http://www.catb.org/esr/writings/taoup/html/optimizationchapter.html may be of help, not on optimizing your problem but on philosophically reconsidering it.

That’s fair and has been something in the back of mind this whole time. My personal reason for posting this is it’s actually a reduced example of a library which has this pattern of looping and searching replicated/copy-pasted across several functions at this point. I’d really like to generalize the iteration pattern and re-use it somehow? Maybe with these “macros” which I don’t understand very clearly?

I did not give your code a very close reading so it is possible I missed something, but I would do the following:

1. write a function that returns the subsequences (using `view`) that the original one breaks up to,
2. write two functions that normalize a subsequence, one by the mean, one by the standard deviation,
3. write a function that does the comparisons.
1 Like

Macros do syntax transformations, I don’t think you need them here.

2 Likes

That’s a very healthy attitude! I would not recommend macros for this. The way I usually solve it is to have a single function with your iteration pattern, and extract any usage-specific code to functions that you pass as arguments. Example:

``````function pairwise_iteration(op, vector::AbstractVector)
s = zero(eltype(vector))
for i = 1:2:length(vector)
s += op(vector[i], vector[i+1])
end
s
end
``````

Which can be called like this:

``````julia> pairwise_iteration(+, [1,2,3,4])
10

julia> pairwise_iteration(*, [1,2,3,4])
14

julia> my_op(a, b) = a^2 + b^3
my_op (generic function with 1 method)

julia> pairwise_iteration(my_op, [1,2,3,4])
82
``````

Btw, also note the `zero(eltype(vector))` for type stability. I saw that you didn’t use this in your `euc_dist` function. It’s not required, but may increase performance.

I ended up making an iterator, since I was repeatedly calculating the same view on an array.

``````"""
Duplicate data and indexing arithmetic to avoid
using a LIFO queue or negative indexing.
"""
mutable struct t_iter
data::Vector{Float64}
t::Vector{Float64}
length::Int
q_len::Int

function t_iter(data::Vector{Float64}, q_len::Int)
return new(data, zeros(Float64, 2*length(data)), length(data), q_len)
end
end

function Base.iterate(data::t_iter, d_i=1)
if d_i >= data.length
return nothing
end

dat = data.data[d_i]

t_idx = ((d_i - 1) % data.q_len) + 1
data.t[t_idx] = dat
data.t[t_idx + data.q_len] = dat

return ((d_i, dat, data.t), d_i+1)
end
``````

Is `IterTools.partition` what you need? There is also `Base.Iterators.partition` but I guess you need overlapping sliding windows. Note that `IterTools.partition` returns a tuple while `Base.Iterators.partition` returns a `view`. If your sub-sequence is small enough, `IterTools.partition` sounds like a way to go. (It would be nice if `Base.Iterators.partition` had the `step` argument as in `IterTools.partition`…)

2 Likes

Maybe? I’m still not at the point with my code (which is adapted from a C library) where I can confidently hold all the operations in my head.

Just in case you don’t know this approach, here the simple way for your ignore_bias using the convolution product.

Let m=length(query), N=length(data) and D[k]=data[k:k+m], let Q be the normalized query and DM[k] the constant vector such that the mean of D[k]-DM[k] is zero. Then, you need to compute for each k the following:

obj[k] = (D[k] - DM[k] -Q[k])^2 = D[k]^2 + DM[k]^2 + Q[k]^2 - 2<D[k],DM[k]> - 2 <D[k],Q> - 2 <DM[k],Q>

Now, <DM[k], Q> = <D[k], DM[k]> = 0 for all k, and hence your objective function is

obj[k] = D[k]^2 +Q^2 - 2<D[k],Q>

Now Q^2 is independent of the offset, and <D[k], Q>=convolution(D,Q)[k], and D[k]^2 can be computed for all k in O(N)

Computing the convolution: https://github.com/aamini/FastConv.jl might work, if you locally merge the PRs that fix this for julia 1.0. Otherwise, someone here will find you a package that does the job (I am not up to date with that part of the ecosystem).

If m is large, then you win a complexity class (via fast fourier transform for the convolution); if m is small, then you still win, because somebody else wrote fast simd/gpu code for convolutions with small kernels.

Homework exercise: Represent ignore_scale via convolutions.

1 Like

I think this is what is done in MASS. I had forgotten about it. Thank you for reminding me.