I’ve written a short script to search a time-series for a specific sub-sequence. I’m having trouble refactoring the code. I’ve linked to question on Code Review StackExchange if someone wants to do a complete review. However, I was hoping I could get some general refactoring advice here for this type of code smell.

# Refactoring searching through a time-series

**zgornel**#2

Yours is a non - trivial refactoring question. Perhaps http://www.catb.org/esr/writings/taoup/html/optimizationchapter.html may be of help, not on optimizing your problem but on philosophically reconsidering it.

**Seanny123**#3

That’s fair and has been something in the back of mind this whole time. My personal reason for posting this is it’s actually a reduced example of a library which has this pattern of looping and searching replicated/copy-pasted across several functions at this point. I’d really like to generalize the iteration pattern and re-use it somehow? Maybe with these “macros” which I don’t understand very clearly?

**Tamas_Papp**#4

I did not give your code a very close reading so it is possible I missed something, but I would do the following:

- write a function that returns the subsequences (using
`view`

) that the original one breaks up to, - write
*two*functions that normalize a subsequence, one by the mean, one by the standard deviation, - write a function that does the comparisons.

**bennedich**#6

That’s a very healthy attitude! I would not recommend macros for this. The way I usually solve it is to have a single function with your iteration pattern, and extract any usage-specific code to functions that you pass as arguments. Example:

```
function pairwise_iteration(op, vector::AbstractVector)
s = zero(eltype(vector))
for i = 1:2:length(vector)
s += op(vector[i], vector[i+1])
end
s
end
```

Which can be called like this:

```
julia> pairwise_iteration(+, [1,2,3,4])
10
julia> pairwise_iteration(*, [1,2,3,4])
14
julia> my_op(a, b) = a^2 + b^3
my_op (generic function with 1 method)
julia> pairwise_iteration(my_op, [1,2,3,4])
82
```

Btw, also note the `zero(eltype(vector))`

for type stability. I saw that you didn’t use this in your `euc_dist`

function. It’s not required, but may increase performance.

**Seanny123**#7

I ended up making an iterator, since I was repeatedly calculating the same view on an array.

```
"""
Duplicate data and indexing arithmetic to avoid
using a LIFO queue or negative indexing.
"""
mutable struct t_iter
data::Vector{Float64}
t::Vector{Float64}
length::Int
q_len::Int
function t_iter(data::Vector{Float64}, q_len::Int)
return new(data, zeros(Float64, 2*length(data)), length(data), q_len)
end
end
function Base.iterate(data::t_iter, d_i=1)
if d_i >= data.length
return nothing
end
dat = data.data[d_i]
t_idx = ((d_i - 1) % data.q_len) + 1
data.t[t_idx] = dat
data.t[t_idx + data.q_len] = dat
return ((d_i, dat, data.t), d_i+1)
end
```

**tkf**#8

Is `IterTools.partition`

what you need? There is also `Base.Iterators.partition`

but I guess you need overlapping sliding windows. Note that `IterTools.partition`

returns a tuple while `Base.Iterators.partition`

returns a `view`

. If your sub-sequence is small enough, `IterTools.partition`

sounds like a way to go. (It would be nice if `Base.Iterators.partition`

had the `step`

argument as in `IterTools.partition`

…)

**Seanny123**#9

Maybe? I’m still not at the point with my code (which is adapted from a C library) where I can confidently hold all the operations in my head.

**foobar_lv2**#10

Just in case you don’t know this approach, here the simple way for your ignore_bias using the convolution product.

Let m=length(query), N=length(data) and D[k]=data[k:k+m], let Q be the normalized query and DM[k] the constant vector such that the mean of D[k]-DM[k] is zero. Then, you need to compute for each k the following:

obj[k] = (D[k] - DM[k] -Q[k])^2 = D[k]^2 + DM[k]^2 + Q[k]^2 - 2<D[k],DM[k]> - 2 <D[k],Q> - 2 <DM[k],Q>

Now, <DM[k], Q> = <D[k], DM[k]> = 0 for all k, and hence your objective function is

obj[k] = D[k]^2 +Q^2 - 2<D[k],Q>

Now Q^2 is independent of the offset, and <D[k], Q>=convolution(D,Q)[k], and D[k]^2 can be computed for all k in O(N)

Computing the convolution: https://github.com/aamini/FastConv.jl might work, if you locally merge the PRs that fix this for julia 1.0. Otherwise, someone here will find you a package that does the job (I am not up to date with that part of the ecosystem).

If m is large, then you win a complexity class (via fast fourier transform for the convolution); if m is small, then you still win, because somebody else wrote fast simd/gpu code for convolutions with small kernels.

Homework exercise: Represent ignore_scale via convolutions.