Refactoring searching through a time-series


I’ve written a short script to search a time-series for a specific sub-sequence. I’m having trouble refactoring the code. I’ve linked to question on Code Review StackExchange if someone wants to do a complete review. However, I was hoping I could get some general refactoring advice here for this type of code smell.


Yours is a non - trivial refactoring question. Perhaps may be of help, not on optimizing your problem but on philosophically reconsidering it.


That’s fair and has been something in the back of mind this whole time. My personal reason for posting this is it’s actually a reduced example of a library which has this pattern of looping and searching replicated/copy-pasted across several functions at this point. I’d really like to generalize the iteration pattern and re-use it somehow? Maybe with these “macros” which I don’t understand very clearly?


I did not give your code a very close reading so it is possible I missed something, but I would do the following:

  1. write a function that returns the subsequences (using view) that the original one breaks up to,
  2. write two functions that normalize a subsequence, one by the mean, one by the standard deviation,
  3. write a function that does the comparisons.


Macros do syntax transformations, I don’t think you need them here.


That’s a very healthy attitude! I would not recommend macros for this. The way I usually solve it is to have a single function with your iteration pattern, and extract any usage-specific code to functions that you pass as arguments. Example:

function pairwise_iteration(op, vector::AbstractVector)
    s = zero(eltype(vector))
    for i = 1:2:length(vector)
        s += op(vector[i], vector[i+1])

Which can be called like this:

julia> pairwise_iteration(+, [1,2,3,4])

julia> pairwise_iteration(*, [1,2,3,4])

julia> my_op(a, b) = a^2 + b^3
my_op (generic function with 1 method)

julia> pairwise_iteration(my_op, [1,2,3,4])

Btw, also note the zero(eltype(vector)) for type stability. I saw that you didn’t use this in your euc_dist function. It’s not required, but may increase performance.