Refactoring searching through a time-series


#1

I’ve written a short script to search a time-series for a specific sub-sequence. I’m having trouble refactoring the code. I’ve linked to question on Code Review StackExchange if someone wants to do a complete review. However, I was hoping I could get some general refactoring advice here for this type of code smell.


#2

Yours is a non - trivial refactoring question. Perhaps http://www.catb.org/esr/writings/taoup/html/optimizationchapter.html may be of help, not on optimizing your problem but on philosophically reconsidering it.


#3

That’s fair and has been something in the back of mind this whole time. My personal reason for posting this is it’s actually a reduced example of a library which has this pattern of looping and searching replicated/copy-pasted across several functions at this point. I’d really like to generalize the iteration pattern and re-use it somehow? Maybe with these “macros” which I don’t understand very clearly?


#4

I did not give your code a very close reading so it is possible I missed something, but I would do the following:

  1. write a function that returns the subsequences (using view) that the original one breaks up to,
  2. write two functions that normalize a subsequence, one by the mean, one by the standard deviation,
  3. write a function that does the comparisons.

#5

Macros do syntax transformations, I don’t think you need them here.


#6

That’s a very healthy attitude! I would not recommend macros for this. The way I usually solve it is to have a single function with your iteration pattern, and extract any usage-specific code to functions that you pass as arguments. Example:

function pairwise_iteration(op, vector::AbstractVector)
    s = zero(eltype(vector))
    for i = 1:2:length(vector)
        s += op(vector[i], vector[i+1])
    end
    s
end

Which can be called like this:

julia> pairwise_iteration(+, [1,2,3,4])
10

julia> pairwise_iteration(*, [1,2,3,4])
14

julia> my_op(a, b) = a^2 + b^3
my_op (generic function with 1 method)

julia> pairwise_iteration(my_op, [1,2,3,4])
82

Btw, also note the zero(eltype(vector)) for type stability. I saw that you didn’t use this in your euc_dist function. It’s not required, but may increase performance.