[ANN] UcrDtw.jl: a "fast" implementation of Dynamic Time Warping

Dynamic Time Warping (DTW) is a way to search a time-series for a given pattern. Basically, instead of Euclidian Distance (ED) for comparing the pattern to the time-series, DTW can be used and allow for more useful matches.

Typically, DTW is significantly slower than ED, if implemented naively. Luckily, Eammon Keogh has spent a chunk of his career optimizing the heck out of it. This package, UcrDtw.jl, is an re-implementation/translation of the UCR Suite of optimizations, previously published as C code. It fixes some bugs and adds some features.

The term ā€œfastā€ is in quotation marks, because I havenā€™t done that much profiling, but I have implemented all the optimizations. Help in this area would be appreciated!

Edit for additional details:

Currently, it does not support multi-variate time series, but Iā€™m currently working on the feature. Additionally, for the multi-variate case, I want to compare it to Matrix Profile, which I think may be better for correlated dimensions.

22 Likes

I donā€™t understand the choice of using GitLab. This means that you wonā€™t be able to register it in the general registry and we will never be able to do ]add UcrDtw. What is the rationale for this?

My company is using GitLab, so I have to use it.

4 Likes

I donā€™t think this is true. The registry contains a git URL, but I donā€™t think it cares where that URL is hosted. Sean wonā€™t be able to use Attobot to do releases (which is pretty convenient for maintainers) but that shouldnā€™t affect userā€™s ability to install the package as normal.

9 Likes

This is correct, and things will eventually change so that Gitlab and other git platforms are supported for automatic merging into General.

2 Likes

8 posts were split to a new topic: Licensing of UcrDtw.jl (split from its announcement)

I think everyone would agree that ultimately we are going to need a way to register and update packages that are not on github, regardless.

6 Likes

Iā€™m super glad to see more DTW support. DTW is really useful in a lot of audio processing contexts, and the other implementations donā€™t seem like theyā€™re maintained, so this is a welcome development. Thanks for putting the work in!

3 Likes

Very interesting!

I have several questions:

  1. How do you handle noisy edges, when S is a squeezed version of Q?

  2. According to the project description,

Return the maximum distance encountered during the search. This allows for mapping the distance values onto a similarity range [0, 1].

What other normalization methods do you use to threshold the resulted similarity? How about transforming DTW distance to the correlation coefficient? I know that Euclidean distance can be easily transformed to it, given running means, std and window length.

  1. Whatā€™s the speed difference comparing to the original C implementation?

  2. What about smoothing path projections, like shapeDTW, in particular DTW of subsequences / motifs?

1 Like

That isnā€™t covered yet, but I did find a promising paper for this use-case, which I document in an issue.

The goal of the similarity was to give a more human-readable number. Do people generally intuitively understand the correlation coefficient?

Working on that. First, I want to finish the multi-variate code, then Iā€™ll start the comparison.

Iā€™ve never heard of shapeDTW! It seems quite promising. Iā€™ll curious how it compares to Matrix Profile, since they both seem concerned with sub-sequences. However, it does seem easier to implement than Matrix Profileā€¦

2 Likes

As far as I know, the main idea of this is that you can transform your series into some feature series/sequence and then optimize distance in feature space, just like multi-variate DTW.

One simple feature series is to pick not just one i-th sample, but some short motif of neighbouring samples (a region [i-d : i+d]). More than that, this region can be further decimated just to 3-dot motif: x[i-d], x[i], x[i+d] (if it is smooth enough). See my implementation of this, maybe it will help optimizing it:

Source code
@inline eucdist(x, y) = abs2(x-y) #(x-y)*(x-y)
@inline absdist(x, y) = abs(x-y)

# multi-variate version
function _dtw!(D::Matrix{T}, x1::AbstractArray{T}, x2::AbstractArray{T}, w::Int, dist = eucdist) where T<:Real
    len1, len2 = size(x1, 1), size(x2, 1)
    d1 = ndims(x1) == 1 ? 1 : size(x1, 2)
    d  = ndims(x2) == 1 ? 1 : size(x2, 2)
    d == d1 || error("DTW: Dimensions mismatch")
    size(D,1) >= len1 && size(D,2) >= len2 || error("DTW: Maximum length exceeded") # ŠæрŠ¾Š²ŠµŃ€ŠŗŠ° ŠæрŠµŠ²Ń‹ŃˆŠµŠ½Šøя Š“Š»ŠøŠ½Ń‹
    # dist = eucdist

    # Initialize first column and first row
    D[1,1] = dist(x1[1,1], x2[1,1])
    @inbounds for k=2:d
        D[1,1] += dist(x1[1,k], x2[1,k])
    end
    prevD = D[1,1]
    @inbounds for r=2:len1
        c = max(r-w, 1); # c = 1
        D[r,c] = prevD
        for k=1:d
            D[r,c] += dist(x1[r,k], x2[c,k]) #(x1[r] - x2[1])*(x1[r] - x2[1])
        end
        prevD = D[r,c]
    end
    prevD = D[1,1]
    @inbounds for c=2:len2
        r = max(c-w, 1); # c = 1
        D[1,c] = prevD
        for k=1:d
            D[r,c] += dist(x1[r,k], x2[c,k]) #(x1[1] - x2[c])*(x1[1] - x2[c])
        end
        prevD = D[r,c]
    end
    # Complete the cost matrix
    @inbounds for c=2:len2, r = max(c-w+1, 2) : min(c+w-1, len1) # r=2:len1
        D[r,c] = min(D[r-1,c], D[r-1,c-1], D[r,c-1])
        for k=1:d
            D[r,c] += dist(x1[r,k], x2[c,k]) # (x1[r] - x2[c])*(x1[r] - x2[c])
        end
    end
    D[len1, len2]
end

# 3-dot version
function _dtw!(D::Matrix{T}, x1::AbstractArray{T}, x2::AbstractArray{T}, w::Int, dt::Int, dist = eucdist) where T<:Real
    len1, len2 = size(x1, 1), size(x2, 1)
    d1 = ndims(x1) == 1 ? 1 : size(x1, 2)
    d  = ndims(x2) == 1 ? 1 : size(x2, 2)
    d == d1 || error("DTW: Dimensions mismatch")
    size(D,1) >= len1 && size(D,2) >= len2 || error("DTW: Maximum length exceeded") # ŠæрŠ¾Š²ŠµŃ€ŠŗŠ° ŠæрŠµŠ²Ń‹ŃˆŠµŠ½Šøя Š“Š»ŠøŠ½Ń‹

    dt2 = 2*dt
    len1 -= dt2
    len2 -= dt2

    # Initialize first column and first row
    D[1,1] = dist(x1[1,1], x2[1,1]) + dist(x1[1+dt,1], x2[1+dt,1]) + dist(x1[1+dt2,1], x2[1+dt2,1])
    @inbounds for k=2:d
        D[1,1] += dist(x1[1,k], x2[1,k]) + dist(x1[1+dt,k], x2[1+dt,k]) + dist(x1[1+dt2,k], x2[1+dt2,k])
    end
    prevD = D[1,1]
    @inbounds for r=2:len1
        c = max(r-w, 1); # c = 1
        D[r,c] = prevD
        for k=1:d
            D[r,c] += dist(x1[r,k], x2[c,k]) + dist(x1[r+dt,k], x2[c+dt,k]) + dist(x1[r+dt2,k], x2[c+dt2,k])
        end
        prevD = D[r,c]
    end
    prevD = D[1,1]
    @inbounds for c=2:len2
        r = max(c-w, 1); # r = 1
        D[r,c] = prevD
        for k=1:d
            D[r,c] += dist(x1[r,k], x2[c,k]) + dist(x1[r+dt,k], x2[c+dt,k]) + dist(x1[r+dt2,k], x2[c+dt2,k])
        end
        prevD = D[r,c]
    end
    # Complete the cost matrix
    @inbounds for c=2:len2
        for r = max(c-w+1, 2) : min(c+w-1, len1) # r=2:len1
            D[r,c] = min(D[r-1,c], D[r-1,c-1], D[r,c-1])
            for k=1:d
                D[r,c] += dist(x1[r,k], x2[c,k]) + dist(x1[r+dt,k], x2[c+dt,k]) + dist(x1[r+dt2,k], x2[c+dt2,k])
            end
        end
    end
    D[len1, len2]
end

What about Matrix Profile, there are many domain-specific performance gains not covered in it (as far as I know):

  • online clustering optimizations,
  • template shape restrictions;
  • template-specific ā€œcheapā€ rules for prior region exclusion (or interest region selection).

Such optimizations can dramatically improve performance for many concrete applications - in a sense that your number of actual DTW checks tend to num_of_templates x num_of_matches, or even lessā€¦

1 Like

So currently multivariate data isnā€™t supported?

This is the approach Iā€™m familiar from using DTW on audio. Generally youā€™re not matching up individual samples but some feature vector (usually some kind of spectral features).

Yeah, I havenā€™t finished writing that for-loop yetā€¦

This method seems really interesting! Can it deal with multivariate timeseries?

Not yet, but Iā€™m working on it. Gotta resolve a small bug first though.

3 Likes

sorry for not reading the entire thread, seems really exciting!

Just merged a PR which now allows for searching a multi-variate time-series!

10 Likes

It seems this package is no longer at the URL in the original post. Is it still available?

Unfortunately, the company I work for wanted to delay open-sourcing the code, so I had to make it private again. :frowning_face:

1 Like

Is it possible to host a copy of the UcrDtw.jl on GitHub? Would be more convenient.