[ANN] UcrDtw.jl: a "fast" implementation of Dynamic Time Warping

Seanny123 · February 13, 2019, 3:48pm

Dynamic Time Warping (DTW) is a way to search a time-series for a given pattern. Basically, instead of Euclidian Distance (ED) for comparing the pattern to the time-series, DTW can be used and allow for more useful matches.

Typically, DTW is significantly slower than ED, if implemented naively. Luckily, Eammon Keogh has spent a chunk of his career optimizing the heck out of it. This package, UcrDtw.jl, is an re-implementation/translation of the UCR Suite of optimizations, previously published as C code. It fixes some bugs and adds some features.

The term “fast” is in quotation marks, because I haven’t done that much profiling, but I have implemented all the optimizations. Help in this area would be appreciated!

Edit for additional details:

Currently, it does not support multi-variate time series, but I’m currently working on the feature. Additionally, for the multi-variate case, I want to compare it to Matrix Profile, which I think may be better for correlated dimensions.

Datseris · February 13, 2019, 3:56pm

I don’t understand the choice of using GitLab. This means that you won’t be able to register it in the general registry and we will never be able to do ]add UcrDtw. What is the rationale for this?

Seanny123 · February 13, 2019, 3:58pm

My company is using GitLab, so I have to use it.

ssfrr · February 13, 2019, 4:20pm

I don’t think this is true. The registry contains a git URL, but I don’t think it cares where that URL is hosted. Sean won’t be able to use Attobot to do releases (which is pretty convenient for maintainers) but that shouldn’t affect user’s ability to install the package as normal.

jpsamaroo · February 13, 2019, 5:58pm

This is correct, and things will eventually change so that Gitlab and other git platforms are supported for automatic merging into General.

mbauman · February 19, 2019, 5:25pm

8 posts were split to a new topic: Licensing of UcrDtw.jl (split from its announcement)

ExpandingMan · February 13, 2019, 6:06pm

I think everyone would agree that ultimately we are going to need a way to register and update packages that are not on github, regardless.

ssfrr · February 13, 2019, 6:29pm

I’m super glad to see more DTW support. DTW is really useful in a lot of audio processing contexts, and the other implementations don’t seem like they’re maintained, so this is a welcome development. Thanks for putting the work in!

sairus7 · February 13, 2019, 9:13pm

Very interesting!

I have several questions:

How do you handle noisy edges, when S is a squeezed version of Q?
According to the project description,

Return the maximum distance encountered during the search. This allows for mapping the distance values onto a similarity range [0, 1].

What other normalization methods do you use to threshold the resulted similarity? How about transforming DTW distance to the correlation coefficient? I know that Euclidean distance can be easily transformed to it, given running means, std and window length.

What’s the speed difference comparing to the original C implementation?
What about smoothing path projections, like shapeDTW, in particular DTW of subsequences / motifs?

Seanny123 · February 14, 2019, 2:55pm

That isn’t covered yet, but I did find a promising paper for this use-case, which I document in an issue.

The goal of the similarity was to give a more human-readable number. Do people generally intuitively understand the correlation coefficient?

Working on that. First, I want to finish the multi-variate code, then I’ll start the comparison.

I’ve never heard of shapeDTW! It seems quite promising. I’ll curious how it compares to Matrix Profile, since they both seem concerned with sub-sequences. However, it does seem easier to implement than Matrix Profile…

sairus7 · February 14, 2019, 4:02pm

As far as I know, the main idea of this is that you can transform your series into some feature series/sequence and then optimize distance in feature space, just like multi-variate DTW.

One simple feature series is to pick not just one i-th sample, but some short motif of neighbouring samples (a region [i-d : i+d]). More than that, this region can be further decimated just to 3-dot motif: x[i-d], x[i], x[i+d] (if it is smooth enough). See my implementation of this, maybe it will help optimizing it:

Source code

@inline eucdist(x, y) = abs2(x-y) #(x-y)*(x-y)
@inline absdist(x, y) = abs(x-y)

# multi-variate version
function _dtw!(D::Matrix{T}, x1::AbstractArray{T}, x2::AbstractArray{T}, w::Int, dist = eucdist) where T<:Real
    len1, len2 = size(x1, 1), size(x2, 1)
    d1 = ndims(x1) == 1 ? 1 : size(x1, 2)
    d  = ndims(x2) == 1 ? 1 : size(x2, 2)
    d == d1 || error("DTW: Dimensions mismatch")
    size(D,1) >= len1 && size(D,2) >= len2 || error("DTW: Maximum length exceeded") # проверка превышения длины
    # dist = eucdist

    # Initialize first column and first row
    D[1,1] = dist(x1[1,1], x2[1,1])
    @inbounds for k=2:d
        D[1,1] += dist(x1[1,k], x2[1,k])
    end
    prevD = D[1,1]
    @inbounds for r=2:len1
        c = max(r-w, 1); # c = 1
        D[r,c] = prevD
        for k=1:d
            D[r,c] += dist(x1[r,k], x2[c,k]) #(x1[r] - x2[1])*(x1[r] - x2[1])
        end
        prevD = D[r,c]
    end
    prevD = D[1,1]
    @inbounds for c=2:len2
        r = max(c-w, 1); # c = 1
        D[1,c] = prevD
        for k=1:d
            D[r,c] += dist(x1[r,k], x2[c,k]) #(x1[1] - x2[c])*(x1[1] - x2[c])
        end
        prevD = D[r,c]
    end
    # Complete the cost matrix
    @inbounds for c=2:len2, r = max(c-w+1, 2) : min(c+w-1, len1) # r=2:len1
        D[r,c] = min(D[r-1,c], D[r-1,c-1], D[r,c-1])
        for k=1:d
            D[r,c] += dist(x1[r,k], x2[c,k]) # (x1[r] - x2[c])*(x1[r] - x2[c])
        end
    end
    D[len1, len2]
end

# 3-dot version
function _dtw!(D::Matrix{T}, x1::AbstractArray{T}, x2::AbstractArray{T}, w::Int, dt::Int, dist = eucdist) where T<:Real
    len1, len2 = size(x1, 1), size(x2, 1)
    d1 = ndims(x1) == 1 ? 1 : size(x1, 2)
    d  = ndims(x2) == 1 ? 1 : size(x2, 2)
    d == d1 || error("DTW: Dimensions mismatch")
    size(D,1) >= len1 && size(D,2) >= len2 || error("DTW: Maximum length exceeded") # проверка превышения длины

    dt2 = 2*dt
    len1 -= dt2
    len2 -= dt2

    # Initialize first column and first row
    D[1,1] = dist(x1[1,1], x2[1,1]) + dist(x1[1+dt,1], x2[1+dt,1]) + dist(x1[1+dt2,1], x2[1+dt2,1])
    @inbounds for k=2:d
        D[1,1] += dist(x1[1,k], x2[1,k]) + dist(x1[1+dt,k], x2[1+dt,k]) + dist(x1[1+dt2,k], x2[1+dt2,k])
    end
    prevD = D[1,1]
    @inbounds for r=2:len1
        c = max(r-w, 1); # c = 1
        D[r,c] = prevD
        for k=1:d
            D[r,c] += dist(x1[r,k], x2[c,k]) + dist(x1[r+dt,k], x2[c+dt,k]) + dist(x1[r+dt2,k], x2[c+dt2,k])
        end
        prevD = D[r,c]
    end
    prevD = D[1,1]
    @inbounds for c=2:len2
        r = max(c-w, 1); # r = 1
        D[r,c] = prevD
        for k=1:d
            D[r,c] += dist(x1[r,k], x2[c,k]) + dist(x1[r+dt,k], x2[c+dt,k]) + dist(x1[r+dt2,k], x2[c+dt2,k])
        end
        prevD = D[r,c]
    end
    # Complete the cost matrix
    @inbounds for c=2:len2
        for r = max(c-w+1, 2) : min(c+w-1, len1) # r=2:len1
            D[r,c] = min(D[r-1,c], D[r-1,c-1], D[r,c-1])
            for k=1:d
                D[r,c] += dist(x1[r,k], x2[c,k]) + dist(x1[r+dt,k], x2[c+dt,k]) + dist(x1[r+dt2,k], x2[c+dt2,k])
            end
        end
    end
    D[len1, len2]
end

What about Matrix Profile, there are many domain-specific performance gains not covered in it (as far as I know):

online clustering optimizations,
template shape restrictions;
template-specific “cheap” rules for prior region exclusion (or interest region selection).

Such optimizations can dramatically improve performance for many concrete applications - in a sense that your number of actual DTW checks tend to num_of_templates x num_of_matches, or even less…

ssfrr · February 14, 2019, 4:58pm

So currently multivariate data isn’t supported?

This is the approach I’m familiar from using DTW on audio. Generally you’re not matching up individual samples but some feature vector (usually some kind of spectral features).

Seanny123 · February 14, 2019, 5:20pm

Yeah, I haven’t finished writing that for-loop yet…

gdkrmr · February 19, 2019, 4:45pm

This method seems really interesting! Can it deal with multivariate timeseries?

Seanny123 · February 19, 2019, 4:59pm

Not yet, but I’m working on it. Gotta resolve a small bug first though.

gdkrmr · February 19, 2019, 8:51pm

sorry for not reading the entire thread, seems really exciting!

Seanny123 · March 25, 2019, 4:21pm

Just merged a PR which now allows for searching a multi-variate time-series!

yurivish · May 12, 2019, 2:14am

It seems this package is no longer at the URL in the original post. Is it still available?

Seanny123 · May 13, 2019, 12:34pm

Unfortunately, the company I work for wanted to delay open-sourcing the code, so I had to make it private again.

zsykm · May 14, 2019, 6:06am

Is it possible to host a copy of the UcrDtw.jl on GitHub? Would be more convenient.

Topic		Replies	Views
[ANN] DynamicAxisWarping.jl Package Announcements package , time-series , dsp	5	1351	August 26, 2020
Dynamic Time Warping in Julia Signal and Image Processing	7	3386	April 24, 2020
Technical question about DelayEmbeddings.jl Specific Domains	12	634	August 14, 2021
Time-series in Julia (working list) Statistics time-series	40	13583	July 10, 2025
KernelMatrices.jl - A software package for working with hierarchical matrices Package Announcements package	22	5914	August 9, 2024

[ANN] UcrDtw.jl: a "fast" implementation of Dynamic Time Warping

Related topics