Dynamic time warping (DTW)¶
Robot 1
It is my understanding that dynamic time warping attempts to align the endpoint of series that may not be entirely overlapping.
Consider my client's use case, which involves series of movie KPIs from upcoming releases. They get 1020 weeks of KPIs leading up to a movie's opening weekend. Clearly many time series are not overlapping, but relatively they could be lined up (like 1 week from opening, 2 weeks from opening, etc.). They could do this in R/Python, but I was thinking time series clustering might be able to handle this.
What do I need to know—like series length limitations or minimal overlapping time periods, etc.? Is my understanding of dynamic time warping even correct?
Robot 2
Well it would be more about the points in the middle generally rather than the ends
Robot 3
For running time series clustering, you need:

10 or more series.

If you want K clusters, you need at least K series with 20+ time steps. (So if you specify 3 clusters, at least three of your series need to be of length 20 or greater.)

If you took the union of all your series, the union needs to collectively span at least 35 time steps.
Robot 3
In DR, the process of DTW is handled during model building—it shouldn’t require any adjustment from the user. If it errors out, flag it for us so we can see why.
Under the hood
 When you press Start in clustering for time series, some kmeans blueprints involve DTW (others involve a related technique called SoftDTW) and then kmeans is applied to cluster the series.

The goal of DTW is to align the series. For example,
sin(x)
, sin(x)
, andcos(x)
are all different functions, but follow the same pattern. DTW will (trivially) shift these functions so their peaks and valleys line up. Their distance would effectively be zero, after applying DTW. 
But... DTW can do more than shifting; it can impute repeated values in the middle of a series, like Robot 2 mentioned.
(Image pulled from here. That site has lots of good moving images; I’m texting from the car so can’t get them to copy over.)
In the image, the left example is straight up Euclidean distance. You just take each time point, calculate the distance between the two series at that moment in time, then square and add them. That’s your Euclidean distance.
In DTW, the top series (I’ll call it T
) is mapped to the bottom (B
) in a more complicated way.
 We’re gonna create 2 new series:
T*
andB*
, and calculate the Euclidean distance between those.
T1 (the first observation in T
) is mapped to B1 through B6. This means, to “match up” T
and B
, T1 is going to be copied 6 times. So, T*1 = T1, T*2 = T1, … T*6 = T1
.
 That is, DTW takes
T
and stretches it out by making five copies of T1, so for thatregionoftime
, T and B are aligned. It’s kind of like shifting part of the series, but using “last value carried forward imputation” to do it. (I’m sure the video on the site will show it better than I’ve described, sorry.)
Then, T*7 = T2, T*8 = T3
and so on.
So far, B* = B
for like the first 25 steps or so. Fast forward to the right side of the valley. Since B
is mapped to multiple places in T
, we’re gonna define B
here to take on multiple copies of B
.
B*25 = B25
B*26 = B25
...
B*30 = B25
Then, B31 = B26, B32 = B27
, and so on.
At the end, this should mean that B*
and T*
are the same length. Then, we calculate the Euclidean distance between B*
and T*
.
Robot 3
So, Robot 1, in more detail than you wanted (and almost surely way sloppier!), the starting and ending points get aligned but there’s additional imputation operations that happen inside that stretch the series.
Robot 3
And in DataRobot all this happens under the hood. So, thank a developer 😉
Robot 1
Hey Robot 3, the client was very appreciative of this information, so thank you! They did ask if there was any documentation on the guardrails/constraints around time series clustering. Do we have them published somewhere?
Robot 4
We have that information in the documentation!