An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Authors

1 Computer, Bu-Ali Sina University

2 Computer Engineering, Bu-Ali Sina University

Abstract

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative studies using quantitative and large scale evaluations.  In order to provide a comprehensive validation, an extensive evaluation of similarity measures for MTS clustering were conducted. The 14 well-known similarity measures with their variants and testing their effectiveness on 23 MTS datasets coming from a wide variety of application domains were re-implemented. In this paper, an overview of these different techniques is given and the empirical comparison regarding their effectiveness based on agglomerative clustering task is presented. Furthermore, the statistical significance tests were used to derive meaningful conclusions. It has been found that all of similarity measures are equivalent, in terms of clustering F-measure, and there is no significant difference between similarity measures based on our datasets. The results provide a comparative background between similarity measures to find the most proper method in terms of performance and computation time in this field.

Keywords