A Shapelet-based Framework for Unsupervised Multivariate Time Series Representation Learning 基于 Shapelet 的无监督多变量时间序列表示学习框架
Zhiyu Liang 梁志宇Harbin Institute of Technology 哈尔滨工业大学Harbin, China 中国哈尔滨zyliang@hit.edu.cnHongzhi Wang* 王洪志*Harbin Institute of Technology 哈尔滨工业大学Harbin, China 中国哈尔滨wangzh@hit.edu.cn
Jianfeng Zhang 张建峰Huawei Noah's Ark Lab 华为诺亚方舟实验室Shenzhen, China 中国深圳zhangjianfeng3@huawei.com
Chen Liang 陈亮Harbin Institute of Technology 哈尔滨工业大学Harbin, China 中国哈尔滨1190201818@stu.hit.edu.cn
Zheng Liang 郑亮Harbin Institute of Technology 哈尔滨工业大学Harbin, China 中国哈尔滨lz20@hit.edu.cn
Lujia Pan 卢家盘Huawei Noah's Ark Lab 华为诺亚方舟实验室Shenzhen, China 中国深圳panlujia@huawei.com
Abstract 摘要
Recent studies have shown great promise in unsupervised representation learning (URL) for multivariate time series, because URL has the capability in learning generalizable representation for many downstream tasks without using inaccessible labels. However, existing approaches usually adopt the models originally designed for other domains (e.g., computer vision) to encode the time series data and rely on strong assumptions to design learning objectives, which limits their ability to perform well. To deal with these problems, we propose a novel URL framework for multivariate time series by learning time-series-specific shapelet-based representation through a popular contrasting learning paradigm. To the best of our knowledge, this is the first work that explores the shapelet-based embedding in the unsupervised general-purpose representation learning. A unified shapelet-based encoder and a novel learning objective with multi-grained contrasting and multi-scale alignment are particularly designed to achieve our goal, and a data augmentation library is employed to improve the generalization. We conduct extensive experiments using tens of real-world datasets to assess the representation quality on many downstream tasks, including classification, clustering, and anomaly detection. The results demonstrate the superiority of our method against not only URL competitors, but also techniques specially designed for downstream tasks. Our code has been made publicly available at https://github.com/real2fish/CSL. 最近的研究表明,无监督表示学习(URL)在多变量时间序列领域展现出巨大潜力,因为 URL 具备在不依赖不可访问标签的情况下,为多种下游任务学习通用表示的能力。然而,现有方法通常采用最初为其他领域(如计算机视觉)设计的模型来编码时间序列数据,并依赖强假设来设计学习目标,这限制了其性能。为解决上述问题,我们提出了一种基于对比学习范式的多变量时间序列 URL 框架,通过学习时间序列特异性 shapelet-based 表示来实现。据我们所知,这是首次在无监督通用表示学习中探索 shapelet-based 嵌入。我们特别设计了一个统一的基于形状的编码器和一个新型学习目标,结合多粒度对比和多尺度对齐,以实现我们的目标,并采用数据增强库来提升泛化能力。我们通过数十个真实世界数据集的实验,评估了在分类、聚类和异常检测等下游任务上的表示质量。实验结果表明,我们的方法不仅优于 URL 竞争方法,还优于专门为下游任务设计的其他技术。我们的代码已开源发布于 https://github.com/real2fish/CSL。
PVLDB Reference Format: PVLDB 参考格式:
Zhiyu Liang, Jianfeng Zhang, Chen Liang, Hongzhi Wang, Zheng Liang, and Lujia Pan. A Shapelet-based Framework for Unsupervised Multivariate Time Series Representation Learning. PVLDB, xx(xx)\mathrm{xx}(\mathrm{xx}) : xxx-xxx,xx\mathrm{xxx}-\mathrm{xxx}, \mathrm{xx}. doi:xx.xxx/xx.xxx 梁志宇、张建峰、梁晨、王红志、梁正、潘露佳. 基于 Shapelet 的无监督多变量时间序列表示学习框架. PVLDB, xx(xx)\mathrm{xx}(\mathrm{xx}) : xxx-xxx,xx\mathrm{xxx}-\mathrm{xxx}, \mathrm{xx} . doi:xx.xxx/xx.xxx
Multivariate time series (MTS) generally describes a group of dependent variables evolving over time, each of which represents a monitoring metric (e.g., temperature or CPU utilization) of an entity (e.g., system or service). MTS data play a vital role in many practical scenarios, such as manufacturing, medicine, and finance [4, 24, 42]. 多变量时间序列(MTS)通常描述一组随时间演变的相互关联的变量,每个变量代表实体(如系统或服务)的某个监测指标(如温度或 CPU 利用率)。MTS 数据在许多实际场景中发挥着关键作用,例如制造、医疗和金融领域[4, 24, 42]。
While MTS are being increasingly collected from various applications, a particular challenge in modeling them is the lack of labels. Unlike images or text that usually contain human-recognizable patterns, label acquisition for time series is much more difficult, because the underlying state of these time-evolving signals can be too complicated even for the domain specialists [46]. For this reason, it has recently become a research focus to explore unsupervised (a.k.a. self-supervised) representation learning (URL) for MTS [10, 56, 59, 60]. URL aims to train a neural network (called encoder) without accessing the labels to embed the data into feature vectors, by using carefully designed learning objectives to leverage the inherent structure of the raw data. The learned representations (a.k.a. features or embeddings) can then be used for training models to solve a downstream analysis task using little annotated data compared to the traditional supervised methods [60]. And the features are more general-purpose since they can facilitate to several tasks. 随着 MTS 在各种应用中被越来越广泛地收集,对其建模面临的一个特别挑战是缺乏标签。与通常包含人类可识别模式的图像或文本不同,时间序列的标签获取要困难得多,因为这些随时间演变的信号的底层状态可能过于复杂,甚至连领域专家都难以理解[46]。因此,最近的研究重点转向了探索无监督(即自监督)表示学习(URL)在 MTS 中的应用[10, 56, 59, 60]。URL 旨在通过精心设计的学习目标,利用原始数据的内在结构,在不访问标签的情况下训练神经网络(称为编码器),将数据嵌入特征向量中。所学习的表示(即特征或嵌入)可用于训练模型,以在与传统监督学习方法相比使用少量标注数据的情况下解决下游分析任务[60]。且这些特征具有更强的通用性,可适用于多种任务。
Unfortunately, unlike in domains such as computer vision (CV)[8, 27, 29] or natural language processing (NLP) [12, 65], URL in the context of time series is still under-explored. MTS are typically continuous-valued data with high noise, diverse temporal patterns and varying semantic meanings, etc [3]. These unique complexities make advanced URL methods in the aforementioned domains difficult to perform well [10,46]. Although several studies have attempted to fill this gap by considering the characteristics of time series, such as the time-evolving nature [46] and the multi-scale semantics [11, 59], existing approaches can still be weak in learning well-performed representations partly due to the following reasons. 遗憾的是,与计算机视觉(CV)[8, 27, 29]或自然语言处理(NLP)[12, 65]等领域不同,时间序列中的 URL 研究仍处于起步阶段。时间序列通常是连续值数据,具有高噪声、多样化的时间模式和变化的语义含义等特征[3]。这些独特复杂性使得在上述领域中应用先进的 URL 方法难以取得良好效果[10,46]。尽管已有研究尝试通过考虑时间序列的特性(如时间演变性质[46]和多尺度语义[11,59])来填补这一空白,但现有方法在学习高性能表示方面仍存在不足,部分原因如下。
First, the existing representation encoder designs are highly inspired by experiences in CV and NLP domains, which may not be well-suited for MTS. Specifically, convolutional neural network (CNN) [16, 49] and Transformer [51] are commonly-used encoders in recent studies [10, 11, 46, 56, 59, 60]. However, the encoders still face many difficulties when applying in MTS due to the lack of capability to deal with the characteristics of time series [32,45,66]. 首先,现有的表示编码器设计深受计算机视觉(CV)和自然语言处理(NLP)领域经验的启发,这可能并不适合运动图像序列(MTS)的处理。具体而言,卷积神经网络(CNN)[16, 49]和 Transformer[51]是近期研究中常用的编码器[10, 11, 46, 56, 59, 60]。然而,由于缺乏处理时间序列特性的能力[32,45,66],这些编码器在应用于 MTS 时仍面临诸多挑战。
Second, some existing approaches rely on domain-specific assumptions, such as the neighbor similarity [11, 46] and the contextual consistency [59], thus are difficult to generalize to various scenarios. For instance, Franceschi et al. [11] and Tonekaboni et al. [46] assume that subsequences distant in time should be dissimilar, which can be easily violated in periodic time series [43]. 其次,一些现有方法依赖于特定领域的假设,例如邻域相似性[11, 46]和上下文一致性[59],因此难以推广到各种场景。例如,Franceschi 等[11]和 Tonekaboni 等[46]假设时间上相距较远的子序列应不相似,这一假设在周期性时间序列[43]中容易被违反。
To tackle the issues mentioned above, we explore the time-series-specific representation encoder without strong assumptions for URL. In particular, we consider the encoder based on a nonparametric time series analysis concept named shapelet [58], i.e. salient subsequence which is tailored to extract time series features from only important time windows to avoid the noises outside. The main reason is that the shapelet-based representation has shown superior performance in specific tasks such as classification [25,33,54] and clustering [61]. Besides, compared to the feature extracted from other neural networks such as CNN, the shapelet-based feature can be more intuitive to understand [58]. However, it has never been explored in the recently rising topic of URL for general-purpose representation. To fill this gap, we take the first step and propose to learn shapelet-based encoder employing contrastive learning, a popular paradigm that has shown success in URL [8, 10, 59, 64]. 为了解决上述问题,我们探索了一种无需对 URL 做强假设的时间序列专用表示编码器。具体而言,我们采用了一种基于非参数时间序列分析概念的编码器,即 shapelet[58],即突出子序列,该方法专门用于从仅重要的时间窗口中提取时间序列特征,以避免外部噪声。主要原因是,基于 shapelet 的表示在分类[25,33,54]和聚类[61]等特定任务中表现优异。此外,与从其他神经网络(如卷积神经网络)中提取的特征相比,基于 shapelet 的特征更直观易懂[58]。然而,这一方法从未被应用于近期兴起的通用表示任务——URL。为填补这一空白,我们迈出第一步,提出基于对比学习的 Shapelet 编码器,该范式在 URL 任务中已取得成功[8, 10, 59, 64]。
We highlight three challenges in learning high-quality and generalpurpose shapelet-based representation. The first is how to design a shapelet-based encoder to capture diverse temporal patterns of various time ranges, considering that it is originally proposed to represent only a single shape feature, and exhaustive search or prior knowledge is needed to determine the encoding scale [5, 25, 61]. The second is how to design a URL objective to learn general information for downstream tasks through this shapelet-based encoder, which has never been studied. Last, while contrastive learning leverages the representation similarity of the augmentations of one sample [8] to learn the encoder, it remains an open problem to properly augment the time series to keep the similarity [46, 59]. 我们指出在学习高质量且通用型 shapelet 基表示时面临的三个挑战。第一个挑战是如何设计一个 shapelet 基编码器,以捕获不同时间范围内的多样化时序模式。需要注意的是,shapelet 最初仅被提出用于表示单一形状特征,且确定编码尺度需要进行穷举搜索或借助先验知识[5, 25, 61]。第二,如何设计一个 URL 目标函数,通过该 shapelet 编码器学习通用信息以支持下游任务,这一问题尚未被研究。最后,虽然对比学习通过利用单个样本增强的表示相似性来学习编码器[8],但如何恰当地增强时间序列以保持相似性仍是一个开放问题[46, 59]。
To cope with these challenges, we propose a novel unsupervised MTS representation learning framework named Contrastive Shapelet Learning (CSL). Specifically, we design a unified architecture that uses multiple shapelets with various (dis)similarity measures and lengths to jointly encode a sample, such that to capture diverse temporal patterns from short to long term. As shapelets of different lengths can separately embed one sample into different representation spaces that are complementary with each other, we propose a multi-grained contrasting objective to simultaneously consider the joint embedding and the representations at each time scale. In parallel, we design a multi-scale alignment loss to encourage the representations of different scales to achieve consensus. The basic idea is to automatically capture the varying semantics by leveraging the intra-scale and inter-scale dependencies of the shapelet-based embedding. Besides, we develop an augmentation library using diverse types of data augmentation methods to further improve the representation quality. To the best of our knowledge, CSL is the first general-purpose URL framework based on shapelets. The main contributions are summarized as follows: 为应对这些挑战,我们提出了一种新型无监督多时序表示学习框架,名为对比性形状集学习(Contrastive Shapelet Learning,CSL)。具体而言,我们设计了一种统一架构,利用具有不同(不)相似度度量和长度的多个形状集,共同编码一个样本,从而捕捉从短期到长期的多样化时序模式。由于不同长度的 Shapelet 可以将同一个样本分别嵌入到互补的表示空间中,我们提出了一个多粒度对比目标,以同时考虑联合嵌入和每个时间尺度上的表示。同时,我们设计了一个多尺度对齐损失,以鼓励不同尺度的表示达成共识。基本思想是通过利用 shapelet 嵌入的尺度内和尺度间依赖性,自动捕获变化的语义。此外,我们开发了一个增强库,使用多种数据增强方法进一步提升表示质量。据我们所知,CSL 是首个基于 shapelet 的通用 URL 框架。主要贡献总结如下:
This paper studies how to improve the URL performance using time-series-specific shapelet-based representation, which has achieved success in specific tasks but has never been explored for the general-purpose URL. 本文研究了如何通过时间序列特异性 Shapelet 表示方法提升 URL 性能。该方法在特定任务中已取得成功,但从未被用于通用 URL 场景。
A novel framework is proposed that adopts contrastive learning to learn shapelet-based representations. A unified shapeletbased encoder architecture and a learning objective with multigrained contrasting and multi-scale alignment are particularly designed to capture diverse patterns in various time ranges. A library containing various types of data augmentation methods is constructed to improve the representation quality. 本文提出了一种基于对比学习的全新框架,用于学习基于形状的表示。该框架特别设计了统一的基于形状的编码器架构,以及具有多粒度对比和多尺度对齐的学习目标,以捕捉不同时间范围内多样化的模式。此外,构建了一个包含多种数据增强方法的库,以提升表示质量。
Experiments on tens of real-world datasets from various domains show that i) our learned representations are general to many downstream tasks, such as classification, clustering, and anomaly detection; ii) the proposed method outperforms existing URL competitors and can be comparable to (even better than) tailored techniques for classification and clustering. Additionally, we study the effectiveness of the key components proposed in CSL and the model sensitivity to the key parameters, demonstrate the superiority of CSL against the fully-supervised competitors on partially labeled data, and explain the shapelets learned by CSL. We also study our method in long time series representation and assess its running time. 对来自多个领域的数十个真实世界数据集的实验表明,i) 我们学习到的表示具有广泛的通用性,适用于许多下游任务,如分类、聚类和异常检测;ii) 提出的的方法在分类和聚类任务上优于现有 URL 竞争方法,甚至可与针对这些任务的专用技术相媲美。此外,我们研究了 CSL 中关键组件的有效性及模型对关键参数的敏感性,证明了 CSL 在部分标注数据上优于全监督学习方法,并解释了 CSL 学习到的 Shapelet 特征。我们还探讨了方法在长时序数据表示中的应用,并评估了其运行时间。
2 RELATED WORK 2 相关研究
There are two lines of research closely related to this paper: Unsupervised MTS representation learning. Unlike in domains such as CV [8,27,29,55] and NLP [12,65], the study of URL in time series is still in its infancy. 与本文密切相关的有两条研究线索:无监督多任务表示学习。与计算机视觉(CV)[8,27,29,55]和自然语言处理(NLP)[12,65]等领域不同,时间序列中 URL 的研究仍处于初级阶段。
Inspired by word representation [36], Franceschi et al. [11] adapts the triplet loss to time series to achieve URL. Similarly, Zerveas et al. [60] explores the utility of transformer [51] for URL due to the success of transformer in modeling natural language. Oord et al. [39] proposes to learn the representation by predicting the future in latent space. Eldele et al. [10] extends this idea by conducting both temporal and contextual contrasting to improve the representation quality. Instead of using prediction, Yue et al. [59] combines timestamp-level contrasting with contextual contrasting to achieve hierarchical representation. Tonekaboni et al. [46] assumes consistency between overlapping temporal neighborhoods to model dynamic latent states, while Yang and Hong [56] utilizes the consistency between temporal and spectral domains to enrich the representation. Although these methods have achieved improvements in representation quality, they still have limitations such as the lack of intuitions in encoder design and the dependency on specific assumptions, as discussed in Section 1. 受词表示[36]的启发,Franceschi 等[11]将三元组损失函数适应于时间序列以实现 URL。类似地,Zerveas 等[60]探索了 Transformer[51]在 URL 中的应用,因为 Transformer 在建模自然语言方面取得了成功。Oord 等[39]提出通过在潜在空间中预测未来来学习表示。Eldele 等[10]在此基础上进一步进行时空对比,以提升表示质量。与预测不同,Yue 等[59]将时间戳级对比与上下文对比相结合,实现分层表示。Tonekaboni 等[46]假设重叠时间邻域之间的一致性来建模动态潜在状态,而 Yang 和 Hong[56]则利用时间域与频谱域之间的一致性来丰富表示。尽管这些方法在表示质量上取得了进步,但它们仍存在局限性,如编码器设计缺乏直观性及对特定假设的依赖,如第 1 节所讨论的。
Time series shapelet. The concept of shapelet is first proposed by Ye and Keogh [58] for supervised time series classification tasks. It focuses on extracting features in a notable time range to reduce the interference of noise, which is prevalent in time series. 时间序列形状特征。形状特征的概念首次由 Ye 和 Keogh[58]提出,用于监督式时间序列分类任务。该方法专注于在显著的时间范围内提取特征,以减少时间序列中普遍存在的噪声干扰。
In the early studies, shapelets are selected by enumerating subsequences of the training time series [5, 17, 38, 58], which suffers from non-optimal representation and high computational overhead [13]. To address these problems, a shapelet learning method is first proposed by Grabocka et al. [13], which directly learns the optimal shapelets through a supervised objective. After this study, many approaches [30, 33, 34, 54] have been proposed to improve the effectiveness and efficiency for classification. Except for supervised classification task, some works [47, 61, 62] employ shapelets for time series clustering and also show competitive performance. 在早期研究中,通过枚举训练时间序列的子序列来选择形状片段[5, 17, 38, 58],这种方法存在表示不优和计算开销高的缺点[13]。为解决这些问题,Grabocka 等[13]首次提出了一种形状片段学习方法,通过监督目标直接学习最优形状片段。此后,许多方法[30, 33, 34, 54]被提出以提升分类任务的有效性和效率。除监督分类任务外,部分研究[47, 61, 62]将形状片应用于时间序列聚类,并展现出竞争性性能。
ShapeNet [25] is a special work related to both URL and shapelet. However, it aims to “select” shapelets from existing candidates for MTS classification, while it just adopts a CNN-based URL method extended from [11] to assist the selection. It even contains a supervised feature selection step which uses the true labels. Instead, both our CSL and other URL methods target a different problem that is to automatically “learn” the new features not present in existing feature set without using labels to tackle more than one task. ShapeNet [25] 是一项与 URL 和 shapelet 均相关的特殊研究。然而,其目标是“从现有候选特征中选择 shapelet”用于 MTS 分类,而它仅采用了基于 CNN 的 URL 方法(扩展自 [11])来辅助选择。该方法甚至包含一个基于真实标签的监督式特征选择步骤。相反,我们的 CSL 方法和其他 URL 方法针对的是一个不同的问题,即在不使用标签的情况下自动“学习”现有特征集中不存在的新的特征,以解决多个任务。
In summary, although shapelet-based representation has been widely studied for classification and clustering tasks, it has never been explored for the unsupervised learning of general-purpose representations facilitating various tasks as our CSL. 综上所述,尽管基于形状的表示方法在分类和聚类任务中已被广泛研究,但从未被用于无监督学习通用表示,以支持如我们提出的 CSL 等多种任务。
3 PROBLEM STATEMENT 3 问题陈述
This section defines the key concept used in the paper. At first, we define the data type we are interested in, multivariate time series. 本节定义了本文中使用的关键概念。首先,我们定义了本文关注的数据类型,即多变量时间序列。
Definition 3.1 (Multivariate Time Series). Multivariate time series (MTS) is a set of variables, each including observations ordered by successive time. Formally, we denote a multivariate time series sample with DD variables (a.k.a. dimensions or channels) and TT timestamps (a.k.a. length) as x inR^(D xx T)x \in \mathbb{R}^{D \times T}, and a dataset containing NN samples as X={x_(1),x_(2),dots,x_(N)}inR^(N xx D xx T)X=\left\{x_{1}, x_{2}, \ldots, x_{N}\right\} \in \mathbb{R}^{N \times D \times T}. 定义 3.1(多变量时间序列)。多变量时间序列(MTS)是一组变量,每个变量包含按时间顺序排列的观测值。形式上,我们用 DD 个变量(又称维度或通道)和 TT 个时间戳(又称长度)表示多变量时间序列样本,记为 x inR^(D xx T)x \in \mathbb{R}^{D \times T} ,而包含 NN 个样本的数据集记为 X={x_(1),x_(2),dots,x_(N)}inR^(N xx D xx T)X=\left\{x_{1}, x_{2}, \ldots, x_{N}\right\} \in \mathbb{R}^{N \times D \times T} 。
Then, the problem that we are addressing, i.e., unsupervised representation learning for MTS, is formulated as follows. 然后,我们所解决的问题,即多任务学习(MTS)中的无监督表示学习,被正式表述如下。
Definition 3.2 (Unsupervised Representation Learning for MTS). Given an MTS dataset X, the goal of unsupervised representation learning (URL) is to train a neural network model (encoder) ff : R^(D xx T)|->R^("Drepr ")\mathbb{R}^{D \times T} \mapsto \mathbb{R}^{\text {Drepr }}, such that the representation z_(i)=f(x_(i))z_{i}=f\left(x_{i}\right) can be informative for downstream tasks, e.g., classification and anomaly detection. Here unsupervised means that the labels of downstream tasks are unavailable when training ff. To simplify the notation, we denote Z=f(X)={z_(1),z_(2),dots,z_(N)}Z=f(X)=\left\{z_{1}, z_{2}, \ldots, z_{N}\right\} in following sections. 定义 3.2(无监督表示学习用于多任务学习)。给定一个多任务学习数据集 X,无监督表示学习(URL)的目标是训练一个神经网络模型(编码器) ff : R^(D xx T)|->R^("Drepr ")\mathbb{R}^{D \times T} \mapsto \mathbb{R}^{\text {Drepr }} ,使得表示 z_(i)=f(x_(i))z_{i}=f\left(x_{i}\right) 对下游任务(如分类和异常检测)具有信息性。 ff 这里“无监督”意味着在训练编码器时,下游任务的标签不可用。为了简化记号,我们在后续章节中用 Z=f(X)={z_(1),z_(2),dots,z_(N)}Z=f(X)=\left\{z_{1}, z_{2}, \ldots, z_{N}\right\} 表示。
It is worthy to note that some works limit “unsupervised (representation) learning” to the unsupervised tasks (e.g., clustering [61]), so the competitors are only the unsupervised methods. Instead, the URL problem mentioned in this paper is to learn the features that can not only tackle the unsupervised tasks, but also achieve comparable performance to the supervised competitors on the classification task, which can be more general yet challenging. 值得注意的是,部分研究将“无监督(表示)学习”限定于无监督任务(如聚类[61]),因此其竞争对手仅限于无监督方法。而本文提到的 URL 问题则是学习能够不仅解决无监督任务,还能在分类任务中与监督学习方法取得相当性能的特征,这在更具通用性且更具挑战性的场景下更为重要。
4 METHODOLOGY 4 方法论
In this section, the proposed framework and all components are elaborated. 在本节中,将详细阐述所提出的框架及其所有组成部分。
4.1 Overview 4.1 概述
We illustrate the overview framework of the proposed contrastive shapelet learning (CSL) in Fig. 1. Given the input XX, two data augmentation methods, denoted as A^(')(x)A^{\prime}(x) and A^('')(x)A^{\prime \prime}(x), are randomly selected from a library (to be discussed later) to produce two correlated views of XX as X^(')=A^(')(X)X^{\prime}=A^{\prime}(X) and X^('')=A^('')(X)X^{\prime \prime}=A^{\prime \prime}(X), where A^(')(X)={A^(')(x_(1)),dots,A^(')(x_(N))}A^{\prime}(X)=\left\{A^{\prime}\left(x_{1}\right), \ldots, A^{\prime}\left(x_{N}\right)\right\} and the same to A^('')(X)A^{\prime \prime}(X). Then these two views are fed into a time-series-specific encoder named Shapelet Transformer (ST), which embeds the samples into a latent space (see Section 4.2). CSL explores the representation in this latent space where different shapelets serve as the basis (see Section 4.3 我们通过图 1 展示了所提出的对比性形状学习(CSL)的总体框架。给定输入数据集 XX ,从一个库(稍后讨论)中随机选择两种数据增强方法,分别记为 A^(')(x)A^{\prime}(x) 和 A^('')(x)A^{\prime \prime}(x) ,以生成 XX 的两个相关视图,即 X^(')=A^(')(X)X^{\prime}=A^{\prime}(X) 和 X^('')=A^('')(X)X^{\prime \prime}=A^{\prime \prime}(X) ,其中 A^(')(X)={A^(')(x_(1)),dots,A^(')(x_(N))}A^{\prime}(X)=\left\{A^{\prime}\left(x_{1}\right), \ldots, A^{\prime}\left(x_{N}\right)\right\} 与 A^('')(X)A^{\prime \prime}(X) 相同。随后,这两个视图被输入到一个专门针对时间序列的编码器——Shapelet Transformer(ST)中,该编码器将样本嵌入到一个潜在空间中(见第 4.2 节)。CSL 在这个潜在空间中探索表示,其中不同的 shapelets 作为基础(见第 4.3 节)。
Figure 1: Overview framework of CSL. 图 1:CSL 的总体框架。
and 4.4). We believe that our method is more general as it does not depend on task-specific assumptions like [11, 46, 59]. 以及 4.4)。我们认为我们的方法更具普适性,因为它不依赖于任务特定的假设,如 [11, 46, 59]。
Formally, given X^(')X^{\prime} and X^('')X^{\prime \prime}, we have the shapelet-based representations as: 形式上,给定 X^(')X^{\prime} 和 X^('')X^{\prime \prime} ,我们得到基于 shapelet 的表示为:
Following the paradigm of contrastive learning [8, 10, 64], for each x_(i)x_{i}, the embedding z_(i)^(')z_{i}^{\prime} should be close to z_(i)^('')z_{i}^{\prime \prime} whereas far away from z_(j)^('')z_{j}^{\prime \prime} derived from other samples where j!=ij \neq i. The encoder is learned through maximizing the similarity of the positive pairs ( z_(i)^('),z_(i)^('')z_{i}^{\prime}, z_{i}^{\prime \prime} ) and minimizing the similarity of the negative pairs (z_(i)^('),z_(j)^(''))\left(z_{i}^{\prime}, z_{j}^{\prime \prime}\right). Note that using data augmentation to generate the positive pairs is a common way for contrastive learning which is required by most URL methods, including TS2Vec [59], TS-TCC [10], etc [56, 60]. Alternatively, T-Loss [11] and TNC [46] select subsequences as positive samples. Both augmented and sampled time series serve as the self-supervised signals in URL, which plays the similar role as the labels used by the supervised methods (e.g., OSCNN [45]). 遵循对比学习的范式[8, 10, 64],对于每个 x_(i)x_{i} ,嵌入 z_(i)^(')z_{i}^{\prime} 应与 z_(i)^('')z_{i}^{\prime \prime} 接近,同时与 z_(j)^('')z_{j}^{\prime \prime} (来自其他样本,其中 j!=ij \neq i )保持距离。编码器通过最大化正样本对( z_(i)^('),z_(i)^('')z_{i}^{\prime}, z_{i}^{\prime \prime} )的相似性并最小化负样本对 (z_(i)^('),z_(j)^(''))\left(z_{i}^{\prime}, z_{j}^{\prime \prime}\right) 的相似性进行学习。需要注意的是,使用数据增强生成正样本对是对比学习的常见方法,而对比学习是大多数 URL 方法(包括 TS2Vec [59]、TS-TCC [10]等[56, 60])所要求的。此外,T-Loss [11] 和 TNC [46] 通过选择子序列作为正样本。在 URL 中,增强后的序列和采样序列均作为自监督信号,其作用与监督学习方法中使用的标签(如 OSCNN [45])类似。
Despite the success of contrastive learning in URL [10, 56, 59, 64], an open question is how to determine proper data augmentation methods to ensure representation similarity of positive samples [8], which could be data- and model-dependent [22]. It is beyond the scope of this paper to develop new augmentation techniques or augmentation selection algorithms. Instead, we construct a data augmentation library which contains diverse types of methods for the random selection at each training step (illustrated in Fig. 1), so that they can be complementary with each other to adapt to various time series data. The library consists of five well-established time series augmentation methods, including jittering J(x)J(x) that adds random noise to each observation, cropping C(x)C(x) that crops the time series into a randomly selected subsequence, time warping TW(x)T W(x) that stretches or contracts the randomly selected subsequences, quantizing Q(x)Q(x) that quantizes each observation to the nearest level, and pooling P(x)P(x) that reduces the temporal resolution using average pooling on each consecutive observations. We illustrate how they are performed using the running examples in Fig. 2, and we refer interested readers to [20,48] for more details. 尽管对比学习在 URL 领域取得了成功[10, 56, 59, 64],但如何确定合适的数据增强方法以确保正样本的表示相似性[8]仍是一个开放问题,且该问题可能与数据和模型相关[22]。本文不探讨开发新的增强技术或增强选择算法。相反,我们构建了一个数据增强库,其中包含多种类型的增强方法,可在每个训练步骤中随机选择(如图 1 所示),以便它们能够互补,适应各种时间序列数据。该库包含五种成熟的时间序列增强方法:jittering( J(x)J(x) ),即在每个观测值中添加随机噪声;cropping( C(x)C(x) ),即裁剪时间序列为随机选取的子序列;time warping( TW(x)T W(x) ),即拉伸或压缩随机选取的子序列;quantizing( Q(x)Q(x) ),即将每个观测值量化到最近的级别;以及 pooling( P(x)P(x) ),即通过对连续观测值进行平均池化来降低时间分辨率。我们通过图 2 中的示例说明了这些操作的实现方式,并建议感兴趣的读者查阅[20,48]以获取更多细节。
The encoder ST is designed to capture the patterns of different time scales using separated shapelets with different lengths. Thus, we propose a multi-grained contrasting objective to simultaneously perform contrastive learning on the shapelet-based embedding of every single scale (fined-grained contrasting) and the representations in the joint space R^("D "_("repr "))\mathbb{R}^{\text {D }_{\text {repr }}} regarding all scales (coarse-grained 编码器 ST 旨在通过使用不同长度的分离式形状片(shapelets)捕获不同时间尺度的模式。因此,我们提出了一个多粒度对比目标,以同时在每个单一尺度的形状片嵌入(细粒度对比)和所有尺度在联合空间 R^("D "_("repr "))\mathbb{R}^{\text {D }_{\text {repr }}} 中的表示上进行对比学习。
Figure 2: Illustration of the data augmentation methods using a two-dimensional time series. All methods are identically performed on each dimension of the original time series. 图2:使用二维时间序列的数据增强方法示意图。所有方法均在原始时间序列的每个维度上以相同方式执行。
contrasting). Additionally, inspired by the consensus principle in multi-view learning [53], we design a multi-scale alignment term to encourage the features at different scales to achieve agreement. 对比)。此外,受多视图学习中共识原则的启发[53],我们设计了一个多尺度对齐项,以鼓励不同尺度特征之间的协调一致。
In the rest of this section, we elaborate the key components of the proposed CSL framework, including Shapelet Transformer, multi-grained contrasting, and multi-scale alignment. 在本节的剩余部分中,我们将详细阐述所提出的 CSL 框架的关键组件,包括 Shapelet Transformer、多粒度对比和多尺度对齐。
4.2 Shapelet Transformer 4.2 形状变换器
Shapelet is originally designed to extract representative shape features of univariate time series [58]. In this paper, we simply extend it to a more general multivariate case. Given a sample x inR^(D xx T)x \in \mathbb{R}^{D \times T}, a multivariate shapelet s inR^(D xx L)(L < T)s \in \mathbb{R}^{D \times L}(L<T) which has the same dimension DD encodes xx using the Euclidean norm between ss and the best-matching subsequence relative to ss within xx, defined as: Shapelet 最初是为提取一维时间序列的代表性形状特征而设计的[58]。本文将其简单扩展到更一般的多维情况。给定一个样本 x inR^(D xx T)x \in \mathbb{R}^{D \times T} ,一个多维 Shapelet s inR^(D xx L)(L < T)s \in \mathbb{R}^{D \times L}(L<T) 具有相同的维度 DD ,通过在 xx 内定义的欧几里得范数,将 xx 编码为: ss 与 ss 的最佳匹配子序列之间的欧几里得范数。
where x[t,L]x[t, L] denotes the subsequence of xx starting at timestamp tt and lasting LL steps. By taking ss as trainable parameter, we can directly learn the optimal shapelet like any neural network using the optimization algorithm such as stochastic gradient descent (SGD) [33]. However, it is difficult to capture patterns beyond shapes for the original definition in Eq. (2), such as the spectral information in the frequency domain, which can limit the capability of the shapelet-based encoder. To address this problem, we extend the representation to a general form as: 其中, x[t,L]x[t, L] 表示从时间戳 tt 开始,持续 LL 步的 xx 子序列。通过将 ss 作为可训练参数,我们可以像任何神经网络一样,使用优化算法(如随机梯度下降[SGD])直接学习最优的 shapelet。然而,原始定义(式(2))难以捕捉形状之外的模式,例如频域中的谱信息,这可能限制基于形状片段编码器的性能。为解决此问题,我们将表示扩展为一般形式:
where x^(j)(s^(j))x^{j}\left(s^{j}\right) represents the series (shapelet) at jj-th dimension, and agg_(d)\operatorname{agg}_{d} is the aggregator that produces the result of dd between the most similar pair of ( x[i,L],sx[i, L], s ). d(*,*)d(\cdot, \cdot) can be any (dis)similarity measure for equal-length series. Eq. (2) is obviously a special case of Eq. (3) when dd is Euclidean norm, and gg corresponds to 1-D convolution when dd is the cross-correlation function [9]. 其中, x^(j)(s^(j))x^{j}\left(s^{j}\right) 表示在维度 jj 处的序列(shapelet), agg_(d)\operatorname{agg}_{d} 是聚合器,用于计算 dd 在最相似的 x[i,L],sx[i, L], s 对之间的结果。 d(*,*)d(\cdot, \cdot) 可以是任何用于等长序列的相似性或差异性度量。式(2)显然是式(3)的特例,当 dd 为欧几里得范数时,且当 dd 为交叉相关函数时, gg 对应于 1 维卷积[9]。
Based on Eq. (3), we design a shapelet-based encoder named Shapelet Transformer (ST), which is shown in Fig. 3. To extract diverse temporal patterns, ST is a combination of multiple submodules with shapelets of RR various lengths (scales) and MM different (dis)similarity measures. The core idea comes from the observations that i) time series could possess both short-term and longterm patterns in practice [45, 57], and ii) different measures can be complementary with each other to produce more informative features [31]. However, our design is eventually different from existing approaches since we simultaneously consider these two aspects in a unified shapelet-based architecture. 基于式(3),我们设计了一种基于形状的编码器,命名为形状变换器(Shapelet Transformer,ST),如图 3 所示。为了提取多样化的时序模式,ST 由多个子模块组成,这些子模块使用具有不同长度(尺度)的形状( RR )和不同(不)相似度度量( MM )的形状。核心思想源于以下观察:i) 时间序列在实际中可能同时存在短期和长期模式[45, 57],ii) 不同度量方法可以互补,从而生成更具信息量的特征[31]。然而,我们的设计与现有方法存在根本差异,因为我们首次在统一的形状特征架构中同时考虑了这两个方面。
Figure 3: Architecture of Shapelet Transformer (ST) 图 3:Shapelet Transformer(ST)的架构
We denote the sub-module with shapelets of length L_(r)L_{r} (scale rr ) and (dis)similarity measure d_(m)d_{m} as f_(r,m)f_{r, m}. Each f_(r,m)f_{r, m} has VV shapelets s_(r,m,1),dots,s_(r,m,V)\boldsymbol{s}_{r, m, 1}, \ldots, \boldsymbol{s}_{r, m, V} that separately embed the input. The outputs of all sub-modules for one sample are concatenated to jointly represent the sample. Formally, the encoder ST is defined as: 我们用 f_(r,m)f_{r, m} 表示由长度为 L_(r)L_{r} (尺度 rr )的 shapelets 组成的子模块,其相似性度量为 d_(m)d_{m} 。每个 f_(r,m)f_{r, m} 包含 VV 个 shapelets s_(r,m,1),dots,s_(r,m,V)\boldsymbol{s}_{r, m, 1}, \ldots, \boldsymbol{s}_{r, m, V} ,这些 shapelets 分别嵌入输入数据。对于一个样本,所有子模块的输出被拼接在一起以共同表示该样本。形式上,编码器 ST 定义为:
where o+\oplus is the concatenation operator, z_(i,r)inR^(K)(K=MV)z_{i, r} \in \mathbb{R}^{K}(K=M V) denotes the representation of x_(i)x_{i} at scale rr, and f_(r,m,v)(x_(i))=g(x_(i),s_(r,m,v),d_(m))f_{r, m, v}\left(x_{i}\right)=g\left(x_{i}, s_{r, m, v}, d_{m}\right). 其中, o+\oplus 表示连接运算符, z_(i,r)inR^(K)(K=MV)z_{i, r} \in \mathbb{R}^{K}(K=M V) 表示 x_(i)x_{i} 在尺度 rr 下的表示,且 f_(r,m,v)(x_(i))=g(x_(i),s_(r,m,v),d_(m))f_{r, m, v}\left(x_{i}\right)=g\left(x_{i}, s_{r, m, v}, d_{m}\right) 。
Note that although extracting multi-scale features is a widely adopted idea for time series [45, 57], the proposed Shapelet Transformer provides a simple yet effective way to achieve contrastive learning for the features of different scales (Section 4.3) and to maximize the agreement of the scales (Section 4.4), which we show essential for improving the performance (Section 5.3). The reason is that we simply concatenate the features encoded by each of the shapelets, and thus the representations of different scales can be separated from each other, while the features of existing multi-scale networks [45, 57] are usually fused through complicated layer-bylayer structures. Moreover, integrating multiple measures into the multi-scale shapelet-based architecture is also a simple yet effective idea for improving the URL performance (Section 5.3). 请注意,尽管提取多尺度特征是时间序列分析中广泛采用的思路[45, 57],但本文提出的 Shapelet Transformer 提供了一种简单而有效的方法,能够实现不同尺度特征的对比学习(第 4.3 节)并最大化尺度间的一致性(第 4.4 节),我们证明这些对提升性能至关重要(第 5.3 节)。其原因是,我们简单地将每个 Shapelet 编码的特征进行拼接,从而使不同尺度的表示相互分离,而现有多尺度网络[45, 57]中的特征通常通过复杂的分层结构进行融合。此外,将多种度量指标整合到多尺度 Shapelet 架构中,也是提升 URL 性能的简单而有效的方法(第 5.3 节)。
Shapelet Transformer is a unified architecture that can be flexibly changed by varying R,M,V,L_(r),d_(m)R, M, V, L_{r}, d_{m} if one has prior knowledge, such as the time scale of the MTS patterns. Considering that prior knowledge is not always easy to access, we introduce a general configuration of the model structure. Specifically, we fix RR to a moderate value of 8 and adaptively set L_(r)L_{r} as the evenly spaced numbers over [0.1 T,0.8 T][0.1 T, 0.8 T], i.e., L_(r)=r xx0.1 T(r in{1,dots,R})L_{r}=r \times 0.1 T(r \in\{1, \ldots, R\}), to approximately match the patterns from short to long term. Three widely adopted (dis)similarity measures are considered, including Euclidean distance ( d_(1)d_{1} ), cosine similarity ( d_(2)d_{2} ) and cross correlation ( d_(3)d_{3} ). As such, given the dimension D_("repr ")D_{\text {repr }} of the output embedding, the encoder structure can be automatically determined. Shapelet Transformer 是一种统一的架构,可以通过调整 R,M,V,L_(r),d_(m)R, M, V, L_{r}, d_{m} 来灵活改变,如果用户拥有先验知识,例如 MTS 模式的时间尺度。考虑到先验知识并不总是容易获取,我们引入了模型结构的通用配置。具体而言,我们将 RR 固定为适中的值 8,并自适应地将 L_(r)L_{r} 设置为 [0.1 T,0.8 T][0.1 T, 0.8 T] 上的均匀间隔数,即 L_(r)=r xx0.1 T(r in{1,dots,R})L_{r}=r \times 0.1 T(r \in\{1, \ldots, R\}) ,以大致匹配从短期到长期的模式。考虑三种广泛采用的(不)相似性度量,包括欧几里得距离( d_(1)d_{1} )、余弦相似度( d_(2)d_{2} )和交叉相关性( d_(3)d_{3} )。因此,给定输出嵌入的维度 D_("repr ")D_{\text {repr }} ,编码器结构可自动确定。
4.3 Multi-grained Contrasting 4.3 多粒度对比
After encoding the training samples, we employ contrastive learning, a popular paradigm that has achieved success in URL, to learn the Shapelet Transformer. The principle of contrastive learning is to pull close the positive pairs and push apart the negative pairs in the embedding space. In this paper, we adopt the InfoNCE loss [39] to separate positive from negative samples because it is one of the most popular loss functions in contrastive learning which has been widely shown effective [15, 39, 41], but the contrastive loss in any 在对训练样本进行编码后,我们采用对比学习(Contrastive Learning)这一在 URL 领域取得显著成效的流行范式,来学习 Shapelet Transformer。对比学习的核心原理是在嵌入空间中将正样本对拉近,负样本对推开。本文采用 InfoNCE 损失函数[39]来区分正样本与负样本,因为它是对比学习中最为流行的损失函数之一,且已被广泛证明有效[15, 39, 41]。然而,任何对比损失在
Figure 4: Illustration of multi-grained contrasting and multiscale alignment. Display one shapelet at each scale for clarity. 图4:多粒度对比与多尺度对齐的示意图。为清晰起见,每个尺度仅显示一个形状片。
other form can also fit in our framework. Given an embedding z_(i)z_{i} of a sample x_(i)x_{i} and a set ZZ that contains the embeddings of one positive sample x_(i)^(+)x_{i}^{+}and N-1N-1 negative samples of x_(i)x_{i}, the contrastive loss is defined as: 其他形式也可以融入我们的框架。给定一个样本嵌入 z_(i)z_{i} ,其中 x_(i)x_{i} 为样本, ZZ 为包含正样本 x_(i)^(+)x_{i}^{+} 和负样本 N-1N-1 的嵌入集合,来自 x_(i)x_{i} 的负样本,对比损失定义为:
where z_(i)^(+)z_{i}^{+}is the embedding of x_(i)^(+),sim(u,v)=u*v//||u||||v||x_{i}^{+}, \operatorname{sim}(\boldsymbol{u}, \boldsymbol{v})=\boldsymbol{u} \cdot \boldsymbol{v} /\|\boldsymbol{u}\|\|\boldsymbol{v}\| is the cosine similarity, and tau\tau is a temperature parameter that controls the strength of penalties on hard negative samples [39]. 其中, z_(i)^(+)z_{i}^{+} 是 x_(i)^(+),sim(u,v)=u*v//||u||||v||x_{i}^{+}, \operatorname{sim}(\boldsymbol{u}, \boldsymbol{v})=\boldsymbol{u} \cdot \boldsymbol{v} /\|\boldsymbol{u}\|\|\boldsymbol{v}\| 的嵌入向量, 是 cosine similarity,而 tau\tau 是控制对硬负样本惩罚强度的温度参数 [39]。
Recall that our Shapelet Transformer embeds one MTS sample into RR separate embedding spaces which capture temporal features at different time scales. Thus, we propose a multi-grained contrasting objective that explicitly considers not only the joint embedding space of the RR scales, but also the latent space for each single scale, as illustrated in Fig. 4. Specifically, we consider contrastive learning in the joint embedding space of all scales as the coarse-grained contrasting. The loss is defined as: 回顾一下,我们的 Shapelet Transformer 将一个 MTS 样本嵌入到 RR 个独立的嵌入空间中,这些空间捕获了不同时间尺度上的时序特征。因此,我们提出了一个多粒度对比目标,该目标不仅显式考虑了 RR 个尺度的联合嵌入空间,还考虑了每个单一尺度的潜在空间,如图 4 所示。具体而言,我们将所有尺度联合嵌入空间中的对比学习视为粗粒度对比。损失函数定义为:
In parallel, the fine-grained contrasting is performed for embedding at each time scale r in{1,dots,R}r \in\{1, \ldots, R\}, defined as: 同时,对每个时间尺度 r in{1,dots,R}r \in\{1, \ldots, R\} 进行细粒度对比,定义为:
One may wonder why the coarse-grained contrasting is required given that the optimal representations at each single scale seem to be learned using the fine-grained losses. The intuition is that with only the fine-grained contrasting, the learning process could be dominated by some scales, i.e., the embeddings of some scales are well learned but the others are not, such that the joint embedding is not optimal. Thus, we design the coarse-grained loss to explicitly encourage feature similarity in the joint embedding space, so that to “balance” the representation quality of each scale to improve the final performance. We further clarify this issue in Section 5.3. 人们可能会疑惑,既然在每个单一尺度上似乎都能通过精细粒度损失学习到最优表示,为何还需要粗粒度对比?其直观原因是,仅依赖精细粒度对比时,学习过程可能被某些尺度主导,即某些尺度的嵌入被很好地学习,而其他尺度则未被充分学习,导致联合嵌入并非最优。因此,我们设计了粗粒度损失,以显式鼓励联合嵌入空间中的特征相似性,从而“平衡”各尺度表示质量,提升最终性能。我们在第5.3节中进一步阐述了这一问题。
4.4 Multi-scale Alignment 4.4 多尺度对齐
As illustrated in Fig. 4, the joint embedding space R^(D_("repr "))\mathbb{R}^{D_{\text {repr }}} ( D_("repr ")=D_{\text {repr }}=RKR K ) is composed by the space R^(K)\mathbb{R}^{K} of each single scale. For one time series, the features of different scales are extracted using shapelets of different lengths, and thus can be seen as different views of the sample (similar to images of a 3-D object taken from different viewpoints [53]). From the perspective of multi-view learning, the representations of different lengths have not only complementary information but also consensus because the shapelets of different lengths can match some correlated time regions. Since we have leveraged the complementary information by using the joint embeddings, we propose to enhance the consensus over different scales, which can help to reduce the error rate of each view (scale) [53]. 如图 4 所示,联合嵌入空间 R^(D_("repr "))\mathbb{R}^{D_{\text {repr }}} ( D_("repr ")=D_{\text {repr }}=RKR K )由每个单一尺度的空间 R^(K)\mathbb{R}^{K} 组成。对于一个时间序列,通过使用不同长度的 shapelets 提取不同尺度的特征,因此可以视为样本的不同视图(类似于从不同视角拍摄的 3D 物体图像[53])。从多视图学习的角度来看,不同长度的表示不仅具有互补信息,还存在共识,因为不同长度的形状片可以匹配一些相关的时域区域。由于我们通过联合嵌入利用了互补信息,我们提出通过增强不同尺度间的共识来提升各视图(尺度)的准确率[53]。
Inspired by Canonical Correlation Analysis (CCA) [1], we design a multi-scale alignment strategy to promote the consensus. The basic idea is to encourage the embeddings of one sample for different scales to be maximally correlated. Formally, given representations Z_(1),Z_(2),dots,Z_(R)inR^(N xx K)Z_{1}, Z_{2}, \ldots, Z_{R} \in \mathbb{R}^{N \times K} that have been column-wise normalized, the objective is minimizing the L_(2)L_{2} distance between each orthogonal features and their mean centers: 受典型相关分析(CCA)[1]的启发,我们设计了一种多尺度对齐策略以促进共识。基本思想是鼓励同一样本在不同尺度上的嵌入向量之间具有最大相关性。具体而言,给定已进行列规范化的表示 Z_(1),Z_(2),dots,Z_(R)inR^(N xx K)Z_{1}, Z_{2}, \ldots, Z_{R} \in \mathbb{R}^{N \times K} ,目标是最小化每个正交特征与其均值中心之间的 L_(2)L_{2} 距离:
where bar(Z)=(1)/(R)sum_(r=1)^(R)Z_(r)\bar{Z}=\frac{1}{R} \sum_{r=1}^{R} Z_{r} are the mean centers of the representations. 其中, bar(Z)=(1)/(R)sum_(r=1)^(R)Z_(r)\bar{Z}=\frac{1}{R} \sum_{r=1}^{R} Z_{r} 是表示的均值中心。
Eq. (9) has orthogonality constraints thus cannot be optimized end-to-end with other objective functions, which could limit its effectiveness [6]. Inspired by the soft decorrelation method proposed in [6], we formulate the orthogonality constraints as a loss function to achieve end-to-end learning. The core idea is to approximate the full-batch covariance matrix at each training step by stochastic incremental learning and encourage sparsity in the off-diagonal elements of the approximated covariance matrix using L_(1)L_{1} regularization. Consider the mini-batch representation Z_(B)inR^(B xx K)Z_{B} \in \mathbb{R}^{B \times K} of size BB which has been batch normalized [21]. At tt-th training step, we compute the mini-batch covariance matrix C_(B)^(t)=(1)/(B-1)Z_(B)^(T)Z_(B)C_{B}^{t}=\frac{1}{B-1} Z_{B}^{T} Z_{B} and an accumulative covariance matrix over each mini batch as: 式(9)存在正交性约束,因此无法与其他目标函数进行端到端优化,这可能限制其有效性[6]。受[6]中提出的软去相关方法启发,我们将正交性约束转化为损失函数,以实现端到端学习。核心思想是在每个训练步骤中通过随机增量学习近似全批次协方差矩阵,并通过 L_(1)L_{1} 正则化鼓励近似协方差矩阵非对角元素的稀疏性。考虑批量归一化后的 Z_(B)inR^(B xx K)Z_{B} \in \mathbb{R}^{B \times K} 表示,其大小为 BB 。在 tt 第 训练步骤中,我们计算小批量协方差矩阵 C_(B)^(t)=(1)/(B-1)Z_(B)^(T)Z_(B)C_{B}^{t}=\frac{1}{B-1} Z_{B}^{T} Z_{B} 以及每个小批量的累积协方差矩阵,具体为:
C_(accu)^(t)=alphaC_(accu)^(t-1)+C_(B)^(t)C_{a c c u}^{t}=\alpha C_{a c c u}^{t-1}+C_{B}^{t}
where alpha in[0,1)\alpha \in[0,1) is a forgetting/decay rate, and C_("accu ")^(0)C_{\text {accu }}^{0} is initialized as all-zero matrix. As such, the full-batch covariance matrix C^(t)=Z^(T)ZC^{t}=Z^{T} Z can be approximated by hat(C)^(t)\hat{C}^{t} as: 其中, alpha in[0,1)\alpha \in[0,1) 表示遗忘/衰减率, C_("accu ")^(0)C_{\text {accu }}^{0} 初始化为全零矩阵。因此,全批次协方差矩阵 C^(t)=Z^(T)ZC^{t}=Z^{T} Z 可近似为 hat(C)^(t)\hat{C}^{t} ,具体表达式为:
hat(C)^(t)=(C_(accu)^(t))/(c^(t))\hat{C}^{t}=\frac{C_{a c c u}^{t}}{c^{t}}
where c^(t)=alphac^(t-1)+1c^{t}=\alpha c^{t-1}+1 is a normalizing factor with c^(0)=0c^{0}=0. 其中, c^(t)=alphac^(t-1)+1c^{t}=\alpha c^{t-1}+1 为归一化因子, c^(0)=0c^{0}=0 。
Given the approximate full-batch covariance matrix hat(C)^(t)\hat{C}^{t} in Eq. (11), the orthogonality constraints Z^(T)Z=IZ^{T} Z=I can be achieved in a soft procedure by minimizing an L_(1)L_{1} loss in the off-diagonal elements to penalize the correlation. Denote the element of hat(C)^(t)\hat{C}^{t} at entry ( i,ji, j ) as phi_(i,j)^(t)\phi_{i, j}^{t}. The soft orthogonality loss is defined as: 给定近似全批次协方差矩阵 hat(C)^(t)\hat{C}^{t} 在式(11)中,正交约束 Z^(T)Z=IZ^{T} Z=I 可通过最小化 L_(1)L_{1} 损失在非对角元素中惩罚相关性,以软正则化方式实现。记 hat(C)^(t)\hat{C}^{t} 在元素( i,ji, j )处的元素为 phi_(i,j)^(t)\phi_{i, j}^{t} 。软正交损失定义为:
Based on Eq. (9) and (12), we define our multi-scale alignment loss on top of both Z^(')Z^{\prime} and Z^('')Z^{\prime \prime} as: 基于式(9)和(12),我们在 Z^(')Z^{\prime} 和 Z^('')Z^{\prime \prime} 的基础上定义了多尺度对齐损失,具体为:
where lambda_(S)\lambda_{S} controls the importance of the soft orthogonality loss. 其中, lambda_(S)\lambda_{S} 控制软正交性损失的重要性。
It is noteworthy that the idea behind L_(A)\mathcal{L}_{A} of aligning multi-scale information can be a general scheme for modeling MTS, which is worth further exploration in the future. 值得注意的是, L_(A)\mathcal{L}_{A} 提出的将多尺度信息进行对齐的思路,可以作为建模 MTS 的通用框架,这一思路值得在未来进行进一步探索。
4.5 Summary and Complexity Analysis 4.5 总结与复杂性分析
Summary. Based on the discussion in Section 4.3 and Section 4.4, the total loss of the proposed CSL is defined as: 摘要。基于第 4.3 节和第 4.4 节的讨论,所提 CSL 的总损失定义为:
where lambda\lambda controls the importance of multi-scale alignment. 其中 lambda\lambda 控制多尺度对齐的重要性。
The encoder (i.e., ST) f(x;theta)f(\boldsymbol{x} ; \boldsymbol{\theta}) is unsupervisedly trained by minimizing the loss L\mathcal{L} using the popular back-propogation algorithm [7], where theta\theta denotes trainable parameters which are updated within each mini batch. The learned encoder maps MTS into latent representation as z_(i)=f(x_(i);theta)z_{i}=f\left(\boldsymbol{x}_{i} ; \boldsymbol{\theta}\right), and z_(i)z_{i} is used for downstream tasks. 编码器(即 ST) f(x;theta)f(\boldsymbol{x} ; \boldsymbol{\theta}) 通过无监督学习,采用广受欢迎的反向传播算法[7]最小化损失 L\mathcal{L} ,其中 theta\theta 表示在每个小批次内更新的可训练参数。学习到的编码器将 MTS 映射为潜在表示 z_(i)=f(x_(i);theta)z_{i}=f\left(\boldsymbol{x}_{i} ; \boldsymbol{\theta}\right) ,而 z_(i)z_{i} 用于下游任务。
Complexity analysis. All data augmentation methods used in CSL take O(BTD)O(B T D) time for MTS samples in a mini batch of size BB [20]. The encoder ST takes O(BL_(S)(T-L_(S)+1)DD_("repr "))O\left(B L_{S}\left(T-L_{S}+1\right) D D_{\text {repr }}\right) time for embedding the time series into representations, where L_(S)L_{S} is the shapelet length. Recall that RK=D_("repr ")R K=D_{\text {repr }}. Therefore, both L_(C)\mathcal{L}_{C} and sum_(r=1)^(R)L_(F,r)\sum_{r=1}^{R} \mathcal{L}_{F, r} can be computed in O(B^(2)D_("repr "))O\left(B^{2} D_{\text {repr }}\right) time and computing sum_(r=1)^(R)||Z_(r)-( bar(Z))||_(F)^(2)\sum_{r=1}^{R}\left\|Z_{r}-\bar{Z}\right\|_{F}^{2} takes O(BD_("repr "))O\left(B D_{\text {repr }}\right) time. The computation of L_(S)(Z_(r))\mathcal{L}_{S}\left(\boldsymbol{Z}_{r}\right) takes O(B^(2)K)O\left(B^{2} K\right) time dominated by computing C_(B)^(t)=(1)/(B-1)Z_(B)^(T)Z_(B)C_{B}^{t}=\frac{1}{B-1} Z_{B}^{T} Z_{B}. Since each training epoch has |__(N)/(B)__|\left\lfloor\frac{N}{B}\right\rfloor batches, the total time complexity for training CSL is O(NTD+NL_(s)(T-L_(s)+1)DD_("repr ")+NBD_("repr "))O\left(N T D+N L_{s}\left(T-L_{s}+1\right) D D_{\text {repr }}+N B D_{\text {repr }}\right). Considering that BB and D_("repr ")D_{\text {repr }} are constants and we set L_(s)L_{s} to be proportional to TT, the time complexity can be simplified as O(NT^(2)D)O\left(N T^{2} D\right). Similarly, the space complexity for the CSL training algorithm is O(TD)O(T D). 复杂性分析。CSL 中使用的所有数据增强方法在大小为 BB 的迷你批次中处理 MTS 样本所需的时间为 O(BTD)O(B T D) [20]。编码器 ST 将时间序列嵌入表示所需的时间为 O(BL_(S)(T-L_(S)+1)DD_("repr "))O\left(B L_{S}\left(T-L_{S}+1\right) D D_{\text {repr }}\right) ,其中 L_(S)L_{S} 为 shapelet 长度。需注意 RK=D_("repr ")R K=D_{\text {repr }} 。因此, L_(C)\mathcal{L}_{C} 和 sum_(r=1)^(R)L_(F,r)\sum_{r=1}^{R} \mathcal{L}_{F, r} 均可在 O(B^(2)D_("repr "))O\left(B^{2} D_{\text {repr }}\right) 时间内计算,计算 sum_(r=1)^(R)||Z_(r)-( bar(Z))||_(F)^(2)\sum_{r=1}^{R}\left\|Z_{r}-\bar{Z}\right\|_{F}^{2} 所需时间为 O(BD_("repr "))O\left(B D_{\text {repr }}\right) 。计算 L_(S)(Z_(r))\mathcal{L}_{S}\left(\boldsymbol{Z}_{r}\right) 所需时间为 O(B^(2)K)O\left(B^{2} K\right) ,主要由计算 C_(B)^(t)=(1)/(B-1)Z_(B)^(T)Z_(B)C_{B}^{t}=\frac{1}{B-1} Z_{B}^{T} Z_{B} 主导。由于每个训练 epoch 包含 |__(N)/(B)__|\left\lfloor\frac{N}{B}\right\rfloor 批次,CSL 训练的总时间复杂度为 O(NTD+NL_(s)(T-L_(s)+1)DD_("repr ")+NBD_("repr "))O\left(N T D+N L_{s}\left(T-L_{s}+1\right) D D_{\text {repr }}+N B D_{\text {repr }}\right) 。考虑 BB 和 D_("repr ")D_{\text {repr }} 为常量,且设 L_(s)L_{s} 与 TT 成正比,时间复杂度可简化为 O(NT^(2)D)O\left(N T^{2} D\right) 。类似地,CSL 训练算法的空间复杂度为 O(TD)O(T D) 。
We compare the complexity of CSL to that of the advanced URL baselines in Table 1. CSL is theoretically more scalable than TS2Vec and TST when D≫TD \gg T, otherwise they have the same time complexity. CSL also has less space complexity than TS2Vec and TST. Compared to TS-TCC, the time complexity of CSL is the same or somewhat greater according to the relation between TT and D, but CSL has less space complexity. T-Loss and TNC are more scalable in both time and space. However, the two methods rely on many sequential operations which can not be accelerated by GPUs. The experimental results in Section 5.8 show that they run much slower than CSL, TS2Vec, TST and TS-TCC with a considerably large input scale. Moreover, we show that our CSL, though primarily designed for improving the representation quality, also has faster training speed for real-world tasks, saying that it can achieve better performance with equal or less training time. 我们对比了 CSL 与先进 URL 基线方法的复杂度,如表 1 所示。当 D≫TD \gg T 时,CSL 在理论上比 TS2Vec 和 TST 更具可扩展性,否则它们的时间复杂度相同。CSL 的空间复杂度也低于 TS2Vec 和 TST。与 TS-TCC 相比,CSL 的时间复杂度根据 TT 与 D 的关系,与 TS-TCC 相同或略高,但 CSL 的空间复杂度更低。T-Loss 和 TNC 在时间和空间上均更具可扩展性。然而,这两种方法依赖于大量顺序操作,无法通过 GPU 加速。第 5.8 节的实验结果表明,在输入规模显著增大的情况下,它们的运行速度远低于 CSL、TS2Vec、TST 和 TS-TCC。此外,我们还证明了我们的 CSL,尽管主要设计用于提升表示质量,但在实际任务中也具有更快的训练速度,即在相同或更少的训练时间内可实现更好的性能。
Table 2: Statistics of the 30 UEA datasets. All datasets are used for classification evaluation and the 12 subsets marked by * are used for clustering evaluation following [61]. 表 2:30 个 UEA 数据集的统计信息。所有数据集均用于分类评估,其中用*标记的 12 个子集用于聚类评估,具体方法参见[61]。
We conduct extensive experiments using total 34 real-world datasets to assess the representation quality of CSL. Three main tasks are investigated including the supervised classification task and the unsupervised clustering and anomaly detection tasks. Note that URL considers the MTS representation at the segment level, thus we work on segment-level anomaly detection (rather than observationlevel [28, 44]). In specific, we consider series at each sliding window x_(i)[t,w],t=1,2,dots,N-w+1x_{i}[t, w], t=1,2, \ldots, N-w+1 as an anomaly if it contains at least one anomalous observation. We train the popular SVM, K-means, and Isolation Forest on top of the learned representations to solve the three tasks respectively. We describe the datasets, baselines, implementations and evaluation metrics as follows. 我们通过使用总共 34 个真实世界数据集进行广泛实验,评估 CSL 的表示质量。研究了三个主要任务,包括监督分类任务以及无监督聚类和异常检测任务。需要注意的是,URL 在段级别考虑 MTS 表示,因此我们专注于段级异常检测(而非观测级[28, 44])。具体而言,我们将每个滑动窗口 x_(i)[t,w],t=1,2,dots,N-w+1x_{i}[t, w], t=1,2, \ldots, N-w+1 中的序列视为异常,若其包含至少一个异常观测值。我们基于学习到的表示方法,分别训练了流行的支持向量机(SVM)、K 均值聚类和孤立森林算法,以解决上述三个任务。以下将详细描述数据集、基线方法、实现细节及评估指标。
Datasets. We use 34 MTS datasets with various sample size, dimension, series length, number of classes and application scenario to evaluate the representation quality on the three downstream tasks. We use the default train/test split for all datasets where only the training data are used for learning the encoder and task-specific models. The datasets used for each task are present below. 数据集。我们使用了 34 个 MTS 数据集,这些数据集具有不同的样本规模、维度、序列长度、类别数量和应用场景,以评估在三个下游任务上的表示质量。对于所有数据集,我们采用默认的训练/测试集划分,其中仅使用训练数据来学习编码器和任务特定模型。每个任务所使用的数据集如下所示。
(1) Classification. To benchmark the result, we evaluate the performance of MTS classification on all 30 datasets of the popular UEA archive [3]. These data are collected from various domains, e.g., human action recognition, Electrocardiography monitoring and audio classification. The dataset statistics is present in Table 2. (1) 分类。为了评估结果,我们对 MTS 分类在流行 UEA 档案库[3]中的所有 30 个数据集上进行了性能评估。这些数据来自多个领域,例如人体动作识别、心电图监测和音频分类。数据集统计信息如表 2 所示。
(2) Clustering. Following a recent work of multivariate time series clustering [61], we evaluate the clustering performance using 12 UEA subsets which are highly heterogenerous in train/test size, length, and the number of dimensions and classes. The statistics of these 12 datasets are shown in Table 2 (marked by *). (2) 聚类分析。基于近期多变量时间序列聚类研究[61],我们使用 12 个高度异构的 UEA 子集评估聚类性能,这些子集在训练/测试样本量、序列长度、维度数及类别数上存在显著差异。这 12 个数据集的统计特征如表 2 所示(标注为*)。
(3) Anomaly Detection. Four recently published datasets collected from several challenging real-world applications are used for anomaly detection. Soil Moisture Active Passive satellite (SMAP) and Mars Science Laboratory rover (MSL) are two spacecraft anomaly detection datasets from NASA [18]. Server Machine Dataset (SMD) is a 5-week-long dataset collected by [44] from a large Internet company. Application Server Dataset (ASD) is a 45 -day-long MTS charactering the status of the servers recently collected by [28]. Following [28], for SMD, we use the 12 entities that do not suffer concept drift for evaluation. Table 3 shows the dataset statistics. (3) 异常检测。本文使用了四个最近发表的异常检测数据集,这些数据集来自多个具有挑战性的真实世界应用场景。其中,土壤湿度主动被动卫星(SMAP)和火星科学实验室漫游车(MSL)是美国宇航局(NASA)提供的两个航天器异常检测数据集[18]。服务器机器数据集(SMD)是由[44]从一家大型互联网公司收集的为期 5 周的数据集。应用服务器数据集(ASD)是一个为期 45 天的 MTS,用于表征服务器的状态,最近由[28]收集。遵循[28],对于 SMD,我们使用 12 个未发生概念漂移的实体进行评估。表 3 显示了数据集的统计信息。
Baselines. We use 21 baselines for comparison, which are categorized into two groups: 基线。我们使用21个基线进行比较,这些基线被分为两组:
(1) URL methods. We compare our CSL with 5 URL baselines specially designed for time series, including TS2Vec [59], T-Loss [11], TNC [46], TS-TCC [10], and TST [60]. All URL competitors are evaluated in the same way as CSL for a fair comparison. More details of these methods are discussed in Section 2. (1) URL 方法。我们将我们的 CSL 与 5 个专门为时间序列设计的 URL 基线方法进行比较,包括 TS2Vec [59]、T-Loss [11]、TNC [46]、TS-TCC [10] 和 TST [60]。所有 URL 方法均采用与 CSL 相同的评估方式,以确保公平比较。这些方法的详细讨论见第 2 节。
(2) Task-specific methods. We also include baselines tailored for downstream tasks. We select outstanding approaches for classification, containing the most popular baseline DTWD which adopts the one-nearest-neighbor classifier with dynamic time warping as the distance metric [3] and five supervised techniques, including the RNN-based MLSTM-FCNs [23], the attentional prototype-based TapNet [63], the shapelet-based ShapeNet [25], and the CNN-based OSCNN [45] and DSN [52]. To avoid an unfair comparison, we let outside the ensemble methods like [31]. Recall that the supervised classification methods use the true labels to learn the features, which is benchmarked against the data augmentation or sampling in URL. Thus, the comparison is fair without further applying the data augmentation methods of CSL on the baseline approaches. (2) 任务特定方法。我们还包含针对下游任务定制的基线方法。我们选取了分类任务中表现优异的方案,其中包括最常用的基线方法 DTWD,该方法采用动态时间规整(Dynamic Time Warping)作为距离度量,结合最近邻分类器[3],以及五种监督学习技术,包括基于 RNN 的 MLSTM-FCNs[23]、基于注意力原型的 TapNet[63]、基于形状的 ShapeNet[25],以及基于卷积神经网络的 OSCNN[45]和 DSN[52]。为避免不公平比较,我们未纳入如[31]等 ensemble 方法。需要注意的是,监督分类方法使用真实标签来学习特征,这与 URL 中的数据增强或采样进行对比。因此,在基线方法上不进一步应用 CSL 的数据增强方法,比较结果是公平的。
We consider six advanced clustering baselines including the dimension reduction-based MC2PCA [26] and TCK [35], the distancebased m-kAVG+ED\mathrm{m}-\mathrm{kAVG}+\mathrm{ED} and m-kDBA\mathrm{m}-\mathrm{kDBA} [40], the deep learning-based DeTSEC [19], and the shapelet-based MUSLA [61]. In addition, we design ShapeNet-Clustering (SN-C), an adaption of the classification baseline ShapeNet which is also based on the shapelets. In SN-C, we dismiss the supervised feature selection of ShapeNet and use K-means rather than SVM upon the features for clustering. 我们考虑了六种先进的聚类基线方法,包括基于维度约减的 MC2PCA [26]和 TCK [35]、基于距离的 m-kAVG+ED\mathrm{m}-\mathrm{kAVG}+\mathrm{ED} 和 m-kDBA\mathrm{m}-\mathrm{kDBA} [40]、基于深度学习的 DeTSEC [19],以及基于形状片段的 MUSLA [61]。此外,我们设计了 ShapeNet-Clustering(SN-C),这是对分类基线 ShapeNet 的改编,同样基于形状片。在 SN-C 中,我们放弃了 ShapeNet 的监督式特征选择,并在特征上使用 K-means 而非 SVM 进行聚类。
Since no evaluation is reported on the anomaly detection datasets under the segment-level setting, we develop 2 baselines on top of the raw MTS using also Isolation Forest for a fair comparison. The models take the observations either at each timestamp (denoted as IF-p) or within each sliding window (denoted as IF-s) as the input. Similarly, we also adapt ShapeNet for anomaly detection (SN-AD) by dismissing the supervised feature selection of ShapeNet and using Isolation Forest upon the shapelet-transformed features. 由于在分段级设置下未报告异常检测数据集的评估结果,我们在原始 MTS 基础上开发了 2 个基线模型,并采用孤立森林(Isolation Forest)进行公平比较。模型将观测值作为输入,观测值可以是每个时间戳(标记为 IF-p)或每个滑动窗口(标记为 IF-s)中的数据。类似地,我们通过去除 ShapeNet 的监督特征选择,并使用 Isolation Forest 对 Shapelet 变换后的特征进行分析,将 ShapeNet 适应于异常检测(SN-AD)。
Implementations. We implement the CSL model using PyTorch 1.10.2 and run all experiments on a Ubuntu machine with Tesla V100 GPU. The SVM, K-means, and Isolation Forest are implemented using Scikit-learn 1.1.1 and the data augmentation methods are implemented using tsaug [20] with default parameters. Most of the hyper-parameters of CSL are set to fixed values for all experiments without tuning. We adopt SGD optimizer to learn the ST encoder. The learning rate is set to 0.01 . Batchnorm is applied after the encoding. We set alpha=0.5\alpha=0.5 in soft orthogonality and lambda=0.01,lambda_(S)=1\lambda=0.01, \lambda_{S}=1 in loss functions. The batch size is set to 8 for all UEA datasets and 256 for the anomaly detection datasets. The temperature tau\tau is selected from {0.1, 0.01, 0.001} by cross validation for the UEA datasets and is fixed to 0.1 for the anomaly detection datasets. Following previous works [59, 61], the embedding dimension is fixed to D_("repr ")=320D_{\text {repr }}=320 for classification and is chosen from {80,240,320}\{80,240,320\} for clustering for a fair comparison. On anomaly detection tasks, we set D_("repr ")D_{\text {repr }} to 240, 320, 48, 32 for SMAP, MSL, SMD, and ASD respectively. 实现。我们使用 PyTorch 1.10.2 实现 CSL 模型,并在配备 Tesla V100 GPU 的 Ubuntu 机器上运行所有实验。支持向量机(SVM)、K-means 聚类和孤立森林算法使用 Scikit-learn 1.1.1 实现,数据增强方法使用 tsaug [20]并采用默认参数。CSL 的大多数超参数在所有实验中均设置为固定值,未进行调优。我们采用 SGD 优化器学习 ST 编码器。学习率设置为 0.01。编码后应用批量归一化(Batchnorm)。在软正交性中设置 alpha=0.5\alpha=0.5 ,在损失函数中设置 lambda=0.01,lambda_(S)=1\lambda=0.01, \lambda_{S}=1 。批量大小对于所有 UEA 数据集设置为 8,对于异常检测数据集设置为 256。温度 tau\tau 通过交叉验证从{0.1, 0.01, 0.001}中选取,并在 UEA 数据集上固定为 0.1。异常检测数据集上,温度固定为 0.1。遵循先前研究[59, 61],分类任务中嵌入维度固定为 D_("repr ")=320D_{\text {repr }}=320 ,聚类任务中从 {80,240,320}\{80,240,320\} 中选择,以确保公平比较。在异常检测任务中,我们分别将 D_("repr ")D_{\text {repr }} 设置为 240、320、48 和 32,对应 SMAP、MSL、SMD 和 ASD 数据集。
We reproduce the URL baselines using the open source code from the authors’ implementations with the recommended configurations. The results of the classification baselines and the task-specific clustering baselines are taken from the published papers [3, 25, 45, 52, 59, 61]. Other results are based on our reproduction. 我们使用作者实现的开源代码和推荐的配置,复现了 URL 基线。分类基线和任务特定聚类基线的结果来自已发表的论文[3, 25, 45, 52, 59, 61]。其他结果基于我们的复现。
Metrics. Standard metrics are employed to evaluate the performance of the downstream tasks. We utilize Accuracy (Acc) [3] in classification tasks. Clustering results are evaluated using Rand Index (RI) and Normalized Mutual Information (NMI) [61, 62]. And F1-score is adopted for anomaly detection [14, 28]. 指标。采用标准指标对下游任务的性能进行评估。在分类任务中,我们使用准确率(Acc)[3]。聚类结果通过兰德指数(RI)和标准化互信息(NMI)[61, 62]进行评估。而异常检测则采用 F1 分数[14, 28]。
5.2 Main Results 5.2 主要结果
Table 4, 5 and 6 summarize the results on classification, clustering, and anomaly detection tasks. We report the average ranking (AR) of algorithms on each dataset, and count the number of datasets in which the CSL wins/ties/loses (W/T/L) the counterparts in the one-versus-one comparisons. The Wilcoxon rank test’s p-values (p-val) are employed to quantificationally evaluate the significance. 表 4、5 和 6 总结了分类、聚类和异常检测任务的结果。我们报告了算法在每个数据集上的平均排名(AR),并统计了在一对一比较中 CSL 战胜/打平/输给(W/T/L)对应算法的数据集数量。采用 Wilcoxon 秩和检验的 p 值(p-val)进行定量显著性评估。
In summary, the proposed CSL outperforms the URL competitors on most of the tasks and datasets, achieving the best overall performance. Moreover, CSL can achieve performance comparable to the approaches customized for classification and clustering. The results show the excellent ability of CSL in unsupervised learning of high-quality and general-purpose MTS representation. Below we discuss the results in detail for each task. 综上所述,提出的 CSL 在大部分任务和数据集上均优于 URL 方法,并取得了最佳整体性能。此外,CSL 在分类和聚类任务上的性能可与针对这些任务定制的方法相媲美。实验结果表明,CSL 在无监督学习高质量、通用型 MTS 表示方面具有卓越能力。以下我们将对各任务的实验结果进行详细讨论。
Classification. As shown in Table 4, CSL achieves competitive performance on most of the datasets. It has the highest average accuracy and accuracy ranking. Specifically, among the 30 datasets, CSL achieves the best accuracy in 21 of them if compared to URL methods only, and the highest accuracy in 12 of them (best in all algorithms) if all methods are considered. In the one-versus-one comparison, CSL outperforms all URL competitors in terms of the number of wins on the datasets. These results are in line with our expectations, as shapelets are originally designed to extract time series patterns that can effectively distinguish different classes. CSL further enhances the advantages of shapelets by jointly using the shapelets of different lengths and multiple (dis)similarity measures, and by using a novel objective for model training. To our surprise, CSL achieves better overall accuracy than the fully supervised counterparts. Compared to the supervised learning methods and based 分类。如表 4 所示,CSL 在大多数数据集上表现出竞争性性能。它具有最高的平均准确率和准确率排名。具体而言,在 30 个数据集中,与仅使用 URL 方法相比,CSL 在 21 个数据集中实现了最佳准确率,而在所有方法中,CSL 在 12 个数据集中实现了最高准确率(所有算法中最佳)。在一对一比较中,CSL 在所有数据集上的胜场数均优于所有 URL 竞争方法。这些结果与我们的预期一致,因为 shapelets 最初设计用于提取能够有效区分不同类别的时序模式。CSL 通过联合使用不同长度的 shapelets 和多种(不)相似度度量,以及采用一种新型模型训练目标,进一步增强了 shapelets 的优势。令人意外的是,CSL 在整体准确率上优于完全监督学习方法。与监督学习方法相比,基于
Figure 5: Two-dimensional t-SNE [50] visualization of the unsupervised learned representation for ERing test set. Classes are distinguishable using their respective marker shapes and colors. 图 5:ERing 测试集的无监督学习表示的二维 t-SNE [50] 可视化结果。各类别可通过其对应的标记形状和颜色进行区分。
Table 4: Performance comparison on MTS classification. The best results among URL methods are highlighted in bold and †\dagger indicates the best among all competitors. The underlined value indicates significant difference under a statistical level of 0.05 . 表 4:MTS 分类性能对比。URL 方法中最佳结果以粗体标出, †\dagger 表示所有竞争方法中的最佳结果。下划线值表示在 0.05 的统计显著性水平下存在显著差异。
on the Wilcoxon rank test, our CSL has surpassed MSLTM-FCNs and TapNet and is on par with ShapeNet, OSCNN, and DSN, showing that CSL has reached a comparable level to supervised learning. This implies that class-specific features can be learned from the inherent structure of the data without supervised information, thus labels are only needed for classifier training. Furthermore, we observe that CSL performs poorly on DuckDuckGeese (DD), which has a very high dimension of 1345 (see Table 2). This may indicate a relative weakness of CSL in dealing with high-dimensional MTS, which is a possible direction to further improve our method. 在威尔科克斯秩和检验中,我们的 CSL 模型超越了 MSLTM-FCNs 和 TapNet,并与 ShapeNet、OSCNN 和 DSN 处于同一水平,这表明 CSL 已达到与监督学习相当的水平。这意味着可以从数据的内在结构中学习到类特异性特征,而无需监督信息,因此标签仅用于分类器训练。此外,我们观察到 CSL 在 DuckDuckGeese(DD)数据集上表现不佳,该数据集具有极高的维度(1345,见表 2)。这可能表明 CSL 在处理高维多模态数据(MTS)时存在相对弱点,这是进一步改进我们方法的潜在方向。
Clustering. The results of the clustering tasks are shown in Table 5. CSL outperforms all the other competitors except MUSLA. We note that the best performance for most of the datasets is achieved by either CSL or MUSLA, which are both based on time series shapelet methods. This result shows the superiority of shapelet 聚类。聚类任务的结果如表 5 所示。CSL 在所有其他竞争方法中表现优异,仅次于 MUSLA。我们注意到,对于大多数数据集,最佳性能由 CSL 或 MUSLA 实现,这两种方法均基于时间序列形状特征方法。这一结果表明了形状特征方法的优越性。
features for the MTS clustering tasks. CSL outperforms MUSLA in terms of average ranking, the number of best performances, and the number of wins in one-versus-one comparisons, while slightly underperforming MUSLA in terms of average RI and NMI. Overall, there is no statistically significant difference between these two methods. We would like to emphasize that MUSLA is a specialized clustering method, while our CSL is a generic URL algorithm that can be used for a variety of downstream tasks. MUSLA also relies on exhaustive search or prior knowledge to determine the length of shapelets, while CSL can achieve comparable performance without any effort in this regard. Besides, we notice that SN-C has almost the worst overall performance, which indicates that the URL-based shapelet selection method of ShapeNet which is customized for classification cannot be well generalized to the clustering problem. MTS 聚类任务的特征。CSL 在平均排名、最佳性能数量以及一对一比较中的胜场数方面优于 MUSLA,但在平均 RI 和 NMI 方面略逊于 MUSLA。总体而言,这两种方法之间没有统计学上的显著差异。我们特别强调,MUSLA 是一种专门的聚类方法,而我们的 CSL 是一种通用的 URL 算法,可用于多种下游任务。MUSLA 依赖于穷举搜索或先验知识来确定形状片的长度,而 CSL 在无需任何额外努力的情况下即可实现相似的性能。此外,我们注意到 SN-C 的整体性能几乎最差,这表明 ShapeNet 中为分类任务定制的基于 URL 的 shapelet 选择方法无法很好地推广到聚类问题。
Table 5: Performance comparison on MTS clustering. The best results among URL methods are highlighted in bold, and †\dagger indicates the best among all competitors. The underlined value indicates significant difference under a statistical level of 0.05 . 表 5:MTS 聚类性能对比。URL 方法中表现最佳的结果以粗体标出, †\dagger 表示所有竞争方法中的最佳结果。下划线值表示在 0.05 的统计显著性水平下存在显著差异。
Table 6: Performance comparison on MTS anomaly detection. ww represents the length of the sliding window and the best results are highlighted in bold. The underlined value indicates significant difference under a statistical level of 0.05\mathbf{0 . 0 5}. 表 6:多任务异常检测性能对比。 ww 表示滑动窗口的长度,最佳结果以粗体标出。下划线值表示在 0.05\mathbf{0 . 0 5} 的统计显著性水平下存在显著差异。
Anomaly Detection. In Table 6, we can see that CSL outperforms the baselines in every setting, except for SMD with a window length of 100, where CSL is slightly inferior to IF-s and TS2Vec. This may indicate that these two algorithms are more effective in detecting outliers with long SMD windows. For each dataset, the performance of all methods tend to improve as the sliding window size increases, because larger windows allow more normal observations to be seen to better detect the outliers. CSL achieves superior performance on the MSL dataset, outperforming the second-best TS-TCC by more 异常检测。如表 6 所示,CSL 在所有设置下均优于基线方法,仅在 SMD 窗口长度为 100 时,CSL 略逊于 IF-s 和 TS2Vec。这可能表明,对于长 SMD 窗口的异常值检测,这两种算法更为有效。对于每个数据集,所有方法的性能随着滑动窗口大小的增加而提升,因为更大的窗口能观察到更多正常样本,从而更好地检测异常值。CSL 在 MSL 数据集上表现优异,显著超越了第二名的 TS-TCC。
than 30%30 \% on each window size. Although the difference in performance is not as large on the other three datasets, CSL is almost always the best and significantly outperformed each competitor. This indicates that CSL has an outstanding ability to identify anomalies. We also observe that the second-best method is different for each dataset, with T-Loss on SMAP, TS-TCC on MSL, and IF-s on SMD, while TS2Vec and T-Loss are the second-best methods with about the same performance on ASD. The reason may be that these URL algorithms are developed based on specific assumptions that may fail in other domains. In contrast, CSL exhibits more general capabilities. The variant of ShapeNet, i.e., SN-AD, does not achieve competitive performance like in classification, showing again the limitation of ShapeNet in terms of task generality. 与 30%30 \% 相比,CSL 在每个窗口大小下均表现优异。尽管在其他三个数据集上的性能差异不大,但 CSL 几乎始终是最佳方法,且显著优于其他所有竞争方法。这表明 CSL 在异常识别方面具有卓越的能力。我们还观察到,每个数据集的第二优方法各不相同:SMAP 上为 T-Loss,MSL 上为 TS-TCC,SMD 上为 IF-s,而 ASD 上 TS2Vec 和 T-Loss 以相近性能并列第二。原因可能在于这些 URL 算法基于特定假设开发,可能在其他领域失效。相比之下,CSL 展现出更强的通用能力。ShapeNet 的变体 SN-AD 并未像分类任务中那样取得竞争性性能,这再次凸显了 ShapeNet 在任务通用性方面的局限性。
Finally, we visualize the unsupervised learned representation of ERing test data using t-SNE [50]. We compare CSL with the five URL baselines and the variant of ShapeNet which excludes the supervised feature selection. As Fig. 5 shows, the representation learned by the proposed CSL forms more separated clusters, which also suggests representation of lower entropy. This explains why CSL can outperform the competitors on downstream analysis tasks. 最后,我们使用 t-SNE [50] 可视化了 ERing 测试数据的无监督学习表示。我们将 CSL 与五个 URL 基线以及排除监督特征选择的 ShapeNet 变体进行了比较。如图 5 所示,CSL 学习到的表示形成了更加分离的聚类,这表明表示的熵更低。这解释了为什么 CSL 能在下游分析任务中优于其他方法。
5.3 Ablation Study 5.3 消融研究
To validate the effectiveness of the key components in CSL, we conduct ablation studies using classification tasks on all 30 UEA datasets. Due to space limitations, only the statistical results are reported here. The best value in a comparison is highlighted in bold and underlining indicates a significant difference under a statistical level of 0.05 . The results are discussed as follows. 为了验证 CSL 中关键组件的有效性,我们使用分类任务在所有 30 个 UEA 数据集上进行了消融实验。由于篇幅限制,本文仅报告统计结果。比较中最佳值以粗体标出,下划线表示在 0.05 的统计显著性水平下存在显著差异。具体结果讨论如下。
Table 7: Effectiveness of multi-scale shapelets. 表7:多尺度形状特征的有效性。
Effectiveness of components in Shapelet Transformer. There are two major designs within the Shapelet Transformer, including using the shapelets of different scales (lengths) and the diverse dis(similarity) measures. As they improve the representation in orthogonal directions, we assess their effectiveness individually. Shapelet Transformer 中组件的有效性。Shapelet Transformer 主要包含两种设计:使用不同尺度(长度)的 Shapelet,以及采用多种差异度量。由于这两种设计能在正交维度上提升表示能力,我们对其有效性进行了单独评估。
(1) Multi-scale shapelets. ST contains shapelets ranging from short to long. Here we compare CSL with its three variants: short scale only (where the shapelet length ranges from 0.1 T0.1 T to 0.4 T0.4 T ), long scale only (from 0.5 T0.5 T to 0.8 T0.8 T ) and the better of the two. To make a fair comparison, we fix the embedding dimension D_("repr ")D_{\text {repr }} and the number of scales RR for all experiments. The results are shown in Table 7. Both the short- and long-scale variants perform much worse than CSL. Even the best of the two variants still performs slightly worse than CSL. These results demonstrate the necessity of using shapelets with a wider range of time scales. (1) 多尺度形状片。ST 包含从短到长的形状片。在此,我们将 CSL 与其三个变体进行比较:仅短尺度(形状片长度范围为 0.1 T0.1 T 至 0.4 T0.4 T )、仅长尺度(从 0.5 T0.5 T 至 0.8 T0.8 T )以及两者中较优的变体。为确保公平比较,我们固定嵌入维度 D_("repr ")D_{\text {repr }} 和尺度数 RR 用于所有实验。结果如表 7 所示。短尺度和长尺度变体均显著劣于 CSL。即使两种变体中表现较好的一个,其性能仍略逊于 CSL。这些结果表明,使用具有更广泛时间尺度范围的 shapelets 是必要的。
(2) Diverse (dis)similarity measures. To investigate the role of the dis(similarity) measures in the Shapelet Transformer, we compare our CSL with its three variants, i.e., separately using one measure of the Euclidean norm, cosine similarity, and cross correlation. The results are summarized in Table 8. We can see that the cross correlation is the best performer among the three single measures, while the Euclidean norm variant using the original definition of shapelet is the worst. This validates our hypothesis that the Euclidean norm-based shapelet has limitations in representing time series. All three variants perform much worse than CSL with the pvalues less than 0.05 . This shows the need to combine the different types of measures in the shapelet-based MTS representation. (2) 多种(不)相似性度量。为了探讨(不)相似性度量在 Shapelet Transformer 中的作用,我们将我们的 CSL 与它的三个变体进行比较,即分别使用欧几里得范数、余弦相似性和交叉相关性作为度量。结果总结如表 8 所示。我们可以看出,在三种单一度量中,交叉相关性表现最佳,而采用 Shapelet 原始定义的欧几里得范数变体表现最差。这验证了我们的假设,即基于欧几里得范数的 Shapelet 在表示时间序列时存在局限性。所有三种变体与 CSL 相比表现显著较差,p 值均小于 0.05。这表明在基于 Shapelet 的 MTS 表示中,有必要结合不同类型的度量方法。
Effectiveness of components in loss function. There are three terms in our loss function, i.e., coarse-grained contrastive loss L_(C)\mathcal{L}_{C}, fine-grained contrastive loss L_(F)=sum_(r=1)^(R)L_(F,r)\mathcal{L}_{F}=\sum_{r=1}^{R} \mathcal{L}_{F, r}, and multi-scale alignment loss L_(A)\mathcal{L}_{A}. We investigate the effect size of each term by removing them one by one. As we can see in Table 9, CSL is significantly better than the variant without the term L_(F)\mathcal{L}_{F} or L_(A)\mathcal{L}_{A}, which proves the importance of these two components. In contrast, removing the coarse-grained loss L_(C)\mathcal{L}_{C} has the least impact. This may imply that, when the representations on each time scale have been sufficiently trained and aligned, the joint version is already near-optimal, thus the coarse-grained contrasting can no longer lead to a huge (but still statistically significant) improvement like the other two terms. 损失函数中各组件的有效性。我们的损失函数包含三个项,即粗粒度对比损失 L_(C)\mathcal{L}_{C} 、细粒度对比损失 L_(F)=sum_(r=1)^(R)L_(F,r)\mathcal{L}_{F}=\sum_{r=1}^{R} \mathcal{L}_{F, r} 以及多尺度对齐损失 L_(A)\mathcal{L}_{A} 。我们通过依次移除各项来研究其影响大小。如表 9 所示,CSL 显著优于移除 L_(F)\mathcal{L}_{F} 或 L_(A)\mathcal{L}_{A} 的变体,这证明了这两个组件的重要性。相比之下,移除粗粒度损失 L_(C)\mathcal{L}_{C} 的影响最小。这可能表明,当各时间尺度上的表示已充分训练并对齐后,联合版本已接近最优,因此粗粒度对比无法再带来像其他两项那样巨大(但仍具有统计显著性)的提升。
Figure 6: Study of the multi-grained contrasting and the multi-scale alignment on UWaveGestureLibrary. Dashed line corresponds to the joint embedding in R^("Drepr ")\mathbb{R}^{\text {Drepr }} with multiple scales and bar corresponds to embedding at each single scale. 图 6:对 UWaveGestureLibrary 进行多粒度对比与多尺度对齐的研究。虚线对应于 R^("Drepr ")\mathbb{R}^{\text {Drepr }} 中多尺度联合嵌入,而实线对应于每个单一尺度下的嵌入。
Table 10: Effectiveness of the data augmentation library. 表10:数据增强库的有效性。
We further explore how multi-grained contrasting and multiscale alignment work using a case study in Fig. 6. We find that removing L_(F)\mathcal{L}_{F} decreases the representation quality of every single scale (the orange bar), and thus has a great negative impact on the joint embedding (the orange line). Similar phenomena can be observed for L_(A)\mathcal{L}_{A}. It indicates that L_(F)\mathcal{L}_{F} and L_(A)\mathcal{L}_{A} improve the final performance through improving the quality of each scale. Compared to the variants without L_(C)\mathcal{L}_{C} (the green bar), the representation quality of each single scale is balanced with the loss (the red bars), saying that the quality of scale 1,4,51,4,5 and 8 is improved, while the quality of scale 2 and 7 is a little decreased. As a result, the joint embedding learned using L_(C)\mathcal{L}_{C} (the red line) is better than that without the loss (the green line). It validates the hypothesis in Section 4.3 that L_(C)\mathcal{L}_{C} can coordinate the multiple scales to improve the joint embedding. 我们进一步通过图 6 中的案例研究,探讨了多粒度对比和多尺度对齐的具体工作机制。我们发现,移除 L_(F)\mathcal{L}_{F} 会降低每个单一尺度的表示质量(橙色柱),从而对联合嵌入产生显著负面影响(橙色线)。类似现象在 L_(A)\mathcal{L}_{A} 中也可观察到。这表明, L_(F)\mathcal{L}_{F} 和 L_(A)\mathcal{L}_{A} 通过提升各尺度表示质量,从而改善最终性能。与不包含 L_(C)\mathcal{L}_{C} 的变体(绿色柱状图)相比,引入损失项后(红色柱状图),各尺度表示质量实现平衡:尺度 1,4,51,4,5 和 8 的质量得到提升,而尺度 2 和 7 的质量略有下降。因此,使用 L_(C)\mathcal{L}_{C} (红色曲线)学习到的联合嵌入优于不使用损失(绿色曲线)的情况。这验证了第 4.3 节中的假设,即 L_(C)\mathcal{L}_{C} 能够协调多尺度以提升联合嵌入效果。
From the above exhaustive analysis, we can conclude that all the components included in the loss function of CSL are necessary. 通过上述全面分析,我们可以得出结论:CSL 损失函数中包含的所有组件都是必要的。
Effectiveness of the data augmentation library. We remove the methods in the data augmentation library one by one to evaluate their effectiveness. As can be seen in Table 10, the variant without time warping get the lowest average ranking (4.52), implying that removing time warping has a broader negative effect among the 30 datasets than the other data augmentation methods. The data augmentation libraries without each of the other four methods have a close average ranking. In contrast, the complete version has the highest average ranking (1.98), suggesting that the performance of CSL can probably be further improved when more types of data augmentation approaches are included in the library. This is an interesting finding and may imply a general data-independent data augmentation scheme for unsupervised representation learning of MTS. We leave the further exploration in our future work. 数据增强库的有效性。我们逐一移除数据增强库中的方法,以评估其有效性。如表 10 所示,未采用时间扭曲的变体获得最低平均排名(4.52),这表明移除时间扭曲对 30 个数据集的影响比其他数据增强方法更为广泛。去除其他四种方法后的数据增强库平均排名相近。相比之下,完整版本的平均排名最高(1.98),这表明当库中包含更多类型的数据增强方法时,CSL 的性能可能进一步提升。这是一个有趣的发现,可能暗示了针对 MTS 无监督表示学习的通用数据独立型数据增强方案。我们将在未来研究中进一步探索。
5.4 Sensitivity Analysis 5.4 敏感性分析
We perform sensitivity analysis to study the key parameters, including the number of shapelet scales RR (default 8), the minimum and maximum lengths of the shapelets L_("min ")L_{\text {min }} (default 0.1T) and L_("max ")L_{\text {max }} (default 0.8 T0.8 T ), the decay rate alpha\alpha (default 0.5 ), and the regularization coefficients lambda\lambda (default 0.01) and lambda_(S)\lambda_{S} (default 1). 我们进行敏感性分析以研究关键参数,包括 Shapelet 尺度数 RR (默认值为 8)、Shapelet 的最小和最大长度 L_("min ")L_{\text {min }} (默认值为 0.1T)和 L_("max ")L_{\text {max }} (默认值为 0.8 T0.8 T )、衰减率 alpha\alpha (默认值为 0.5),以及正则化系数 lambda\lambda (默认值为 0.01)和 lambda_(S)\lambda_{S} (默认值为 1)。
Similar to the setting in Section 3.2, given L_("min "),L_("max ")L_{\text {min }}, L_{\text {max }} and RR, the shapelet lengths are simply set to the evenly spaced numbers over 与第 3.2 节的设置类似,给定 L_("min "),L_("max ")L_{\text {min }}, L_{\text {max }} 和 RR ,shapelet 长度简单地设置为这些数值在区间内均匀分布的数值。
Figure 7: Sensitivity analysis of the key parameters. 图7:关键参数的敏感性分析。
Figure 8: Accuracy of OSCNN, fully supervised and fine-tuned CSL w.r.t. the ratio of labeled data on UWaveGestureLibrary. 图 8:OSCNN、完全监督学习和微调 CSL 在 UWaveGestureLibrary 数据集上的准确率,与标注数据比例的关系。 [L_("min "),L_("max ")]\left[L_{\text {min }}, L_{\text {max }}\right], i.e., L_(r)=L_("min ")+(r-1)(L_("max ")-L_("min "))/(R-1)(r in{1,dots,R})L_{r}=L_{\text {min }}+(r-1) \frac{L_{\text {max }}-L_{\text {min }}}{R-1}(r \in\{1, \ldots, R\}). The performance is evaluated using classification accuracy. The results on three diverse UEA datasets are shown in Fig. 7 and the similar trends can be observed on the other datasets. Please note that the performance on AtrialFibrillation seems sensitive just because the dataset has only 15 testing samples, the minimum number among the 30 UEA datasets, which is a corner case of our evaluation. We discuss the results in detail as follows. [L_("min "),L_("max ")]\left[L_{\text {min }}, L_{\text {max }}\right] 即, L_(r)=L_("min ")+(r-1)(L_("max ")-L_("min "))/(R-1)(r in{1,dots,R})L_{r}=L_{\text {min }}+(r-1) \frac{L_{\text {max }}-L_{\text {min }}}{R-1}(r \in\{1, \ldots, R\}) 。性能评估采用分类准确率。图 7 展示了在三个不同 UEA 数据集上的结果,其他数据集也呈现出类似趋势。需注意,心房颤动(AtrialFibrillation)数据集的性能表现较为敏感,这是因为该数据集仅包含 15 个测试样本,是 30 个 UEA 数据集中样本量最少的,属于我们评估中的特殊案例。我们将对结果进行详细讨论如下。
The sensitivity analysis of RR. To capture multi-scale information, the number of scales RR cannot be too small. But too large RR will cause a small number of shapelets KK under a fixed representation dimension D_("repr ")D_{\text {repr }}, which can also decrease the representation quality. As the result in Fig. 7a shows, the model is relatively more sensitive to small values of RR than larger values, while a moderate value of 8 can lead to good overall performance among the datasets. RR 的敏感性分析。为了捕获多尺度信息,尺度数 RR 不能过小。但尺度数 RR 过大,会在固定表示维度 D_("repr ")D_{\text {repr }} 下导致 KK 的数量过少,这也会降低表示质量。如图 7a 所示,模型对 RR 的小值比大值更敏感,而取适中的值 8 可在不同数据集上实现较好的整体性能。
The sensitivity analysis of L_("min ")L_{\text {min }} and L_("max ")L_{\text {max }}. From Fig. 7b-7c, we can see that for UWaveGestureLibrary, the best choice of L_("min ")L_{\text {min }} is about 0.4 T0.4 T and the values of 0.8 T-0.9 T0.8 T-0.9 T are the best for L_("max ")L_{\text {max }}, which indicates that the long-term features can be more effective than the short-term ones. For ArticularyWordRecognition, the relatively small values of L_("min ")(0.1 T-0.2 T)L_{\text {min }}(0.1 T-0.2 T) and large values of L_("max ")(0.8 T-L_{\text {max }}(0.8 T- 0.9 T ) are better, showing the importance of both short- and longterm features. While on the AtrialFibrillation dataset, L_("min ")=0.1 TL_{\text {min }}=0.1 T and 0.8 T-0.9 T0.8 T-0.9 T for L_("max ")L_{\text {max }} are empirically the best choice. Overall, the default settings of L_("min ")=0.1 TL_{\text {min }}=0.1 T and L_("max ")=0.8 TL_{\text {max }}=0.8 T can be decent for different datasets without any tunning (also validated in Section 5.3), while one can manually optimize them for further improvement. The sensitivity analysis of alpha\alpha. As the result shown in Fig. 7d, our model is more sensitive to the small values of the decay rate alpha\alpha than the large values for the UWaveGestureLibrary and AtrialFibrillation datasets, and the opposite for ArticularyWordRecognition. Overall, L_("min ")L_{\text {min }} 和 L_("max ")L_{\text {max }} 的敏感性分析。从图 7b-7c 可以看出,对于 UWaveGestureLibrary, L_("min ")L_{\text {min }} 的最佳选择约为 0.4 T0.4 T ,而 0.8 T-0.9 T0.8 T-0.9 T 的值对于 L_("max ")L_{\text {max }} 而言最佳,这表明长期特征比短期特征更有效。对于 ArticularyWordRecognition, L_("min ")(0.1 T-0.2 T)L_{\text {min }}(0.1 T-0.2 T) 的相对较小值和 L_("max ")(0.8 T-L_{\text {max }}(0.8 T- 的较大值(0.9 T)表现更佳,这表明短期和长期特征均重要。而在 AtrialFibrillation 数据集上, L_("min ")=0.1 TL_{\text {min }}=0.1 T 和 0.8 T-0.9 T0.8 T-0.9 T 对于 L_("max ")L_{\text {max }} 的设置经验上最佳。总体而言, L_("min ")=0.1 TL_{\text {min }}=0.1 T 和 L_("max ")=0.8 TL_{\text {max }}=0.8 T 的默认设置在不同数据集上无需调优即可取得不错效果(如第 5.3 节验证),但可通过手动优化进一步提升性能。 alpha\alpha 的敏感性分析。如图 7d 所示,对于 UWaveGestureLibrary 和 AtrialFibrillation 数据集,我们的模型对衰减率 alpha\alpha 的小值更为敏感,而对大值不敏感;对于 ArticularyWordRecognition 数据集则相反。总体而言,
Figure 9: Explanation of the shapelets learned by CSL. 图 9:CSL 学习到的形状片段的解释。
our model is less sensitive to alpha\alpha than the other parameters, and a moderate value around 0.5 is better for different datasets. 我们的模型对 alpha\alpha 的敏感性低于其他参数,且对于不同数据集而言,将该参数设置为 0.5 左右的适中值效果更佳。
The sensitivity analysis of lambda\lambda and lambda_(S)\lambda_{S}. As shown in Fig. 7e, by varying lambda\lambda from 1 to 0.0001 , we observe that our model achieves good performance among different datasets when lambda\lambda is around 0.01 . Similarly, we vary lambda_(S)\lambda_{S} from 100 to 0.01 . The result in Fig. 7f indicates that our model is more robust to the larger values of lambda_(S)\lambda_{S} (1 to 100) for UWaveGestureLibrary and the opposite for AtrialFibrillation (where the model is more sensitive when lambda_(S) > 1\lambda_{S}>1 ). Overall, a moderate value around 1 can be a good choice for different datasets. lambda\lambda 和 lambda_(S)\lambda_{S} 的敏感性分析。如图 7e 所示,当 lambda\lambda 在 1 到 0.0001 之间变化时,我们发现当 lambda\lambda 约为 0.01 时,我们的模型在不同数据集上表现良好。类似地,我们让 lambda_(S)\lambda_{S} 在 100 到 0.01 之间变化。图 7f 的结果表明,对于 UWaveGestureLibrary,当 lambda_(S)\lambda_{S} 在 1 到 100 之间时,我们的模型对较大的值更具鲁棒性,而对于 AtrialFibrillation,情况则相反(当 lambda_(S) > 1\lambda_{S}>1 时,模型更敏感)。总体而言,对于不同数据集,选择一个适中的值(约为 1)是一个不错的选择。
5.5 Study of Partially Labeled Classification 5.5 部分标注分类研究
To further demonstrate the superiority of our CSL, we perform a case study on UWaveGestureLibrary under a practical setting of partially labeled MTS classification. Specifically, we compare CSL with the best-performing supervised OSCNN on the dataset where only a portion of the randomly selected data is labeled. For CSL, we first use all the data to train the Shapelet Transformer without using labels. Then, we append a linear classifier on top of the representations, and fine-tune the encoder and linear layer using the available labeled data by minimizing the standard cross-entropy loss as used in OSCNN. In contrast, OSCNN is supervisedly trained using the same labeled data (fully supervised). For comparison, we also train a CSL model in the same fully supervised way as OSCNN. 为了进一步验证 CSL 的优越性,我们在 UWaveGestureLibrary 数据集上进行了一项实际场景下的部分标注多模态手势分类案例研究。具体而言,我们在仅对随机选取部分数据进行标注的数据集上,将 CSL 与性能最佳的监督学习 OSCNN 进行对比。对于 CSL,我们首先使用所有数据在不使用标签的情况下训练 Shapelet Transformer。随后,我们在表示层上附加一个线性分类器,并利用可用标注数据通过最小化标准交叉熵损失(与 OSCNN 中一致)对编码器和线性层进行微调。相比之下,OSCNN 采用全监督方式使用相同标注数据进行训练。为了对比,我们还以与 OSCNN 相同的全监督方式训练了一个 CSL 模型。
As Fig. 8 shows, the fully supervised CSL performs very closely to OSCNN. The fine-tuned CSL consistently outperforms the two competitors, especially when the proportion of labeled data is small. Taking advantage of URL which can “pre-train” the encoder using all available data regardless of annotations, the fine-tuned CSL uses only 20%20 \% labeled data to achieve accuracy comparable to the fully supervised OSCNN and CSL trained with 50%50 \% labeled data. The results show the superiority of our URL method in partially-label settings, compared to the traditional fully supervised techniques. 如图 8 所示,完全监督的 CSL 与 OSCNN 的表现非常接近。经过微调的 CSL 在所有情况下都优于两个竞争对手,尤其是在标注数据比例较小的情况下。利用 URL 能够利用所有可用数据(无论是否有标注)对编码器进行“预训练”的优势,经过微调的 CSL 仅使用 20%20 \% 标注数据就实现了与完全监督的 OSCNN 和使用 50%50 \% 标注数据训练的 CSL 相当的准确率。实验结果表明,在部分标注场景下,我们的 URL 方法相较于传统全监督技术具有显著优势。
5.6 Study of the Learned Shapelets 5.6 学习型 Shapelet 的研究
To provide an intuitive understanding of the features CSL extracts, we study the learned shapelets using an easy-to-understand BasicMotions problem from UEA archive. The time series are sensor records of four human motions, i.e., Standing, Walking, Running and Badminton. Each sample has six dimensions and of length T=100T=100. 为了直观地理解 CSL 提取的特征,我们使用来自 UEA 档案库的一个易于理解的基本动作问题来研究学习到的形状特征。时间序列是四种人类动作的传感器记录,即站立、行走、跑步和打羽毛球。每个样本具有六个维度,长度为 T=100T=100 。
We plot four time series of the four classes and two shapelets with different lengths and measures learned by our CSL (see Fig. 9a). Shapelet 1 is of length 30 and encodes the samples using Euclidean 我们绘制了四类数据的四个时间序列以及两个不同长度和度量方式的 Shapelet(由我们的 CSL 学习得到,见图 9a)。Shapelet 1 的长度为 30,使用欧几里得距离对样本进行编码。
Table 11: Accuracy and SVM training time on long time series. 表 11:长时序数据的准确率与支持向量机(SVM)训练时间。
distance (Green). Shapelet 2 has a length of 40 and is used in conjunction with the cross-correlation function for encoding (Orange). The shapelets are matched to the most similar subsequences for each of the time series samples. Note that CSL uses multivariate shapelets to jointly capture the information among different variables, where each shapelet has the same dimensions as the time series. The two shapelets encode each time series sample into a two-dimensional representation (Fig. 9b), where each axis is the (dis)similarity between the shapelet and the matching subsequence according to Eq. (3). 距离(绿色)。Shapelet 2 的长度为 40,与交叉相关函数配合使用进行编码(橙色)。Shapelets 与每个时间序列样本中最相似的子序列进行匹配。需注意,CSL 使用多变量形状片共同捕获不同变量之间的信息,其中每个形状片的维度与时间序列相同。两个形状片将每个时间序列样本编码为二维表示(图 9b),其中每个轴表示形状片与匹配子序列之间的相似度(根据式(3)计算)。
From Fig. 9, we observe that the representation based on Shapelet 1 (X-axis) can distinguish the Standing motion while the other three motions can be effectively classified by the features extracted using both shapelets. Thus, the shapelets can be seen as the prototypes of some classes and the representations are explained as the degree the shapelets exist in the time series, which is intuitive to understand. Our proposal not only extends the original shapelet which is designed only for supervised classification to general-purpose URL, but also retains its benefit in terms of explainability or interpretability [58]. Although the interpretation method is ad-hoc, it remains a nice property of the shapelet compared to the complex neural networks which are harder to explain [37]. 从图 9 中可以看出,基于 Shapelet 1(X 轴)的表示能够区分站立动作,而其他三种动作则可以通过使用两种 Shapelet 提取的特征进行有效分类。因此,Shapelet 可以被视为某些类别的原型,而表示则被解释为 Shapelet 在时间序列中存在的程度,这种解释方式直观易懂。我们的提案不仅将原先仅用于监督分类的 Shapelet 扩展到通用 URL,还保留了其在可解释性或可解释性方面的优势[58]。尽管解释方法是临时设计的,但与难以解释的复杂神经网络相比,这仍是 Shapelet 的优点[37]。
5.7 Study of Long Time Series Representation 5.7 长时序列表示研究
We assess the ability of the URL methods on long time series representation. Four datasets from the Time Series Machine Learning Website [2] are used including BinaryHeartbeat (BH), CatsDogs (CD), DucksAndGeese (DA) and UrbanSound (US). The series lengths of the four datasets are 18530, 14773, 236784 and 44100 respectively and the other statistics can be found on the website. Following Section 5.1, we train an SVM using either the unsupervised learned representation or the raw values of the training data, and report the test accuracy and the SVM training time (marked in italics) in Table 11. In terms of accuracy, CSL performs the best on three data sets and the second-best on DA, showing its higher ability in long series representation. T-Loss is also well-performed, but is still inferior to CSL on CD and US. TS2Vec and TST cannot handle long series due to high space complexity, so they have to shorten the raw data by truncation or subsampling following [59], which may cause information loss and result in their low performance. Compared to analysis on the raw values, using the representation learned by CSL can not only improve the accuracy, but also achieve more than 140x of speedups for the SVM training. This indicates the superiority of the proposed CSL in long time series analysis. 我们评估了 URL 方法在长时序数据表示中的性能。我们使用了时间序列机器学习网站[2]中的四个数据集,包括 BinaryHeartbeat(BH)、CatsDogs(CD)、DucksAndGeese(DA)和 UrbanSound(US)。这四个数据集的序列长度分别为 18530、14773、236784 和 44100,其他统计信息可在网站上查阅。遵循第 5.1 节,我们使用无监督学习的表示或训练数据的原始值训练一个支持向量机(SVM),并在表 11 中报告测试准确率和 SVM 训练时间(以斜体标注)。在准确率方面,CSL 在三个数据集上表现最佳,在 DA 上表现第二好,这表明其在长序列表示方面具有更高的能力。T-Loss 表现良好,但在 CD 和 US 数据集上仍逊于 CSL。TS2Vec 和 TST 因空间复杂度过高无法处理长序列,因此需通过截断或采样缩短原始数据(参考 [59]),这可能导致信息丢失并引发性能下降。与直接分析原始值相比,使用 CSL 学习的表示不仅能提升准确率,还能使 SVM 训练速度提升超过 140 倍。这表明所提出的 CSL 在长时序分析中具有显著优势。
5.8 Running Time Analysis 5.8 运行时间分析
Although our main goal is to improve the representation quality of URL, we show that the running time of the proposed CSL is also less than or comparable to the URL baselines. We first assess the accuracy with respect to the training time using two mediumsized datasets. As shown in Fig. 10, CSL and TS2Vec achieve the 尽管我们的主要目标是提升 URL 的表示质量,但我们证明了所提出的 CSL 算法的运行时间也小于或与 URL 基线算法相当。我们首先通过两个中等规模的数据集评估了准确性与训练时间的关系。如图 10 所示,CSL 和 TS2Vec 均达到了
Figure 10: Accuracy w.r.t. total training time. 图10:准确率与总训练时间的关系。
Figure 11: Training time per epoch of varying input size (N)(N), dimension (D)(D) and series length (T)(T) on InsectWingbeat, DuckDuckGeese and EigenWorms respectively. 图 11:不同输入大小 (N)(N) 、维度 (D)(D) 和序列长度 (T)(T) 在 InsectWingbeat、DuckDuckGeese 和 EigenWorms 数据集上的每 epoch 训练时间。
same or higher accuracy using much less time, showing that they are faster to train than the other URL methods, while CSL is also faster than TS2Vec. Next, we evaluate the training time per epoch on InsectWingbeat, DuckDuckGeese and EigenWorms, the UEA datasets with the largest input size, dimension and series length. The results are shown in Fig. 11a-11c respectively. TS-TCC runs fast in most cases. TS2Vec is also time-efficient, but it cannot scale to large length TT due to high memory consumption (Fig. 11c). TST is slower than CSL for high-dimensional time series (Fig. 11b) and runs out of memory for large TT (Fig. 11c). T-Loss and TNC, though have smaller time complexity, are much slower than the others with considerably large N,DN, D and TT. The reason is that they consist of many sequential operations which cannot be sped up with GPUs. CSL is fairly efficient among the URL methods in terms of running time per epoch. More importantly, CSL can be faster to train as we have illustrated above as it converges using less number of epochs. Besides, we observe that the time spent on data augmentation (CSLAug) is very little during the CSL training. 使用相同或更高的精度,但所需时间大大减少,这表明它们的训练速度比其他 URL 方法更快,而 CSL 也比 TS2Vec 更快。接下来,我们在 UEA 数据集中的 InsectWingbeat、DuckDuckGeese 和 EigenWorms 上评估了每个 epoch 的训练时间,这些数据集具有最大的输入规模、维度和序列长度。结果分别如图 11a-11c 所示。TS-TCC 在大多数情况下运行速度较快。TS2Vec 也具有较高的时间效率,但由于内存消耗过高,无法处理大型时间序列( TT )(图 11c)。TST 在高维时间序列(图 11b)中比 CSL 更慢,且在大型时间序列( TT )中因内存不足而无法运行(图 11c)。T-Loss 和 TNC 虽然时间复杂度较小,但在处理较大的 N,DN, D 和 TT 时速度明显慢于其他方法。这是因为它们包含大量顺序操作,无法通过 GPU 加速。CSL 在 URL 方法中以每 epoch 的运行时间而言效率较高。更重要的是,如上所述,CSL 由于使用较少的 epoch 即可收敛,因此训练速度更快。此外,我们在 CSL 训练过程中观察到数据增强(CSLAug)所花费的时间非常少。
6 CONCLUSION 6 结论
This paper presents a novel URL framework named CSL, which leverages contrastive learning for MTS-specific representation. Particularly, we design a unified shapelet-based encoder and an objective with multi-grained contrasting and multi-scale alignment to capture information in various time ranges. We also build a data augmentation library including diverse types of methods to improve the generality. Extensive experiments on tens of real-world datasets demonstrate the superiority of CSL over the baselines on downstream classification, clustering, and anomaly detection tasks. 本文提出了一种名为 CSL 的创新 URL 框架,该框架利用对比学习实现 MTS 特异性表示。具体而言,我们设计了一个统一的基于形状的编码器和一个具有多粒度对比和多尺度对齐的优化目标,以捕获不同时间范围内的信息。此外,我们构建了一个数据增强库,包含多种类型的方法以提升模型的一般性。在数十个真实世界数据集上的广泛实验表明,CSL 在分类、聚类和异常检测等下游任务中均优于基线方法。
ACKNOWLEDGMENTS 致谢
This paper was supported by NSFC grant (62232005,62202126)(62232005,62202126) and The National Key Research and Development Program of China (2020YFB1006104). 本文研究得到国家自然科学基金项目 (62232005,62202126)(62232005,62202126) 和国家重点研发计划项目(2020YFB1006104)的资助。
REFERENCES 参考文献
[1] Galen Andrew, Raman Arora, Jeff Bilmes, and Karen Livescu. 2013. Deep canonical correlation analysis. In International conference on machine learning. PMLR, 1247-1255. [1] 加伦·安德鲁、拉曼·阿罗拉、杰夫·比尔梅斯和凯伦·利夫斯库。2013. 深度规范相关性分析。收录于《国际机器学习会议论文集》。PMLR,第 1247-1255 页。
[2] Anthony Bagnall, Eamonn Keogh, Jason Lines, Aaron Bostrom, James Large, and Matthew Middlehurst. [n.d.]. Time Series Machine Learning Website. www. timeseriesclassification.com. [2] 安东尼·巴纳尔、伊蒙·基奥、杰森·莱恩斯、亚伦·博斯特罗姆、詹姆斯·拉尔奇和马修·米德尔赫斯特。 [无日期]. 时间序列机器学习网站. www. timeseriesclassification.com.
[3] Anthony J. Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn, James Large, Aaron Bostrom, Paul Southam, and Eamonn J. Keogh. 2018. The UEA multivariate time series classification archive, 2018. CoRR abs/1811.00075 (2018). arXiv:1811.00075 http://arxiv.org/abs/1811.00075 [3] 安东尼·J·巴纳尔、黄安达、杰森·莱恩斯、迈克尔·弗林、詹姆斯·拉尔奇、亚伦·博斯特罗姆、保罗·索瑟姆和伊蒙·J·基奥。2018.东安格利亚大学多变量时间序列分类档案库,2018. CoRR abs/1811.00075 (2018). arXiv:1811.00075http://arxiv.org/abs/1811.00075
[4] Stefanos Bennett, Mihai Cucuringu, and Gesine Reinert. 2022. Detection and clustering of lead-lag networks for multivariate time series with an application to financial markets. (2022). [4] 斯蒂芬诺斯·贝内特、米海·库库林古和格西内·莱因尔特。2022. 多变量时间序列中领先-滞后网络的检测与聚类:金融市场应用。(2022)
[5] Aaron Bostrom and Anthony Bagnall. 2017. Binary shapelet transform for multiclass time series classification. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XXXII. Springer, 24-46. [5] 艾伦·博斯特罗姆和安东尼·巴格纳尔. 2017. 二元形状变换在多类时间序列分类中的应用. 发表于《大规模数据与知识驱动系统交易》第32卷. 施普林格出版社, 24-46.
[6] Xiaobin Chang, Tao Xiang, and Timothy M Hospedales. 2018. Scalable and effective deep CCA via soft decorrelation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1488-1497. [6] 查小斌、向涛和蒂莫西·M·霍斯佩达莱斯. 2018. 可扩展且有效的深度 CCA 通过软去相关. 收录于《IEEE 计算机视觉与模式识别会议论文集》. 1488-1497.
[7] Yves Chauvin and David E Rumelhart. 2013. Backpropagation: theory, architectures, and applications. Psychology press. [7] 伊夫·肖文和戴维·E·鲁梅尔哈特. 2013. 反向传播:理论、架构与应用. 心理学出版社.
[8] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research), Vol. 119. PMLR, 1597-1607. http://proceedings.mlr.press/v119/chen20j.html [8] 陈廷,西蒙·科恩布利思,莫罕默德·诺鲁齐,和杰弗里·E·辛顿。2020. 视觉表示对比学习的简单框架。收录于《第 37 届国际机器学习会议论文集》(ICML 2020),2020 年 7 月 13 日至 18 日,线上会议(机器学习研究论文集),第 119 卷。PMLR,第 1597-1607 页。http://proceedings.mlr.press/v119/chen20j.html
[9] Timothy Derrick and Joshua Thomas. 2004. Time series analysis: the crosscorrelation function. (2004). [9] 蒂莫西·德里克和乔舒亚·托马斯. 2004. 时间序列分析:交叉相关函数. (2004).
[10] Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen, Min Wu, Chee Keong Kwoh, Xiaoli Li, and Cuntai Guan. 2021. Time-Series Representation Learning via Temporal and Contextual Contrasting. In Proceedings of the Thirtieth International foint Conference on Artificial Intelligence, IFCAI-21, Zhi-Hua Zhou (Ed.). International Joint Conferences on Artificial Intelligence Organization, 2352-2359. https://doi.org/10.24963/ijcai.2021/324 Main Track. [10] 埃马德尔丁·埃德勒(Emadeldeen Eldele)、穆罕默德·拉加布(Mohamed Ragab)、陈正华(Zhenghua Chen)、吴敏(Min Wu)、郭志强(Chee Keong Kwoh)、李晓丽(Xiaoli Li)和关存泰(Cuntai Guan). 2021. 基于时空对比的时序表示学习. 收录于《第 30 届国际人工智能联合会议论文集》(IFCAI-21),周志华(Zhi-Hua Zhou)主编.国际人工智能联合会议组织,第 2352-2359 页。https://doi.org/10.24963/ijcai.2021/324 主轨 。
[11] Jean-Yves Franceschi, Aymeric Dieuleveut, and Martin Jaggi. 2019. Unsupervised scalable representation learning for multivariate time series. Advances in neural information processing systems 32 (2019). [11] 让-伊夫·弗朗切斯基、艾梅里克·迪乌勒维特和马丁·贾吉。2019. 无监督可扩展表示学习在多变量时间序列中的应用。神经信息处理系统进展 32 (2019).
[12] Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 6894-6910. https://doi.org/10.18653/v1/2021.emnlpmain. 552 [12] 高天宇、姚兴成、陈丹琪. 2021. SimCSE:句嵌入的简单对比学习.收录于《2021 年自然语言处理经验方法会议论文集》(EMNLP 2021),虚拟会议 / 多米尼加共和国蓬塔卡纳,2021 年 11 月 7 日至 11 日,Marie-Francine Moens、Xuanjing Huang、Lucia Specia 和 Scott Wen-tau Yih 编。计算语言学协会,6894-6910.https://doi.org/10.18653/v1/2021.emnlpmain. 552
[13] Josif Grabocka, Nicolas Schilling, Martin Wistuba, and Lars Schmidt-Thieme. 2014. Learning time-series shapelets. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 392-401. [13] 约瑟夫·格拉博卡、尼古拉斯·施林格、马丁·维斯图巴和拉尔斯·施密特-蒂姆。2014. 时间序列形状图的学习。收录于《第 20 届 ACM SIGKDD 国际知识发现与数据挖掘会议论文集》。ACM,第 392-401 页。
[14] Siho Han and Simon S Woo. 2022. Learning Sparse Latent Graph Representations for Anomaly Detection in Multivariate Time Series. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2977-2986. [14] 韩思浩和吴西蒙. 2022. 基于稀疏潜在图表示的多变量时间序列异常检测. 收录于第 28 届 ACM SIGKDD 知识发现与数据挖掘会议论文集. 2977-2986.
[15] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross B. Girshick. 2020. Momentum Contrast for Unsupervised Visual Representation Learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020. Computer Vision Foundation / IEEE, 97269735. https://doi.org/10.1109/CVPR42600.2020.00975 [15] 何凯明, 范浩琪, 吴宇欣, 谢萨宁, 和 Ross B. Girshick. 2020. 动量对比在无监督视觉表示学习中的应用. 2020 年 IEEE/CVF 计算机视觉与模式识别会议 (CVPR 2020), 2020 年 6 月 13 日至 19 日, 西雅图, 华盛顿州, 美国.计算机视觉基金会/IEEE,97269735。https://doi.org/10.1109/CVPR42600.2020.00975
[16] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770-778. [16] 何凯明, 张向宇, 任绍清, 孙建. 2016. 深度残差学习在图像识别中的应用. 2016 年 IEEE 计算机视觉与模式识别会议论文集. 770-778.
[17] Jon Hills, Jason Lines, Edgaras Baranauskas, James Mapp, and Anthony Bagnall. 2014. Classification of time series by shapelet transformation. Data Mining and Knowledge Discovery 28, 4 (2014), 851-881. [17] 乔恩·希尔斯、杰森·莱恩斯、埃德加拉斯·巴拉努斯卡斯、詹姆斯·马普和安东尼·巴格纳尔。2014. 基于形状片变换的时间序列分类。数据挖掘与知识发现 28, 4 (2014), 851-881.
[18] Kyle Hundman, Valentino Constantinou, Christopher Laporte, Ian Colwell, and Tom Söderström. 2018. Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018, Yike Guo and Faisal Farooq (Eds.). ACM, 387-395. https://doi.org/10.1145/3219819.3219845 [18] 凯尔·亨德曼、瓦伦蒂诺·康斯坦蒂努、克里斯托弗·拉波特、伊恩·科尔韦尔和汤姆·索德斯特罗姆。2018.利用 LSTMs 和非参数动态阈值检测航天器异常。收录于《第 24 届 ACM SIGKDD 国际知识发现与数据挖掘会议论文集》(KDD 2018),英国伦敦,2018 年 8 月 19 日至 23 日,由郭一和法伊萨尔·法鲁克(Faisal Farooq)编辑。ACM,第 387-395 页。https://doi.org/10.1145/3219819.3219845
[19] Dino Ienco and Roberto Interdonato. 2020. Deep Multivariate Time Series Embedding Clustering via Attentive-Gated Autoencoder. In Advances in Knowledge Discovery and Data Mining - 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11-14, 2020, Proceedings, Part I (Lecture Notes in Computer Science), Hady W. Lauw, Raymond Chi-Wing Wong, Alexandros Ntoulas, Ee-Peng Lim, See-Kiong Ng, and Sinno Jialin Pan (Eds.), Vol. 12084. Springer, 318-329. https://doi.org/10.1007/978-3-030-47426-3_25 [19] 迪诺·伊恩科和罗伯托·因特多纳托. 2020.基于注意力门控自编码器的深度多变量时间序列嵌入聚类. 收录于《知识发现与数据挖掘进展——第 24 届亚太会议(PAKDD 2020)论文集,第 1 卷》(Lecture Notes in Computer Science 系列),新加坡,2020 年 5 月 11 日至 14 日,主编:Hady W. Lauw,Raymond Chi-Wing Wong、Alexandros Ntoulas、Ee-Peng Lim、See-Kiong Ng 和 Sinno Jialin Pan(编),第 12084 卷。Springer,第 318-329 页。https://doi.org/10.1007/978-3-030-47426-3_25
[20] Arundo Analytics Inc. 2019. tsaug: An open-source package for time series data augmentation. Retrieved January 1, 2023 from https://tsaug.readthedocs.io/en/ stable/references.html [20] Arundo Analytics Inc. 2019. tsaug:一个用于时间序列数据增强的开源包。2023 年 1 月 1 日检索自 https://tsaug.readthedocs.io/en/stable/references.html
[21] Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015 (FMLR Workshop and Conference Proceedings), Francis R. Bach and David M. Blei (Eds.), Vol. 37. JMLR.org, 448-456. http://proceedings.mlr.press/ v37/ioffe15.html [21] 谢尔盖·伊奥夫和克里斯蒂安·塞格迪。2015. 批量归一化:通过减少内部协变量漂移加速深度网络训练。收录于《第 32 届国际机器学习会议论文集》(ICML 2015),法国里尔,2015 年 7 月 6 日至 11 日(FMLR 研讨会及会议论文集),弗朗西斯·R·巴赫和戴维 ·M·布莱伊(编),第 37 卷。JMLR.org,第 448-456 页 。http://proceedings.mlr.press/v37/ioffe15.html
[22] Brian Kenji Iwana and Seiichi Uchida. 2021. An empirical survey of data augmentation for time series classification with neural networks. Plos one 16, 7 (2021), e0254841. [22] 布莱恩·肯吉·伊瓦纳和内田清一. 2021. 基于神经网络的时间序列分类数据增强的实证研究. 《PLOS ONE》16, 7 (2021), e0254841.
[23] Fazle Karim, Somshubra Majumdar, Houshang Darabi, and Samuel Harford. 2019. Multivariate LSTM-FCNs for time series classification. Neural Networks 116 (2019), 237-245. https://doi.org/10.1016/j.neunet.2019.04.014 [23 法兹勒·卡里姆、索姆舒布拉·马祖达尔、侯尚·达拉比和塞缪尔·哈福德。2019. 多变量 LSTM-FCN 在时间序列分类中的应用。神经网络 116 (2019), 237-245. https://doi.org/10.1016/j.neunet.2019.04.014
[24] Eunji Kim, Sungzoon Cho, Byeongeon Lee, and Myoungsu Cho. 2019. Fault detection and diagnosis using self-attentive convolutional neural networks for variable-length sensor data in semiconductor manufacturing. IEEE Transactions on Semiconductor Manufacturing 32, 3 (2019), 302-309. [24] 金恩智、赵成俊、李秉恩和赵明洙. 2019. 基于自注意卷积神经网络的半导体制造中可变长传感器数据的故障检测与诊断. 《IEEE 半导体制造学报》32, 3 (2019), 302-309.
[25] Guozhong Li, Byron Choi, Jianliang Xu, Sourav S. Bhowmick, Kwok-Pan Chun, and Grace Lai-Hung Wong. 2021. ShapeNet: A Shapelet-Neural Network Approach for Multivariate Time Series Classification. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press, 8375-8383. https://ojs.aaai.org/index.php/AAAI/article/view/ 17018 [25] 李国中、蔡伯恩、徐建良、苏拉夫·S·鲍米克、陈国潘和黄丽红。2021. ShapeNet:一种基于形状片段神经网络的多变量时间序列分类方法。第 35 届人工智能协会年会(AAAI 2021)、第 33 届人工智能创新应用会议(IAAI 2021)、第 11 届人工智能教育进展研讨会(EAAI 2021),虚拟会议,2021 年 2 月 2 日至 9 日。AAAI 出版社,第 8375-8383 页。https://ojs.aaai.org/index.php/AAAI/article/view/17018
[26] Hailin Li. 2019. Multivariate time series clustering based on common principal component analysis. Neurocomputing 349 (2019), 239-247. https://doi.org/10. 1016/j.neucom.2019.03.060 [26] 李海林. 2019. 基于共同主成分分析的多变量时间序列聚类. 神经计算 349 (2019), 239-247.https://doi.org/10. 1016/j.neucom.2019.03.060
[27] Junnan Li, Pan Zhou, Caiming Xiong, and Steven C. H. Hoi. 2021. Prototypical Contrastive Learning of Unsupervised Representations. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https://openreview.net/forum?id=KmykpuSrjcq [27] 李俊南、周潘、熊才明和史蒂文·C·H·霍伊。2021. 无监督表示的原型对比学习。第 9 届国际学习表示会议(ICLR 2021),虚拟会议,奥地利,2021 年 5 月 3 日至 7 日。OpenReview.net.https://openreview.net/forum?id=KmykpuSrjcq
[28] Zhihan Li, Youjian Zhao, Jiaqi Han, Ya Su, Rui Jiao, Xidao Wen, and Dan Pei. 2021. Multivariate Time Series Anomaly Detection and Interpretation using Hierarchical Inter-Metric and Temporal Embedding. In KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14-18, 2021, Feida Zhu, Beng Chin Ooi, and Chunyan Miao (Eds.). ACM, 3220-3230. https://doi.org/10.1145/3447548.3467075 [28] 李志涵, 赵有健, 韩家琪, 苏雅, 焦瑞, 文西道, 及裴丹. 2021. 基于分层指标间比较与时序嵌入的多变量时间序列异常检测与解释.收录于《KDD '21:第 27 届 ACM SIGKDD 知识发现与数据挖掘会议论文集》,虚拟会议,新加坡,2021 年 8 月 14 日至 18 日,朱飞达、Ooi Beng Chin、Miao Chunyan(编)。ACM,3220-3230。https://doi.org/10.1145/3447548.3467075
[29] Zhaowen Li, Yousong Zhu, Fan Yang, Wei Li, Chaoyang Zhao, Yingying Chen, Zhiyang Chen, Jiahao Xie, Liwei Wu, Rui Zhao, et al. 2022. Univip: A unified framework for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14627-14636. [29] 李兆文, 朱有松, 杨帆, 李伟, 赵朝阳, 陈颖颖, 陈志阳, 谢家豪, 吴立伟, 赵瑞, 等. 2022. Univip: 统一的自监督视觉预训练框架.收录于《IEEE/CVF 计算机视觉与模式识别会议论文集》. 14627-14636.
[30] Zhiyu Liang and Hongzhi Wang. 2021. Efficient class-specific shapelets learning for interpretable time series classification. Information Sciences 570 (2021), 428450. [30] 梁志宇和王红志. 2021. 基于可解释时间序列分类的高效类特异性 Shapelet 学习. 信息科学 570 (2021), 428450.
[31] Jason Lines, Sarah Taylor, and Anthony Bagnall. 2018. Time series classification with HIVE-COTE: The hierarchical vote collective of transformation-based ensembles. ACM transactions on knowledge discovery from data 12, 5 (2018). [31] 杰森·莱恩斯、莎拉·泰勒和安东尼·巴格纳尔。2018. 基于 HIVE-COTE 的时间序列分类:基于变换的集成模型的分层投票集合。ACM 知识发现与数据挖掘交易 12, 5 (2018).
[32] Shizhan Liu, Hang Yu, Cong Liao, Jianguo Li, Weiyao Lin, Alex X. Liu, and Schahram Dustdar. 2022. Pyraformer: Low-Complexity Pyramidal Attention for Long-Range Time Series Modeling and Forecasting. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net. https://openreview.net/forum?id=0EXmFzUn5I [32] 刘志坚, 宇航, 廖聪, 李江国, 林伟耀, 刘亚历克斯, 和 沙赫拉姆·杜斯达尔. 2022. Pyraformer: 低复杂度金字塔注意力机制在长时序建模与预测中的应用.收录于《第十届国际学习表示会议》(ICLR 2022),虚拟会议,2022 年 4 月 25 日至 29 日。OpenReview.net.https://openreview.net/forum?id=0EXmFzUn5I
[33] Qianli Ma, Wanqing Zhuang, and Garrison Cottrell. 2019. Triple-shapelet networks for time series classification. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 1246-1251. [33] 马千利、庄万清和加里森·科特雷尔. 2019. 三重形状网络在时间序列分类中的应用. 2019 年 IEEE 国际数据挖掘会议(ICDM). IEEE, 1246-1251.
[34] Qianli Ma, Wanqing Zhuang, Sen Li, Desen Huang, and Garrison Cottrell. 2020. Adversarial dynamic shapelet networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 5069-5076. [34] 马千利,庄万清,李森,黄德森,及加里森·科特雷尔. 2020. 对抗性动态形状网络. 收录于《人工智能协会年会论文集》,第34卷,第5069-5076页.
[35] Karl Øyvind Mikalsen, Filippo Maria Bianchi, Cristina Soguero-Ruíz, and Robert Jenssen. 2018. Time series cluster kernel for learning similarities between multivariate time series with missing data. Pattern Recognit. 76 (2018), 569-581. https://doi.org/10.1016/j.patcog.2017.11.030 [35] 卡尔·奥伊文·米卡尔森(Karl Øyvind Mikalsen)、菲利普·马里亚·比安奇(Filippo Maria Bianchi)、克里斯蒂娜·索格罗-鲁伊斯(Cristina Soguero-Ruíz)和罗伯特·延森(Robert Jenssen). 2018.缺失数据的多变量时间序列相似性学习的时间序列聚类核. 模式识别. 76 (2018), 569-581.https://doi.org/10.1016/j.patcog.2017.11.030
[36] Tomás Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1301.3781 [36] 托马斯·米科洛夫、凯·陈、格雷格·科拉多和杰弗里·迪恩。2013。向量空间中词表示的高效估计. 收录于《第 1 届学习表示国际会议论文集》(ICLR 2013),美国亚利桑那州斯科茨代尔,2013 年 5 月 2 日至 4 日,研讨会论文集,约书亚·本吉奥和扬·勒丘恩(编). http://arxiv.org/abs/1301.3781
[37] Christoph Molnar. 2022. Interpretable Machine Learning (2 ed.). https: //christophm.github.io/interpretable-ml-book [37] 克里斯托弗 ·莫尔纳尔. 2022. 可解释机器学习(第 2 版). https: //christophm.github.io/interpretable-ml-book
[38] Abdullah Mueen, Eamonn Keogh, and Neal Young. 2011. Logical-shapelets: an expressive primitive for time series classification. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1154-1162. [38] 阿卜杜拉·穆恩、伊蒙·基奥和尼尔·杨。2011. 逻辑形状:时间序列分类的表达性基本单元。收录于《第 17 届 ACM SIGKDD 国际知识发现与数据挖掘会议论文集》。ACM,第 1154-1162 页。
[39] Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018). [39] 艾伦·范登奥德(Aaron van den Oord)、李亚哲(Yazhe Li)和奥里奥尔·维尼亚尔斯(Oriol Vinyals). 2018. 基于对比预测编码的表示学习. arXiv 预印本 arXiv:1807.03748 (2018).
[40] Mert Ozer, Anna Sapienza, Andrés Abeliuk, Goran Muric, and Emilio Ferrara. 2020. Discovering patterns of online popularity from time series. Expert Syst. Appl. 151 (2020), 113337. https://doi.org/10.1016/j.eswa.2020.113337 [40 梅尔特·奥泽尔、安娜·萨皮恩扎、安德烈斯·阿贝利乌克、戈兰·穆里奇和埃米利奥·费拉拉。2020. 通过时间序列发现在线流行模式。专家系统应用,151 (2020),113337。https://doi.org/10.1016/j.eswa.2020.113337
[41] Jiezhong Qiu, Qibin Chen, Yuxiao Dong, Jing Zhang, Hongxia Yang, Ming Ding, Kuansan Wang, and Jie Tang. 2020. GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training. In KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020, Rajesh Gupta, Yan Liu, Jiliang Tang, and B. Aditya Prakash (Eds.). ACM, 1150-1160. https://doi.org/10.1145/3394486.3403168 [41] 邱杰中, 陈启斌, 董玉晓, 张静, 杨红霞, 丁明, 王冠山, 和 唐杰. 2020. GCC: 图对比编码用于图神经网络预训练.收录于《KDD '20:第 26 届 ACM SIGKDD 知识发现与数据挖掘会议论文集》,虚拟会议,美国加利福尼亚州,2020 年 8 月 23 日至 27 日,Rajesh Gupta、Yan Liu、Jiliang Tang 和 B. Aditya Prakash(编)。ACM,第 1150-1160 页。https://doi.org/10.1145/3394486.3403168
[42] Ann Riley and Elvira Nica. 2021. Internet of things-based smart healthcare systems and wireless biomedical sensing devices in monitoring, detection, and prevention of COVID-19. American Journal of Medical Research 8, 2 (2021), 51-64. [42] 安·莱利和埃尔维拉·尼卡. 2021. 基于物联网的智能医疗系统与无线生物医学传感设备在 COVID-19 监测、检测与预防中的应用. 《美国医学研究杂志》8, 2 (2021), 51-64.
[43] Saeid Sanei and Jonathon A Chambers. 2013. EEG signal processing. John Wiley & Sons. [43] 赛伊德·萨内伊和乔纳森·A·钱伯斯. 2013. 脑电图信号处理. 约翰威利父子出版社.
[44] Ya Su, Youjian Zhao, Chenhao Niu, Rong Liu, Wei Sun, and Dan Pei. 2019. Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019, Ankur Teredesai, Vipin Kumar, Ying Li, Rómer Rosales, Evimaria Terzi, and George Karypis (Eds.). ACM, 2828-2837. https://doi.org/10.1145/3292500. 3330672 [44] 闫苏,赵有健,牛晨浩,刘荣,孙伟,和裴丹. 2019. 基于随机循环神经网络的多变量时间序列鲁棒异常检测.收录于《第 25 届 ACM SIGKDD 国际知识发现与数据挖掘会议论文集》(KDD 2019),阿拉斯加州安克雷奇,美国,2019 年 8 月 4 日至 8 日,Ankur Teredesai、Vipin Kumar、Ying Li、Rómer Rosales、Evimaria Terzi 和 George Karypis(编)。ACM,第 2828-2837 页。https://doi.org/10.1145/3292500。3330672
[45] Wensi Tang, Guodong Long, Lu Liu, Tianyi Zhou, Michael Blumenstein, and Jing Jiang. 2022. Omni-Scale CNNs: a simple and effective kernel size configuration for time series classification. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net. https://openreview.net/forum?id=PDYs7Z2XFGv [45] 唐文思, 龙国栋, 刘璐, 周天一, Michael Blumenstein, 和 江静. 2022.全尺度卷积神经网络:时间序列分类中简单有效的卷积核大小配置. 收录于《第十届国际学习表示会议》(ICLR 2022),虚拟会议,2022 年 4 月 25 日至 29 日.OpenReview.net.https://openreview.net/forum?id=PDYs7Z2XFGv
[46] Sana Tonekaboni, Danny Eytan, and Anna Goldenberg. 2021. Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding. In 9 th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https://openreview.net/forum? id=8qDwejCuCN [46]萨娜·托内卡博尼(Sana Tonekaboni)、丹尼·埃坦(Danny Eytan)和安娜·戈德伯格(Anna Goldenberg). 2021. 基于时序邻域编码的无监督表示学习. 发表于第 9 届国际学习表示会议(ICLR 2021),虚拟会议,奥地利,2021 年 5 月 3 日至 7 日. OpenReview.net. https://openreview.net/forum? id=8qDwejCuCN
[47] Liudmila Ulanova, Nurjahan Begum, and Eamonn J. Keogh. 2015. Scalable Clustering of Time Series with U-Shapelets. In Proceedings of the 2015 SIAM International Conference on Data Mining, Vancouver, BC, Canada, April 30 - May 2, 2015, Suresh Venkatasubramanian and Jieping Ye (Eds.). SIAM, 900-908. https: //doi.org/10.1137/1.9781611974010.101 [47] 吕德米拉·乌拉诺娃、努尔贾汉·贝古姆和伊蒙·J·基奥。2015.基于 U-Shapelets 的时序数据可扩展聚类. 收录于《2015 年 SIAM 国际数据挖掘会议论文集》,加拿大不列颠哥伦比亚省温哥华,2015 年 4 月 30 日至 5 月 2 日,苏雷什·文卡塔苏布拉曼尼亚和叶杰平(编). SIAM,第 900-908 页.https: //doi.org/10.1137/1.9781611974010.101
[48] Terry Taewoong Um, Franz Michael Josef Pfister, Daniel Pichler, Satoshi Endo, Muriel Lang, Sandra Hirche, Urban Fietzek, and Dana Kulic. 2017. Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, ICMI 2017, Glasgow, United Kingdom, November 13-17, 2017, Edward Lank, Alessandro Vinciarelli, Eve E. Hoggan, Sriram Subramanian, and Stephen A. Brewster (Eds.). ACM, 216-220. https://doi.org/10.1145/3136755.3136817 [48] テリー・テウォン・ウム、フランツ・ミヒャエル・ヨゼフ・ピフスター、ダニエル・ピヒラー、サトシ・エンドウ、ミュリエル・ラング、サンドラ・ヒルチェ、アーバン・フィッツェク、およびダナ・クリッチ。2017. 卷积神经网络在帕金森病监测中对可穿戴传感器数据的增强。收录于《第 19 届 ACM 国际多模态交互会议论文集》(ICMI 2017),英国格拉斯哥,2017 年 11 月 13 日至 17 日,爱德华·兰克、亚历山德罗·文卡雷利、伊芙·霍根、斯里拉姆·苏布拉马尼安和斯蒂芬·A·布鲁斯特(编)。ACM,第 216-220 页。https://doi.org/10.1145/3136755.3136817
[49] Aäron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew W. Senior, and Koray Kavukcuoglu. 2016. WaveNet: A Generative Model for Raw Audio. CoRR abs/1609.03499 (2016). arXiv:1609.03499 http://arxiv.org/abs/1609.03499 [49] 阿伦·范登奥德(Aäron van den Oord)、桑德·迪勒曼(Sander Dieleman)、海加·ゼン(Heiga Zen)、凯伦·西蒙扬(Karen Simonyan)、奥里奥尔·维尼亚尔斯(Oriol Vinyals)、亚历克斯·格雷夫斯(Alex Graves)、纳尔·卡尔奇布伦纳(Nal Kalchbrenner)、安德鲁·W·塞尼尔(Andrew W. Senior)和科雷·卡武库楚奥卢(Koray Kavukcuoglu)。201WaveNet:一种原始音频的生成模型。CoRR abs/1609.03499 (2016). arXiv:1609.03499http://arxiv.org/abs/1609.03499
[50] Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research 9, 86 (2008), 2579-2605. http: //jmlr.org/papers/v9/vandermaaten08a.html [50] Laurens van der Maaten 和 Geoffrey Hinton. 2008. 基于 t-SNE 的数据可视化. 机器学习研究期刊 9, 86 (2008), 2579-2605. http://jmlr.org/papers/v9/vandermaaten08a.html
[51] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 49, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998-6008. https://proceedings.neurips.cc/paper/2017/hash/ 3f5ee243547dee91fbd053c1c4a845aa-Abstract.html [51] 阿什 ish 瓦斯瓦尼, 诺姆 沙泽尔, 尼基 帕尔马, 雅各布 乌兹科雷特, 利昂 琼斯, 艾丹 N. 戈麦斯, 卢卡什 凯撒, 和 伊利亚 波洛苏金. 2017.注意力就是一切。收录于《神经信息处理系统进展》第 30 卷:2017 年神经信息处理系统年会论文集,2017 年 12 月 49 日,美国加利福尼亚州长滩,伊莎贝尔·盖永、乌尔里克·冯·卢克斯堡、萨米·本吉奥、汉娜·M·沃拉奇、罗布·费格斯、S.V.N.维什瓦纳坦和罗曼·加内特(编)。5998-6008. https://proceedings.neurips.cc/paper/2017/hash/ 3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
[52] Qiao Xiao, Boqian Wu, Yu Zhang, Shiwei Liu, Mykola Pechenizkiy, Elena Mocanu, and Decebal Constantin Mocanu. 2022. Dynamic Sparse Network for Time [52] 乔晓,吴博谦,张宇,刘世伟,米科拉·佩切尼茨基,埃琳娜·莫卡努,和德切巴尔·康斯坦丁·莫卡努。2022. 时间动态稀疏网络。
Series Classification: Learning What to “see”. CoRR abs/2212.09840 (2022). https: //doi.org/10.48550/arXiv.2212.09840 arXiv:2212.09840 系列分类:学习“识别”什么。 CoRR abs/2212.09840 (2022). https://doi.org/10.48550/arXiv.2212.09840arXiv:2212.09840
[53] Chang Xu, Dacheng Tao, and Chao Xu. 2013. A Survey on Multi-view Learning. CoRR abs/1304.5634 (2013). arXiv:1304.5634 http://arxiv.org/abs/1304.5634 [53] 张旭,陶大成,徐超. 2013. 多视图学习综述. CoRR abs/1304.5634 (2013). arXiv:1304.5634http://arxiv.org/abs/1304.5634
[54] Akihiro Yamaguchi, Ken Ueo, and Hisashi Kashima. 2022. Learning Evolvable Time-series Shapelets. In 38th IEEE International Conference on Data Engineering, ICDE 2022, Kuala Lumpur, Malaysia, May 9-12, 2022. IEEE, 793-805. https: //doi.org/10.1109/ICDE53745.2022.00064 [54] 山口明弘、上尾健、樫村久。2022. 可进化时间序列形状特征的学习。第 38 届 IEEE 国际数据工程会议(ICDE 2022),马来西亚吉隆坡,2022 年 5 月 9 日至 12 日。IEEE,793-805。https: //doi.org/10.1109/ICDE53745.2022.00064
[55] Jinyu Yang, Jiali Duan, Son Tran, Yi Xu, Sampath Chanda, Liqun Chen, Belinda Zeng, Trishul Chilimbi, and Junzhou Huang. 2022. Vision-language pre-training with triple contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15671-15680. [55] 杨金宇,段佳丽,孙 Tran,徐一,桑帕特·钱达,陈丽群,曾贝琳达,特里舒尔·奇利姆比,和黄俊洲。2022. 基于三元对比学习的视觉-语言预训练。收录于《IEEE/CVF 计算机视觉与模式识别会议论文集》。15671-15680.
[56] Ling Yang and Shenda Hong. 2022. Unsupervised time-series representation learning with iterative bilinear temporal-spectral fusion. In International Conference on Machine Learning. PMLR, 25038-25054. [56] 楊玲與洪申達. 2022. 無監督時間序列表示學習之迭代雙線性時空融合方法. 機器學習國際會議. PMLR, 25038-25054.
[57] Junchen Ye, Zihan Liu, Bowen Du, Leilei Sun, Weimiao Li, Yanjie Fu, and Hui Xiong. 2022. Learning the evolutionary and multi-scale graph structure for multivariate time series forecasting. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2296-2306. [57] 叶俊臣, 刘子涵, 杜博文, 孙蕾蕾, 李伟苗, 傅彦杰, 熊辉. 2022. 多变量时间序列预测中的进化与多尺度图结构学习. 28th ACM SIGKDD 知识发现与数据挖掘会议论文集. 2296-2306.
[58] Lexiang Ye and Eamonn Keogh. 2011. Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data mining and knowledge discovery 22, 1-2 (2011), 149-182. [58] 叶乐翔和伊蒙·基奥。2011. 时间序列形状片:一种新型技术,可实现准确、可解释且快速的分类。数据挖掘与知识发现 22, 1-2 (2011), 149-182.
[59] Zhihan Yue, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yunhai Tong, and Bixiong Xu. 2022. Ts2vec: Towards universal representation of time series. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 8980-8987. [59] 约翰·叶,王玉静,段俊勇,杨天明,黄从瑞,童云海,徐碧雄. 2022. Ts2vec:时间序列的通用表示方法. 人工智能协会年会论文集,第 36 卷,第 8980-8987 页.
[60] George Zerveas, Srideepika Jayaraman, Dhaval Patel, Anuradha Bhamidipaty, and Carsten Eickhoff. 2021. A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2114-2124. [60] 乔治·泽尔维亚斯(George Zerveas)、斯里迪皮卡·贾亚拉曼(Srideepika Jayaraman)、德瓦尔·帕特尔(Dhaval Patel)、阿努拉达·巴米迪帕蒂(Anuradha Bhamidipaty)和卡斯滕·艾克霍夫(Carsten Eickhoff). 2021. 基于变压器的多变量时间序列表示学习框架. 收录于《第 27 届 ACM SIGKDD 知识发现与数据挖掘会议论文集》. 2114-2124.
[61] Nan Zhang and Shiliang Sun. 2022. Multiview Unsupervised Shapelet Learning for Multivariate Time Series Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022), 1-16. https://doi.org/10.1109/TPAMI.2022.3198411 [61] 张南和孙世良. 2022. 多视图无监督 Shapelet 学习在多变量时间序列聚类中的应用. 《IEEE 模式分析与机器智能汇刊》 (2022), 1-16.https://doi.org/10.1109/TPAMI.2022.3198411
[62] Qin Zhang, Jia Wu, Peng Zhang, Guodong Long, and Chengqi Zhang. 2019. Salient Subsequence Learning for Time Series Clustering. IEEE Trans. Pattern Anal. Mach. Intell. 41, 9 (2019), 2193-2207. https://doi.org/10.1109/TPAMI.2018. 2847699 [62] 秦张, 贾武, 张鹏, 龙国栋, 和 张成奇. 2019. 时间序列聚类中的显著子序列学习. 模式分析与机器智能汇刊. 41, 9 (2019), 2193-2207.https://doi.org/10.1109/TPAMI.2018. 2847699
[63] Xuchao Zhang, Yifeng Gao, Jessica Lin, and Chang-Tien Lu. 2020. TapNet: Multivariate Time Series Classification with Attentional Prototypical Network. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 6845-6852. https://ojs.aaai.org/index.php/AAAI/article/view/6165 [63] 张旭超, 高一峰, 杰西卡·林, 和 卢长田. 2020. TapNet: 基于注意力原型网络的多变量时间序列分类.收录于《第 34 届人工智能协会年会(AAAI 2020)、第 32 届人工智能创新应用大会(IAAI 2020)、第 10 届人工智能教育进展研讨会(EAAI 2020)》,2020 年 2 月 7 日至 12 日,美国纽约州纽约市。AAAI 出版社,第 6845-6852 页。https://ojs.aaai.org/index.php/AAAI/article/view/6165
[64] Xiang Zhang, Ziyuan Zhao, Theodoros Tsiligkaridis, and Marinka Zitnik. 2022. Self-Supervised Contrastive Pre-Training For Time Series via Time-Frequency Consistency. CoRR\operatorname{CoRR} abs/2206.08496 (2022). https://doi.org/10.48550/arXiv.2206. 08496 arXiv:2206.08496 [64] 向张, 赵子元, 塞奥多罗斯·齐利加里迪斯, 和 马林卡·齐特尼克. 2022.基于时频一致性的自监督对比预训练在时间序列中的应用。 CoRR\operatorname{CoRR} abs/2206.08496 (2022).https://doi.org/10.48550/arXiv.2206.08496 arXiv:2206.08496
[65] Yanzhao Zhang, Richong Zhang, Samuel Mensah, Xudong Liu, and Yongyi Mao. 2022. Unsupervised sentence representation via contrastive learning with mixing negatives. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 11730-11738. [65] 张延超, 张日崇, 塞缪尔·门萨, 刘旭东, 毛永义. 2022. 基于混合负样本的对比学习实现无监督句子表示. 人工智能协会年会论文集, 第36卷. 11730-11738.
[66] Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. 2022. FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting. In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.), Vol. 162. PMLR, 27268-27286. [66] 田周,马子青,文清松,王雪,孙亮,金荣. 2022. FEDformer:基于频率增强分解的变压器模型用于长期序列预测.收录于《第 39 届国际机器学习会议论文集》(机器学习研究论文集),Kamalika Chaudhuri、Stefanie Jegelka、Le Song、Csaba Szepesvari、Gang Niu 和 Sivan Sabato 编,第 162 卷。PMLR,第 27268-27286 页。
*Corresponding author. *通讯作者。
This work is licensed under the Creative Commons BY-NC-ND 4.0 International License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of this license. For any use beyond those covered by this license, obtain permission by emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights licensed to the VLDB Endowment. 本作品采用知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议(Creative Commons BY-NC-ND 4.0 International License)授权。如需查看该许可协议的副本,请访问 https://creativecommons.org/licenses/by-nc-nd/4.0/。若需超出本许可协议范围的使用,请通过发送邮件至 info@vldb.org 获取授权。版权由作品所有者/作者持有。出版权已授权给 VLDB 基金会。
Proceedings of the VLDB Endowment, Vol. xx, No. xx ISSN 2150-8097. doi:xx.xxx/xx.xxx 《VLDB 基金会会议论文集》,第 xx 卷,第 xx 期。ISSN 2150-8097。doi:xx.xxx/xx.xxx