DUET: Dual Clustering Enhanced Multivariate Time Series Forecasting DUET:双聚类增强多变量时间序列预测
Xiangfei Qiu 邱湘飞East China Normal University ^(1){ }^{1} 华东师范大学 ^(1){ }^{1}Shanghai, China 中国上海xfqiu@stu.ecnu.edu.cn
Xingjian Wu 徐行健East China Normal University ^(1){ }^{1} 华东师范大学 ^(1){ }^{1}Shanghai, China 中国上海xjwu@stu.ecnu.edu.cn
Yan Lin 严琳Aalborg University 阿尔堡大学Aalborg, Denmark 丹麦,阿尔堡lyan@cs.aau.dk
Chenjuan Guo 郭晨娟East China Normal University ^(1){ }^{1} 华东师范大学 ^(1){ }^{1}Shanghai, China 上海,中国cjguo@dase.ecnu.edu.cn
Jilin Hu ⊠\boxtimes 吉林胡同 ⊠\boxtimesEast China Normal University ^(1,2){ }^{1,2} 华东师范大学 ^(1,2){ }^{1,2}Shanghai, China 上海,中国jlhu@dase.ecnu.edu.cn
Bin YangEast China Normal University ^(1){ }^{1} 华东师范大学 ^(1){ }^{1}Shanghai, China 中国上海byang@dase.ecnu.edu.cn
Abstract 摘要
Multivariate time series forecasting is crucial for various applications, such as financial investment, energy management, weather forecasting, and traffic optimization. However, accurate forecasting is challenging due to two main factors. First, real-world time series often show heterogeneous temporal patterns caused by distribution shifts over time. Second, correlations among channels are complex and intertwined, making it hard to model the interactions among channels precisely and flexibly. 多变量时间序列预测对于金融投资、能源管理、天气预报和交通优化等多种应用至关重要。然而,由于两个主要因素,准确的预测具有挑战性。首先,现实世界中的时间序列通常表现出异构的时序模式,这是由于时间推移中的分布变化所导致的。其次,各通道之间的相关性复杂且相互交织,使得精确和灵活地建模通道之间的交互变得困难。
In this study, we address these challenges by proposing a general framework called DUET, which introduces DUal clustering on the temporal and channel dimensions to Enhance multivariate Time series forecasting. First, we design a Temporal Clustering Module (TCM) that clusters time series into fine-grained distributions to handle heterogeneous temporal patterns. For different distribution clusters, we design various pattern extractors to capture their intrinsic temporal patterns, thus modeling the heterogeneity. Second, we introduce a novel Channel-Soft-Clustering strategy and design a Channel Clustering Module (CCM), which captures the relationships among channels in the frequency domain through metric learning and applies sparsification to mitigate the adverse effects of noisy channels. Finally, DUET combines TCM and CCM to incorporate both the temporal and channel dimensions. Extensive experiments on 25 real-world datasets from 10 application domains, demonstrate the state-of-the-art performance of DUET. 在本研究中,我们通过提出一个名为 DUET 的通用框架来解决这些挑战,该框架在时间和信道维度上引入了 DUal 聚类以 Enhance 多变量时间序列预测。首先,我们设计了一个时间聚类模块(TCM),将时间序列聚类为细粒度分布,以处理异构时间模式。对于不同的分布簇,我们设计了各种模式提取器来捕获其内在时间模式,从而对异构性进行建模。其次,我们引入了一种新颖的信道-软聚类策略,并设计了一个信道聚类模块(CCM),该模块通过度量学习捕获频域中信道之间的关系,并应用稀疏化来减轻噪声信道的不利影响。最后,DUET 结合 TCM 和 CCM,以整合时间和信道维度。在来自 10 个应用领域的 25 个真实世界数据集上的大量实验表明,DUET 具有最先进的性能。
multivariate time series; forecasting; dual-clustering 多变量时间序列;预测;双重聚类
ACM Reference Format: ACM 参考格式:
Xiangfei Qiu, Xingjian Wu, Yan Lin, Chenjuan Guo, Jilin Hu ⊠\boxtimes, and Bin Yang. 2025. Dr. DUET: Dual Clustering Enhanced Multivariate Time Series Forecasting. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1 (KDD '25), August 3-7, 2025, Toronto, ON, Canada. ACM, New York, NY, USA, 14 pages. https://doi.org/10.1145/3690624.3709325 向飞,吴兴健,林岩,郭晨娟,胡继林 ⊠\boxtimes ,以及杨斌。2025 年。《DUET:双重聚类增强多元时间序列预测》。在《第 31 届 ACM SIGKDD 知识发现与数据挖掘会议论文集》第 1 卷(KDD '25)中,2025 年 8 月 3 日至 7 日,加拿大多伦多,安大略省。ACM,纽约,纽约,美国,14 页。https://doi.org/10.1145/3690624.3709325
1 Introduction 1 引言
Figure 1: A Non-stationary time series with three intervals A , B\mathbf{B}, and C\mathbf{C}, exhibiting varying value distributions ( P_(A)!=P_(B)!=P_{A} \neq P_{B} \neqP_(C)P_{C} ) and temporal patterns. 图 1:一个非平稳时间序列,包含三个区间 A、 B\mathbf{B} 和 C\mathbf{C} ,表现出不同的值分布( P_(A)!=P_(B)!=P_{A} \neq P_{B} \neqP_(C)P_{C} )和时间模式。
Multivariate time series is a type of time series that organizes timestamps chronologically and involves multiple channels (a.k.a., variables) at each timestamp [22, 23, 29, 59, 62, 64]. In recent years, multivariate time series analysis has seen remarkable progress, with key tasks such as anomaly detection [25, 43, 45, 63, 73, 79], classification [6, 81], and imputation [18, 65-67], among others [24, 28, 80, 82], gaining attention. Among these, multivariate time series forecasting (MTSF) [9, 27, 70, 84, 85, 89, 91] stands out as a critical and widely studied task. It has been extensively applied in diverse domains, including economics [30, 57], traffic [11, 17, 32, 74, 75, 77, 78], energy [20, 60, 68], and AIOps [5, 8, 37, 40, 53], highlighting its importance and impact. 多元时间序列是一种按时间顺序组织时间戳,并在每个时间戳涉及多个通道(即变量)的时间序列[22, 23, 29, 59, 62, 64]。近年来,多元时间序列分析取得了显著进展,其中异常检测[25, 43, 45, 63, 73, 79]、分类[6, 81]和插补[18, 65-67]等关键任务受到关注。在这些任务中,多元时间序列预测(MTSF)[9, 27, 70, 84, 85, 89, 91]作为一项关键且广泛研究的任务脱颖而出。它已广泛应用于多个领域,包括经济[30, 57]、交通[11, 17, 32, 74, 75, 77, 78]、能源[20, 60, 68]和 AIOps[5, 8, 37, 40, 53],突显了其重要性和影响。
Figure 2: Channel strategies. Different colors represent different channels, with squares representing features before processing with various channel strategies, and squares with rounded corners representing features after processing. 图 2:通道策略。不同颜色代表不同通道,方形代表使用各种通道策略处理前的特征,带圆角的方形代表处理后的特征。
Building an MTSF method typically involves modeling correlations on the temporal and channel dimensions. However, real-world time series often exhibit heterogeneous temporal patterns caused by the shifting of distribution over time, a phenomenon called Temporal Distribution Shift (TDS) [4, 15]. Additionally, the correlation among multiple channels is complex and intertwined [56]. Therefore, developing a method that can effectively extract heterogeneous temporal patterns and channel dependencies is essential yet challenging. Specifically, achieving these goals is hindered by two major challenges. 构建 MTSF 方法通常涉及在时间和通道维度上建模相关性。然而,现实世界中的时间序列常常表现出异构时间模式,这是由于分布随时间推移而变化所引起的现象,称为时间分布偏移(TDS)[4, 15]。此外,多个通道之间的相关性复杂且相互交织[56]。因此,开发一种能够有效提取异构时间模式和通道依赖性的方法是至关重要且具有挑战性的。具体而言,实现这些目标受到两大主要挑战的阻碍。
Researchers have explored various strategies to manage multiple channels, including 1) treating each channel independently [46, 52] (Channel-Independent, CI), 2) assuming each channel correlates with all other channels [88] (Channel-Dependent, CD), and 3) grouping channels into clusters [44] (Channel-Hard-Clustering, CHC). 研究人员已经探索了多种管理多个信道的策略,包括 1) 独立处理每个信道[46, 52](信道独立,CI),2) 假设每个信道与其他所有信道相关[88](信道依赖,CD),以及 3) 将信道分组为簇[44](信道硬聚类,CHC)。
Figure 3: Performance of DUET. Results (MSE) are averaged from all forecasting horizons. DUET outperforms strong baselines in 10\mathbf{1 0} commonly used datasets. 图 3:DUET 的性能。结果(均方误差)来自所有预测范围的平均值。DUET 在 10\mathbf{1 0} 常用的数据集上优于强基线。
Figure 2 illustrates these three strategies. CI imposes the constraint of using the same model across different channels. While it offers robustness [52], it overlooks potential interactions among channels and can be limited in generalizability and capacity for unseen channels [21,56]. CD, on the other hand, considers all channels simultaneously and generates joint representations for decoding [88], but may be susceptible to noise from irrelevant channels, reducing the model’s robustness. CHC partitions multivariate time series into disjoint clusters through hard clustering, applying CD modeling methods within each cluster and CI methods among clusters [44]. However, this approach only considers relationships within the same cluster, limiting flexibility and versatility. In conclusion, there is yet an approach to model the complex interactions among channels precisely and flexibly. 图 2 展示了这三种策略。CI(跨通道一致性)强制要求在不同通道中使用相同的模型。虽然它提供了鲁棒性[52],但它忽略了通道之间的潜在交互,并且在泛化能力和处理未见通道的容量上可能有限[21,56]。另一方面,CD(跨通道联合建模)同时考虑所有通道,并为解码生成联合表示[88],但可能容易受到来自无关通道的噪声影响,从而降低模型的鲁棒性。CHC(基于硬聚类的通道交互建模)通过硬聚类将多变量时间序列划分为不相交的簇,在每个簇内应用 CD 建模方法,在簇之间应用 CI 方法[44]。然而,这种方法只考虑同一簇内的关系,限制了灵活性和通用性。总之,目前还没有一种方法能够精确且灵活地建模通道之间的复杂交互。
In this study, we address the above two challenges by proposing a general framework, DUET, which introduces DUal clustering on the temporal and channel dimensions to Enhance multivariate Time series forecasting. First, to model heterogeneous temporal patterns caused by TDS, we design a Temporal Clustering Module (TCM). This module clusters time series into fine-grained distributions, allowing us to use various pattern extractors to capture their intrinsic temporal patterns, thereby modeling the heterogeneity of temporal patterns. This method effectively handles both stationary and nonstationary data, demonstrating strong generality. Second, to flexibly model relationships among channels, we propose a Channel Clustering Module (CCM). Using a channel-soft-clustering strategy, this module captures relationships among channels in the frequency domain through metric learning and applies sparsification. This approach enables each channel to focus on those beneficial for downstream prediction tasks, while mitigating the impact of noisy or irrelevant channels, thereby achieving effective channel soft clustering. Finally, the Fusion Module (FM), based on a masked attention mechanism, efficiently combines the temporal features extracted by the TCM with the channel mask matrix generated by the CCM. Experimental results show that the proposed DUET achieves SOTA performance on real-world forecasting datasets-see Figure 3. 在本研究中,我们通过提出一个通用框架 DUET 来解决上述两个挑战,该框架在时间和通道维度上引入了 DUal 聚类以 Enhance 多变量时间序列预测。首先,为了建模由 TDS 引起的异构时间模式,我们设计了一个时间聚类模块(TCM)。该模块将时间序列聚类为细粒度分布,使我们能够使用各种模式提取器来捕获其内在的时间模式,从而建模时间模式的异构性。该方法有效处理了平稳和非平稳数据,展示了强大的通用性。其次,为了灵活地建模通道之间的关系,我们提出了一个通道聚类模块(CCM)。通过使用通道软聚类策略,该模块通过度量学习在频域中捕获通道之间的关系,并应用稀疏化。这种方法使每个通道能够专注于对下游预测任务有益的部分,同时减轻噪声或不相关通道的影响,从而实现有效的通道软聚类。 最后,基于掩码注意力机制融合模块(FM),高效地结合了时间特征模块(TCM)提取的时间特征和通道掩码矩阵生成模块(CCM)生成的通道掩码矩阵。实验结果表明,所提出的 DUET 在真实世界预测数据集上达到了 SOTA 性能——见图 3。
Our contributions are summarized as follows. 我们的贡献总结如下。
To address MTSF, we propose a general framework called DUET. It learns an accurate and adaptive forecasting model through dual clustering on both temporal and channel dimensions. 为解决 MTSF 问题,我们提出了一种名为 DUET 的通用框架。它通过在时间和通道维度上进行双重聚类来学习准确且自适应的预测模型。
We design the TCM that clusters time series into fine-grained distributions. Various pattern extractors are then designed for different distribution clusters to capture their unique temporal patterns, modeling the heterogeneity of temporal patterns. 我们设计了时间特征模块(TCM),将时间序列聚类为细粒度分布。然后为不同的分布聚类设计了各种模式提取器,以捕获其独特的时序模式,对时序模式的异质性进行建模。
We design the CCM that flexibly captures the relationships among channels in the frequency domain through metric learning and applies sparsification. 我们通过度量学习灵活地捕捉频域中通道之间的关系,并应用稀疏化来设计 CCM。
We conduct extensive experiments on 25 datasets. The results show that DUET outperforms state-of-the-art baselines. Additionally, all datasets and code are avaliable at https://github.com/ decisionintelligence/DUET. 我们在 25 个数据集上进行了广泛的实验。结果表明,DUET 比最先进的基础模型表现更优。此外,所有数据集和代码都可在 https://github.com/decisionintelligence/DUET 获得。
2 Related Works 2 相关工作
2.1 Temporal Distribution Shift in MTSF 2.1 MTSF 中的时间分布偏移
Time series forecasting suffers from Temporal Distribution Shift (TDS), as the distribution of real-world series changes over time [1, 13, 41]. In recent years, various methods have been proposed to address this issue. Some works tackle TDS from a normalization perspective. DAIN [54] adaptively normalizes the series with nonlinear neural networks. RevIN [33] utilizes instance normalization on input and output sequences by normalizing the input sequences and then denormalizing the model output sequences. Dish-TS [16] identifies intra- and inter-space distribution shifts in time series and mitigates these issues by learning distribution coefficients. Nonstationary Transformer [47] presents de-stationary attention that incorporates non-stationary factors in self-attention, significantly improving transformer-based models. Some works address TDS from a distribution perspective. DDG-DA [35] predicts evolving data distribution in a domain adaptation fashion. AdaRNN [15] proposes an adaptive RNN to alleviate the impact of non-stationary factors by characterizing and matching distributions. Other works address TDS from a time-varying model parameters perspective. Triformer [10] proposes a light-weight approach to enable variable-specific model parameters, making it possible to capture distinct temporal patterns from different variables. ST-WA [11] use distinct sets of model parameters for different time period. 时间序列预测受到时间分布偏移(TDS)的影响,因为现实世界序列的分布会随时间变化[1, 13, 41]。近年来,人们提出了各种方法来解决这个问题。一些工作从归一化的角度处理 TDS。DAIN[54]使用非线性神经网络自适应地归一化序列。RevIN[33]通过归一化输入序列并对模型输出序列进行反归一化,在输入和输出序列上使用实例归一化。Dish-TS[16]识别时间序列中的空间内和空间间分布偏移,并通过学习分布系数来缓解这些问题。非平稳 Transformer[47]提出了去平稳注意力机制,将非平稳因素纳入自注意力机制中,显著提高了基于 Transformer 的模型。一些工作从分布的角度处理 TDS。DDG-DA[35]以领域适应的方式预测领域内数据分布的演变。AdaRNN[15]提出了一种自适应 RNN,通过表征和匹配分布来缓解非平稳因素的影响。其他工作从时变模型参数的角度处理 TDS。 Triformer [10] 提出了一种轻量级方法,以实现变量特定的模型参数,使其能够从不同变量中捕获不同的时间模式。ST-WA [11] 使用不同的模型参数集用于不同的时间段。
Despite the effectiveness of existing methods, our work explicitly models heterogeneous temporal patterns separately under different distributions, which can further improve the performance. 尽管现有方法有效,我们的工作明确地在不同的分布下分别对异构时间模式进行建模,这可以进一步提高性能。
2.2 Channel Strategies in MTSF 2.2 MTSF 中的通道策略
It is essential to consider the correlations among channels in MTSF. Most existing methods adopt either a Channel-Independent (CI) or Channel-Dependent (CD) strategy to utilize the spectrum of information in channels. CI strategy approaches [12, 38, 52, 76] share the same weights across all channels and make forecasts independently. Conversely, CD strategy approaches [7, 10, 26, 42, 46, 88] consider all channels simultaneously and generates joint representations for decoding. The CI strategy is characterized by low model capacity but high robustness, whereas the CD strategy exhibits the opposite characteristics. DGCformer [44] proposes relatively balanced channel strategies called Channel-Hard-Clustering (CHC), 在 MTSF 中,考虑通道之间的相关性至关重要。大多数现有方法采用通道独立 (CI) 或通道依赖 (CD) 策略来利用通道中的信息谱。CI 策略方法 [12, 38, 52, 76] 在所有通道中共享相同的权重,并独立进行预测。相反,CD 策略方法 [7, 10, 26, 42, 46, 88] 同时考虑所有通道,并为解码生成联合表示。CI 策略的特点是模型容量低但鲁棒性高,而 CD 策略则表现出相反的特点。DGCformer [44] 提出了一种相对平衡的通道策略,称为通道硬聚类 (CHC),
trying to mitigate this polarization effect and improve predictive capabilities. Specifically, DGCformer designs a graph clustering module to assign channels with significant similarities into the same cluster, utilizing the CD strategy inside each cluster and the CI strategy across them. This approach adopts the CHC strategy, focusing solely on channel correlations within the same cluster. As a result, this method suffers from the limitations of rigidly adhering to channel-similarity rules defined by human experience. 试图缓解这种极化效应并提高预测能力。具体来说,DGCformer 设计了一个图聚类模块,将具有显著相似性的通道分配到同一个簇中,在每个簇内部使用 CD 策略,并在它们之间使用 CI 策略。这种方法采用 CHC 策略,仅关注同一簇内的通道相关性。因此,这种方法受到严格遵循人类经验定义的通道相似性规则的局限性的影响。
Different from the above methods, we adopt a Channel-SoftClustering (CSC) strategy and devise a fully adaptive sparsity module to dynamically build group for each channel, which is a more comprehensive design covering the CHC strategy. 与上述方法不同,我们采用通道-软聚类(CSC)策略,并设计了一个完全自适应的稀疏性模块,动态地为每个通道构建组,这是一种更全面的设计,涵盖了 CHC 策略。
3 Preliminaries 3 初步知识
3.1 Definitions 3.1 定义
Definition 3.1 (Time series). A time series X inR^(N xx T)X \in \mathbb{R}^{N \times T} is a timeoriented sequence of N -dimensional time points, where TT is the number of timestamps, and NN is the number of channels. If N=1N=1, a time series is called univariate, and multivariate if N > 1N>1. 定义 3.1(时间序列)。时间序列 X inR^(N xx T)X \in \mathbb{R}^{N \times T} 是一个 N 维时间点的面向时间序列,其中 TT 是时间戳的数量, NN 是通道的数量。如果 N=1N=1 ,则时间序列称为单变量的,如果 N > 1N>1 则称为多变量的。
For convenience, we separate dimensions with commas. Specifically, we denote X_(i,j)inRX_{i, j} \in \mathbb{R} as the ii-th channel at the jj-th timestamp, X_(n,:)inR^(T)X_{n,:} \in \mathbb{R}^{T} as the time series of nn-th channel, where n=1,cdots,Nn=1, \cdots, N. We also introduce some definitions used in our methodology: 为了方便起见,我们用逗号分隔维度。具体来说,我们用 X_(i,j)inRX_{i, j} \in \mathbb{R} 表示第 ii 个通道在第 jj 个时间戳的值,用 X_(n,:)inR^(T)X_{n,:} \in \mathbb{R}^{T} 表示第 nn 个通道的时间序列,其中 n=1,cdots,Nn=1, \cdots, N 。我们还引入了一些我们方法中使用的定义:
Definition 3.2 (Temporal Distribution Shift [15]). Given a time series XinR^(N xx L)\mathcal{X} \in \mathbb{R}^{N \times L}, by sliding the window, we get a set of time series with the length of TT, denoted as D={X_(n,i:i+T)∣n in[1,N]&i in:}\mathcal{D}=\left\{\mathcal{X}_{n, i: i+T} \mid n \in[1, N] \& i \in\right.[1,L-T]}[1, L-T]\}, where each X_(n,i:i+T)X_{n, i: i+T} equals to such X_(n,:)X_{n,:}. Then, temporal distribution shift is referred to the case that D\mathcal{D} can be clustered into KK sets, i.e., D=uuu_(i=1)^(K)D_(i)\mathcal{D}=\bigcup_{i=1}^{K} \mathcal{D}_{i}, where each D_(i)\mathcal{D}_{i} denotes the set with data distribution P_(D_(i))(x)P_{\mathcal{D}_{i}}(x), where P_(D_(i))(x)!=P_(D_(j))(x),AA i!=jP_{\mathcal{D}_{i}}(x) \neq P_{\mathcal{D}_{j}}(x), \forall i \neq j and 1 <= i,j <= k1 \leq i, j \leq k. 定义 3.2(时间分布偏移 [15])。给定一个时间序列 XinR^(N xx L)\mathcal{X} \in \mathbb{R}^{N \times L} ,通过滑动窗口,我们得到一个长度为 TT 的时间序列集合,记为 D={X_(n,i:i+T)∣n in[1,N]&i in:}\mathcal{D}=\left\{\mathcal{X}_{n, i: i+T} \mid n \in[1, N] \& i \in\right.[1,L-T]}[1, L-T]\} ,其中每个 X_(n,i:i+T)X_{n, i: i+T} 等于这样的 X_(n,:)X_{n,:} 。然后,时间分布偏移指的是 D\mathcal{D} 可以聚类成 KK 个集合,即 D=uuu_(i=1)^(K)D_(i)\mathcal{D}=\bigcup_{i=1}^{K} \mathcal{D}_{i} ,其中每个 D_(i)\mathcal{D}_{i} 表示具有数据分布 P_(D_(i))(x)P_{\mathcal{D}_{i}}(x) 的集合,其中 P_(D_(i))(x)!=P_(D_(j))(x),AA i!=jP_{\mathcal{D}_{i}}(x) \neq P_{\mathcal{D}_{j}}(x), \forall i \neq j 和 1 <= i,j <= k1 \leq i, j \leq k 。
3.2 Problem Statement 3.2 问题陈述
Multivariate Time Series Forecasting aims to predict the next FF future timestamps, formulated as Y=(:X_(:,T+1),cdots,X_(:,T+F):)inR^(N xx F)Y=\left\langle X_{:, T+1}, \cdots, X_{:, T+F}\right\rangle \in \mathbb{R}^{N \times F} based on the historical time series X=(:X_(:,1),cdots,X_(:,T):)inR^(N xx T)X=\left\langle X_{:, 1}, \cdots, X_{:, T}\right\rangle \in \mathbb{R}^{N \times T} with NN channels and TT timestamps. 多变量时间序列预测旨在根据具有 NN 个通道和 TT 个时间戳的历史时间序列 X=(:X_(:,1),cdots,X_(:,T):)inR^(N xx T)X=\left\langle X_{:, 1}, \cdots, X_{:, T}\right\rangle \in \mathbb{R}^{N \times T} ,预测接下来的 FF 个未来时间戳,表示为 Y=(:X_(:,T+1),cdots,X_(:,T+F):)inR^(N xx F)Y=\left\langle X_{:, T+1}, \cdots, X_{:, T+F}\right\rangle \in \mathbb{R}^{N \times F} 。
4 Methodology 4 方法论
4.1 Structure Overview 4.1 结构概述
Figure 4 shows the architecture of DUET, which adopts a dual clustering on both temporal and channel dimensions, simultaneously mining intrinsic temporal patterns and dynamic channel correlations. Specifically, we first use the Instance Norm [33] to unify the distribution of training and testing data. Then, the Temporal Clustering Module (TCM) utilizes a specially designed Distribution Router (Figure 5a) to capture the potential latent distributions of each time series X_(n,:)inR^(T)X_{n,:} \in \mathbb{R}^{T} in a channel-independent way, and then clusters time series with similar latent distributions by assigning them to the same group of Linear-based Pattern Extractors (Figure 5b). In this way, we can mitigate the issue that single structure cannot fully extract temporal features due to heterogeneity of temporal patterns, even with millions of parameters. Meanwhile, the Channel Clustering Module (CCM) captures the correlations among 图 4 展示了 DUET 的架构,该架构在时间和通道维度上采用双重聚类,同时挖掘内在时间模式和动态通道相关性。具体来说,我们首先使用实例归一化[33]来统一训练和测试数据的分布。然后,时间聚类模块(TCM)利用专门设计的分布路由器(图 5a)以通道无关的方式捕获每个通道中时间序列 X_(n,:)inR^(T)X_{n,:} \in \mathbb{R}^{T} 的潜在潜在分布,然后通过将具有相似潜在分布的时间序列分配到同一组基于线性模式的提取器(图 5b)进行聚类。通过这种方式,我们可以缓解由于时间模式的异质性,即使有数百万个参数,单个结构也无法充分提取时间特征的问题。同时,通道聚类模块(CCM)捕获了
^(1){ }^{1} School of Data Science & Engineering. ^(1){ }^{1} 数据科学与工程学院。 ^(2){ }^{2} Engineering Research Center of Blockchain Data Management, Ministry of Education. ^(2){ }^{2} 教育部区块链数据管理工程研究中心。