支持矩阵机：综述

Anuradha Kumari $^{a}$ , Mushir Akhtar $^{a}$ , Rupal Shah $^{b}$ , M. Tanveer $^{a,}^{*}$ $^{a}$ 数学系，印度理工学院印多尔分校，Simrol，印多尔，453552，马哈拉施特拉邦，印度 $^{b}$ 电气工程系，印度理工学院印多尔分校，Simrol，印多尔，453552，马哈拉施特拉邦，印度

文章信息

  关键词：
  支持矩阵机
脑电图（EEG）
  故障检测
  支持向量机

摘要

支持向量机（SVM）是机器学习领域中用于分类和回归问题的最广泛研究的范例之一。它依赖于向量化的输入数据。然而，现实世界中的大量数据以矩阵格式存在，需要通过将矩阵重塑为向量将其输入 SVM。重塑过程会破坏矩阵数据中固有的空间相关性。此外，将矩阵转换为向量会导致输入数据具有高维性，从而引入显著的计算复杂性。为了解决分类矩阵输入数据时出现的问题，提出了支持矩阵机（SMM）。它代表了为处理矩阵输入数据而设计的一种新兴方法。SMM 通过使用谱弹性网络特性来保留矩阵数据的结构信息，该特性是核范数和 Frobenius 范数的组合。本文首次对 SMM 模型的发展进行了深入分析，可作为新手和专家的全面总结。我们讨论了多种 SMM 变体，如鲁棒、稀疏、类别不平衡和多类分类模型。我们同时分析了 SMM 的应用，并通过概述可能激励研究人员改进 SMM 算法的未来研究方向和可能性来结束本文。

引言 … 2
搜索方法 … 2
SMM 基础 … 2
3.1. 符号表示法。 … 2
3.2. 相关工作 … 3
3.3. SMM 的数学公式表示。 … 4
用于分类的 SMM … 6
4.1. 最小二乘 SMM……6
4.2. 稳健和稀疏 SMM。 … 6
4.3. 用于多类分类的 SMM … 7
4.4. 用于不平衡学习的 SMM … 8
4.5. SMM 的深度变体……10
4.6. SMM 的其他变体 … 11
用于回归的 SMM … 12
SMM 用于半监督学习……12
应用……12
7.1.脑电图信号分类……12
7.2.故障诊断……16
7.3.其他应用……17
7.4.实验结果……17
7.4.1. 脑电图数据集… 17
7.4.2. 故障数据集… 17
7.5. SMM 在现实世界应用中的挑战……17
结论与未来方向 … 20

CRediT 作者贡献声明……21
利益冲突声明… 21
数据可用性 … 21
致谢 … 21
参考文献… 21

1. 引言

支持向量机（SVM）（Cortes & Vapnik，1995）由于其坚实的理论基础和高泛化能力（Cervantes，Garcia-Lamont，Rodríguez-Mazahua，& Lopez，2020），近年来已成为一种广泛增长且最受欢迎的分类技术。这些算法在各种现实世界应用中，如经济（Wang，Wang，& Lai，2005）、医疗（Zhu，Liu，Lu，& Li，2016）、文本分类（Zhang，Yoshida，& Tang，2008）、回归（Chuang，2007）等，是分类和回归中最有效和可靠的算法之一。它基于结构风险最小化（SRM）原则和最大间隔原则（Deng，Tian，& Zhang，2012），在两类样本之间创建具有最大间隔的最优分类超平面。SVM 与其他技术结合使用以提高分类和训练效率，例如与序列最小优化（SMO）结合的 SVM（Keerthi，Shevade，Bhattacharyya，& Murthy，2001；Platt，1998）、与连续过松弛（SOR）技术结合的 SVM（Mangasarian & Musicant，1999）、SVMlight（Joachims，1999）等。

支持向量机（SVMs）是适用于向量形式输入数据的合适分类器。然而，某些应用会生成高阶张量形式的数据，这是一种更自然的数据表达方式。例如步态轮廓序列（3 阶张量）、灰度图像（2 阶张量）、彩色视频（4 阶张量）等。Tucker 张量分解（TTD）允许将所有高阶张量转换为矩阵（Kotsia & Patras, 2011）。此外，一些实际应用中的数据以矩阵形式存在，如医学图像、人脸逼真图像、手掌图像等。因此，研究输入数据为矩阵类型的分类技术非常重要。此外，矩阵数据包含行与列之间的相关性，从而比向量数据传递更多信息，例如不同 EEG 通道之间的相关性（Zhou & Li, 2014）、图像数据中邻域像素的空间关系（Wolf, Jhuang, & Hazan, 2007）等。

如果在传统分类器（如 SVM）中考虑矩阵输入数据，则需要对矩阵进行重塑为向量，这将破坏结构信息，即矩阵中存在的空间相关性（Wolf 等人，2007 年）。此外，将矩阵转换为向量的过程在处理相对较少的输入样本时会产生高维向量，从而引发所谓的维度灾难问题。高维性增加了分类的计算复杂度。一些利用矩阵数据空间相关性的分类模型被提出，例如 rank

k

-SVM（Wolf 等人，2007 年），该模型考虑使用秩为 1 的正交矩阵来表示回归矩阵，以及双线性 SVM（Pirsiavash、Ramanan 和 Fowlkes，2009 年），该模型从回归矩阵中生成两个低秩矩阵。然而，在

k

-SVM 和双线性 SVM 中，回归矩阵的秩需要固定。此外，上述方法会导致非凸优化问题。此外，还有 Gao、Lv 等人。 (2021) 提出了一种依赖于行为连续性的结构信息和高相似性邻域帧的结构约束矩阵分解框架。此外，高、秦、胡、刘和 Li (2021) 提出了一种智能车辆的短期预测和长期轨迹预测。

在机器学习领域，为了在保留其空间相关性的同时解决维度灾难问题，罗、谢、张和李 (2015) 引入了一种名为支持矩阵机 (SMM) 的监督学习领域的进步。

SMM 使用矩阵的核范数作为矩阵秩的凸替代方案（Kobayashi & Otsu, 2012）。核范数的应用灵感来源于矩阵补全（Candes & Recht, 2012; Huang, Nie and Huang, 2013）和低秩矩阵逼近（Srebro & Shraibman, 2005），它利用了输入矩阵数据列与行之间的现有相关性。此外，SMM 在优化过程中还使用 Frobenius 范数，并通过 hinge 损失惩罚数据点，从而实现模型的稀疏性。平方 Frobenius 范数与核范数的组合被称为谱弹性网络，这与 Zou 和 Hastie（2005）提出的传统弹性网络平行。因此，SMM 提高了矩阵输入数据泛化分类性能和凸优化问题的性能。为了强调 SMM 与 SVM 之间的比较，我们在表 1 中提供了两种模型的对比分析。

近年来，社交媒体建模（SMM）及其在多个实际场景中的应用呈现显著增长。过去十年中引入的不同 SMM 变体包括鲁棒和稀疏模型、处理不平衡数据的模型、最小二乘 SMM 适应以及将 SMM 扩展到多类分类任务的模型。SMM 也与深度变体结合使用。它在脑电图（EEG）数据分类和故障诊断领域具有主要应用。本文讨论了 SMM 的增长、其变体以及应用。图 1 突出了本文的整体结构和进展。

本文其余部分的结构安排如下：第二节讨论与综述论文相关的搜索方法。第三节包含 SMM 的基础及其相关工作。第四节简要讨论了 SMM 的不同变体及其近年来在分类问题上的发展。第五节分析了 SMM 模型在回归问题上的应用，第六节讨论了 SMM 在半监督学习方面的应用。我们在第七节介绍了 SMM 在现实世界领域的各种应用。第八节总结了本文，并提出了几个未来的研究方向。

2. 搜索方法

本综述中包含的论文来自两个搜索引擎：Google Scholar 和 Scopus。主要搜索于 2023 年 5 月进行，随后于 2024 年 4 月进行了补充搜索。搜索重点关注“支持矩阵机”和“SMM”等关键词，并根据论文标题进行筛选。初步筛选排除了未专注于支持矩阵机的研究。总共收录了自 2015 年以来发表的 61 篇论文。

3. SMM 的基础

在本节中，我们深入探讨 SMM 的基本组成部分，首先从符号体系的研究开始，接着考察相关的前期研究，并详细阐述 SMM 框架的构建。

3.1. 符号体系

首先，我们讨论全文所使用的符号体系。遵循普遍接受的惯例，我们使用小写字母表示标量值，使用小写粗体字母表示向量，使用大写粗体字母表示矩阵。奇异值分解（SVD）

表1
SVM 与 SMM 的比较。

特性	SVM	SMM
相似性
基于平面的学习原理决策函数	超平面分离数据点最大间隔原则符号函数	超平面分离数据点最大间隔原则符号函数
差异
输入数据	向量	矩阵
输入矩阵数据的行为	向量化矩阵不保留其固有的空间相关性	保留固有的空间相关性
优化问题	不具有谱弹性网络特性	通过使用 Frobenius 范数和核范数采用谱弹性网络惩罚特性
时间复杂度	$O (N^{3})$ (Kumari, Ganaie, & Tanveer, 2022)	$O ($ 多项式 $(N, p q))$ (Duan, Yuan, Liu, & Li, 2017)

图1. 展示了论文结构和进展的视觉说明。

表2
论文中使用的符号。

符号	描述
C	多类问题中的类别数量
$N$	训练样本数量
$p \times q$	输入矩阵数据的顺序
$κ$	矩阵求逆的条件数
$ϵ$	输出空间的预期精度
H	包含输入矩阵的希尔伯特空间
V	对称矩阵
$ζ$	权衡参数/正则化参数
$λ$	核范数约束
$ρ$	ADMM 的固有系数
$s$	迭代次数
L	深度变体中的层数
$K$	矩阵的秩

对于给定的矩阵

X \in R^{p \times q}

被压缩为

X = E Σ F^{T}

，其中

E \in R^{p \times r}

和

F \in R^{q \times r}

是酉矩阵，

Σ

是对角矩阵，其对角线元素为奇异值

v_{1}, v_{2}, \dots, v_{r}

。这些奇异值按

v_{1} \geq v_{2} \geq \dots \geq v_{r} \geq 0

的顺序排列。非零奇异值的数量表示

X

的秩，因此

r \leq min (p, q)

。此外，我们令

‖ X ‖_{*} = \sum_{i = 1}^{r} ν_{i}

为

X

的核范数，

‖ X ‖_{F} = \sqrt{\sum_{i, j} x_{i, j}^{2}} = \sqrt{\sum_{i = 1}^{r} ν_{i}^{2}}

为弗罗贝尼乌斯范数。论文中使用的其他符号及其描述见表2。

在本小节中，我们首先对矩阵分类问题进行简单描述。给定一组训练矩阵样本

{X_{i}, y_{i}}_{i = 1}^{N}

，其中

X_{i} \in R^{p \times q}

是

i^{t h}

输入矩阵，

y_{i} \in {- 1, 1}

是其对应的类别标签。主要目标是通过提供的训练样本训练一个函数

f

：

R^{p \times q} \to R

，该函数能够显著确定一个新未见矩阵样本的类别。

为了使用传统技术解决矩阵分类问题，一种常用的启发式方法涉及将每个矩阵数据实例

X_{i}

转换为向量格式。随后，使用这些向量化数据训练模型。在传统分类器中，一个高效的方法是软间隔支持向量机模型（Cortes & Vapnik, 1995），其优化问题如下所示：

min_{w, b} \frac{1}{2} w^{T} w + ζ \sum_{i = 1}^{N} {1 - y_{i} (w^{T} x_{i} + b)}_{+}

,
其中

w \in R^{p \times q}; b \in R; x_{i} = vec (X_{i})

表示矩阵

X_{i}, {h}_{+} := max {0, h}

的向量化格式，

X_{i}, {h}_{+} := max {0, h}

表示经典的 hinge 损失函数，

ζ > 0

。

由于

tr (W^{T} W)

等价于

vec (W)^{T} vec (W)

，且

tr (W^{T} X_{i})

等价于

vec (W)^{T} vec (X_{i})

。因此，在计算方面

表 3
不同最小二乘变体的比较。

	ILS-TSMM（高阿，范斌，& 许旭，2015）	BP-LSSMM（夏&范，2016）	NPLSSMM（李，杨，潘，程，& 程程，2020）	LSISMM（Li, Shao, Lu, Xiang 和 Cai, 2022）	AMK-TMM（Liang, Hang, Lei 等人, 2022）
SRM 原则	$✓$	$\times$	$\times$	$✓$	$\times$
超平面	两个非平行超平面	一个决策边界	两个非平行超平面	两个非平行超平面	一个决策边界
迁移学习	$x$	$x$	$\times$	$x$	$✓$
优化问题	两个	一个	两	两	一

表4
SMM 的最小二乘变体。

模型	作者	特征	损失函数	数据集	优点	解决方法
ILS-TSMM (2015)	Gaoa 等人 (2015)	-	最小二乘损失	ORL 和 YALE 人脸数据库	考虑 SRM 原理。与 SMM 相比，在时间效率方面有所提高（Luo 等人，2015 年）。	求解方程组。
BP-LSSMM (2016)	夏和范（2016 年）	将最小二乘法引入双层规划 SMM 求解。	-	PALM400、ORL、Yale 数据集	比 SMM 更高效	采用增广拉格朗日乘子法，并求解线性方程组。
NPLSSMM (2020)	Li 等人 (2020)	Solves matrix classification problem by constructing two non-parallel hyperplanes.	Least square loss	Fault dataset from CWRU	Distinguishes the classes by obtaining a maximum margin hyperplane in matrix form, reduced complexity as it solves a system of linear equations.	ADMM
LSISMM (2022)	Li, Shao 等人 (2022)	Constructs non-parallel hyperplanes and uses small infrared thermal images for fault diagnosis.	Least square loss	-	Flexible to maximize the distance between non-parallel hyperplanes and high computational efficiency than SMM.	ADMM
AMK-TMM (2022)	梁航、雷等 (2022)	Introduces a novel adaptive multimodel knowledge transfer framework and consist of equality constraints.	Least square loss	BCI III 的 Dataset IVa 和 BCI IV 的 IIa	Utilize CV using a leave-one-out strategy to automatically find the correlated source domains and their corresponding weights.	ADMM

considerations, Eq. (1) is identical to the subsequent formulation for performing matrix classification directly:

min_{W, b} \frac{1}{2} tr (W^{T} W) + ζ \sum_{i = 1}^{N} {1 - y_{i} [tr (W^{T} X_{i}) + b]}_{+}

.
This illustrates that directly employing Eq. (2) for classification is insufficient to capture the inherent structure present within each input matrix. As a consequence, this approach leads to a loss of information.

In order to consider the structural characteristics, an intuitive strategy involves capturing the correlations inherent within each input matrix by introducing a low-rank restriction on the matrix

W

. To tackle this issue, a number of approaches are suggested such as the low-rank SVM (Wolf et al., 2007) and the bi-linear SVM (Pirsiavash et al., 2009). However, these approaches have limitations as they demand manual pre-specification of the latent rank of

W

tailored to various applications. To address the aforementioned issues, Luo et al. (2015) proposed an advancement in the realm of supervised learning, SMM, which overcomes the pre-specified rank criteria, preserves the structural information of the input matrix by employing the spectral elastic net penalty for the regression matrix, and produces a better result for matrix-form data.

3.3. SMM 的数学公式

SMM is a proficient matrix-based adaptation of SVM, capitalizing on the strengths of SVM, which include robust generalization capabilities. Further, it has the ability to comprehensively harness the structural insights embedded within matrix data. The objective function of SMM,
在最大边距原则下，表达式如下：

\frac{1}{2} tr (W^{T} W) + λ ‖ W ‖_{*} + ζ \sum_{i}^{N} {1 - y_{i} [tr (W^{T} X_{i}) + b_{i}]}_{+}

Here, the initial term

\frac{1}{2} tr (W^{T} W) + λ ‖ W ‖_{*}

pertains to the utilization of spectral elastic net regularization, which serves the purpose of capturing correlations inherent within individual matrices. On the other hand, the last summation term represents the hinge loss function. The term

\frac{1}{2} tr (W^{T} W)

can also be written as

\frac{1}{2} ‖ W ‖_{F}^{2}

, which represents the square Frobenius norm.

The Frobenius norm of matrix

W

serves as a regularization term, aiming to find a weight matrix with a reduced rank. It is also crucial to highlight that the nuclear norm serves as a regularization factor to ascertain the rank of matrix

W

. Estimating the rank of a matrix can be a complex problem with NP-hard characteristics (Wang, Wang, Hu, & Yan, 2015), however, the nuclear norm is widely acknowledged as the optimal convex approximation method for assessing the rank of the matrix (Candes & Recht, 2012; Zhou & Li, 2014). Additionally, the low-rank parameter

λ

governs the level of structure information incorporated for constructing the classification hyperplane. The presence of the term

‖ W ‖_{*}

introduces non-smoothness to the objective function of SMM. This characteristic poses a challenge when attempting to directly solve Eq. (3). Consequently, the solution for SMM is derived through the application of the alternating direction method of multipliers (ADMM) (Goldstein, O’Donoghue, Setzer, & Baraniuk, 2014). Now, by introducing an auxiliary matrix variable

Q

, the objective function of SMM can be reformulated in the following manner:

\arg min F (W, b) + G (Q)

(W, b), Q

Table 5
Robust and sparse models of SMM.

模型	作者	特征	损失函数	数据集	优点	解决方法
RSMM（2018）	郑、朱和恒 (2018)	Decompose each input signal into low-rank clean signal and sparse intra-sample outliers and employ $l_{1}$ norm for sparseness.	铰链损失	IVa of brain computer interface (BCI) competition III (Dornhege, Blankertz, Curio, & Muller, 2004), IIb and IIa of BCI competition IV (Leeb et al., 2007)	增强了 SMM 的鲁棒性。	ADMM
SSMM (2018)	Zheng, Zhu, Qin, Chen 和 Heng (2018)	Performs feature selection to remove redundant features and involves a new regularization term, which is a linear combination of nuclear norm and $l_{1}$ norm.	铰链损失	INRIA Person Dataset (Dalal & Triggs, 2005), Caltech Face Dataset (Fergus, Perona, & Zisserman, 2003), IIa and IIb of BCI competition IV (Ang, Chin, Wang, Guan, & Zhang, 2012)	增强了 SMM 的稀疏性。	广义前向-后向（GFB）算法
RMSMM (2019)	钱、Tran-Dinh、傅、邹和刘（2019）	基于角度分类框架构建，将二元和多类问题整合到一个框架中。	截断 hinge 损失	日常和体育活动数据集（Altun & Barshan，2010）	比 SMM 具有更好的预测性能和更快的计算速度。	不精确的近端 DC 算法
RSSMM (2021)	Gu, Zheng, Pan, and Tong (2021)	在目标函数中采用 $l_{1}$ 范数和稀疏约束，以削弱输入矩阵的冗余信息。	平滑斜坡损失	来自 AHUT 的滚动轴承故障数据集	减少异常值的影响。	GFB 算法
SWSSMM (2021)	Li, Yang 等人 (2021)	从原始信号中自动提取固有故障特征，并使用辛系数矩阵（SCM）。此外，在 SCM 中添加了基于可变熵的权重系数，以增强故障特征。	铰链损失	康涅狄格大学振动信号数据集	Eliminates the effect of noise in raw signal and enhance the fault features.	GFB 算法
TRMM (2022)	Pan, Xu, Zheng, Tong 和 Cheng (2022)	Employs truncated nuclear norm for low-rank approximation.	斜坡损失	Fault dataset of roller bearing from AHUT	Insensitive and robust to outliers and efficient than RSMM.	加速近端梯度（ALG）算法
Pin-SMM (2022)	冯和徐（2022）	最大化分位数距离，而不是最短距离。	保形损失	INRIA 人物（Dalal & Triggs，2005），加州理工学院（Fei-Fei，Fergus，& Perona，2006），BCI IV 的 IIa	Robust to noise.	ADMM
SNMM (2022)	Wang, Xu, Pan, Xie, 和 Zheng (2022)	Employed the $l_{1}$ norm distance as a constraint of the hyperplane and avoided the need to find inverse matrices.	铰链损失	Fault dataset of roller bearing from AHUT, HNU, and CWRU	Improves the robustness and reduces the storage requirements.	一种交替迭代方法
ACF-SSMM (2022)	Li, Wang and Liu (2022)	Extend the input matrix by adding data through an auto-correlation function (ACF) transform, which contain data information at previous/current instants.	铰链损失	SEED-VIG 疲劳数据集（Zheng & Lu，2017）	Enhances the generalization performance of SMM.	GFB 算法
SMMRe（2023）	Razzak、Bouadjenek、Saris 和 Ding（2023）	Decompose each input signal into low-rank clean signal and sparse intra-sample outliers and employ joint $l_{2, 1}$ and nuclear norm.	铰链损失	Caltech（Fei-Fei 等人，2006 年）、INRIA Person（Dalal 和 Triggs，2005 年）、BCI III 的 IVa、BCI IV 的 IIb 和 IIa	增强了 SMM 的鲁棒性。	ADMM
TPin-SMM (2024)	Li 和 Xu (2024)	引入了截断弹球损失，增强了对外离群值、噪声不敏感性和稀疏性的鲁棒性。	截断的弹珠损失	加州理工学院（Fei-Fei 等人，2006 年），BCI IV 的 IIa 部分，戴姆勒行人数据集	提高了 SMM 的泛化性能。	CCCP-ADMM

Model Author Characteristics Loss function Datasets Advantages Technique to solve RSMM (2018) Zheng, Zhu and Heng (2018) Decompose each input signal into low-rank clean signal and sparse intra-sample outliers and employ l_(1) norm for sparseness. Hinge loss IVa of brain computer interface (BCI) competition III (Dornhege, Blankertz, Curio, & Muller, 2004), IIb and IIa of BCI competition IV (Leeb et al., 2007) Enhances the robustness of SMM. ADMM SSMM (2018) Zheng, Zhu, Qin, Chen and Heng (2018) Performs feature selection to remove redundant features and involves a new regularization term, which is a linear combination of nuclear norm and l_(1) norm. Hinge loss INRIA Person Dataset (Dalal & Triggs, 2005), Caltech Face Dataset (Fergus, Perona, & Zisserman, 2003), IIa and IIb of BCI competition IV (Ang, Chin, Wang, Guan, & Zhang, 2012) Enhances the sparseness of SMM. Generalized forwardbackwards (GFB) algorithm RMSMM (2019) Qian, Tran-Dinh, Fu, Zou, and Liu (2019) Constructed in the angle based classification framework and condenses the binary and multi-class problems into a single framework. Truncated hinge loss Daily and sports activities dataset (Altun & Barshan, 2010) Enjoys better prediction performance and faster computation than SMM. Inexact proximal DC algorithm RSSMM (2021) Gu, Zheng, Pan, and Tong (2021) Employs l_(1) norm and sparse constraint into objective function to weaken the redundant information of the input matrix. Smooth ramp loss Fault dataset of roller bearing from AHUT Reduces the influence of outliers. GFB algorithm SWSSMM (2021) Li, Yang et al. (2021) Automatically extract inherent fault features from raw signals and use the symplectic coefficient matrix (SCM). Also a variable entropy-based weight coefficient is added into SCM to enhance the fault features. Hinge loss Vibration signal dataset from University of Connecticut Eliminates the effect of noise in raw signal and enhance the fault features. GFB algorithm TRMM (2022) Pan, Xu, Zheng, Tong and Cheng (2022) Employs truncated nuclear norm for low-rank approximation. Ramp loss Fault dataset of roller bearing from AHUT Insensitive and robust to outliers and efficient than RSMM. Accelerated Proximal Gradient (ALG) algorithm Pin-SMM (2022) Feng and Xu (2022) Maximizes the quantile distance rather than the shortest distance. Pinball loss INRIA Person (Dalal & Triggs, 2005), Caltech (Fei-Fei, Fergus, & Perona, 2006), IIa of BCI IV Robust to noise. ADMM SNMM (2022) Wang, Xu, Pan, Xie, and Zheng (2022) Employed the l_(1) norm distance as a constraint of the hyperplane and avoided the need to find inverse matrices. Hinge loss Fault dataset of roller bearing from AHUT, HNU, and CWRU Improves the robustness and reduces the storage requirements. An alternating iteration method ACF-SSMM (2022) Li, Wang and Liu (2022) Extend the input matrix by adding data through an auto-correlation function (ACF) transform, which contain data information at previous/current instants. Hinge loss SEED-VIG fatigue dataset (Zheng & Lu, 2017) Enhances the generalization performance of SMM. GFB algorithm SMMRe (2023) Razzak, Bouadjenek, Saris, and Ding (2023) Decompose each input signal into low-rank clean signal and sparse intra-sample outliers and employ joint l_(2,1) and nuclear norm. Hinge loss Caltech (Fei-Fei et al., 2006), INRIA Person (Dalal & Triggs, 2005), IVa of BCI III, IIb and IIa of BCI IV Enhances the robustness of SMM. ADMM TPin-SMM (2024) Li and Xu (2024) Incorporated truncated pinball loss and yields robustness to outliers, noise insensitivity, and sparsity. Truncated pinball loss Caltech (Fei-Fei et al., 2006), IIa of BCI IV, Daimler pedestrian dataset Enhances the generalization performance of SMM. CCCP-ADMM

| Model | Author | Characteristics | Loss function | Datasets | Advantages | Technique to solve | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | RSMM (2018) | Zheng, Zhu and Heng (2018) | Decompose each input signal into low-rank clean signal and sparse intra-sample outliers and employ $l_{1}$ norm for sparseness. | Hinge loss | IVa of brain computer interface (BCI) competition III (Dornhege, Blankertz, Curio, & Muller, 2004), IIb and IIa of BCI competition IV (Leeb et al., 2007) | Enhances the robustness of SMM. | ADMM | | SSMM (2018) | Zheng, Zhu, Qin, Chen and Heng (2018) | Performs feature selection to remove redundant features and involves a new regularization term, which is a linear combination of nuclear norm and $l_{1}$ norm. | Hinge loss | INRIA Person Dataset (Dalal & Triggs, 2005), Caltech Face Dataset (Fergus, Perona, & Zisserman, 2003), IIa and IIb of BCI competition IV (Ang, Chin, Wang, Guan, & Zhang, 2012) | Enhances the sparseness of SMM. | Generalized forwardbackwards (GFB) algorithm | | RMSMM (2019) | Qian, Tran-Dinh, Fu, Zou, and Liu (2019) | Constructed in the angle based classification framework and condenses the binary and multi-class problems into a single framework. | Truncated hinge loss | Daily and sports activities dataset (Altun & Barshan, 2010) | Enjoys better prediction performance and faster computation than SMM. | Inexact proximal DC algorithm | | RSSMM (2021) | Gu, Zheng, Pan, and Tong (2021) | Employs $l_{1}$ norm and sparse constraint into objective function to weaken the redundant information of the input matrix. | Smooth ramp loss | Fault dataset of roller bearing from AHUT | Reduces the influence of outliers. | GFB algorithm | | SWSSMM (2021) | Li, Yang et al. (2021) | Automatically extract inherent fault features from raw signals and use the symplectic coefficient matrix (SCM). Also a variable entropy-based weight coefficient is added into SCM to enhance the fault features. | Hinge loss | Vibration signal dataset from University of Connecticut | Eliminates the effect of noise in raw signal and enhance the fault features. | GFB algorithm | | TRMM (2022) | Pan, Xu, Zheng, Tong and Cheng (2022) | Employs truncated nuclear norm for low-rank approximation. | Ramp loss | Fault dataset of roller bearing from AHUT | Insensitive and robust to outliers and efficient than RSMM. | Accelerated Proximal Gradient (ALG) algorithm | | Pin-SMM (2022) | Feng and Xu (2022) | Maximizes the quantile distance rather than the shortest distance. | Pinball loss | INRIA Person (Dalal & Triggs, 2005), Caltech (Fei-Fei, Fergus, & Perona, 2006), IIa of BCI IV | Robust to noise. | ADMM | | SNMM (2022) | Wang, Xu, Pan, Xie, and Zheng (2022) | Employed the $l_{1}$ norm distance as a constraint of the hyperplane and avoided the need to find inverse matrices. | Hinge loss | Fault dataset of roller bearing from AHUT, HNU, and CWRU | Improves the robustness and reduces the storage requirements. | An alternating iteration method | | ACF-SSMM (2022) | Li, Wang and Liu (2022) | Extend the input matrix by adding data through an auto-correlation function (ACF) transform, which contain data information at previous/current instants. | Hinge loss | SEED-VIG fatigue dataset (Zheng & Lu, 2017) | Enhances the generalization performance of SMM. | GFB algorithm | | SMMRe (2023) | Razzak, Bouadjenek, Saris, and Ding (2023) | Decompose each input signal into low-rank clean signal and sparse intra-sample outliers and employ joint $l_{2,1}$ and nuclear norm. | Hinge loss | Caltech (Fei-Fei et al., 2006), INRIA Person (Dalal & Triggs, 2005), IVa of BCI III, IIb and IIa of BCI IV | Enhances the robustness of SMM. | ADMM | | TPin-SMM (2024) | Li and Xu (2024) | Incorporated truncated pinball loss and yields robustness to outliers, noise insensitivity, and sparsity. | Truncated pinball loss | Caltech (Fei-Fei et al., 2006), IIa of BCI IV, Daimler pedestrian dataset | Enhances the generalization performance of SMM. | CCCP-ADMM |

s.t.

W - Q = 0

这里，

F (W, b) = \frac{1}{2} tr (W^{T} W) + ζ \sum_{i}^{N} {1 - y_{i} [tr (W^{T} X_{i}) + b_{i}]}_{+}

，和

G (Q) = λ ‖ Q ‖_{*}

。然后，使用增广拉格朗日乘子

β

，方程(4)被重写为如下形式：

L (Q, β, (W, b)) = F (W, b) + G (Q) + \frac{ρ}{2} ‖ Q - W ‖_{F}^{2} + tr [β^{T} (Q - W)]

.
此处，

ρ

表示 ADMM 方法的固有系数。在此框架内，我们需要确定三个变量矩阵：

Q, β

和

(W, b)

。获取这些矩阵的最优解涉及迭代方法。其一般步骤为
updating these variable matrices are outlined as follows:

\begin{aligned} Q^{(t + 1)} & = \underset{Q}{\arg min} L (Q, β^{(t)}, (W^{(t)}, b^{(t)})), \\ (W^{(t + 1)}, b^{(t + 1)}) & = \underset{(W, b)}{\arg min} L (Q^{(t + 1)}, β^{(t)}, (W, b)), \\ β^{(t + 1)} & = β^{(t)} + ρ (Q^{(t + 1)} - (W^{(t + 1)}, b^{(t + 1)})) . \end{aligned}

The fundamental steps for solving within this context involve calculating

Q^{(t)}

and the pair (

W^{(t)}, b^{(t)}

) during each iteration. To update

Q

(for the sake of ease, we omit the superscripts in the subsequent explanation), suppose that

β

and (

W, b

) remain constant, then the
solution of

Q

can be determined using the following equation:

\underset{Q}{\arg min} G (Q) + \frac{ρ}{2} ‖ Q - W ‖_{F}^{2} + tr [β^{T} (Q - W)]

.
By solving Eq. (5), we can derive the updating formula for the matrix

Q

during each iteration as follows:

Q = \frac{1}{ρ} S_{λ} (ρ W - β)

,
其中

S_{λ}

是用于奇异值的阈值算子（Cai, Candès, & Shen, 2010）。类似地，通过求解以下方程式获得对 (

W, b

) 的解：

\underset{(W, b)}{\arg min} F (W, b) - tr (β^{T} W) + \frac{ρ}{2} ‖ W - Q ‖_{F}^{2}

.
随后，通过式（6）的解，更新 (

W, b

) 的公式可以表示如下：

W = \frac{1}{ρ + 1} (ρ Q + β + \sum_{i = 1}^{N} γ_{i} y_{i} X_{i})

b = \frac{1}{| Q |} \sum_{i = 1}^{N} (y_{i} - tr (W^{T} X_{i}))

.
当达到指定的最大迭代次数时，参数更新过程终止。此时，获得

Q, β

和 (

W, b

) 的最优解。然后，使用以下决策函数预测新矩阵数据

\tilde{X}

的标签：

\tilde{y} = sign (tr ({\tilde{X}}^{T} W) + b)

4. SMM 用于分类

SMM 保留矩阵输入数据的结构信息，并成功对矩阵进行分类。这激励了研究人员在该领域进行研究。在本节中，我们讨论了一些著名的 SMM 变体，它们在经典 SMM 上具有不同的优势。

4.1. 最小二乘 SMM

经典 SMM 通过结合 Frobenius 范数和核范数以及 hinge 损失来解决优化问题。为了提高 SMM 的效率，借鉴了支持张量机（STM）（Cai, He, Wen, Han, & Ma, 2006; Tao, Li, Hu, Maybank, & Wu, 2005）的启发，提出了改进的最小二乘双 SMM（ILS-TSMM）（Gaoa et al., 2015）。ILS-TSMM 通过包含正则化项来考虑 SRM 原理。它考虑了双思想（Jayadeva, Khemchandani, & Chandra, 2007）和最小二乘框架（Suykens & Vandewalle, 1999）来加速计算。双思想的结合形成了两个非平行的超平面。文章（Gaoa et al., 2015）包含了矩阵数据的线性和非线性分类情况。ILS-TSMM 对应于线性情况，而其与矩阵核函数的结合对应于非线性 ILS-TSMM（NILS-TSMM）。

另一种最小二乘方法，称为双层规划（BP）上的最小二乘 SMM（BP-LSSMM），由 Xia 和 Fan（2016）引入。双层规划（BP）的基本原则是下层问题的参数是上层问题的决策变量，下层问题的最优解对上层问题有响应。BP-LSSMM 的公式使用最小二乘技术求解，与经典 SMM 相比，具有降低的时间复杂度。

此外，提出了结合最小二乘和非平行超平面的思想，命名为用于滚动轴承故障诊断的非平行最小二乘 SMM（NPLSSMM）（Li 等人，2020）。该方法绘制两个非平行超平面，使得超平面与对应其类别的样本之间的距离应
尽可能接近，与其他类数据点的距离应为 1。最小二乘损失被认为是对误分类点施加惩罚。所得优化问题采用 ADMM 方法求解。Li, Shao 等人（2022）考虑了一种用于 SMM 的类似最小二乘方法，命名为最小二乘交互 SMM（LSISMM）。

此外，Liang, Hang, Lei 等人（2022）引入了自适应多模态知识迁移矩阵机（AMK-TMM），该模型在 SMM 中结合了最小二乘损失和迁移学习。利用有限的标记目标数据和等式要求，AMK-TMM 能够自动检测相关多源模型并自适应地加权。AMK-TMM 采用一种多模型适应方法，遵循留一交叉验证（CV）策略，在可用的目标训练数据上自适应地选择多个相关源模型知识。对于训练数据有限的情况，AMK-TMM 提高了 LS-SMM 的泛化能力。

因此，我们基于最小二乘法的概念分析了各种现有的 SMM 模型，这对应于模型训练时间的减少。此外，与经典 SMM 相比，它简化了模型。表 3 概述了 SMM 领域中各种最小二乘变体。这些变体侧重于求解线性方程组，而不是 QPP，这一策略选择显著提高了计算效率，尤其是在处理大规模数据集时。在这些方法中，LSISMM（Li, Shao 等人，2022）表现突出，具有时间复杂度

O (s (min (m^{2} n, m n^{2}) + N m n))

。此外，我们在表 4 中提供了不同最小二乘变体的详细信息。此外，我们将深入探讨 SMM 框架内不同鲁棒和稀疏模型的相关讨论。

4.2. 鲁棒和稀疏 SMM

SMM outperforms SVM in terms of performance due to the preservation of the structural information in the input matrix. However, it considers the hinge loss function and

l_{2}

norm in the objective function which reduces the robustness (Wu & Liu, 2007) and sparsity (Tanveer, Sharma, Rastogi, & Anand, 2021), respectively. Moreover, the input data often contains distortions from measurement artifacts, outliers, and unconventional sources of noise. Consequently, the obtained classifier may have poor performance. Thus, in order to tackle the aforementioned challenges, various variants of SMM have been proposed in the literature. To counter intra-sample outliers, Zheng, Zhu, Heng (2018) proposed a robust SMM (RSMM). It decomposed the input matrix into a latent low-rank clean matrix plus a sparse noise matrix and used only the clean matrix for training, which makes it robust to intra-sample outliers. Also, to enhance the sparseness, it employs the

l_{1}

norm instead of the

l_{2}

norm. The

l_{1}

norm optimization problems tend to drive some of the coefficients to exactly zero and encourage sparse solutions (Tanveer et al., 2021). The time complexity of RSMM is

O (N^{2} p q) \times s

. Following the same concept, Razzak et al. (2023) proposed SMM that simultaneously performs matrix Recovery (SMMRe), a variant of SMM via joint

l_{2, 1}

and nuclear norm minimization. The objective function of SMMRe combines the property of matrix recovery along with low rank and joint sparsity to deal with complex high-dimensional noisy data.

Other approach to build a robust model is by incorporating a robust classification loss function. In light of this, Qian et al. (2019) proposed robust multicategory SMM (RMSMM) which makes SMM robust by using the truncated hinge loss function (Wu & Liu, 2007) rather than hinge loss. The hinge loss is unbounded and can grow indefinitely for outliers away from the optimal hyperplane. In contrast, the truncated hinge loss limits the impact of such outliers by capping the loss at a predefined value. As a result, the truncated hinge loss is resistant to outliers. In a similar way, Gu et al. (2021) proposed ramp sparse SMM model (RSSMM) to improve the robustness of SMM. RSSMM uses
smooth ramp loss function instead of hinge loss, which also limits the maximum loss and weakens the sensitivity to outliers.

潘、徐、郑、童等 (2022) 提出了一种非并行 SMM，命名为双稳健矩阵机 (TRMM)。它具有两个主要特征：首先，它使用截断核范数 (Hong, Wei, Hu, Cai, & He, 2016) 获取重要的结构信息；其次，它采用斜坡损失函数 (Brooks, 2011)，该函数限制对离群值的惩罚，使 TRMM 对离群值具有稳健性。原始 SMM 中使用的核范数可能次优，并且不一定是最佳选择 (Chen, Dong, & Chan, 2013)。因为它在最小化核范数时对所有奇异值进行聚合并同等对待。然而，奇异值的重要性取决于矩阵中低秩细节的重要性 (Liu, Lai, Zhou, Kuang, & Jin, 2015)。这表明较大的奇异值对应于更关键的低秩信息，而较小的奇异值可能源于不太相关的信息 (Dixit, Verma, & Raj, 2020; Jia, Feng, Wang, Xu, & Zhang, 2018)。TRMM 的时间复杂度为

O (min (p^{2} q, q p^{2})) \times s

。

上述大多数模型使用 hinge 损失函数或其任何变体（例如截断 hinge 损失函数），这些函数对噪声敏感且在重采样时不稳定（Huang, Shi and Suykens, 2013）。为克服这一局限性，Feng 和 Xu (2022) 在 SMM 框架中引入了 pinball 损失函数，并提出了 Pin-SMM。它最大化分位数距离而非最短距离，这使得其对噪声具有鲁棒性且在重采样时稳定（Huang, Shi et al., 2013）。然而，由于 pinball 损失函数的无界性，Pin-SMM 对离群值不具有鲁棒性。为使 Pin-SMM 对离群值具有鲁棒性，Li 和 Xu (2024) 将截断 pinball 损失（Shen, Niu, Qi, & Tian, 2017）引入 SMM 并提出了 TPin-SMM。截断 pinball 损失是 pinball 损失函数的有界变体，它在一定阈值后对样本提供固定损失，从而使模型对离群值具有鲁棒性。

为了强调稀疏性，Zheng、Zhu、Qin、Chen 等人（2018）提出了一种变体，称为稀疏 SMM（SSMM）。该方法同时考虑了每个输入矩阵的内在结构以及特征选择的过程。此外，它引入了一个新的正则化项，该项结合了

l_{1}

范数和核范数，以增强稀疏性。SSMM 的时间复杂度为

O (N^{2} p q)

。然而，SSMM 不足以仅仅通过人工提取故障特征和选择有价值特征。为了应对这一挑战，Li、Yang 等人（2021）引入了辛加权稀疏 SMM（SWSSMM）。该方法结合了辛几何的原理来生成一个辛系数矩阵（SCM），作为故障特征的表示。该方法有效地减轻了输入矩阵中噪声的影响。此外，引入了基于变分熵的权重系数，并将其应用于 SCM，以增强故障特征的表示。

此外，为发展适用于大规模问题的稀疏 SMM，Wang 等人（2022）提出了稀疏范数矩阵机（SNMM），该算法构建了一对非平行超平面，并结合

l_{1}

范数距离的 hinge 损失函数。SNMM 的优化问题避免了求矩阵逆的需求，使其适用于大规模问题。然而，它未考虑过去信息对当前检测的影响。为克服这一局限，Li 等人（2022）提出了基于自相关函数的稀疏 SMM（ACF-SSMM）。该算法同样使用

l_{1}

范数，并对输入的前/当前数据进行自相关函数处理。这一过程封装了过去信息如何影响当前检测，并考虑了输入矩阵固有的记忆特性。

Here, we delved into a comprehensive analysis of robust and sparse variants within the realm of SMM models. By examining these nuanced approaches, we gained valuable insights into enhancing model robustness and promoting sparsity. To enhance robustness against outliers or noise, two primary strategies are considered in the literature. First, decompose the input matrix into clean and noise matrices and use only the clean matrix for training (Razzak et al., 2023; Zheng, Zhu, Heng,
2018). Second, to integrate robust loss functions, such as truncated hinge/pinball loss, to limit the impact of outliers and enhance noise resistance (Gu et al., 2021; Qian et al., 2019). On the other hand, to enhance the sparsity, a key approach involves introducing regularization terms that combine the

l_{1}

norm (Wang et al., 2022; Zheng, Zhu, Qin, Chen et al., 2018). Further, introducing novel regularization terms that encourage sparse solutions by driving some coefficients to zero may be incorporated to enhance sparsity. Table 5 presents an overview of the various robust and sparse variants of SMM.

4.3. SMM for multi-class classification

在原始公式中，SMM 是为二元分类问题设计的；然而，现实世界中的大多数问题基于多类分类（Franc & Hlavác，2002）。为了应对这一挑战，Zheng、Zhu、Qin 和 Heng（2018）开发了一个名为多类 SMM（MSMM）的模型。这种方法结合了多类 hinge 损失函数以及结合平方 Frobenius 范数和核范数的正则化项。多类 hinge 损失函数的公式扩展了 margin rescaling 损失（Joachims、Finley & Yu，2009）的概念，以适应矩阵形式数据。为了提高 MSMM 的分类性能，Razzak、Blumenstein 和 Xu（2019）提出了一种新的多类 SMM（M-SMM）。它由二元 hinge 损失和弹性网络惩罚融合而成。二元 hinge 损失采用

C

函数来模拟多个二元分类器，从而避免计算每对可能类别之间的支持向量。然而，MSMM 和 M-SMM 对异常值不鲁棒。为了为多类问题开发鲁棒的 SMM 变体，Qian 等人（2019）提出了鲁棒多类 SMM（RMSMM）。它基于基于角度的分类框架（Zhang & Liu，2014）构建，并嵌入了一个截断的铰链损失函数（Wu & Liu，2007）。多类分类的常用方法是用

C

分类函数来表示

C

类别。然而，基于角度的分类框架需要训练

C - 1

分类函数，因此它遵循更快的计算速度（Sun, Craig, & Zhang, 2017）。RMSMM 的非凸优化问题通过 DCA 算法解决。

借鉴协同进化（Rosales-Perez, García, Terashima-Marin, Coello, & Herrera, 2018）的思路，Razzak (2020) 提出了一种多类稀疏 SMM（MSMM-CE）方法，该方法以协同方式将 SMM 的多类问题分解为二分类子问题。MSMM-CE 的目标函数包含多个组件的融合。首先，它整合了一个二分类 hinge 损失项以促进模型拟合。此外，它还引入了 Frobenius 范数和核范数作为正则化惩罚项，旨在鼓励模型中的低秩性和稀疏性。此外，目标函数还包括一个额外的惩罚组件，用于惩罚多类分类中的错误。

上述模型忽略了不同样本的显著重要性，使得使用超平面进行类别分割变得困难。为应对这一挑战，Pan、Xu、Zheng、Su 和 Tong（2022）在矩阵建模领域引入了一种新方法，利用模糊理论（Hüllermeier，2005）构建了多类模糊 SMM（MFSMM）。这一创新框架结合了非平行超平面的原理和模糊属性，以优化任意两个模糊超平面之间的分离区间。模糊平面的引入为单个训练样本分配了不同的隶属度，有效降低了噪声的影响。值得注意的是，这标志着模糊理论在 SMM 开发中的首次应用。

在前述讨论中，我们考察了多类分类模型的发展现状。每种变体都具有独特的特征，并针对 SMM 领域内多类分类的特定挑战。以下是对多类 SMM 变体的比较分析：

表6
多类分类的 SMM 变体。

模型	作者	特征	损失函数	数据集	优点	解决方法
MSMM（2018）	郑珠，秦恒（2018）	包含多类 hinge 损失项和涉及 Frobenius 范数与核范数的正则化项。	多类 hinge 损失	BCI III 的 IIIa 数据集和 BCI IV 的 IIa 数据集	提升了多任务 BCI 系统的性能。	ADMM
M-SMM (2019)	Razzak、Blumenstein 等人 (2019)	最大化类内间隔，并采用 $C$ 函数模拟所有二元分类器，而非计算每两个类别之间的支持向量。	二元铰链损失	BCI III 的 IIIa 数据集和 BCI IV 的 IIa 数据集	提高了多类问题的分类性能。	ADMM
RMSMM (2019)	钱等 (2019)	采用基于角度的分类框架，将二元和多类问题压缩到一个框架中。	截断 hinge 损失	日常生活和体育活动数据集（Altun & Barshan，2010）	具有更好的预测性能和更快的计算速度。	不精确的近端 DC 算法
MSMM-CE (2020)	Razzak (2020)	解决多类分类问题，并减少数据冗余。	铰链损失	BCI 竞赛基准脑电图数据集（IIIa 和 IIa）	通过单步寻找支持向量来解决多类分类问题。	进化技术
MFSMM (2022)	Pan, Xu, Zheng, Su 等人 (2022)	引入模糊属性为不同样本分配不同的隶属度。	Hinge Loss	AHUT 的滚动轴承故障数据集和湖南大学（HNU）的滚动轴承故障数据集	在存在噪声的情况下提高多类分类的性能。	SOR

表7
不平衡学习中的 SMM。

模型	作者	特征	损失函数	数据集	优点	解决方法
ESMM（2017）	Zhu（2017）	采用基于熵的模糊隶属度。	铰链损失	KEEL 不平衡数据集（Derrac、Garcia、Sanchez 和 Herrera，2015）	在不平衡数据集上具有更好的泛化性能。	ADMM
CWSMM（2021）	Li、Cheng、Shao、Liu 和 Cai（2021）	针对不同类别样本使用不同的惩罚因子，基于先验知识分配置信度权重，并提出了基于 D-S 证据理论融合的 CWSMM。	铰链损失	由 Spectra Quest, Inc., Richmond, USA 生产的定制旋转机械数据集	增强了对不平衡数据的鲁棒性。	ADMM
DPAMM (2022)	Xu, Pan, Zheng, Liu, and Tong (2022)	自适应低秩正则化方法被引入以获取低秩信息，并自适应地选择与矩阵高度显著相关数据相关的奇异值。	铰链损失	belevel 齿轮滚子轴承故障仿真试验台数据集，从固定轴滚子轴承试验台收集的数据集，CWRU 提供的轴承数据集	提高了在非平衡数据集上的性能。	SOR

对异常值的鲁棒性：MSMM（Zheng, Zhu, Qin, Heng, 2018）和 M-SMM（Razzak, Blumenstein 等人，2019）对异常值的鲁棒性有限，而 RMSMM（Qian 等人，2019）和 MSMM-CE（Razzak, 2020）被设计得更加鲁棒。
计算效率：RMSMM（Qian 等人，2019 年）因其采用基于角度的分类框架而脱颖而出，能够实现更快的计算。
模糊理论的引入：MFSMM（Pan、Xu、Zheng、Su 等人，2022 年）创新性地将模糊理论应用于类别分割，解决了不同样本重要性的挑战。

这种比较分析提供了关于每个变体的优势和独特特征的见解，有助于为特定的多类别分类挑战选择最合适的模型。表 6 展示了 SMM 的多类别分类变体概述。现在，我们将注意力转向那些为解决类别不平衡这一迫切挑战而演变的模型。

4.4. 针对不平衡学习的 SMM

在实际问题中，不平衡数据集非常普遍，其中一个类别（少数类）的数据样本显著少于其他类别或类别（多数类或类别）（Ganaie, Tanveer and Lin, 2022）。这种不平衡模式可以在各种环境中观察到，例如医学诊断数据集（Majid, Ali, Iqbal, & Kausar, 2014）、电子邮件垃圾邮件检测任务（Tang, Krasser, Judge, & Zhang, 2006）等等。类别不平衡在使用原始 SMM 时可能会带来困难，因为它可能倾向于多数类，并难以有效地从少数类中学习（Richhariya & Tanveer, 2020）。为了处理不平衡数据集，Zhu（2017）提出了一种基于熵的 SMM（ESMM）。它结合了一种基于熵的模糊隶属度评估技术，特别关注模式中确定性的重要性。因此，ESMM 不仅保证了少数类的显著性，而且更重视具有更高类别确定性的模式。因此，当面对不平衡数据集时，ESMM 能够生成一个

表8
对 SMM 的不同深度变体的比较。

特性	DSSMM（Hang 等人，2020 年）	DSFR（Liang，Hang，Yin 等人，2022 年）	DST-LSSMM（Hang 等人，2023）	DSPTMM（Pan、Sheng 等人，2023）
参数调整	前馈替代通过反向传播进行参数微调。	参数微调替代反向传播。	前馈替代通过反向传播进行参数微调。	-
每层输入数据	每一层的预测随机投影会修改原始特征，并将其传递到下一层。	将原始数据作为输入	原始输入数据与前一层的输出随机投影值一起被输入到下一层。	PTMs 检索前一层的输出，进行随机投影，将其与输入结合，并传递到下一层。
优化问题	凸	凸	凸	凸
损失函数	铰链损失	铰链损失	平方损失	保形损失
考虑的特征	预提取信息	用于高层特征提取的 CSP	预提取信息	源域和目标域的标注样本
计算复杂性	$O (L s N^{2} p q)$	$O (L (N^{3} + s N^{2} p q))$	-	$O (L (N^{3} + s N^{2} p q))$

表9
SMM 的深度变体。

模型	作者	特征	损失函数	数据集	优点	解决方法
DSSMM (2020)	韩等 (2020)	继承了强大的深度表征学习能力。	Hinge Loss	BCI 竞赛 III 数据集 IVa，BCI 竞赛 IV 数据集 IIb，BCI 竞赛 IV 数据集 IIa，下肢 MI-BCI 数据集	涉及高效的顺向传播而非参数微调与反向传播，导致凸优化问题。	ADMM
DSFR (2022)	Liang, Hang, Yin 等 (2022)	DSFR 的基础构建模块由特征解码模块组成，其中 CSP 作为特征提取器，SMM 作为分类器。	铰链损失	BCI 竞赛 III 的 Dataset IVa，BCI 竞赛 IV 的 Dataset IIb，BCI 竞赛的 Dataset IIa	直接接受原始脑电图数据作为输入，并自动学习特征表示。为了提高分类性能，FDM 可以从脑电图特征矩阵中收集结构数据。	ADMM
DST-LSSMM（2023）	韩等 (2023)	深度堆叠网络使用 LSSMM 作为基础构建单元，前一层的投影被用作堆叠元素。	最小二乘损失	基于互信息（MI）的脑电图（EEG）竞赛数据集，包括 BCI 竞赛 III 中的数据集 IIIa、数据集 iva，以及一个自收集的数据集，该数据集包括下肢基于互信息的脑机接口（BCI）数据集（Lei 等人，2019 年）	与深度学习模型相比，它克服了非凸性，并且需要较少的训练数据。多层结构利用了自适应学习的优势。	交替迭代方法
DSPTMM（2023）	潘胜等 (2023)	深度堆叠网络使用 pinball 传递模块作为基本构建单元，并使用随机投影作为堆叠元素。	保形损失	用于滚动轴承故障诊断	克服了非凸性，与深度学习模型相比，需要较少的训练数据。利用自适应学习，pinball 损失导致模型具有鲁棒性。	ADMM

| Model | Author | Characteristics | Loss function | Datasets | Advantages | Technique to solve | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | DSSMM (2020) | Hang et al. (2020) | Inherits the powerful capability of deep representation learning. | Hinge Loss | BCI competition III Dataset IVa, BCI Competition IV Dataset IIb, BCI Competition IV Dataset IIa, Lower Limb MI-BCI Dataset | Involves an efficient feed-forward rather than parameter fine-tuning with backpropagation, leads to a convex optimization problem. | ADMM | | DSFR (2022) | Liang, Hang, Yin et al. (2022) | Base building blocks of DSFR consist of feature decoding modules which have CSP as a feature extractor and SMM as a classifier. | Hinge loss | Dataset IVa of BCI competition III, Dataset IIb of BCI competition IV, Dataset IIa of BCI competition | Directly accepts the raw EEG data as input and automatically learns the feature representations. To improve classification performance, FDM can collect structural data from the EEG feature matrix. | ADMM | | DST-LSSMM (2023) | Hang et al. (2023) | Deep stacked network uses LSSMM as the base building unit and the projection of the previous layer is used as the stacking element. | Least square loss | MI-based EEG competition datasets which includes dataset IIIa, dataset iva in BCI competition III, and a self-collected dataset which includes lower limb MI-based BCI dataset (Lei et al., 2019) | Overcomes non-convexity and requires less training data in comparison to deep learning models. Multiple layers take the privilege of adaptive learning. | Alternating iterative method | | DSPTMM (2023) | Pan, Sheng et al. (2023) | Deep stacked network uses pinball transfer module as the base building units and random projections as the stacking element. | Pinball loss | Used in the roller bearing fault diagnosis | Overcomes non-convexity, requires less training data in comparison to deep learning models. Utilizes adaptive learning, pinball loss leads to a robust model. | ADMM |

比传统 SMM 具有更大适应性的决策表面。

在解决不平衡数据集带来的挑战方面，Li 等人 (2021) 的研究是一项重大进展。他们引入了一个名为置信度加权 SMM (CWSMM) 的融合框架，该框架采用针对单个类别样本的动态惩罚因子。此外，他们整合了与矩阵样本相关的先验知识，以制定分配置信度权重的策略。这种策略性方法显著有助于提高系统的整体鲁棒性。值得注意的是，他们利用邓普斯特-沙弗证据理论 (D-S) (Tang 等人，2020) 来和谐地整合 CWSMMs 派生的概率输出与各种测量方法。

The aforementioned imbalance models utilize the nuclear norm to represent the low-rank characteristics within each matrix data. However, the approach of nuclear norm minimization might not be optimal (Liu, Zhou et al., 2019; Zhang, Lei, Pan and Pedrycz, 2021),
resulting in certain limitations in capturing weak correlation information. To address this concern, Xu et al. (2022) introduced dynamic penalty adaptive matrix machine (DPAMM). This approach is built upon the adaptive framework for minimizing low-rank approximations (Gao et al., 2016). The technique dynamically selects and retains singular values that pertain to highly significant correlation data within the input matrix. Moreover, within the loss function component, a dynamic penalty factor is incorporated, enabling the adjustment of the penalty severity for samples based on the degree of imbalance. The computational complexity of DPAMM is

O (N^{2} p q) \times s

所讨论的每种方法都以独特的方法论应对了 SMM 中不平衡数据集带来的挑战。ESMM（Zhu，2017）侧重于模糊隶属度评估，确保少数类的显著性。CWSMM（Li，Cheng 等人，2021）动态分配惩罚因子，并整合先验知识以增强鲁棒性。DPAMM（Xu 等人，2022）动态选择并保留重要

表10
不同 SMM 变体的比较。

模型	SRM 原则	决策超平面	损失函数	CV 五/十	线性/非线性变体	计算复杂性
SMM（Luo 等人，2015 年）	$✓$	一个	铰链损失	-	线性	-
PTSMM（Xu, Fan, & Gao, 2015）	$✓$	一对投影轴。	铰链损失	五折交叉验证	非线性	-
TMRSMM（高，范，和，徐，2016）	$x$	两个非平行的超平面。	铰链损失	五折交叉验证	Both linear and non-linear	-
QSMM（Duan 等人，2017 年）	$✓$	One	least square loss	-	线性	$O (κ^{3} ϵ^{- 3} (\log (N p q)))$ $+ O (\log (p q))$
MRMLTSMCM (姜 & 杨, 2018)	$✓$	Two non-parallel hyperplanes.	铰链损失	五折交叉验证	线性	-
KSMM（Ye，2019）	$✓$	一	铰链损失	CV	非线性	$O (N^{2} s m n^{2})$
MDSMM（Ye & Han，2019）	$✓$	一	铰链损失	十折交叉验证	线性情况	$O (N^{2} s (p q + p + q))$
WSMM（Maboudou-Tchao，2019）	$✓$	一	铰链损失	-	非线性	-
KL-SMM（Chen 等人，2020 年）	$✓$	一	铰链损失	五折交叉验证	线性	$O (s N^{2} p q)$
WOA-SMM（Zheng, Gu, Pan, & Tong, 2020）	$✓$	One	铰链损失	WOA	线性	-
IQSMM (Zhang, Song and Wu, 2021)	-	One	-	-	线性	$O (κ^{2} \log^{1.5} (κ / ϵ)$ $\log (N p q)) + O (\log (p q))$
PSMM（张和李，2022）	$✓$	One	Least square loss	-	线性	-
NPBSMM（Pan, Xu, Zheng 和 Tong, 2023）	$✓$	Constructs non-parallel hyperplanes.	铰链损失	五折交叉验证	线性	$O (s N (p + q) K)$

correlation data, improving low-rank approximation. A brief comparative analysis among the class-imbalance SMM variants is provided below:

Handling Class Imbalance: ESMM (Zhu, 2017) and CWSMM (Li, Cheng et al., 2021) specifically address the challenges of imbalanced datasets by ensuring the significance of the minority class and employing dynamic penalty factors tailored to individual class samples.
Adaptive Framework: DPAMM (Xu et al., 2022) utilizes an adaptive framework for minimizing low-rank approximations, contributing to the retention of highly significant correlation data within the input matrix.

This analysis provides insights into the strengths and distinctive features of each variant. Depending on the specific characteristics of the dataset and requirements of the application, practitioners can choose the most suitable approach for effectively handling class-imbalance in SMM. Table 7 showcases a summary of the various class-imbalance variants of SMM.

Having traversed an extensive expanse of SMM variants in the preceding subsections that encompass least squares approaches, robust and sparse formulations, multi-class classification paradigms, and strategies to address class-imbalance. In the next subsection, we transition our discussion to deep variants of SMM.

4.5. SMM 的深度变体

SMM 的浅层变体在分类矩阵数据方面表现良好，然而，它们没有探索用于自动学习数据的强大堆叠泛化原理。在本小节中，我们通过使用堆叠泛化的不同层概念（Breiman，1996；Wolpert，1992）来突出 SMM 模型的深度变体用于分类。

深度堆叠 SMM（DSSMM）（Hang 等人，2020）具有不同层的 SMM，这些层保留了数据的结构信息。

The random projections of the predictions of each layer modify the original feature and are passed to the next layer of DSSMM. Instead of parameter fine-tuning via backpropagation, DSSMM uses an effective feed-forward method, where each layer forms a convex optimization problem. Also, hinge loss for penalizing misclassification leads to the effectiveness of the model (Vinyals, Jia, Deng, & Darrell, 2012). Though DSSMM has better performance than SMM, the training of DSSMM model has the issue of using the pre-extracted information which may degrade the classification accuracy of the model as the pre-extracted information may not be sufficient or may be corrupted by noise. The following models overcome the said issue.

大多数基于深度堆叠网络（DSN）的模型采用预提取特征作为输入，当信息丰富的神经模式不能被输入特征充分表示时，这会对学习高级特征表示产生不利影响。此外，如果预提取特征不准确，分类过程也会受到影响。Liang、Hang、Yin 等人（2022）引入了使用公共空间模式（CSP）和 SMM 的深度堆叠特征表示（DSFR），该技术通过允许模型自行获取高级表示和抽象来解决上述问题。DSFR 的基本单元是具有 SMM 作为分类器和 CSP 作为特征提取器的特征解码模块。堆叠元素是前一层输出的随机投影。与 DSSMM 类似，DSFR 使用前馈方法而不是反向传播进行微调。
为了克服使用不足数据训练模型的问题，引入了深度堆叠转移最小二乘支持矩阵机（DST-LSSMM）（Hang 等人，2023）。它将 LSSMM 作为 DSN 的基础构建模块，并将前层的随机投影作为其堆叠元素。原始输入数据以及前层输出值的随机投影被输入到下一层。所需的模型参数从较低层获取。

摘要

采用自适应多层模型知识迁移学习系统，使构建高层模型更加容易。它还使用高效的顺向传播方法，而不是微调和参数预训练。-为了增强深度 SMM 模型对噪声的鲁棒性，提出了深度堆叠皮纳球迁移矩阵机（DSPTMM）（Pan, Sheng 等人，2023）。DSPTMM 的基础构建模块和堆叠元素分别是皮纳球迁移模块（PTM）和随机投影。在 DSPTMM 中，PTM 用于获取前一层输出，然后进行随机投影并与原始输入矩阵结合，输入到 DSPTMM 的下一层。使用皮纳球损失最大化分位数距离，并提供噪声鲁棒性（Huang, Shi 等人，2013）。

深度 SMM 变体在训练数据不足以及训练数据被异常值/噪声污染时，都提升了其分类性能。因此，深度 SMM 变体被证明是矩阵分类领域内一个有效的进展。我们在表 8 中比较了不同的深度 SMM 变体。此外，为了简要概述深度变体模型，我们构建了表 9。将我们的语境转移到其他 SMM 变体上，我们在下一节进行讨论。

4.6. 其他 SMM 变体

在本小节中，我们的讨论集中在与经典 SMM 相比在提升泛化性能方面起主要作用的现有不同 SMM 变体。它们可以列为：

量子 SMM：SMM 的时间复杂度为 $O (poly (N, p q))$ (Duan 等人, 2017)。如果训练输入矩阵的大小或训练样本的数量很高，复杂度就会很高。为了克服这一限制，引入了用于 SMM 的量子算法（QSMM）(Duan 等人, 2017)。在 QSMM 中，SMM 的 QPP 被使用最小二乘技术转换为线性规划问题。得到的最小二乘公式通过量子矩阵求逆（QMI），即 Harrow-Hassidim-Lloyd（HHL）技术（Harrow, Hassidim, & Lloyd, 2009）求解。然后采用量子奇异值阈值化（QSVT）来求解奇异值阈值化（SVT）。HHL 和 QSVT 这两个关键步骤的时间复杂度为 $O (\log (N p q))$ 和 $O (\log (p q))$ ，比经典 SMM 呈指数级改进。Zhang, Song 等人(2021)在 2021 年提出了 QSMM 的改进版本（IQSMM），题为“支持矩阵机的一种改进量子算法”，尽管他们使用了 HHL 算法，但使用了改进的 QMI。更精确地说，QMI（HHL）算法的复杂度为 $O (κ^{3} ϵ^{- 3} \log (N p q))$ (Zhang, Song 等人, 2021)，在改进的 QSMM 中变为 $O (κ^{2} \log^{1.5} (κ / ϵ) \log (N p q))$ 。
非线性核 SMM：受矩阵希尔伯特空间（Ye，2017）的启发，Yunfei Ye 引入了非线性核 SMM（KSMM）（Ye，2019），该方法包含矩阵形式的内积以保留矩阵数据中的结构信息。矩阵形式的内积定义为在線性和非线性情况下均应用 $⟨ ⟨ X, X ⟩_{H}, \frac{V}{‖ V ‖} ⟩ \geq 0$ 。KSMM 获得的优化问题使用 SMO（Platt，1999）上的渐近收敛算法求解，而不是像经典 SMM 那样使用 ADMM。它利用矩阵数据的结构信息，通过计算输入训练矩阵的加权平均距离来构建超平面，这与传统的核方法（Luo 等人，2015）不同。
SMM 的小波核：小波技术在对非平稳信号进行分类和逼近方面具有潜力（Zhang & Benveniste，1992），因此可以与分类技术结合以提高泛化性能。受小波支持向量机（Zhang，Zhou，
& Jiao, 2004 年)，提出了用于 SMM（WSMM）（MaboudouTchao, 2019）的小波核。使用小波核引入的两种 SMM 变体分别是墨西哥帽小波核的 SMM 和 Morlet 小波核的 SMM。
近邻 SMM：基于近邻 SVM（Fung & Mangasarian, 2001），Zhang 和 Liu（2022）引入了另一种 SMM 变体，称为近邻 SMM（PSMM）。PSMM 的目标是最小化每个近邻平面与其对应类别的欧几里得距离，同时遵循 SMM 的概念，最小化回归矩阵的核范数。获得的 PSMM 公式比经典 SMM（Luo 等人，2015 年）更简单，并且比 SMM 具有更高的效率。这些特性使 PSMM 在复杂图像分类中非常有效。
投影双机 SMM：正则化投影双机 SVM（Shao, Wang, Chen, & Deng, 2013）实现了 SRM 原理，并找到两个投影方向，每个类别一个，使得每个类别的投影样本彼此很好地分离。Xu 等人（2015）采用了寻找投影方向到二阶张量的思想，并将其命名为线性投影双机 SMM（PTSMM）。对于每个类别，PTSMM 找到一个投影轴，使得投影样本的类内方差最小，而其他类别的投影样本尽可能分散。文章（Xu 等人，2015）还包含了一种新的矩阵核函数，用于 PTSMM 的非线性情况。所得的 PTSMM 的 QPP 通过使用 SOR 技术求解。
双多重秩 SMM：通过 Tucker 张量分解（Kotsia & Patras, 2011）将高阶张量数据转换为矩阵数据，导致多重秩矩阵。为了在线性和非线性情况下对多重秩矩阵进行分类，分别提出了线性双多重秩 SMM（LTMRSMM）（Gao 等人，2016）和非线性 TMRSMM（NTMRSMM）（Gao 等人，2016）。基于多重秩多线性 SVM（MRMLSVM）（Hou, Nie, Zhang, Yi, & Wu, 2014）的思想，引入了 LTMRSMM，该模型用回归矩阵的奇异值分解（SVD）的左奇异向量和右奇异向量替换 MRMLSVM 的投影向量，并形成 QPPs。NTMRSMM 使用非线性 PTSMM（Xu 等人，2015）的矩阵核函数，并使用左奇异向量和右奇异向量求解 QPPs。
由蒋和杨（2018）提出了一种对多秩矩阵进行分类的另一种方法，命名为多秩多线性双支持矩阵分类机（MRMLTSMCM）。与 LTMRSMM 使用奇异向量不同，MRMLTSMCM 通过使用一对投影矩阵来构建每个决策函数。此外，它比 TWSVM（Jayadeva 等人，2007）更高效，克服了过拟合问题并提高了分类精度。为了最小化类内样本距离的平方，MRMLTSMCM 使用 2-范数距离度量。由于平方运算，离群点的距离会被放大。因此，MRMLTSMCM 对离群点敏感。此外，MRMLTSMCM 的对涉及矩阵的求逆，使其不适用于大规模数据。为了克服上述问题，引入了非平行有界 SMM（NPBSMM）（Pan 等人，2023），由于优化问题中的约束范数组（CNG）的影响，离群点的影响被减少。CNG 由类内样本的 1-范数距离和 hinge loss 组成，抑制了离群点对模型的影响，并导致稀疏模型。因此，NPBSMM 适用于大规模数据，因为它避免了矩阵求逆，并且比 SMM 具有更好的泛化性能（Luo 等人，2015）。
使用迁移学习的 SMM：通过在 SMM 中结合迁移学习（Weiss，Khoshgoftaar 和 Wang，2016），Chen 等人（2020）提出了基于知识利用的 SMM（KL-SMM）。除了在训练中使用目标域的数据外，KL-SMM 还使用来自源域模型的信息。因此，
弥补了由于目标数据标记不足而导致的训练缺陷。此外，间接使用源域知识有助于保护隐私。此外，从源模型到目标模型的结构信息传播增强了模型的泛化能力。然而，KL-SMM 使用 hinge loss，该损失考虑了平面与数据点之间的最短距离，并且对特征噪声敏感（Huang 等人，2013）。在 KL-SMM 中结合 pinball loss 导致了 pinball 迁移 SMM（Pin-TSMM）（Pan 等人，2022），它保留了 KL-SMM 的优点，同时减少了噪声对超平面影响的程度。
多同步快速傅里叶变换（MSST）与鲸鱼优化算法（WOA）-SMM：经典的 SMM（Luo 等人，2015 年）有三个参数：权衡参数（ $ζ$ ）、ADMM 方法的超参数（ $ρ$ ）、核范数约束（ $λ$ ）。最佳参数的选择是模型最终性能的关键步骤。WOA（Mirjalili 和 Lewis，2016 年）是一种自适应参数选择技术，将其与 SMM 结合引入了 WOA-SMM（Zheng 等人，2020 年）。WOA-SMM 中用于优化 SMM 参数的迭代算法能够自适应地获取最佳值，并解决了 SMM 中参数主观设置的问题。WOA-SMM 具有操作简单、收敛速度快、收敛精度好、调整参数少等优点。构建输入矩阵到 WOA-SMM 的特征非常重要。如果将原始信号的时间-频率（TF）特征直接输入，WOA-SMM 可能无法收敛。为处理此问题，采用 MSST（Yu, Wang, & Zhao, 2018），这是一种通过应用多次同步压缩变换（Thakur & Wu, 2011）操作来提高 TF 表示的能量集中度的迭代重新分配程序。MSST 通过提取时间消耗较少且计算成本降低的 TF 域特征来构建特征矩阵。MSST 是一种迭代重新分配程序，通过应用多次同步压缩变换（Thakur & Wu, 2011）操作来提高 TF 表示的能量集中度。
多距离 SMM：SMM 通过正则化回归矩阵来捕获矩阵数据的结构信息。Ye 和 Han（2019）提出了多距离 SMM（MDSMM），这是另一种捕获结构信息的技巧。从几何角度来看，MDSMM 与 SMM 在优化问题方面有所不同，该问题包含了多距离的概念（Ye & Han, 2019），并使用基于向量的距离来量化成本函数和惩罚函数。此外，为多距离数组中的条目分配适当的权重决定了它们的相对重要性。与 SMM 相比，MDSMM 提高了泛化性能并加快了训练速度。

因此，经典 SMM 中融入的不同概念使其几何结构得到改进，同时泛化性能也有所提升。在表 10 中，我们考察了不同 SMM 变体的差异。为了更好地展示模型的特性，不同的变体总结在表 11 中。

5. 用于回归的 SMM

The concept of plane-based learning applied to regression problems is termed as support vector regression (SVR) (Smola & Schölkopf, 2004). It is rigorous and has convex QPP applied to find the global optimal solution, which resolves the local minima issue that the neural network model faces (Tang, Ma, Hu, & Tang, 2019). Taking motivation from SVR, Yuan and Weng (2021) introduced support matrix regression (SMR). SMR (Yuan & Weng, 2021) applies the idea of matrix input to the regression problems along with preserving the structural information of the matrix data. The objective of SMR is to maximize the
margin and minimize the squared Frobenius norm of the matrix, hence reducing the sensitivity of the regression to noisy data and leading to robustness against noise. SMR overcomes the lack of physical degradability in the input data and is effective for time asynchronization issues.

6. SMM for semi-supervised learning

原始的 SMM 模型高度依赖于大量的标记数据集。然而，在实际应用中，标记数据并不常见，这会由于缺乏监督信息而降低模型的性能（Bennett & Demiriz, 1998）。为了克服这一局限性，半监督学习（SSL）（Reddy, Viswanath, & Reddy, 2018）方法受到了研究人员的极大关注。它使用标记数据和非标记数据。通过采用 SSL 方法，Li 等人（2023）首先提出了一种新的半监督概率 SMM（SPSMM）。在 SPSMM 中，设计了一种概率输出策略，用于计算每个输入数据的特定类别概率。此外，为了解决标记样本不足的问题，采用基于 SSL 的框架来仔细选择具有显著置信度的非标记样本，以分配伪标签。表 12 展示了 SMM 的回归和半监督变体的总结。

图 2 展示了 SMM 随时间的演变和发展。

7. 应用

SMM 因其处理复杂和高维数据的卓越能力而备受关注。SMM 的应用广泛涉及处理多维数据的领域，这些数据可以表示为矩阵。在许多分类问题中，如脑电图分类、故障诊断和图像分类，输入特征具有高维性，并以矩阵形式表示。SMM 通过关联行和列提供的有效信息来封装特征矩阵的结构信息。在本节中，我们讨论 SMM 在各个领域的多样化应用，展示其在解决实际问题和未来发展潜力方面的有效性。

7.1. 脑电图信号分类

脑电图分类在各种领域发挥着关键作用，包括神经科学（Srinivasan，2007 年）、临床诊断（Praline 等人，2007 年）、脑机接口（Värbu，Muhammad，& Muhammad，2022 年）等。SMM 在分类脑电图信号方面的应用已显示出良好的结果，有助于分析和研究脑活动。

脑电图（EEG）利用传感器捕捉由大脑中数十亿神经元同步活动产生的动态电活动。随着时间的推移，已经开发出多种获取脑电图信号的技术，并且有多种传感器类型可供使用，例如湿电极、干电极和无线脑电图系统（Tyagi, Semwal, & Shah, 2012）。脑电图数据的维度特性使其理解和分析具有挑战性，需要全面的理解和学习过程（Mumtaz, Rasheed, & Irfan, 2021）。这使得社交媒体挖掘（SMM）成为脑电图分类任务的合适选择。用于执行 SMM 变体应用的不同脑电图数据集列于表 13。

对单次试验脑电图（EEG）信号进行分类是一项具有挑战性的任务，需要采用不同的技术来提高信号质量。这些技术旨在最小化 EEG 数据中的噪声、测量伪影、离群值和不相关信息（Lotte, Congedo, Lécuyer, Lamarche, & Arnaldi, 2007），最终提高后续分析的准确性。为了提高信号质量，滤波起着重要作用。常见空间模式（CSP）（Devlaminck, Wyns, Grosse-Wentrup, Otte, & Santens, 2011）

表11

其他 SMM 变体。
模型	作者	特征	损失函数	数据集	优点	解决方法
SMM (2015)	Luo 等人 (2015)	具有 Frobenius 范数和核范数的谱弹性网络惩罚。	铰链损失	脑电图酒精中毒，脑电图情绪，学生面部和 INRIA 人脸	保持矩阵内的相关性。	ADMM
PTSMM (2015)	Xu 等人 (2015)	寻求两类之间的投影轴，使得每类的类内方差最小，同时尽可能使其他类的投影样本分散。	铰链损失	使用 ORL、YALE 和 AR 人脸数据库进行二维图像分类	采用新的矩阵核函数处理非线性情况。考虑 SRM 原理。	使用 SOR 方法求解 QPP。
LTMRSMM、NTMRSMM（2016）	高等人（2016 年）	处理具有多重秩的矩阵数据。	铰链损失	Feret、ORL、FingerDB、Palm100 和 Ar	相比多重秩矩阵向量化方法，计算成本更低。	迭代求解 QPP。
QSMM (2017)	Duan 等人 (2017)	通过引入最小二乘损失，将 SMM 的 QPP 转化为线性方程组的求解问题，并使用量子矩阵求逆（HHL）和 QSVT 进行求解。	平方损失函数	-	相较于经典 SMM，速度呈指数增长。复杂度： $O (κ^{3} ϵ^{- 3} (\log (N p q))) + O (\log (p q))$ ，而 SMM 的复杂度为 $O (poly (N, p q))$ 。	HHL 和 QSVT 算法
MRMLTSMCM (2018)	Jiang and Yang (2018)	TWSVM 的扩展，使用投影矩阵对来获得非平行超平面。	铰链损失	UCI 数据集：Sonar、CMC、Hill-valley、Ionosphere、Madelon、Pedestrian、Pollen、FingerDB、Binucleate、RGB	比 TWSVM 更高效，实现 SRM 原理。	交替优化获得的 QPP。
KSMM（2018）	Ye (2019)	通过计算加权平均距离生成基于矩阵的超平面。	铰链损失	ORL 人脸数据库、谢菲尔德人脸数据集、哥伦比亚物体图像库（COIL-20）和二进制 alpha 数字	矩阵形式的内积利用了矩阵数据的结构信息，并在不使用交替投影方法的情况下解决了优化问题。	SMO
MDSMM (2019)	Ye 和 Han (2019)	引入多距离来提取输入矩阵的内在信息，并使用基于向量的距离来量化成本函数和惩罚函数。	铰链损失	IMM 人脸数据集、日本女性面部表情（JAFFE）数据集（Lyons, Akamatsu, Kamachi, Gyoba 和 Budynek, 1998）、Jochen Triesch 静态手部姿态数据集（von der Malsburg, 1996）、哥伦比亚物体图像库 COIL-20（Nene, Nayar, & Murase, 1996）以及哥伦比亚物体图像库 COIL-100（Nene 等人, 1996）	提高了泛化性能。	交替投影法用于解决优化问题。
WSMM（2019）	Maboudou Tchao（2019）	为非线性情况引入小波核。	铰链损失	脑电图酒精成瘾数据集，INRIA 人体数据集	在矩阵空间中获得 Mercer 核。提高了在 EEG 和 INRIA 数据集上的性能。	使用二次规划软件求解 QPP。
KL-SMM (2020)	Chen 等人 (2020)	Uses the concept of transfer learning.	铰链损失	基于运动意象的脑电图数据集	Improved generalization capability of a target domain by leveraging information from the source domain.	ADMM
WOA-SMM (2020)	Zheng 等人 (2020)	time-frequency domain features are extracted using multisynchrosqueezing transform (MSST) to construct the feature matrix.	铰链损失	Fault dataset from Case Western Reserve University (CWRU) and Anhui University of Technology (AHUT)	Improves the classification performance, consumes less time and lower calculation cost.	WOA is used to solve the optimization problem.
IQSMM (2021)	张三等 (2021)	The QPP of SMM is transformed to the solution of the system of linear equations by incorporating least square loss and solved using improved quantum matrix inversion and QSVT.	Square loss function	-	复杂度： $O (κ^{2} \log^{1.5} (κ / ϵ) (\log (N p q))$ $+ O (\log (p q))$	量子矩阵求逆和 QSVT
PSMM (2022)	张和李 (2022)	为不同类别构建近端超平面。	-	MNIST 数字数据库（LeCun, Bottou, Bengio, & Haffner, 1998）、MIT 人脸数据库、INRIA 行人数据库（Dalal & Triggs, 2005）、学生人脸数据库（Nazir, Ishtiaq, Batool, Jaffar, & Mirza, 2010）、JAFFE（Lyons, Akamatsu, Kamachi 和 Gyoba, 1998）	比 SMM 更简单的形式，在时间复杂度上比 SMM 更高效。	ADMM
NPBSMM (2023)	潘, 徐等 (2023)	在优化问题中引入了约束范数群。	铰链损失	AHUT 滚动轴承故障数据集	Leads to robust and sparse model. Suitable for large-scale data as matrix inversion is not required.	双坐标下降（DCD）

Other variants of SMM. Model Author Characteristics Loss function Datasets Advantages Technique to solve SMM (2015) Luo et al. (2015) Spectral elastic net penalty having Frobenius and nuclear norm. Hinge loss EEG alcoholism, EEG emotion, the students face and INRIA person Preserves the correlation within a matrix. ADMM PTSMM (2015) Xu et al. (2015) Seeks projection axis of both classes with a minimum within-class variance of each and scattered projected samples of other classes as far as possible. Hinge loss 2d image classification using ORL, YALE and AR face databases Deals with non-linear cases using a new matrix kernel function. Considers SRM principle. SOR to solve QPP. LTMRSMM, NTMRSMM (2016) Gao et al. (2016) Deals with matrix data having multiple ranks. Hinge loss Feret, ORL, FingerDB, Palm100 and Ar Reduced computational cost than the multi rank matrix vectorization method. Iteratively solving the QPPs. QSMM (2017) Duan et al. (2017) The QPP of SMM is transformed to the solution of a system of linear equations by incorporating least square loss and solved using quantum matrix inversion (HHL) and QSVT. Square loss function - Exponential increase of speed over classical SMM. Complexity: O(kappa^(3)epsilon^(-3)(log(Npq)))+O(log(pq)), whereas complexity of SMM O(poly(N,pq)). HHL and QSVT algorithm MRMLTSMCM (2018) Jiang and Yang (2018) Extension of TWSVM, uses pairs of projecting matrices to obtain the non-parallel hyperplanes. Hinge loss UCI datasets: Sonar, CMC, Hill-valley, Ionosphere, Madelon, Pedestrian, Pollen, FingerDB, Binucleate, RGB Efficient than TWSVM, implements SRM principle. Optimizing the obtained QPPs alternatively. KSMM (2018) Ye (2019) Generates a matrix-based hyperplane by computing the weighted average distance. Hinge loss ORL face database, the Sheffield Face dataset, Columbia Object Image Library (COIL-20) and the binary alpha digits The matrix form inner product exploits the structural information of matrix data and solves optimization problem without using alternating projection method. SMO MDSMM (2019) Ye and Han (2019) Introduces multi-distance to extract the intrinsic information of input matrix and used vector-based distance to quantify the cost function and penalty function. Hinge loss IMM face dataset, the Japanese female facial expression (JAFFE) dataset (Lyons, Akamatsu, Kamachi, Gyoba and Budynek, 1998), the jochen triesch static hand posture dataset (von der Malsburg, 1996), the Columbia object image library COIL-20 (Nene, Nayar, & Murase, 1996), and the Columbia Object Image Library COIL-100 (Nene et al., 1996) Improves the generalization performance. Alternating projection method is used to solve the optimization problem. WSMM (2019) MaboudouTchao (2019) Wavelet kernels introduced for non-linear case. Hinge loss EEG alcoholism dataset, INRIA person dataset Obtains Mercer kernel in the matrix space. Improves performance on the EEG and INRIA datasets. QPP is solved using quadratic programming software. KL-SMM (2020) Chen et al. (2020) Uses the concept of transfer learning. Hinge loss Motor Imagery (MI) based EEG datasets Improved generalization capability of a target domain by leveraging information from the source domain. ADMM WOA-SMM (2020) Zheng et al. (2020) time-frequency domain features are extracted using multisynchrosqueezing transform (MSST) to construct the feature matrix. Hinge loss Fault dataset from Case Western Reserve University (CWRU) and Anhui University of Technology (AHUT) Improves the classification performance, consumes less time and lower calculation cost. WOA is used to solve the optimization problem. IQSMM (2021) Zhang, Song et al. (2021) The QPP of SMM is transformed to the solution of the system of linear equations by incorporating least square loss and solved using improved quantum matrix inversion and QSVT. Square loss function - complexity: O(kappa^(2)log^(1.5)(kappa//epsilon)(log(Npq)):} +O(log(pq)) Quantum matrix inversion and QSVT PSMM (2022) Zhang and Liu (2022) Constructs proximal hyperplane for the different classes. - minst digital database (LeCun, Bottou, Bengio, & Haffner, 1998), MIT face database, INRIA person database (Dalal & Triggs, 2005), students face database (Nazir, Ishtiaq, Batool, Jaffar, & Mirza, 2010), JAFFE (Lyons, Akamatsu, Kamachi and Gyoba, 1998) Simpler formulation than SMM, efficient than SMM in time complexity. ADMM NPBSMM (2023) Pan, Xu et al. (2023) Constrain norm group is introduced in the optimization problem. Hinge loss AHUT fault dataset of roller bearing Leads to robust and sparse model. Suitable for large-scale data as matrix inversion is not required. Dual coordinate descent (DCD)

| Other variants of SMM. | | | | | | | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | Model | Author | Characteristics | Loss function | Datasets | Advantages | Technique to solve | | SMM (2015) | Luo et al. (2015) | Spectral elastic net penalty having Frobenius and nuclear norm. | Hinge loss | EEG alcoholism, EEG emotion, the students face and INRIA person | Preserves the correlation within a matrix. | ADMM | | PTSMM (2015) | Xu et al. (2015) | Seeks projection axis of both classes with a minimum within-class variance of each and scattered projected samples of other classes as far as possible. | Hinge loss | 2d image classification using ORL, YALE and AR face databases | Deals with non-linear cases using a new matrix kernel function. Considers SRM principle. | SOR to solve QPP. | | LTMRSMM, NTMRSMM (2016) | Gao et al. (2016) | Deals with matrix data having multiple ranks. | Hinge loss | Feret, ORL, FingerDB, Palm100 and Ar | Reduced computational cost than the multi rank matrix vectorization method. | Iteratively solving the QPPs. | | QSMM (2017) | Duan et al. (2017) | The QPP of SMM is transformed to the solution of a system of linear equations by incorporating least square loss and solved using quantum matrix inversion (HHL) and QSVT. | Square loss function | - | Exponential increase of speed over classical SMM. Complexity: $O\left(\kappa^{3} \epsilon^{-3}(\log (N p q))\right)+O(\log (p q))$, whereas complexity of SMM $O(\operatorname{poly}(N, p q))$. | HHL and QSVT algorithm | | MRMLTSMCM (2018) | Jiang and Yang (2018) | Extension of TWSVM, uses pairs of projecting matrices to obtain the non-parallel hyperplanes. | Hinge loss | UCI datasets: Sonar, CMC, Hill-valley, Ionosphere, Madelon, Pedestrian, Pollen, FingerDB, Binucleate, RGB | Efficient than TWSVM, implements SRM principle. | Optimizing the obtained QPPs alternatively. | | KSMM (2018) | Ye (2019) | Generates a matrix-based hyperplane by computing the weighted average distance. | Hinge loss | ORL face database, the Sheffield Face dataset, Columbia Object Image Library (COIL-20) and the binary alpha digits | The matrix form inner product exploits the structural information of matrix data and solves optimization problem without using alternating projection method. | SMO | | MDSMM (2019) | Ye and Han (2019) | Introduces multi-distance to extract the intrinsic information of input matrix and used vector-based distance to quantify the cost function and penalty function. | Hinge loss | IMM face dataset, the Japanese female facial expression (JAFFE) dataset (Lyons, Akamatsu, Kamachi, Gyoba and Budynek, 1998), the jochen triesch static hand posture dataset (von der Malsburg, 1996), the Columbia object image library COIL-20 (Nene, Nayar, & Murase, 1996), and the Columbia Object Image Library COIL-100 (Nene et al., 1996) | Improves the generalization performance. | Alternating projection method is used to solve the optimization problem. | | WSMM (2019) | MaboudouTchao (2019) | Wavelet kernels introduced for non-linear case. | Hinge loss | EEG alcoholism dataset, INRIA person dataset | Obtains Mercer kernel in the matrix space. Improves performance on the EEG and INRIA datasets. | QPP is solved using quadratic programming software. | | KL-SMM (2020) | Chen et al. (2020) | Uses the concept of transfer learning. | Hinge loss | Motor Imagery (MI) based EEG datasets | Improved generalization capability of a target domain by leveraging information from the source domain. | ADMM | | WOA-SMM (2020) | Zheng et al. (2020) | time-frequency domain features are extracted using multisynchrosqueezing transform (MSST) to construct the feature matrix. | Hinge loss | Fault dataset from Case Western Reserve University (CWRU) and Anhui University of Technology (AHUT) | Improves the classification performance, consumes less time and lower calculation cost. | WOA is used to solve the optimization problem. | | IQSMM (2021) | Zhang, Song et al. (2021) | The QPP of SMM is transformed to the solution of the system of linear equations by incorporating least square loss and solved using improved quantum matrix inversion and QSVT. | Square loss function | - | complexity: $O\left(\kappa^{2} \log ^{1.5}(\kappa / \epsilon)(\log (N p q))\right.$ $+O(\log (p q))$ | Quantum matrix inversion and QSVT | | PSMM (2022) | Zhang and Liu (2022) | Constructs proximal hyperplane for the different classes. | - | minst digital database (LeCun, Bottou, Bengio, & Haffner, 1998), MIT face database, INRIA person database (Dalal & Triggs, 2005), students face database (Nazir, Ishtiaq, Batool, Jaffar, & Mirza, 2010), JAFFE (Lyons, Akamatsu, Kamachi and Gyoba, 1998) | Simpler formulation than SMM, efficient than SMM in time complexity. | ADMM | | NPBSMM (2023) | Pan, Xu et al. (2023) | Constrain norm group is introduced in the optimization problem. | Hinge loss | AHUT fault dataset of roller bearing | Leads to robust and sparse model. Suitable for large-scale data as matrix inversion is not required. | Dual coordinate descent (DCD) |

Table 12
SMM for regression and semi-supervised learning.

模型	作者	特征	损失函数	数据集	优点	解决方法
SMR (2021)	袁和翁（2021）	Incorporates the idea of matrix learning for regression problems.	$ε$ insensitive loss	Test distribution grid: single phase IEEE 8-bus system (Liao, Weng, Liu, & Rajagopal, 2018), IEEE 123-bus system. utility distribution network modified from (Narang, Ayyanar, Gemin, Baggu, & Srinivasan, 2015)	Used for learning power flow mapping and overcomes the lack of physical degradability, thus overfitting. Robust to noise/outliers. effective for time asynchronization issues.	-
SPSMM (2023)	Li, Li, Yan, Shao 和 Lin (2023)	A strategy based on probability output is utilized to estimate the posterior class probabilities for matrix inputs. Furthermore, a semi-supervised learning framework is implemented to facilitate the transfer of knowledge from unlabeled samples to labeled ones.	-	An infrared thermal imaging dataset	Mitigate the issue of limited labeled samples and bolster the generalization performance.	SMO

Table 13
脑电图数据集。

数据集	描述	Sampling frequency	论文	链接
BCI 竞赛 III 数据集 IVa (BCIC34a)	118通道，5名受试者（每名受试者280次试验）在涉及右手或足部运动的运动想象任务期间	1000 Hz	Hang 等人(2020, 2023)，Liang、Hang、Lei 等人(2022)，Liang、Hang、Yin 等人(2022)，Razzak、Hameed 和 Xu(2019)以及 Zheng、Zhu、Heng(2018)	链接
BCI 竞赛 III 数据集 IIIa (BCIC33a)	60通道，单迹3名受试者，分为4个类别：右手、左手、舌头和脚	250 Hz	韩等 (2023), 拉扎克, 布卢门斯坦等 (2019), 拉扎克, 哈米德等 (2019) 以及郑朱秦恒 (2018)	链接
BCI 竞赛 IV 数据集 IIa (BCIC32a)	22通道，4个类别：左手、右手、脚和舌头，9名受试者	250 Hz	Chen 等人 (2020)，Hang 等人 (2020)，Liang、Hang、Yin 等人 (2022)，Razzak、Blumenstein 等人 (2019)，Razzak、Hameed 等人 (2019)，Zheng、Zhu、Heng (2018) 以及 Zheng、Zhu、Qin、Heng (2018)	链接
BCI 竞赛 IV 数据集 IIb (BCIC32b)	3 bi-polar EEG channels, 2 classes: left hand, right hand, feet and tongue, 9 subjects	250 Hz	Chen 等人（2020 年）、Hang 等人（2020 年）、Liang、Hang、Yin 等人（2022 年）、Razzak、Hameed 等人（2019 年）和 Zheng、Zhu、Heng（2018 年）	链接
下肢 MI-BCI 数据集（LLMI-BCI）	32-channel self-collected EEG signals from 10 subjects	512 Hz	Hang 等人 $(2020, 2023)$	-
上海交通大学情绪脑电图数据集（SEED-VIG）疲劳	17-channel 3 classes: awake, tired, drowsy from 10 subjects	512 Hz	Li, Wang 等人 (2022)	链接

is used as a preprocessing technique for BCIs. Another preprocessing technique is downsampling the EEG data to reduce the highcomputational costs (Bischof & Bunch, 2021). After preprocessing, the EEG data undergoes feature extraction or feature selection in order to extract the most helpful information from the input data matrix. The standard methods of feature extraction are time domain parameters (TDP) (Vidaurre, Krämer, Blankertz, & Schlögl, 2009), fast Fourier transform (FFT) (Shakshi & Jaswal, 2016), principle component analysis (PCA) (Kuncheva & Faithfull, 2013), and so forth (Hu & Zhang, 2019).

The presence of noise and outliers in the EEG data affects the classification of data. To handle this, Zheng, Zhu, Heng (2018) proposed a classifier entitled RSMM for single-trial EEG classification. The preprocessing techniques of the raw data used are Chebyshev Type II filter (Sen, Mishra, & Pattnaik, 2023) followed by spatial filtering using the CSP algorithm. The feature extraction is done using TDP algorithm, which leads to the robustness of the model (Nicolas-Alonso & Gomez-Gil, 2012). The EEG data is highly complex because of the high dimensionality. To overcome this complexity, Razzak, Hameed et al. (2019) proposed efficient feature extraction and showed comparative studies on the PCA algorithms, namely robust joint sparse PCA (RJSPCA) and outliers robust PCA (ORPCA) for dimensionality reduction (for simplicity, we denote the model as R-SMM in further paper). The preprocessing technique used is filter bank CSP (FBCSP) followed by the TDP algorithm for feature extraction. PCA is then applied to select the robust features from TDP which is beneficial in dimensionality reduction. Li, Wang et al. (2022) discusses the application of EEG-based fatigue and attention detection by using SEED-VIG for experimentation. The EEG fatigue signals are classified using the ACF-SSMM (Li, Wang et al., 2022) method. This method involves compressing the redundant features by use of the sparse principle.

The aforementioned methods involve binary classification only. However, to address the multi-class classification, a multi-class SMM (MSMM) is developed that aims to improve EEG-based BCIs’ performance involving multiple activities (Zheng, Zhu, Qin, Heng, 2018). The preprocessing is based on the techniques used by Ang et al. (2012), which employs non-overlapping band-pass filters of the sixth-order

Butterworth filter (Pise & Rege, 2021) to filter out the artifacts and unrelated signals, followed by CSP to select the most dominant channels. Several techniques are experimented upon for feature extraction, including band powers (BPO), power spectral density (PSD) and TPD, among which TPD led to the best results. MSMM (Zheng, Zhu, Qin, Heng, 2018) is the first attempt to handle a multiple-class EEG data classification that promotes a broader range of applications in BCI technology. To increase the generalization performance in multi-class SMM, Razzak, Blumenstein et al. (2019) proposed a multi-class SMM (M-SMM) which enhanced and increased the inter-class margins to focus on the single-trial multi-class classification of EEG signals. To mitigate the effect of outliers/noise, spatial filtering has been employed as an effective preprocessing technique to identify discriminative spatial patterns and eliminate uncorrelated information. Specifically, the FBCSP algorithm is used to filter out unrelated sensorimotor rhythms and artifacts by autonomously selecting a subject-specific frequency range for band-pass filtering of the EEG measurements.

The collection of EEG data is extremely time-consuming and challenging to obtain from the clinical point of view due to the intricacies of recording the data and privacy laws (Vaid, Singh, & Kaur, 2015). Thus, techniques are developed to leverage the available little source data and apply it to the target domain. Chen et al. (2020) proposed KL-SMM to improve the performance of the EEG signal classification when very little data on the target domain is available. KL-SMM is a five-order Butterworth filter followed by spatial filters as part of the preprocessing techniques. Another article that addresses classification in cases of insufficient data is AMK-TMM (Liang, Hang, Lei et al., 2022) based on the LS-SMM. The AMK-TMM framework introduces an adaptive approach that uses the leave-one-out CV strategy to identify many correlated source models and their corresponding weights. This enables construction of the target classifier and the identification of the correlated source models to be integrated into a single learning framework.

Inspired by the deep learning and transfer learning techniques for increasing the performance of the model with less amount of data, Hang et al. (2020) proposed SMM to be a basic building block of a DSN and introduced DSSMM which uses first-ordered band-pass filter for

Table 14
脑电图分类应用。

模型	作者	数据集	指标	特征提取	预处理技术	涉及未来方向
RSMM（2018）	郑、朱、何（2018）	BCIC34a、BCIC42b 和 BCIC32a	精度	TDP 算法	Chebyshev type 2 filter and CSP	是
MSMM（2018）	郑珠，秦恒（2018）	BCIC34a、BCIC42b 和 BCIC32a	Kappa coefficient, precision, recall and F-measure	TDP 算法	基于 Ang 等人（2012）的非重叠带通滤波器，六阶 Butterworth 和 CSP	是
R-SMM（2019）	Razzak, Hameed 等人 (2019)	BCIC34a, BCIC42a 和 BCIC42b	Kappa 系数、精确率、召回率和 F-measure	JSPCA 算法优于 TDP 算法	FBCSP	否
M-SMM (2019)	Razzak、Blumenstein 等人 (2019)	BCIC33a 和 BCIC42b	召回率、精确率、F-measure 和 kappa 系数。	TDP 算法	FBCSP	否
DSSMM (2020)	韩等 (2020)	BCIC34a, BCIC42b, BCIC42a, LLMI-BCI	准确率，F1，AUC	TDP 算法	使用五阶巴特沃斯滤波器和空间滤波器的带通滤波器	是
KL-SMM (2020)	Chen 等人 (2020)	BCIC42a 和 BCIC42b	准确率，F1，AUC	-	五阶巴特沃斯滤波器后接空间滤波器	是
DSFR (2022)	Liang, Hang, Yin 等 (2022)	BCIC34a, BCIC42b, 和 BCIC42a	准确率，F1，AUC	-	五阶带通滤波器和 CSP	是
AMK-TMM (2022)	梁航、雷等 (2022)	BCIC34a、BCIC42a 和 LLMI-BCI	准确率、标准差、AUC	-	五阶巴特沃斯滤波器和 CSP	是
ACFSSMM (2022)	Li, Wang 等人 (2022)	SEED-VIG 数据集	准确率，F1，AUC	TDP 算法	带通滤波器和 CSP	否
DSTLSSMM（2023）	韩等 (2023)	BCIC33a、BCIC34a 和 LLMI-BCI	准确率、召回率、kappa、F-Score	-	五阶巴特沃斯带通滤波和 CSP	是

表15
滚动轴承故障诊断数据集。

数据集	Papers
AHUT 数据集	Gu et al. (2021), Pan, Sheng et al. (2022, 2023), Pan, Xu and Zheng (2022), Pan, Xu, Zheng, Liu and Tong (2022), Pan, Xu, Zheng, Su et al. (2022), Pan, Xu et al. (2023), Pan, Xu, Zheng, Tong et al. (2022), Pan and Zheng (2021), Wang et al. (2022) and Zheng et al. (2020)
CRWU 数据集	李等人 (2020), 潘生等人 (2022, 2023), 潘旭等人 (2023), 潘旭、郑同等人 (2022), 潘阳、郑李、程 (2019), 潘郑 (2021), 王等人 (2022) 和郑等人 (2020)
HNU 数据集	Li et al. (2020), Pan, Xu, Zheng, Liu et al. (2022), Pan, Xu, Zheng, Tong et al. (2022) and Pan, Xu et al. (2023)
Vibration signal dataset from University of Connecticut (UCONN)	Li, Yang 等人 (2021)
苏州大学 (SUZ) 数据集	Gu 等人 (2021)
定制数据集	李等人（2023）和潘与郑（2021）

the preprocessing technique. Similar to DSSMM, Liang, Hang, Yin et al. (2022) proposed a DSFR method that takes the input in the form of raw EEG data and performs learning directly from it. DSFR method reduces the reliance of the model on pre-extracted EEG features and can extract the features more effectively than CSP followed by classification. Liang, Hang, Yin et al. (2022) suggested the filtering of EEG signals using a five-ordered band-pass filter as preprocessing technique. Further, Hang et al. (2023) proposed the deep stacked method termed as DST-LSSMM with the building block module made of LSSMM. The author performed five-order Butterworth band-pass filtering followed by CSP on it as a
preprocessing technique. The spatial filters of the preserved CSP consisted of the initial filter along with the last three filters. Subsequently, the dynamic logarithmic power of the filtered EEG signals is calculated over time. This process yielded a matrix-based representation of EEG features.

使用 SMM 准确识别不同的脑状态有助于提高神经系统疾病的诊断准确性，并辅助患者监测。表 14 展示了 SMM 的不同变体，用于分类脑电图信号。在准确分类脑电图模式方面取得的成功突显了这种方法的重要性。

图 2. 展示了 SMM 的演进和发展。
在增进我们对脑活动及其在医疗保健及其他领域的应用的理解方面。

7.2. 故障诊断

故障诊断在各个行业至关重要，包括制造业（Yan, Wang, Lu, Zhou, & Peng, 2023）、汽车工业（Pernestål, 2009）、航空航天（Patton, 1990）、电力系统（Sekine, Akimoto, Kunugi, Fukui, & Fukui, 1992）等等。这些机械仪器和设备的健康状况直接影响机器寿命和生产安全。SMM 在故障诊断中的应用显示出巨大的潜力，提供了一种稳健且实用的方法。
用于识别和分类复杂系统中故障的框架。用于故障诊断的不同数据集列于表15中。

Fault diagnosis uses a two-dimensional vibrational signal as an input. However, sometimes the feature matrix may be contaminated which leads to noise and outliers. To mitigate these contaminations, Gu et al. (2021) suggested RSSMM which uses MSST as its feature extraction technique and leads to robustness in fault diagnosis in roller bearings. Similarly, adaptive interactive deviation matrix machine (AIDMM) (Pan, Xu, Zheng, Liu et al., 2022) also contributes to the sparseness and robustness. A similar application is targeted by TRMM (Pan, Xu, Zheng, Tong et al., 2022) which is a non-parallel classifier for fault diagnosis. TRMM uses symplectic geometry similarity transformation (SGST) to extract the two-dimensional feature matrix and is insensitive to noise as well as robust to the outliers, which helps to improve the health of the mechanical equipment by accurate fault diagnosis. Meanwhile, Pan, Xu, Zheng, Su et al. (2022) proposed MFSMM for the fault diagnosis of roller bearings to diagnose the working state of the roller bearing accurately by the classification of outliers using the concept of fuzzy hyperplanes. To handle complicated roller bearing faults, i.e., the compound roller bearing faults, Li et al. (2020) proposed NPLSSMM which used continuous wavelet transform (CWT) as its feature extraction technique and minimized the effect of outliers. Further, its applications can also be extended to other rotating machinery for fault diagnosis. Application on different rotating machinery includes SWSSMM (Li, Yang et al., 2021), which extracts the distinct fault features in the gears directly from the raw vibration signal. Gear fault diagnosis requires professional expertise and knowledge, however, the proposed model extracts a symplectic weighted coefficient matrix using symplectic similar transform (SST).

Pan et al. (2019) proposed a multi-class classifier called symplectic geometry matrix machine (SGMM) for fault diagnosis of roller bearing, which is robust to noise and outliers. SGST is used to obtain a symplectic geometry coefficient matrix in SGMM that preserves the structural information and removes noise interference while preventing the convergence problem. The time-frequency features of the roller bearings are insufficient to represent the whole information and complete functioning. For a faster and high convergence activity of the optimization algorithm, Zheng et al. (2020) proposed the combination of the WOA and SMM, which involves MSST time-frequency analysis. The vibration signature from the drive-end bearing is chosen for analysis. An SKF bearing is employed for this purpose, and artificial defects are introduced at individual points on the ball, inner race, and outer race using spark machining techniques. Pan and Zheng (2021) proposed an improved version of SMM called symplectic hyperdisk matrix machine (SHMM) for fault diagnosis which suggests the use of SGST to obtain the input matrix in the form of a dimensionless feature matrix. Further, hyperdisk is used in SHMM to cluster the different kinds of data which makes the whole process robust and efficient.

在故障诊断中，也将振动数据和红外图像作为输入数据进行研究。Li 等人（2021）讨论了 CWSMM，该模型采用动态惩罚因子通过在训练过程中为不同类别的样本赋予适当权重来解决类别不平衡问题。通过使用矩阵样本的先验知识来设计置信度权重分配策略，使 CWSMM 具有鲁棒性。振动数据和红外热成像（IRT）图像是从一个专门用于诊断旋转机械故障的测试装置中收集的。

在工业实践中，可能存在标注样本不足的问题。为应对这一挑战，Pan 等人（2022）讨论了 Pin-TSMM 在滚动轴承故障诊断中的应用。在标注样本不足的情况下，Pin-TSMM 方法涉及首先使用来自源域的大量标注数据训练模型。随后，使用来自目标域的少量数据进行微调预训练模型。此外，Li 等人（2023）讨论了基于半监督概率的 SPSMM，并使用红外成像技术进行变速箱诊断。
故障诊断而不是易受噪声影响的振动信号。SSL 策略加剧了数据样本不足和异常值的问题，这使得模型在故障检测领域具有鲁棒性。然而，如果数据规模较大，可以使用 NPBSMM（Pan, Xu 等人，2023），因为它包含 CNG，能够处理大数据。CNG 还试图抑制异常值的影响，并有助于使结果更加稀疏。因此，模型具有更高的能力来精确拟合给定数据。

经典 SMM 具有概率信息不足的问题，这由辛相关矩阵机（SRMM）（Pan, Xu, Zheng, 2022）在滚动轴承故障诊断方案中解决。在 SRMM 方法中，分类器的输入是样本信号矩阵。辛几何分析的固有鲁棒性有助于 SRMM 的韧性。此外，经典 SMM 在数据不平衡的情况下表现不佳。因此，为了应对不平衡问题，Xu 等人（2022）提出了 DPAMM，它可以从包含振动信号结构信息的特征矩阵中学习。

故障诊断不仅可以利用振动数据，还可以利用热成像图。考虑到这一点，Li、Shao 等人（2022）引入了基于红外热成像图的全局结构信息最小二乘支持向量机（LSISMM）。该设计有效地利用了红外热成像图中存在的结构信息。SNMM（Wang 等人，2022）构建了一对超平面来解决故障诊断分类问题。Pan、Sheng 等人（2023）提出了 DSPTMM，该模型使用叠加泛化来捕获矩阵结构数据并得出预测。为了提高故障诊断的性能，DSPTMM 采用 SGST 处理原始信号，因为数据受到噪声的影响。

SMM 能够有效地捕捉特征和实例之间的复杂关系和相互作用。它允许在具有复杂故障模式的系统中进行准确的故障识别，并可以帮助专家做出关于维护和系统优化的知情决策。SMM 在故障诊断中的几个应用在表 16 中给出。

7.3. 其他应用

SMM 也可应用于图像分类任务和电网应用（Xu 等人，2015；Yuan & Weng，2021），利用图像特征、矩阵数据和类别标签之间的关系。图像数据作为输入通常在 SVM 算法中作为向量处理，但随着 SMM 的出现，我们现在可以利用图像的结构特性并取得更好的结果。对于二维图像分类，Xu 等人（2015）提出了 PTSMM。作者还讨论了通过引入新的矩阵核函数得到的 PTSMM 非线性版本，该版本取得了良好的效果。实验在 ORL、YALE 和 AR 数据库上进行。类似地，Liu、Jiao 等人（2019）提出了一种专门针对极化合成孔径雷达（PolSAR）图像的图像分类技术。该方法涉及通过极化散射编码将 PolSAR 图像转换为矩阵，然后使用 S-SMM 对矩阵进行分类（Liu、Jiao 等人，2019）。作者使用来自机载系统（NASA/JPL-Caltech AIRSAR）的 PolSAR 图像进行实验。其他一些应用包括回归技术。袁和翁（2021）提出了一种用于配电网潮流计算插补的变分 SMR 模型。这是 SMM 在回归方面的唯一应用。我们将应用列于表 17。

SMM 的应用可以扩展到可以表示为矩阵形式的不同数据域。当需要利用和结合输入数据的结构信息时，它在这些情况下表现良好。随着数据复杂性的增加，SMM 提供了一种在训练中结合数据相关性的方法，从而提高了性能。图 3 描绘了 SMM 的各种应用。

7.4. 实验结果

SMM 及其变体被应用于各种场景，如前一节所述。在本节中，我们旨在对几个在分类 EEG 信号分类和故障诊断中常见的标准矩阵数据集方面被证明有效的变体进行对比分析。

Experimental setup: We carried out the experiments in MATLAB 2023b on a desktop PC having processor Intel® Xeon® Gold 6226R CPU @ 2.90 GHz and 128 GB RAM. The experimental setup involves a simple grid search on the different parameters involved in the model. The data is normalized using Z-score normalization. Initially, we divided the datasets into 70:30 for training and testing, respectively. Out of the

70 %

of the datasets obtained for training, we again split it into training and validation in the ratio of 70:30. Further, for the selection of optimal parameters we employ grid search on the parameters. The range of different parameters in SSMM, RSMM, RSSMM and MSMM are chosen as follows:

C \in {0.1, 1, 10, 100}, λ \in

{0, 0.1, 0.5, 1, 2, 5, 10}

following (Zheng, Zhu, Qin, Heng, 2018), and

ρ =

0.01 from (Xu et al., 2022). For RSMM (Zheng, Zhu, Heng, 2018), the notation

λ_{3} \in {0.0001, 0.001, 0.01, 0.1, 1}

following (Zheng, Zhu, Heng, 2018). For RSSMM (Gu et al., 2021), the constraint of the sparse term,

α \in {10^{- 4}, 10^{- 3.75}, \dots, 10^{- 1}}

following (Gu et al., 2021), the truncation parameter

ϵ

is chosen from

{0.1, 0.2, \dots, 1}

. The metric used to evaluate the performance of the model is accuracy defined as:
accuracy

= \frac{Number of correctly classified samples}{total number of samples} \times 100

7.4.1. 脑电图数据集

我们对四个已知的 EEG 数据集进行了实验。其中两个是二分类数据集（BCIC42a 和 BCIC34a），而另外两个是多分类数据集（BCIC33a 和 BCIC42b）。每个数据集的预处理涉及使用四阶巴特沃斯滤波器对 EEG 数据进行滤波（Pise & Rege, 2021）。处理后的数据用于分析模型的性能。表 18 表示了模型的最佳参数、训练时间（以秒为单位）和准确率。

我们观察到在多类别预测中，多-RSMM 比 MSMM 表现更好，在 BCIC33a 上的准确率为

82.51 %

，在 BCIC42b 上的准确率为

67.05 %

，然而 RSMM 的训练时间更长。多-RSMM 的高准确率归因于移除异常值并将矩阵降维至低秩结构进行训练，这证明了可以减少模型的过拟合。类似地，对于二元类数据集，我们观察到 SSMM 比 RSMM 表现更好，在 BCIC34a 上的准确率为

98.90 %

，在 BCIC42a 上的准确率为 100%。与 RSMM 相比，SSMM 的训练时间更短。SSMM 的性能和效率归因于其稀疏性和用于解决凸 QPP 的广义 GFB 算法。

7.4.2. 故障数据集

实验中使用的故障数据集是公开可用的 CWRU 二分类数据集。该数据集用于预测轴承中是否存在故障。作为预处理的一部分，我们对数据进行了降采样和归一化处理（Gu 等人，2021 年）。我们使用驱动端故障的内圈数据和正常基线非故障数据，对

0.007 mm, 0.014 mm

和 0.021 毫米的故障深度进行了二分类。我们报告了 RSMM 和 SSMM 模型的准确率，并发现 RSMM 在三种故障深度下均给出了更好的准确率，即

100 %

。结果如表 19 所示。

7.5. SMM 在实际应用中的挑战

SMM 为处理以矩阵形式表示的复杂数据提供了一个强大的框架，但其在实际场景中的应用并非没有挑战。在本小节中，我们讨论了研究人员和从业者在使用 SMM 进行实际应用时可能面临的一些潜在关键挑战。

表 16
故障诊断应用。

模型	作者	数据集	指标	特征提取	优化技术	超参数调整	涉及未来方向
SGMM（2019 年）	潘等人（2019年）	CWRU 数据集	精度	SGST	ADMM	5折交叉验证	是
WOASMM（2020 年）	Zheng 等人 (2020)	CWRU 和 AHUT 数据集	精度	MSST	WOA	5折交叉验证	是
NPLSSMM (2020)	Li 等人 (2020)	CWRU 和 HNU 数据集	精度	CWT	ADMM	5折交叉验证	是
SWSSMM (2021)	Li, Yang 等人 (2021)	UCONN	精度	SST	ADMM	5折交叉验证	是
RSSMM (2021)	Gu 等人 (2021)	AHUT 和 SUZ 数据集	精度	MSST	ADMM	5折交叉验证	是
SHMM (2021)	Pan 和 Zheng (2021)	CWRU，6 种滚动轴承数据，以及 AHUT 的 12 种滚动轴承数据	卡方系数、召回率、精确率和 F1 值、准确率		ADMM	5折交叉验证	否
CWSMM（2021）	Li, Cheng 等人 (2021)	振动数据和红外热成像 (IRT) 图像 Spectra Quest, Inc. 位于美国弗吉尼亚州里士满	精度		ADMM	-	是
TRMM (2022)	Pan, Xu, Zheng, Tong 等人 (2022)	AHUT, CWRU 和 HNU 数据集	准确率，G 均值，F 度量，AUC，kappa，精确率，灵敏度，特异性	SGST	APG	5折交叉验证	是
SRMM (2022)	Pan, Xu, Zheng (2022)	AHUT 数据集	识别率、时间、kappa、准确率、召回率和 F1 以及统计检验	SGST	-	5折交叉验证	是
DPAMM (2022)	徐等人 (2022)	两个自定义数据集和 CWRU 数据集。	特异性、Gmean 和召回率	SGST	ADMM	网格搜索	否
LSISMM (2022)	Li, Shao 等人 (2022)	自定义数据集。	特异性、Gmean 和召回率		ADMM	5折交叉验证	是
MFSMM (2022)	Pan, Xu, Zheng, Su 等人 (2022)	AHUT 数据集	精确率，召回率，F 分数，kappa 系数，准确率和运营效率		ADMM	-	是
Pin-TSMM (2022)	潘胜等 (2022)	AHUT 和 CWRU 数据集	准确率、精确率、召回率、F1 分数和 kappa 系数		ADMM	网格搜索	是
SNMM (2022)	王等 (2022)	AHUT 和 CRWU 数据集	准确率、kappa 系数、召回率、F1 分数和精确率。		ADMM	5折交叉验证	是
AIDMM（2022）	潘、徐、郑、刘等（2022）	AHUT 和 HNU 数据集	准确率、kappa 系数、召回率、F1 分数和时间。		AIDMM	5折交叉验证	是
DSPTMM（2023）	潘胜等 (2023)	AHUT 数据集	卡方系数、召回率、精确率和 F1 值、准确率		ADMM	5折交叉验证	是
SPSMM (2023)	Li 等人(2023)	定制数据集	精度		ADMM	5折交叉验证	是
NPBSMM (2023)	潘, 徐等 (2023)	AHUT、CWRU 和 HNU 数据集	准确率、kappa 系数、召回率和 F1 分数		双坐标下降（DCD）算法	网格搜索	是

Model Author Dataset Metric Feature extraction Optimization technique Hyperparameter tuning Involves future directions SGMM (2019) Pan et al. (2019) CWRU dataset Accuracy SGST ADMM 5-fold CV Yes WOASMM (2020) Zheng et al. (2020) CWRU and AHUT datasets Accuracy MSST WOA 5-fold CV Yes NPLSSMM (2020) Li et al. (2020) CWRU and HNU dataset Accuracy CWT ADMM 5-fold CV Yes SWSSMM (2021) Li, Yang et al. (2021) UCONN Accuracy SST ADMM 5-fold CV Yes RSSMM (2021) Gu et al. (2021) AHUT and SUZ datasets Accuracy MSST ADMM 5-fold CV Yes SHMM (2021) Pan and Zheng (2021) CWRU, 6 types of roller-bearing data, and 12 types of roller-bearing data of AHUT Kappa, recall, precision and F1, accuracy ADMM 5-fold CV No CWSMM (2021) Li, Cheng et al. (2021) Vibration data and infrared thermography (IRT) images Spectra Quest, Inc., located in Richmond, VA, USA Accuracy ADMM - Yes TRMM (2022) Pan, Xu, Zheng, Tong et al. (2022) AHUT, CWRU and HNU datasets Accuracy, G-mean, Fmeasure, AUC, kappa, precision, sensitivity, specificity SGST APG 5-fold CV Yes SRMM (2022) Pan, Xu, Zheng (2022) AHUT dataset Recognition rate, time, kappa, accuracy, recall rate and F1 and statistical test SGST - 5-fold CV Yes DPAMM (2022) Xu et al. (2022) Two custom datasets and CWRU dataset. Specificity, Gmean, and recall SGST ADMM Grid-search No LSISMM (2022) Li, Shao et al. (2022) Custom dataset. Specificity, Gmean, and recall ADMM 5-fold CV Yes MFSMM (2022) Pan, Xu, Zheng, Su et al. (2022) AHUT dataset Precision, recall, F-score, kappa, accuracy and operational efficiency ADMM - Yes Pin-TSMM (2022) Pan, Sheng et al. (2022) AHUT and CWRU dataset Accuracy, precision, recall, F1-score, and kappa coefficient ADMM Grid-search Yes SNMM (2022) Wang et al. (2022) AHUT and CRWU dataset Accuracy, kappa, recall, F1 score and precision. ADMM 5-fold CV Yes AIDMM (2022) Pan, Xu, Zheng, Liu et al. (2022) AHUT and HNU dataset Accuracy, kappa, recall, F1 score and time. AIDMM 5-fold CV Yes DSPTMM (2023) Pan, Sheng et al. (2023) AHUT dataset Kappa, recall, precision and F1, accuracy ADMM 5-fold CV Yes SPSMM (2023) Li et al. (2023) Custom dataset Accuracy ADMM 5-fold CV Yes NPBSMM (2023) Pan, Xu et al. (2023) AHUT, CWRU, and HNU datasets Accuracy, kappa, recall and F1score Dual coordinate descent (DCD) algorithm Grid-search Yes

表 17
SMM 的其他应用。

模型	作者	数据集	指标	应用
PTSMM (2015)	Xu 等人 (2015)	ORL、YALE、AR 数据集	准确率和运行时间	二维图像分类
S-SMM（2019）	刘、焦、张和刘（2019）	机载系统（NASA/JPL-Caltech AIRSAR）的 PolSAR 图像。	准确性和 kappa 系数	二维图像分类
SMR (2021)	袁和翁（2021）	-	精度	无需可观测性自动计算配电网潮流

表 18
脑电图数据集上二元和多类分类模型的精度、训练时间（秒）和最优参数

类型	数据集	模型	最优参数	训练时间	准确率 (%)
多类别	BCIC33a	RSMM（Zheng, Zhu, Heng, 2018）	$C = 0.1, λ = 0.01, λ_{3} = 0.01$	1.68	82.51
		RSSMM（Gu 等人，2021 年）	$ϵ = 0.0001, α = 0.001, λ_{3} = 0.1$	0.60	84.18
		MSMM（郑、朱、秦、恒，2018 年）	$C = 0.1, λ = 0.01, ρ = 0.01$	0.59	81.63
	BCIC42b	RSMM（郑、朱、恒，2018 年）	$C = 1, λ = 0.01, λ_{3} = 0.0001$	6.77	67.05
		RSSMM（Gu 等人，2021 年）	$ϵ = 0.1, α = 0.0001, λ_{3} = 0.001$	0.64	75
		MSMM（郑、朱、秦、恒，2018 年）	$C = 1, λ = 10, ρ = 0.1$	6.26	60.54
二分类	BCIC34a	SSMM（郑，朱，秦，陈等，2018）	$C = 1, λ = 0.1$	0.58	98.9
	BCIC34a	RSMM（郑，朱，恒，2018）	$C = 0.1, λ = 5, λ_{3} = 0.01$	0.65	97.44
	BCIC42a	SSMM（郑，朱，秦，陈等，2018）	$C = 1, λ = 10$	5.6	100
	BCIC42a	RSMM（Zheng, Zhu, Heng, 2018）	$C = 0.1, λ = 0.01, λ_{3} = 0.01$	6.02	100

表 19
CWRU 数据集上用于二元分类的模型的准确率、训练时间（秒）和最优参数。

故障深度	模型	最优参数	训练时间	准确率 (%)
0.007毫米	SSMM（郑，朱，秦，陈等，2018）	$C = 1, λ = 0$	0.6859	98.63
0.007毫米	RSMM（郑，朱，恒，2018）	$C = 0.1, λ = 5, λ_{3} = 0.0001$	3.7764	100
0.014毫米	SSMM（郑，朱，秦，陈等，2018）	$C = 1, λ = 0.1$	0.03	99
0.014毫米	RSMM（郑，朱，恒，2018）	$C = 0.1, λ = 5, λ_{3} = 0.01$	1.38	100
0.021毫米	SSMM（郑，朱，秦，陈等，2018）	$C = 1, λ = 0$	0.8159	97.29
0.021毫米	RSMM（郑，朱，恒，2018）	$C = 0.1, λ = 5, λ_{3} = 0.0001$	3.6419	100

图 3. 展示了 SMM 在现实场景中的多样化应用。

数据复杂性：现实世界的数据集通常表现出高维性、噪声和可变性，使用 SMM 有效捕捉潜在模式构成挑战。预处理和特征选择已成为处理复杂数据结构的至关重要但具有挑战性的任务。
计算复杂性：SMM 的计算需求，尤其是在处理大规模数据集时，可能非常显著。在大型矩阵上训练 SMM 模型可能需要大量的计算资源和时间，从而影响方法的可扩展性和效率。

图 4. SMM 未来研究方向的综合表示。

过拟合与泛化：在 SMM 中平衡过拟合与泛化的权衡至关重要。确保模型能够很好地泛化到未见数据，同时避免对训练数据过拟合，仍然是一个挑战，特别是在复杂的分类任务中。
超参数调整：SMM 模型通常涉及调整各种超参数，如正则化参数和核函数。找到最佳的超参数组合以实现最佳性能可能是一个具有挑战性和耗时的工作，需要专业知识和计算资源。

通过承认和应对这些挑战，研究人员和从业者可以增强 SMM 在实际场景中的实用性和有效性。未来的研究工作可以集中于开发解决方案以克服这些挑战，并优化 SMM 在不同应用中的实施。

8. 结论与未来方向

本文对支持矩阵机（SMM）算法进行了全面分析，包括其变体以及实际应用。SMM 在优化问题中考虑了 Frobenius 范数和核范数，从而保留了矩阵输入数据的结构信息。基于 SMM 的模型在各种实际领域中取得了显著成功，这得益于其保留输入矩阵空间相关性的特性，并避免了支持向量机（SVM）中的维度灾难。SMM 在脑电图（EEG）信号分类和故障诊断领域也取得了前沿性能。本综述研究讨论了 SMM 的发展历程，从其理论根源到多样化的应用。据我们所知，这是第一篇专门针对 SMM 的综述文章。我们希望本文能为研究人员提供关于 SMM 的重要信息。在回顾文献的过程中，我们识别出几个未来可能的研究方向，如图 4 所示，并详细阐述如下：

尽管在矩阵数据上表现出色的小波包神经网络（SMM）由于其高计算复杂度，在大规模数据集上的可扩展性存在局限性。可以通过将并行计算等技术（Zhang, Li, & Yang, 2005）与 SMM 相结合来应对这一挑战。此外，参数调整具有显著复杂性，可以通过将进化算法（Das & Suganthan, 2010）与鲸鱼优化算法（WOA）相结合来优化 SMM 参数，从而有效解决这一问题。
损失函数在训练过程中起着关键作用，引导模型朝着正确的方向发展。尽管如此重要，但在 SMM（社交多媒体）的背景下，针对损失函数的研究却有限。目前，仅在 SMM 中使用了少数几种损失函数，包括 hinge loss（Cortes & Vapnik, 1995）和 pinball loss（Huang, Shi 等人，2013）。然而，为了提高 SMM 的效率、鲁棒性和泛化能力，迫切需要研究和引入最近开发的损失函数，例如 RoBoSS loss（Akhtar, Tanveer, & Arshad, 2024b）、flexible pinball loss（Kumari, Akhtar, Tanveer, & Arshad, 2024）、HawkEye loss（Akhtar, Tanveer, & Arshad, 2024a）、Wave loss（Akhtar, Tanveer, Arshad，阿尔茨海默病神经影像学倡议，2024c）等等。
在输入矩阵数据存在噪声和异常值的情况下，传统 SMM 可能容易受到扰动模式的影响，从而可能损害模型的泛化性能（Rezvani、Wang & Pourpanah，2019）。模糊逻辑（Zadeh，1965）提供了一种解决方案，以增强 SMM 在面对此类挑战时的鲁棒性。借鉴各种模糊 SVM 变体，基于模糊逻辑的 SMM 的改进版本可以提供缓解异常值和噪声影响的解决方案，从而推动该领域的发展。
将其他有益的范式（如特权信息（Vapnik & Vashist，2009）、多视图学习（Zhao、Xie、Xu & Sun，2017）、宇宙学习（Cherkassky、Dhar & Dai，2011）等）融入 SMM 的概念可以扩展到 SMM。这种扩展可以通过利用不同视角的见解来提高 SMM 的分类性能。
SMM 及其变体通常依赖于离线训练，假设可以同时获取完整的训练数据。然而，现实场景往往涉及顺序流式数据。将在线学习（Hoi, Sahoo, Lu, & Zhao, 2021）技术与 SMM 相结合，为增强对数据动态演变的适应性提供了机会。这种方法促进了实时模型更新，确保在动态环境中的有效性。采用在线学习增强了 SMM 的实用性，有效应对了数据环境演变带来的挑战。
SMM 在回归问题中的应用仍相对较少，这为扩展提供了途径，可能借鉴双支持向量回归（Huang, Wei, & Zhou, 2022）等方法的思路。此外，SMM 的通用性可用于解决各种现实中的连续变量挑战，例如估计大脑年龄（Ganaie, Tanveer and Beheshti, 2022）、预测骨骼年龄（Iglovikov, Rakhlin, Kalinin, & Shvets, 2018）、预测股票市场趋势（Naeini, Taremian, & Hashemi, 2010）等。
在机器学习中，类别不平衡是一个普遍存在的问题，会导致模型偏向多数类，尤其影响少数群体（Rezvani & Wang, 2023）。尽管在社交机器学习（SMM）文献中对此关注有限，现有的类别不平衡方法主要集中于算法层面的方法。然而，整合数据层面的技术，包括欠采样（Laurikkala, 2001）和过采样（Chawla, Bowyer, Hall, & Kegelmeyer, 2002），可以增强 SMM 处理不平衡数据集的有效性。因此，深入探讨这些方法是缓解类别不平衡问题的关键。
在社交机器学习（SMM）领域内的显著研究主要致力于监督学习。然而，大量未标记的真实世界数据的普遍存在表明，将 SMM 扩展到半监督学习领域具有广阔前景，而该领域仍基本未被探索。此外，进入无监督学习领域扩展了 SMM 的范畴，增强了其实用性、效用以及处理多样化问题领域的能力。
SMM 已在生物医学领域得到应用，主要涉及使用脑电图（EEG）数据进行情绪检测、肢体/腿部运动分析以及癫痫数据集分析等任务，但其应用范围仍有拓展的空间。具体而言，其应用可扩展至其他神经退行性疾病的检测，如阿尔茨海默病和精神分裂症。
集成多种 SMM 变体有可能提升分类性能。结合使用不同超参数或数据子集训练的不同 SMM 模型可以减少单个模型的偏差，并提高整体泛化能力。这种方法利用集成多样性来捕捉更广泛的数据模式，从而实现稳健的预测。此外，集成有助于通过平均模型输出来缓解过拟合。总体而言，集成 SMM 变体为提升分类精度提供了一种有前景的策略。

CRediT 作者贡献声明

Anuradha Kumari：撰写-初稿，可视化，验证，软件，形式分析，概念化。Mushir Akhtar：撰写-审阅与编辑，撰写-初稿，可视化，验证，方法学，调查研究，形式分析，概念化。Rupal Shah：撰写-初稿，可视化，软件，形式分析。M. Tanveer：撰写-审阅与编辑，监督，资源，项目管理，概念化。

利益冲突声明

作者声明他们没有已知的利益冲突或个人关系可能影响本论文中报告的工作。

数据可用性

数据将根据要求提供。

致谢

我们感谢在数学研究影响中心支持计划（MATRICS）下的科学工程研究委员会（SERB）提供的资金支持，项目编号为 MTR/2021/000787。我们还要感谢新德里印度科学工业研究委员会（CSIR）提供的奖学金资助，Anuradha Kumari 女士（文件编号：09/1022 (12437)/ 2021-EMR-I）和 Mushir Akhtar 先生（文件编号：09/1022 (13849)/2022-EMR-I）对此表示衷心感谢。

参考文献

Akhtar, M., Tanveer, M., & Arshad, M. (2024a). HawkEye：基于有界、平滑及不敏感损失函数的鲁棒回归方法。http://dx.doi.org/10.48550/ arXiv.2401.16785, arXiv 预印本。
Akhtar, M., Tanveer, M., & Arshad, M. (2024b). RoBoSS：一种用于监督学习的鲁棒、有界、稀疏及平滑损失函数。IEEE 图像处理与机器智能汇刊，1-13。 http://dx.doi.org/10.1109/TPAMI.2024.3465535.
Akhtar, M., Tanveer, M., Arshad, M., & 阿尔茨海默病神经影像学倡议 (2024c). 基于波动损失函数的监督学习方法：一种鲁棒和平滑的方法。模式识别，文章 110637。 http://dx.doi.org/ 10.1016/j.patcog.2024.110637。

阿尔顿，K.，& 巴尔尚，B. (2010)。使用惯性/磁传感器单元进行人类活动识别。在人行为理解：第一届国际研讨会，HBU 2010，土耳其伊斯坦布尔，2010 年 8 月 22 日。会议论文集 1 (pp. 38-51)。斯普林格。
Ang, K. K., Chin, Z. Y., Wang, C., Guan, C., & Zhang, H. (2012). 在 BCI 竞赛 IV 数据集 2a 和 2b 上的滤波器组公共空间模式算法。神经科学前沿，6，39。
Bennett, K., & Demiriz, A. (1998). 半监督支持向量机。神经信息处理系统进展，11。
Bischof, B., & Bunch, E. (2021). 下采样对脑电图分类任务的几何特征性能。arXiv 预印本 arXiv:2102.07669。
Breiman, L. (1996). 堆叠回归。机器学习，24，49-64。
布鲁克斯，J. P. (2011)。具有斜坡损失和硬边距损失的支撑向量机。运筹学，59(2)，467-479。
Cai, J.-F., Candès, E. J., & Shen, Z. (2010). 一种用于矩阵补全的奇异值阈值算法。SIAM 优化汇刊，20(4)，1956-1982。
Cai, D., He, X., Wen, J.-R., Han, J., & Ma, W.-Y. (2006). 支持张量机在文本分类中的应用。 https://hdl.handle.net/2142/11193.
Candes, E., & Recht, B. (2012). 通过凸优化实现精确矩阵补全。 ACM 通讯，55(6), 111-119。
Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., & Lopez, A. (2020). 支持向量机分类的综合综述：应用、挑战和趋势。神经计算，408, 189-215。
乔拉，N. V.，鲍耶，K. W.，霍尔，L. O.，& 凯格尔迈耶，W. P. (2002)。SMOTE：合成少数类过采样技术。人工智能研究杂志，16，321-357。
Chen, K., Dong, H., & Chan, K.-S. (2013). 通过自适应核范数惩罚实现降秩回归。生物统计学，100(4), 901-920。
陈，Y.，杭，W.，梁，S.，刘，X.，李，G.，王，Q.，等. (2020)。一种新型迁移支持矩阵机用于基于运动想象的脑机接口。神经科学前沿，14，文章 606949。
Cherkassky, V., Dhar, S., & Dai, W. (2011). 实践条件下通识学习有效性的实用条件。IEEE 神经网络汇刊，22(8)，1241-1255。
Chuang, C.-C. (2007). 基于模糊划分的模糊加权支持向量回归。IEEE 系统、人与控制汇刊 B（控制），37(3)，630-640。
Cortes, C., & Vapnik, V. (1995). 支持向量机. 机器学习, 20, 273-297.
Dalal, N., & Triggs, B. (2005). 方向梯度直方图用于行人检测。第 1 卷，在 2005 年 IEEE 计算机协会计算机视觉与模式识别会议（第 886-893 页）。IEEE。
Das, S., & Suganthan, P. N. (2010). 差异进化：当前研究综述. IEEE 进化计算汇刊, 15(1), 4-31.

Deng, N., Tian, Y., & Zhang, C. (2012). 支持向量机：基于优化的理论、算法及扩展. CRC Press.
Derrac, J., Garcia, S., Sanchez, L., & Herrera, F. (2015). KEEL 数据挖掘软件工具：数据集库、算法集成及实验分析框架. 多值逻辑与软计算杂志, 17.
Devlaminck, D., Wyns, B., Grosse-Wentrup, M., Otte, G., & Santens, P. (2011). 运动想象 BCI 中常见空间模式的多元学习. 计算智能与神经科学, 2011, 8.
迪克西特、V.、韦尔马、P. 和拉杰、P. (2020)。利用模糊集理论进行船厂设施布局选择中的隐性知识利用。专家系统应用，158，文章 113423。

多恩海格、G.、布兰克茨、B.、库里奥、G. 和穆勒、K.-R. (2004)。通过特征组合和多类范式提高非侵入式脑电图单次试验分类的比特率。生物医学工程 IEEE 汇刊，51(6)，993-1002。
Duan, B., Yuan, J., Liu, Y., & Li, D. (2017). 用于支持矩阵机的量子算法. 物理评论 A, 96(3), 032301.
菲菲、L.、费格斯、R. 和佩罗娜、P. (2006)。对象类别的单次学习。模式分析和机器智能 IEEE 汇刊，28(4)，594-611。
冯, R., & 许, Y. (2022). 基于 pinball 损失的支撑矩阵机分类. 神经计算与应用, 34(21), 18643-18661.
费格斯, R., 佩罗纳, P., & 泽瑟曼, A. (2003). 基于无监督尺度不变学习的物体类别识别. 第 2 卷, 在 2003 年 IEEE 计算机学会计算机视觉与模式识别会议, 2003 年会议录 (第 II 页). IEEE.
弗兰克, V., & 哈瓦茨, V. (2002). 多类支持向量机. 第 2 卷, 在 2002 年国际模式识别会议 (第 236-239 页). IEEE.
Fung, G., & Mangasarian, O. L. (2001). 近端支持向量机分类器. 第七届 ACM SIGKDD 国际知识发现与数据挖掘会议论文集 (pp. 77-86). http://dx.doi.org/10.1145/502512.502527.
Ganaie, M., Tanveer, M., & Beheshti, I. (2022). 基于改进的最小二乘双 SVR 的脑年龄预测. IEEE 生物医学与健康信息学杂志, 27(4), 1661-1669.
Ganaie, M., Tanveer, M., & Lin, C.-T. (2022). 用于类别不平衡学习的大规模模糊最小二乘双 SVM. IEEE 模糊系统汇刊, 30(11), 4815-4827.
Gao, X., Fan, L., & Xu, H. (2016). 一种用于矩阵数据分类的新型方法——基于双多重秩 SMMs. 应用软计算, 48, 546-562.
Gao, H., Lv, C., Zhang, T., Zhao, H., Jiang, L., Zhou, J., 等. (2021). 用于人类行为分割的结构约束矩阵分解框架. 电气与电子工程师协会控制论汇刊, 52(12), 12978-12988.
高, H., 秦, Y., 胡, C., 刘, Y., & 李, K. (2021). 典型道路交通场景下智能车辆轨迹预测的交互多模型. IEEE Transactions on Neural Networks and Learning Systems, 34(9), 6468-6479.
高 a, X., 范 b, L., & 徐 b, H. (2015). 改进的平方最小二乘双支持矩阵机. International Journal of Applied Mathematics and Machine Learning, 2, 137-162.
Goldstein, T., O’Donoghue, B., Setzer, S., & Baraniuk, R. (2014). 快速交替方向优化方法. SIAM 图像科学杂志, 7(3), 1588-1623.
顾, M., 郑 b, J., 潘 b, H., & 唐 b, J. (2021). 斜坡稀疏支持矩阵机及其在滚动轴承故障诊断中的应用. Applied Soft Computing, 113, Article 107928.
韩伟，冯伟，梁思，王强，刘晓，& 赵凯生 (2020)。基于深度堆叠支持矩阵机的运动想象脑电图分类表示学习。生物医学计算方法与程序，193，文章 105466。
韩伟，李哲，尹敏，梁思，沈浩，王强，等 (2023)。具有自适应多层转换的深度堆叠最小二乘支持矩阵机用于脑电图分类。生物医学信号处理与控制，82，文章 104579。
哈罗，A. W.，哈西迪姆，A.，& 劳埃德，S. (2009)。线性方程组的量子算法。物理评论快报，103(15)，文章 150502。
Hoi, S. C., Sahoo, D., Lu, J., & Zhao, P. (2021). 在线学习：综合综述. 神经计算, 459, 249-289.
Hong, B., Wei, L., Hu, Y., Cai, D., & He, X. (2016). 基于截断核范数正则化的在线鲁棒主成分分析. 神经计算, 175, 216-222.
Hou, C., Nie, F., Zhang, C., Yi, D., & Wu, Y. (2014). 用于矩阵数据分类的多秩多线性 SVM. 模式识别, 47(1), 454-469.
Hu, L., & Zhang, Z. (2019). 脑电图信号处理与特征提取. 斯普林格.
Huang, J., Nie, F., & Huang, H. (2013). 鲁棒离散矩阵补全. 第 27 卷, 在 AAAI 人工智能会议论文集 (pp. 424-430).
黄晓，石磊，& 苏云凯 (2013)。基于 pinball 损失的支撑向量机分类器。IEEE 模式分析与机器智能汇刊，36(5)，984-997。
黄浩，魏雪，& 周宇 (2022)。双支持向量回归概述。神经计算，490，80-92。
Hüllermeier, E. (2005)。机器学习和数据挖掘中的模糊方法：现状与前景。模糊集与系统，156(3)，387-406。
Iglovikov, V. I., Rakhlin, A., Kalinin, A. A., & Shvets, A. A. (2018)。使用深度卷积神经网络的儿童骨龄评估。在深度学习在医学图像分析和临床决策支持的多模态学习：第 4 届国际研讨会，DLMIA 2018，和第 8 届国际研讨会，ML-CDS 2018，与 MICCAI 2018 联合举行，西班牙格拉纳达，2018 年 9 月 20 日，论文集 4 (pp. 300-308)。Springer。

Jayadeva, Khemchandani, R., & Chandra, S. (2007). 双支持向量机用于模式分类. IEEE 模式分析与机器智能汇刊, 29(5), 905-910.
Jia, X., Feng, X., Wang, W., Xu, C., & Zhang, L. (2018). 贝叶斯推理用于自适应低秩和稀疏矩阵估计. 神经计算, 291, 71-83.
Jiang, R., & Yang, Z.-X. (2018). 多秩多线性双支持矩阵分类机. 智能与模糊系统杂志, 35(5), 5741-5754.
Joachims, T. (1999). Svmlight: 支持向量机. 第 19 卷, 在 SVM-light 支持向量机 (第 25 页). 多特蒙德大学, http://svmlight.joachims.org/.
Joachims, T., Finley, T., & Yu, C.-N. J. (2009). 结构化支持向量机的切割平面训练. 机器学习, 77, 27-59.
Keerthi, S. S., Shevade, S. K., Bhattacharyya, C., & Murthy, K. R. K. (2001). 对 Platt 的 SMO 算法的改进用于 SVM 分类器设计. 神经计算, 13(3), 637-649.
Kobayashi, T., & Otsu, N. (2012). 低秩集成双线性分类器的有效优化. 在计算机视觉-ECCV 2012：第 12 届欧洲计算机视觉会议，意大利佛罗伦萨，2012 年 10 月 7-13 日，会议录，第二部分 12 (pp. 474-487). Springer.
Kotsia, I., & Patras, I. (2011). 支持 Tucker 机. 在 CVPR 2011 (pp. 633-640). IEEE, http://dx.doi.org/10.1109/CVPR.2011.5995663.
Kumari, A., Akhtar, M., Tanveer, M., & Arshad, M. (2024). 使用柔性 Pinball 损失支持向量机进行乳腺癌诊断. 应用软计算, 157, 文章 111454. http://dx.doi.org/10.1016/j.asoc.2024.111454.
Kumari, A., Ganaie, M., & Tanveer, M. (2022). 直觉模糊宇宙支持向量机. 在国际神经信息处理会议 (pp. 236-247). Springer, http://dx.doi.org/10.1007/978-3-031-30105-6_20.
Kuncheva, L. I., & Faithfull, W. J. (2013). 基于主成分分析的特征提取用于多维无标签数据的变更检测. IEEE 神经网络与学习系统汇刊, 25(1), 69-80.
Laurikkala, J. (2001). 通过平衡类别分布改进困难小类识别. 在医学人工智能: 第 8 届欧洲医学人工智能会议, AIME, 会议记录 8 (pp. 63-66). 斯普林格.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). 基于梯度的学习方法应用于文档识别. IEEE 会刊, 86(11), 2278-2324.
Leeb, R., Lee, F., Keinrath, C., Scherer, R., Bischof, H., & Pfurtscheller, G. (2007). 脑机通信：探索虚拟公寓的动机、目标和影响. IEEE 神经系统与康复工程汇刊, 15(4), 473-482.
Lei, B., Liu, X., Liang, S., Hang, W., Wang, Q., Choi, K.-S., 等. (2019). 通过多视角多级深度多项式网络在脑机接口中进行行走意象评估. IEEE 神经系统与康复工程汇刊, 27(3), 497-506.
Li, X., Cheng, J., Shao, H., Liu, K., & Cai, B. (2021). 基于融合 CWSMM 的旋转机械故障诊断框架在强干扰和不平衡情况下的应用. IEEE 工业信息汇刊, 18(8), 5180-5189.
李, X., 李, Y., 严, K., 邵, H., & 林, J. J. (2023). 基于半监督概率支持矩阵机和红外成像的锥齿轮箱智能故障诊断. 可靠性工程与系统安全, 230, 文章 108921.
李, X., 邵, H., 陆, S., 邢, J., & 蔡博. (2022). 基于 LSISMM 和小型红外热成像的时变速度下旋转机械高效故障诊断. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 52(12), 7328-7340.
Li, Y., Wang, D., & Liu, F. (2022). The auto-correlation function aided sparse support matrix machine for EEG-based fatigue detection. IEEE Transactions on Circuits and Systems II: Express Briefs, http://dx.doi.org/10.1109/TCSII.2022.3211931.
李, H., & 许, Y. (2024). 具有截断球损失的支持矩阵机用于分类. Applied Soft Computing, 154, 文章 111311.
Li, X., Yang, Y., Pan, H., Cheng, J., & Cheng, J. (2020). 基于非平行最小二乘支持矩阵机的滚动轴承故障诊断. 机械与机器理论, 145, 文章 103676.
Li, X., Yang, Y., Shao, H., Zhong, X., Cheng, J., & Cheng, J. (2021). 基于辛加权稀疏支持矩阵机的齿轮故障诊断. 测量, 168, 文章 108392.

Liang, S., Hang, W., Lei, B., Wang, J., Qin, J., Choi, K.-S., et al. (2022). Adaptive multimodel knowledge transfer matrix machine for EEG classification. IEEE Transactions on Neural Networks and Learning Systems, 1-14. http://dx.doi.org/10.1109/TNNLS. 2022.3220551.

Liang, S., Hang, W., Yin, M., Shen, H., Wang, Q., Qin, J., 等. (2022). 基于堆叠共同空间模式和支持矩阵机的深度脑电图特征学习. 生物医学信号处理与控制, 74, 文章 103531.
辽, Y., 翁, Y., 刘, G., & 拉贾戈帕尔, R. (2018). 基于组 Lasso 的城市中压和低压配电网拓扑估计. IEEE 电力系统汇刊, 34(1), 12-27.
Liu, X., Jiao, L., Zhang, D., & Liu, F. (2019). Polsar image classification based on polarimetric scattering coding and sparse support matrix machine. In IGARSS 20192019 IEEE international geoscience and remote sensing symposium (pp. 3181-3184). IEEE, http://dx.doi.org/10.1109/IGARSS.2019.8900267.
刘, Q., 赖, Z., 周智, Z., 广, F., & 金, Z. (2015). 基于加权残差误差的截断核范数正则化矩阵补全方法. IEEE 图像处理汇刊, 25(1), 316-330.

刘, B., 周智, Y., 刘, P., 孙, W., 李, S., & 方, X. (2019). 基于双核范数最大化和集成流形正则化的显著性检测. 知识系统, 183, 文章 104850.
Lotte, F., Congedo, M., Lécuyer, A., Lamarche, F., & Arnaldi, B. (2007). 基于脑电图脑机接口的分类算法综述. 神经工程杂志, 4(2), R1.
Luo, L., Xie, Y., Zhang, Z., & Li, W.-J. (2015). Support matrix machines. In F. Bach, & D. Blei (Eds.), Proceedings of machine learning research: vol. 37, Proceedings of the 32nd international conference on machine learning (pp. 938-947). Lille, France: PMLR, URL https://proceedings.mlr.press/v37/luo15.html.
Lyons, M., Akamatsu, S., Kamachi, M., & Gyoba, J. (1998). Coding facial expressions with gabor wavelets. In Proceedings third IEEE international conference on automatic face and gesture recognition (pp. 200-205). IEEE, http://dx.doi.org/10.1109/AFGR. 1998.670949.

Lyons, M. J., Akamatsu, S., Kamachi, M., Gyoba, J., & Budynek, J. (1998). 日本女性面部表情数据库 (JAFFE). 第三届国际自动面部和手势识别会议论文集 (pp. 14-16).
Maboudou-Tchao, E. M. (2019). Wavelet kernels for support matrix machines. Modern Statistical Methods for Spatial and Multivariate Data, 75-93. http://dx.doi.org/10. 1007/978-3-030-11431-2_4.
Majid, A., Ali, S., Iqbal, M., & Kausar, N. (2014). 基于不平衡数据使用最近邻和支持向量机预测人类乳腺癌和结肠癌. 生物医学计算机方法与程序, 113(3), 792-808.
Mangasarian, O. L., & Musicant, D. R. (1999). 支持向量机的逐次超松弛算法. 神经网络 IEEE 汇刊, 10(5), 1032-1037.
Mirjalili, S., & Lewis, A. (2016). 鲸鱼优化算法. 工程软件进展, 95, 51-67.
Mumtaz, W., Rasheed, S., & Irfan, A. (2021). 《脑电图伪影去除方法相关挑战的综述》. 生物医学信号处理与控制, 68, 文章 102741.

Naeini, M. P., Taremian, H., & Hashemi, H. B. (2010). 《使用神经网络进行股票市场价值预测》. 2010 年国际计算机信息系统与工业管理应用会议 (第 132-136 页). IEEE.
Narang, D., Ayyanar, R., Gemin, P., Baggu, M., & Srinivasan, D. (2015). High penetration of photovoltaic generation study - flagstaff community power (final technical report, results of phases 2-5). http://dx.doi.org/10.2172/1171386, URL https://www.osti.gov/biblio/1171386.
Nazir, M., Ishtiaq, M., Batool, A., Jaffar, M. A., & Mirza, A. M. (2010). 《高效性别分类的特征选择》. 第 11 届 WSEAS 国际会议论文集 (第 70-75 页).
Nene, S. A., Nayar, S. K., & Murase, H. (1996). 哥伦比亚物体图像库（COIL-20）。(pp. 223-303).
Nicolas-Alonso, L. F., & Gomez-Gil, J. (2012). Brain computer interfaces, a review. Sensors, 12(2), 1211-1279.
Pan, H., Sheng, L., Xu, H., Tong, J., Zheng, J., & Liu, Q. (2022). 有限标注数据下的滚珠轴承故障诊断的弹球转移支持矩阵机。应用软计算，125，文章 109209。
Pan, H., Sheng, L., Xu, H., Zheng, J., Tong, J., & Niu, L. (2023). 深度堆叠弹球转移矩阵机及其在滚珠轴承故障诊断中的应用。人工智能工程应用，121，文章 105991。
潘海，徐辉，& 郑杰 (2022)。一种用于滚动轴承智能故障诊断的新型辛相关矩阵机方法。专家系统应用，192，文章 116400。
潘海，徐辉，郑杰，刘强，& 唐杰 (2022)。一种用于滚动轴承智能故障诊断的自适应交互偏差矩阵机方法。测量科学与技术，33(7)，文章 075103。
潘海，徐辉，郑杰，苏杰，& 唐杰 (2022)。用于滚动轴承故障诊断分类的多类模糊支持矩阵机。先进工程信息，51，文章 101445。
潘海，徐辉，郑杰，& 唐杰 (2023)。非平行有界支持矩阵机及其在滚动轴承故障诊断中的应用。信息科学，624，395-415。
潘海，徐辉，郑杰，童俊，& 程俊（2022）。用于滚子轴承异常样本智能故障识别的双稳健矩阵机。知识系统，252，文章 109391。
潘海，杨勇，郑杰，李翔，& 程俊（2019）。基于辛几何矩阵机的滚子轴承故障诊断方法。机械与机器理论，140，31-43。
潘海，& 郑杰（2021）。使用辛超盘矩阵机的滚子轴承智能故障诊断方法。应用软计算，105，文章 107284。

Patton, R. J. (1990)。使用分析冗余在航空航天系统中的故障检测与诊断。在 IEE 专题研讨会关于状态监测与容错（第 1 页）。IET。
Pernestål, A. (2009). 基于汽车应用的概率性故障诊断（博士论文），林雪平大学电子出版社。
Pirsiavash, H., Ramanan, D., & Fowlkes, C. (2009). 双线性分类器用于视觉识别。神经信息处理系统进展，22。
皮斯，A. W.，& 佩格，P. P. (2021)。各种滤波技术在脑电图信号去噪中的比较分析。在 2021 年第六届技术融合国际会议 i2CT，(第 1-4 页)。IEEE，http://dx.doi.org/10.1109/I2CT51068.2021。9417984。

Platt, J. (1998). 序贯最小优化：一种用于训练支持向量机的快速算法：技术报告，微软。
Platt, J. C. (1999). 基于序列最小优化算法的快速支持向量机训练，核方法进展。支持向量学习，185-208。http: //dx.doi.org/10.7551/mitpress/1130.003.0016.
Praline, J., Grujic, J., Corcia, P., Lucas, B., Hommet, C., Autret, A., et al. (2007). 临床实践中的脑电图涌现。临床神经生理学，118(10)，2149-2155。
Qian, C., Tran-Dinh, Q., Fu, S., Zou, C., & Liu, Y. (2019). 鲁棒多类别支持矩阵机。数学规划，176，429-463。
Razzak, I. (2020). 协作进化多类支持矩阵机。在 2020 年国际联合神经网络会议（第 1-8 页）。IEEE，http://dx.doi.org/ 10.1109/IJCNN48605.2020.9207164。

Razzak, I., Blumenstein, M., & Xu, G. (2019). 基于最大化类间边距的多类支持矩阵机用于单次试验脑电图分类. IEEE 神经系统与康复工程汇刊, 27(6), 1117-1127.
Razzak, I., Bouadjenek, M. R., Saris, R. A., & Ding, W. (2023). 基于矩阵补全框架下联合最小二乘和核范数最小化的支持矩阵机，用于损坏数据的分类。IEEE 神经网络与学习系统汇刊，http://dx.doi.org/10.1109/TNNLS.2023.3293888。
Razzak, I., Hameed, I. A., & Xu, G. (2019). 鲁棒的稀疏表示和多类支持矩阵机用于运动想象脑电图信号的分类. IEEE 交叉学科医学与健康工程汇刊, 7, 1-8.
Reddy, Y., Viswanath, P., & Reddy, B. E. (2018). 半监督学习：简要综述. 国际工程与技术杂志, 7(1.8), 81.
Rezvani, S., & Wang, X. (2023). 《关于类别不平衡学习技术的广泛综述》. 应用软计算, 文章 110415. http://dx.doi.org/10.1016/j.asoc. 2023.110415.

Rezvani, S., Wang, X., & Pourpanah, F. (2019). 《直觉模糊双支持向量机》. IEEE 模糊系统汇刊, 27(11), 2140-2151.
Richhariya, B., & Tanveer, M. (2020). 《用于类别不平衡学习的简化全域双支持向量机》. 模式识别, 102, 文章 107150.
Rosales-Perez, A., García, S., Terashima-Marin, H., Coello, C. A. C., & Herrera, F. (2018). 《基于支持向量机协同进化的多类分类 MC2ESVM》. IEEE 计算智能杂志, 13(2), 18-29.
Sekine, Y., Akimoto, Y., Kunugi, M., Fukui, C., & Fukui, S. (1992). 《电力系统故障诊断》. IEEE 会议录, 80(5), 673-683.
Sen, D., Mishra, B. B., & Pattnaik, P. K. (2023). 《脑电图信号处理中使用的滤波技术的综述》. 2023 年第七届电子与信息学趋势国际会议 (pp. 270-277). IEEE, http://dx.doi.org/10.1109/ICOEI56765.2023. 10125857.

Shakshi, R. J., & Jaswal, R. (2016). 使用 FFT 在 LabVIEW 上对脑电图信号进行分类和特征提取。国际工程与技术研究杂志，3, 1208-1212。
Shao, Y.-H., Wang, Z., Chen, W.-J., & Deng, N.-Y. (2013). 投影双支持向量机的正则化方法。知识系统，37, 203-210。
沈晓，牛丽，齐子，& 田宇。(2017)。截断 Pinball 损失的支持向量机分类器。模式识别，68，199-210。
Smola, A. J., & Schölkopf, B. (2004). 《支持向量回归教程》. 统计与计算, 14, 199-222.
Srebro, N., & Shraibman, A. (2005). 排序、迹范数和最大范数。国际计算学习理论会议 (pp. 545-560). Springer, http://dx.doi. org/10.1007/11503415_37.
Srinivasan, N. (2007). 创造力的认知神经科学：基于脑电图的方法。方法, 42(1), 109-116.
Sun, H., Craig, B. A., & Zhang, L. (2017). 基于角度的多类别距离加权支持向量机. 机器学习研究杂志, 18(1), 2981-3001.
Suykens, J. A., & Vandewalle, J. (1999). 最小二乘支持向量机分类器. 神经处理快报, 9, 293-300.
Tang, X., Gu, X., Wang, J., He, Q., Zhang, F., & Lu, J. (2020). 基于特征选择反馈网络和改进的 DS 证据融合的轴承故障诊断方法. IEEE 访问, 8, 20523-20536.
Tang, Y., Krasser, S., Judge, P., & Zhang, Y.-Q. (2006). 基于粒度 SVM 的高不平衡邮件服务器行为数据快速有效的垃圾邮件发送者检测。2006 年国际协作计算会议：网络、应用和工作共享 (pp. 1-6). IEEE, http://dx.doi.org/10.1109/COLCOM.2006.361856.
Tang, X., Ma, Z., Hu, Q., & Tang, W. (2019). 基于并行增量调制和旋转线性核支持向量机的实时心律心跳分类算法. IEEE 生物医学工程汇刊, 67(4), 978-986.
Tanveer, M., Sharma, S., Rastogi, R., & Anand, P. (2021). 基于 Pinball 损失的稀疏支持向量机. 新兴电信技术汇刊, 32(2), 文章 e3820.
Tao, D., Li, X., Hu, W., Maybank, S., & Wu, X. (2005). 监督张量学习. 第五届 IEEE 数据挖掘国际会议 (第 8 页). IEEE.
Thakur, G., & Wu, H.-T. (2011). 基于同步挤压的非均匀样本瞬时频率恢复. 美国工业与应用数学学会汇刊, 43(5), 2078-2095.
Tyagi, A., Semwal, S., & Shah, G. (2012). 《用于数据采集的脑电图传感器的综述》. 《计算机应用杂志》(IJCA), 13-17.
Vaid, S., Singh, P., & Kaur, C. (2015). 脑电图信号分析用于脑机接口：综述。2015 年第五届国际先进计算与通信技术会议 (pp. 143-147). IEEE, http://dx.doi.org/10.1109/ACCT.2015.72.

Vapnik, V., & Vashist, A. (2009). 《一种新的学习范式：利用特权信息进行学习》. 《神经网络》, 22(5-6), 544-557.
Värbu, K., Muhammad, N., & Muhammad, Y. (2022). 脑电图（EEG）为基础的脑机接口（BCI）应用的历史、现状与未来。传感器，22(9), 3331。
Vidaurre, C., Krämer, N., Blankertz, B., & Schlögl, A. (2009). 基于脑机接口的时域参数作为特征。神经网络，22(9), 1313-1319。
Vinyals, O., Jia, Y., Deng, L., & Darrell, T. (2012). 基于递归感知表示的学习。神经信息处理系统进展，25。
von der Malsburg, C. (1996). 在复杂背景下对手势的鲁棒分类。在第二届国际自动人脸和手势识别研讨会论文集，佛蒙特（第 170-175 页）。http://dx.doi.org/10.1109/AFGR.1996. 557260。

Wang, J., Wang, M., Hu, X., & Yan, S. (2015). 基于统一 Schatten-

p

-范数和

l_{q}

-范数正则化的主成分追踪进行视觉数据去噪。模式识别，48(10), 3135-3144。
王, Y., 王, S., & 赖, K. K. (2005). 一种新的模糊支持向量机用于评估信用风险. IEEE 模糊系统汇刊, 13(6), 820-831.
王, M., 徐, H., 潘, H., 谢 N., & 郑 J. (2022). 稀疏范数矩阵机及其在滚动轴承故障诊断中的应用. 测量科学与技术, 33(11), 文章 115114.
Weiss, K., Khoshgoftaar, T. M., & Wang, D. (2016). 迁移学习综述。大数据杂志，3(1), 1-40。
Wolf, L., Jhuang, H., 和 Hazan, T. (2007). 基于低秩 SVM 的外观建模. 在 2007 IEEE 计算机视觉与模式识别会议 (pp. 1-6). IEEE, http://dx.doi.org/10.1109/CVPR.2007.383099.
沃尔珀特，D. H. (1992)。堆叠泛化。神经网络，5(2)，241-259。
吴，Y.，& 刘，Y. (2007)。鲁棒的截断铰链损失支持向量机。美国统计协会杂志，102(479)，974-983。
夏，W.，& 范李. (2016)。基于双层规划的最小二乘支持矩阵机。国际应用数学与机器学习杂志 (IJAMML)，1，1-18。
徐，H.，范李.，& 高，X. (2015)。用于二维图像数据分类的投影双 SMM。神经计算与应用，26，91-100。
徐鹤，潘浩，郑杰，刘强，& 邓军（2022）。动态惩罚自适应矩阵机在滚动轴承不平衡故障智能检测中的应用。知识系统，247，文章 108779。
严伟，王静，陆森，周敏，& 彭翔（2023）。工业智能制造实时故障诊断方法综述。过程，11(2)，369。
叶勇（2017）。矩阵希尔伯特空间及其在矩阵学习中的应用。arXiv 预印本 arXiv:1706.08110。
叶勇（2019）。用于矩阵学习的非线性核支持矩阵机。国际机器学习与控制论杂志，10(10)，2725-2738。
Ye, Y., & Han, D. (2019). 多距离支持矩阵机. Pattern Recognition Letters, 128, 237-243.
Yu, G., Wang, Z., & Zhao, P. (2018). 多同步压缩变换. IEEE Transactions on Industrial Electronics, 66(7), 5441-5455.
Yuan, J., & Weng, Y. (2021). 支持矩阵回归用于学习具有不可观测性的配电网潮流. IEEE Transactions on Power Systems, 37(2), 1151-1161.

Zadeh, L. (1965). 模糊集. Information and Control, 8(3), 338-353.
张, Q., & Benveniste, A. (1992). 小波网络. IEEE 神经网络汇刊, 3(6), 889-898.
张, Y., Lei, X., Pan, Y., & Pedrycz, W. (2021). 基于 circRNA-疾病对图和加权核范数最小化的疾病相关 circRNA 预测. 知识系统, 214, 文章 106694.
Zhang, J.-P., Li, Z.-W., 和 Yang, J. (2005). 大规模分类问题上的并行 SVM 训练算法. 第 3 卷, 在 2005 国际机器学习与控制会议 (pp. 1637-1641). IEEE, http://dx.doi.org/10.1109/ICMLC. 2005.1527207.

张, C., & 刘, Y. (2014). 基于多类角度的大边距分类. Biometrika, 101(3), 625-640.
张, W., & 刘, Y. (2022). 近端支持矩阵机. 应用数学与物理杂志, 10(7), 2268-2291.
张勇, 宋涛, & 吴峥 (2021). 支持矩阵机的一种改进量子算法. 量子信息处理, 20, 1-12.
张伟, 吉田健, & 唐旭 (2008). 基于多词支持向量机的文本分类. 知识系统, 21(8), 879-886.
张立, 周伟, & 焦李 (2004). 小波支持向量机. 电气与电子工程师协会系统、人与控制论汇刊 B（控制论）, 34(1), 34-39.
赵洁，谢欣，徐晓，& 孙思（2017）。多视图学习概述：近期进展与新挑战。信息融合，38，43-54。
郑杰，顾敏，潘浩，& 邓景和（2020）。基于多同步压缩变换和 WOA-SMM 的滚动轴承故障分类方法。IEEE Access，8，215355-215364。
郑伟立，& 陆兵立（2017）。基于脑电图和额叶 EOG 的多模态警觉性估计方法。神经工程杂志，14(2)，文章 026017。
郑强，朱峰，& 恒平安（2018）。用于单次脑电图分类的鲁棒支持矩阵机。IEEE 神经系统和康复工程汇刊，26(3)，551-562。
Zheng, Q., Zhu, F., Qin, J., Chen, B., & Heng, P.-A. (2018). 稀疏支持矩阵机. 模式识别，76, 715-726.
Zheng, Q., Zhu, F., Qin, J., & Heng, P.-A. (2018). 多类支持矩阵机用于单次脑电图分类. 神经计算，275, 869-880.
Zhou, H., & Li, L. (2014). 正则化矩阵回归. 英国皇家统计学会. B 辑. 统计方法论，76(2), 463-483.
朱, C. (2017). 基于熵的支持矩阵机. 在智能科学 i: 第二届 IFIP TC 12 国际会议, ICIS 2017, 上海, 中国, 2017 年 10 月 25-28 日, 会议论文集 2 (pp. 200-211). 斯普林格, http://dx.doi.org/10.1007/978-3-319-68121-4_21.
朱, H., 刘, X., 陆, R., & 李, H. (2016). 基于非线性 SVM 的高效隐私保护在线医学诊断框架. IEEE 生物医学与健康信息学杂志, 21(3), 838-850.
邹, H., & 哈斯蒂, T. (2005). 通过弹性网络进行正则化和变量选择. 英国皇家统计学会杂志. B. 统计方法论, 67(2), 301-320.

- 通讯作者.
电子邮件地址：mtanveer@iiti.ac.in（M. Tanveer）