Traffic encryption is widely used to protect communication privacy but is increasingly exploited by attackers to conceal malicious activities. Existing malicious encrypted traffic detection methods rely on large amounts of labeled samples for training, limiting their ability to quickly respond to new attacks. These methods also are vulnerable to traffic obfuscation strategies, such as injecting dummy packets. In this paper, we propose SmartDetector, a robust malicious encrypted traffic detection method via contrastive learning. We first propose a novel traffic representation named Semantic Attribute Matrix (SAM), which can effectively distinguish between malicious and benign traffic. We also design a data augmentation method to generate diverse traffic samples, which makes the detection model more robust against different traffic obfuscation strategies. We propose a malicious encrypted traffic classifier that first pre-trains a model via contrastive learning to learn deep representations from unlabeled data, then fine-tunes the model with a supervised classifier to achieve accurate detection even with only a few labeled samples. We conduct extensive experiments with five public datasets to evaluate the performance of SmartDetector. The results demonstrate that it outperforms the state-of-theart (SOTA) methods in three typical scenarios. Specifically, in the evasion attack detection scenario, SmartDetector achieves an F1 score and AUC above 93%, with average improvements of 19.84%\mathbf{1 9 . 8 4 \%} and 18.17%\mathbf{1 8 . 1 7 \%} over the SOTA method, respectively. 流量加密被广泛用于保护通信隐私,但越来越多地被攻击者用来隐藏恶意活动。现有的恶意加密流量检测方法依赖于大量标记样本进行训练,这限制了它们快速响应新攻击的能力。这些方法还容易受到流量混淆策略的影响,例如注入虚拟数据包。在本文中,我们提出了 SmartDetector,这是一种通过对比学习的鲁棒性恶意加密流量检测方法。我们首先提出了一种名为 Semantic Attribute Matrix (SAM) 的新型流量表示,它可以有效地区分恶意流量和良性流量。我们还设计了一种数据增强方法来生成不同的流量样本,这使得检测模型在面对不同的流量混淆策略时更加鲁棒。我们提出了一种恶意加密的流量分类器,它首先通过对比学习对模型进行预训练,以从未标记的数据中学习深度表示,然后使用监督分类器对模型进行微调,即使只有几个标记样本也能实现准确检测。我们对 5 个公共数据集进行了广泛的实验,以评估 SmartDetector 的性能。结果表明,它在三种典型情况下优于 State-of-theart (SOTA) 方法。具体来说,在规避攻击检测场景中,SmartDetector 实现了 F1 分数和 AUC 超过 93%,平均分别优于 19.84%\mathbf{1 9 . 8 4 \%}18.17%\mathbf{1 8 . 1 7 \%} SOTA 方法。
WITH the prosperity of Internet applications, the volume of network traffic has increased dramatically, which brings an outbreak trend of cyber attacks. According to the Check Point Report [1], world-wide weekly cyber attacks increased 42%42 \% in the first half of 2022. To bypass the security 随着互联网应用的繁荣,网络流量急剧增加,这带来了网络攻击的爆发趋势。根据 Check Point 报告 [1],2022 年上半年全球每周网络攻击有所增加 42%42 \% 。绕过安全性
Received 21 October 2024; revised 18 March 2025; accepted 6 April 2025. Date of publication 15 April 2025; date of current version 25 April 2025. This work was supported in part by the National Key Research and Development Program of China under Grant 2023YFB2703800, in part by the NSFC Projects under Grant 62222201 and Grant U23A20304, and in part by Beijing Natural Science Foundation under Grant M23020. The associate editor coordinating the review of this article and approving it for publication was Dr. Z. Berkay Celik. (Corresponding author: Meng Shen.) 2024 年 10 月 21 日收稿;修订于 2025 年 3 月 18 日;2025 年 4 月 6 日接受。发布日期 2025 年 4 月 15 日;当前版本的日期 2025 年 4 月 25 日。这项工作部分得到了国家重点研发计划(2023YFB2703800)的支持,部分由 62222201 和 U23A20304 资助下的国家自然科学基金项目支持,部分由北京市自然科学基金(M23020)资助。协调本文审稿并批准发表的副主编是 Z. Berkay Celik 博士。(通讯作者:孟深)
Ke Ye is with the School of Computer Science, Beijing Institute of Technology, Beijing 100081, China (e-mail: yek_1010@163.com). Ke Ye 就职于北京理工大学计算机科学学院,中国北京 100081(电子邮件:yek_1010@163.com)。
Ke Xu is with the Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China (e-mail: xuke@tsinghua.edu.cn). Gang Xiong is with the Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100190, China (e-mail: xionggang@iie.ac.cn). Digital Object Identifier 10.1109/TIFS.2025.3560560 徐可就职于中国北京市清华大学计算机科学与技术系 100084(电子邮件:xuke@tsinghua.edu.cn)。熊刚就职于中国科学院信息工程研究所,北京 100190(电子邮件:xionggang@iie.ac.cn)。数字对象标识符 10.1109/TIFS.2025.3560560
clearance, malicious traffic tries to hide its distinctive features through traffic encryption, where potential attackers protect their traffic using encryption protocols such as SSL/TLS [2]. As demonstrated by the report of WatchGuard threat lab [3], 91.5%91.5 \% of malware arrived over encrypted connections by the end of 2023. 清除,恶意流量试图通过流量加密来隐藏其独特功能,潜在攻击者使用 SSL/TLS 等加密协议保护其流量 [2]。正如 WatchGuard 威胁实验室 [3] 的报告所表明的那样, 91.5%91.5 \% 到 2023 年底,恶意软件会通过加密连接到达。
The traditional malicious traffic detection methods try to find abnormal traffic by searching predetermined signatures in packet payload [4]. However, traffic encryption protocols such as SSL/TLS [2] make packet contents invisible, leading to the inefficacy of the methods based on deep packet inspection (DPI). To detect malicious traffic hidden in encrypted traffic, recent studies resort to machine learning approaches and extract statistical features [4] or sequence features [5] that are independent of packet contents to build machine learning classifiers. As a result, malicious traffic detection is usually regarded as a classification problem. 传统的恶意流量检测方法试图通过在数据包有效负载中搜索预定的签名来发现异常流量 [4]。然而,SSL/TLS [2] 等流量加密协议使数据包内容不可见,导致基于深度数据包检测 (DPI) 的方法无效。为了检测隐藏在加密流量中的恶意流量,最近的研究采用机器学习方法,并提取独立于数据包内容的统计特征 [4] 或序列特征 [5] 来构建机器学习分类器。因此,恶意流量检测通常被视为分类问题。
However, existing methods are still hampered by two-fold challenges, i.e., efficiency and robustness. Efficiency means that a method can quickly respond to new types of attacks. The existing methods usually require a large volume of welllabeled data for model training [4], [6], [7]. When new attacks appear, it is quite time-consuming to collect adequate malicious samples and retrain the classifiers. For instance, even if researchers use two dozen computers for traffic crawling, it still takes two weeks to collect sufficient dataset for model training [8]. Robustness means that obfuscated malicious traffic can also be accurately detected. The existing methods can easily be defeated by evasion attacks [5], [8], [9], [10], i.e., obfuscating the original traffic by adding manual noises. For instance, evasion attacks [11] can change the features of malicious traffic by injecting dummy packets and adding time delay into encrypted traffic, making the classifiers built based on the original features less effective. Thus, evasion attacks bring new challenges for malicious traffic detection, which requires the extracted features to be resilient to traffic obfuscation. 然而,现有方法仍然受到双重挑战的阻碍,即效率和稳健性。效率意味着方法可以快速响应新型攻击。现有的方法通常需要大量的标记数据进行模型训练 [4], [6], [7]。当新的攻击出现时,收集足够的恶意样本并重新训练分类器非常耗时。例如,即使研究人员使用二十多台计算机进行流量爬取,仍然需要两周时间才能收集到足够的数据集进行模型训练 [8]。稳健性意味着也可以准确检测混淆的恶意流量。现有的方法很容易被规避攻击 [5]、[8]、[9]、[10],即通过添加人工噪声来混淆原始流量。例如,规避攻击 [11] 可以通过注入虚拟数据包并在加密流量中增加时间延迟来改变恶意流量的特征,从而使基于原始特征构建的分类器效果不佳。因此,规避攻击给恶意流量检测带来了新的挑战,这就要求提取的特征能够灵活应对流量混淆。
In this paper, we propose SmartDetector, a robust malicious traffic detection method based on contrastive learning [12] to discover malicious traffic hidden in encrypted traffic. We propose a new traffic representation named Semantic Attribute Matrix (SAM), which captures distinctive features between benign and malicious traffic in a simple matrix. We take SAM as the starting point and build a traffic classifier based on contrastive learning, which enables us to pre-train an encoder with a large volume of unlabeled traffic and then 在本文中,我们提出了 SmartDetector,这是一种基于对比学习 [12] 的稳健恶意流量检测方法,用于发现隐藏在加密流量中的恶意流量。我们提出了一种名为 Semantic Attribute Matrix (SAM) 的新流量表示形式,它可以在一个简单的矩阵中捕获良性和恶意流量之间的独特特征。我们以 SAM 为起点,基于对比学习构建了一个流量分类器,这使我们能够预先训练一个具有大量未标记流量的编码器,然后
quickly adapt the pre-training encoder to a new type of attack with only a few labeled samples for model tuning. As the unlabeled instances can be easily collected, not requiring the heavy cost of environment construction and traffic labeling. 快速使预训练编码器适应新型攻击,只需少量标记样本即可进行模型调整。由于未标注的实例可以轻松采集,无需繁重的环境建设和流量标注成本。
To improve the robustness of SmartDetector, we propose a data augmentation method tailored to the traffic obfuscation strategies adopted by the attackers. We simulate the process by which an attacker perturbs traffic features by adding random noises [11], e.g., inserting dummy packets and adding random time delay. Based on augmented traffic samples, SmartDetector learns the correlation between the original and obfuscated traffic, which makes it effectively resist evasion attacks. 为了提高 SmartDetector 的鲁棒性,我们提出了一种针对攻击者采用的流量混淆策略的数据增强方法。我们模拟了攻击者通过添加随机噪声 [11] 来扰乱流量特征的过程,例如,插入虚拟数据包和添加随机时间延迟。SmartDetector 基于增强的流量样本,学习原始流量和混淆流量之间的相关性,从而有效抵御规避攻击。
We summarize our contributions as follows. 我们将我们的贡献总结如下。
We propose a novel traffic representation named the Semantic Attribute Matrix (SAM). SAM is capable of extracting distinct features that differentiate between benign and malicious traffic, and can maintain its effectiveness under different traffic obfuscation strategies. We provide a quantitative analysis to demonstrate that SAM is more effective than the three typical representations employed by the state-of-the-art (SOTA) methods. 我们提出了一种名为 Semantic Attribute Matrix (SAM) 的新型流量表示。SAM 能够提取区分良性流量和恶意流量的不同特征,并且可以在不同的流量混淆策略下保持其有效性。我们提供了定量分析,以证明 SAM 比最先进的 (SOTA) 方法采用的三种典型表示更有效。
We propose SmartDetector, a robust malicious traffic detection method based on contrastive learning. In the pre-training phase, we design a data augmentation method tailored to the specifications of the network traffic. The encoder in SmartDetector can be pre-trained with unlabeled traffic samples. During retraining, we only need a few labeled samples to train the traffic classifier to achieve high accuracy in detecting new attacks. 我们提出了 SmartDetector,这是一种基于对比学习的鲁棒恶意流量检测方法。在预训练阶段,我们设计了一种根据网络流量规格量身定制的数据增强方法。SmartDetector 中的编码器可以使用未标记的流量样本进行预训练。在重新训练过程中,我们只需要少量的标记样本来训练流量分类器,就可以在检测新的攻击时达到很高的准确率。
We conducted extensive experiments using five representative public datasets to evaluate the performance of SmartDetector. The results show that SmartDetector outperforms the SOTA methods [6], [7], [8], [9] in all scenarios. In particular, in the scenario of detecting obfuscated malicious traffic, SmartDetector achieves an F1 score and AUC that both exceed 90%90 \%, with an average improvement of 19.84%19.84 \% and 18.17%18.17 \% over the SOTA method. 我们使用五个具有代表性的公共数据集进行了广泛的实验,以评估 SmartDetector 的性能。结果表明,SmartDetector 在所有场景下都优于 SOTA 方法 [6], [7], [8], [9]。特别是,在检测混淆恶意流量的场景下,SmartDetector 的 F1 分数和 AUC 均超过 90%90 \% ,平均优于 19.84%19.84 \%18.17%18.17 \% SOTA 方法。
The remainder of this paper is organized as follows. We introduce the background and related work in Section II. Then, we describe the design goals in Section III. We present SAM and SmartDetector in Section IV and Section V. Next, we conduct experiments to evaluate the performance of SmartDetector in Section VI. Finally, we conclude this paper in Section VII. 本文的其余部分组织如下。我们在第二节中介绍了背景和相关工作。然后,我们在第 III 节中描述了设计目标。我们在第 IV 节和第 V 节中介绍了 SAM 和 SmartDetector。接下来,我们在第 VI 节中进行实验以评估 SmartDetector 的性能。最后,我们在第 VII 节中总结了本文。
II. Background and Related Work 二、背景及相关工作
In this section, we first describe the threat model of malicious traffic detection, and then review the existing methods. 本节首先介绍恶意流量检测的威胁模型,然后回顾现有方法。
A. Threat Model A. 威胁模型
There are usually two kinds of roles in the scenario of malicious traffic detection, i.e., the network administrator and the attacker, as shown in Fig. 1. The attacker launches remote attacks on devices located in a local area network (LAN). We assume that communication traffic is encrypted by network 恶意流量检测的场景通常有两种角色,即网络管理员和攻击者,如图 1 所示。攻击者对位于局域网 (LAN) 中的设备发起远程攻击。我们假设通信流量是由网络加密的
Fig. 1. The threat model for malicious traffic detection. 图 1.恶意流量检测的威胁模型。
encryption protocols, which is more challenging for malicious traffic detection. 加密协议,这对于恶意流量检测更具挑战性。
Capability of Attackers: We assume that the attacker can manipulate packets within the traffic flow to evade detection [10], such as inserting dummy packets or introducing delays. However, we assume that attackers cannot fully replicate the traffic features of benign traffic (e.g., packet length, direction). This is due to the fact that fully replicating the aforementioned features to imitate benign traffic would impose a significant burden on attackers and result in the failure of the attack [4], [11]. For example, in a Denial of Service (DoS) attack, if the attacker adjusts the sending intervals to be identical to those of benign traffic, they may not be able to overwhelm the target system [13]. We further assume that the attacker lacks the ability to manipulate ports within the target network, specifically the inability to open widely recognized ports (such as port 80 for HTTP and port 443 for HTTPS), as this would necessitate system-level privileges [14]. 攻击者的能力:我们假设攻击者可以纵流量中的数据包来逃避检测 [10],例如插入虚拟数据包或引入延迟。但是,我们假设攻击者无法完全复制良性流量的流量特征(例如,数据包长度、方向)。这是因为完全复制上述特征以模仿良性流量会给攻击者带来沉重的负担,并导致攻击失败 [4], [11]。例如,在拒绝服务 (DoS) 攻击中,如果攻击者将发送间隔调整为与良性流量的发送间隔相同,他们可能无法压垮目标系统 [13]。我们进一步假设攻击者缺乏纵目标网络内端口的能力,特别是无法打开被广泛认可的端口(例如 HTTP 的端口 80 和 HTTPS 的端口 443),因为这需要系统级权限 [14]。
Capability of Network Administrator: The administrator possesses the capability to monitor encrypted traffic via the network gateway, however, they are unable to decrypt any individual packets. The administrator can continuously collect traffic data to distinguish between malicious and benign traffic. However, they are unable to foresee the type of attack or malware that an attacker might initiate, nor do they have any prior information regarding the specific traffic obfuscation method employed by the attacker. 网络管理员的能力: 管理员能够通过网络网关监控加密流量,但是,他们无法解密任何单个数据包。管理员可以持续收集流量数据,以区分恶意流量和良性流量。但是,他们无法预见攻击者可能发起的攻击或恶意软件的类型,也没有任何关于攻击者采用的特定流量混淆方法的事先信息。
B. Related Work on Malicious Traffic Detection B. 恶意流量检测的相关工作
Traditional malicious traffic detection methods in the early stages typically rely on signature matching. These methods extract sequences of bytes from malicious traffic [4] and detect malicious activity by comparing the traffic against known malicious signatures. However, the widespread adoption of encryption technologies has made packet contents inaccessible, thereby rendering detection methods based on packet content scanning ineffective. To overcome this limitation, various detection methods specifically designed for malicious encrypted traffic have been proposed, as summarized in Table I. 传统的恶意流量检测方法在早期阶段通常依赖于签名匹配。这些方法从恶意流量 [4] 中提取字节序列,并通过将流量与已知的恶意签名进行比较来检测恶意活动。但是,加密技术的广泛采用使数据包内容无法访问,从而使基于数据包内容扫描的检测方法无效。为了克服这一限制,已经提出了专门为恶意加密流量设计的各种检测方法,如表 I 所示。
Methods Based on Unsupervised Learning: These methods detect malicious traffic by identifying patterns that deviate from established benign traffic behaviors. Mirsky et al. [15] proposed Kitsune, a real-time plug-and-play framework that 基于无监督学习的方法:这些方法通过识别偏离既定良性流量行为的模式来检测恶意流量。Mirsky 等人 [15] 提出了 Kitsune,这是一个实时即插即用框架,
TABLE I 表 I
Summary of Existing Malicious Encrypted Traffic Detection Methods 现有恶意加密流量检测方法总结
Color Image Direction Sequence Semantic Attribute Matrix 彩色图像方向序列语义属性矩阵
Method Categories Typical Methods Traffic Representation
Unsupervised Learning Kitsune [15] Bovenzi et al. [16] OADSD [17] Statistical Features Statistical Features Packet head, payload
Supervised Learning Rahmat et al. [19] EvilHunter [20] ST-Graph [6] Feng et al. [21] DFR [7] Statistical Features Ad Bid Request Features Traffic Graph Grey-scale Map Grey-scale Map
Meta-Learning FC-Net [9] TF [8] SmartDetector Color Image Direction Sequence Semantic Attribute Matrix| Method Categories | Typical Methods | Traffic Representation |
| :--- | :--- | :--- |
| Unsupervised Learning | Kitsune [15] Bovenzi et al. [16] OADSD [17] | Statistical Features Statistical Features Packet head, payload |
| Supervised Learning | Rahmat et al. [19] EvilHunter [20] ST-Graph [6] Feng et al. [21] DFR [7] | Statistical Features Ad Bid Request Features Traffic Graph Grey-scale Map Grey-scale Map |
| Meta-Learning | FC-Net [9] TF [8] SmartDetector | Color Image Direction Sequence Semantic Attribute Matrix |
employs an ensemble of autoencoders to reconstruct statistical features for learning normal traffic patterns. Building on this foundation, Bovenzi et al. [16] introduced enhancements including ensemble equalization and advanced distance metrics (e.g., NAP), significantly improving adaptability and detection performance in dynamic environments. Zhang et al. [17] leveraged isolation forests for anomaly detection and proposed an adaptive evolution mechanism enabling real-time malicious traffic identification. These methods do not require labeled data during training. However, due to the lack of label information, they are unable to directly learn specific features of attacks, which may lead to higher false positive/negative rates [18]. 采用一组自动编码器来重建统计特征,以学习正常的流量模式。在此基础上,Bovenzi 等[16]引入了增强功能,包括集成均衡和高级距离度量(例如 NAP),显著提高了动态环境中的适应性和检测性能。Zhang 等 [17] 利用隔离森林进行异常检测,并提出了一种自适应进化机制,能够实时识别恶意流量。这些方法在训练期间不需要标记数据。然而,由于缺乏标签信息,他们无法直接学习攻击的具体特征,这可能会导致更高的假阳性/阴性率 [18]。
2) Methods Based on Supervised Learning: These methods leverage labeled data to train models, enabling them to differentiate between benign and malicious traffic based on their unique features. Rahmat et al. [19] introduced a method grounded in ensemble learning, utilizing algorithms such as XGBoost and AdaBoost to construct the model through techniques like bagging and boosting. Sun et al. [20] observed that fraudulent devices could be detected through encrypted traffic analysis and developed EvilHunter for this purpose. Fu et al. [6] proposed ST-Graph, which detects malware traffic using graph-based network analysis. They employed graph representation learning to capture the spatial and temporal features of network behaviors and utilized a random forest to build the classifier. Feng et al. [21] presented a two-layer deep learning approach for malware detection, combining convolutional neural networks (CNNs) and autoencoders. Zeng et al. [7] introduced a deep learning framework for detecting malicious traffic, utilizing CNNs and Stacked AutoEncoders to extract features from raw traffic without manual feature engineering. These methods achieve high F1 scores when trained on an adequate number of labeled samples [6], [7]. However, they rely on predefined parameters and, when confronted with emerging attacks, require a significant number of labeled samples for model retraining [4]. 2) 基于监督学习的方法:这些方法利用标记数据来训练模型,使它们能够根据其独特特征区分良性流量和恶意流量。Rahmat 等[19]介绍了一种基于集成学习的方法,利用 XGBoost 和 AdaBoost 等算法,通过 bagging 和 boosting 等技术构建模型。Sun 等人 [20] 观察到,可以通过加密流量分析来检测欺诈设备,并为此开发了 EvilHunter。Fu 等人 [6] 提出了 ST-Graph,它使用基于图的网络分析来检测恶意软件流量。他们采用图表示学习来捕获网络行为的空间和时间特征,并利用随机森林来构建分类器。Feng 等人 [21] 提出了一种用于恶意软件检测的两层深度学习方法,该方法结合了卷积神经网络 (CNN) 和自动编码器。Zeng 等人 [7] 介绍了一个用于检测恶意流量的深度学习框架,利用 CNN 和 Stacked AutoEncoders 从原始流量中提取特征,而无需手动进行特征工程。当使用足够数量的标记样本进行训练时,这些方法可以获得高 F1 分数 [6],[7]。然而,它们依赖于预定义的参数,当面对新出现的攻击时,需要大量标记样本进行模型再训练 [4]。
3) Methods Based on Meta-Learning: Meta-learning [22], [23] is a machine learning paradigm designed to improve model transferability. In the scenario of malicious traffic detection, this enables the model to generalize knowledge from known attack patterns (e.g., historical attack samples) to 3) 基于元学习的方法:元学习 [22]、[23] 是一种旨在提高模型可迁移性的机器学习范式。在恶意流量检测的场景中,这使模型能够将已知攻击模式(例如,历史攻击样本)中的知识泛化为
new attack types. By leveraging prior experience, the model can rapidly adapt to new malicious traffic patterns, even with limited labeled samples. Several studies have applied this approach to malicious traffic detection. Xu et al. [9] proposed FC-Net, a malicious traffic detection framework based on meta-learning, which was leveraged to distinguish a pair of samples as a basic task of learning. TF [8] was also an encrypted traffic analysis method which was suitable for few-shot learning. It consisted of two parts, i.e., the feature extraction based on CNN and the classification network based on k-NN. Although FC-Net [9] and TF [8] can detect new attacks with few labeled samples, they still depend on a large, well-labeled traffic dataset during the pre-training phase. Furthermore, their traffic representations are not informative enough to effectively distinguish between benign and malicious traffic (see Section VI-B). To evade detection [11], attackers may vary traffic features by employing traffic obfuscation, e.g., injecting dummy packets. Unfortunately, the above methods cannot cope with obfuscated malicious traffic. 新的攻击类型。通过利用以前的经验,该模型可以快速适应新的恶意流量模式,即使标记样本有限。一些研究已将这种方法应用于恶意流量检测。Xu et al. [9] 提出了 FC-Net,这是一种基于元学习的恶意流量检测框架,用于区分一对样本,作为学习的基本任务。TF [8] 也是一种加密的流量分析方法,适用于小样本学习。它由两部分组成,即基于 CNN 的特征提取和基于 k-NN 的分类网络。尽管 FC-Net [9] 和 TF [8] 可以用很少的标记样本检测到新的攻击,但它们在预训练阶段仍然依赖于大型的、标记良好的流量数据集。此外,它们的流量表示信息量不足,无法有效区分良性流量和恶意流量(参见第 VI-B 节)。为了逃避检测 [11],攻击者可以通过使用流量混淆来改变流量特征,例如注入虚拟数据包。不幸的是,上述方法无法应对混淆的恶意流量。
III. Design Goals 三、设计目标
In this section, we describe the design goals of malicious encrypted traffic detection. 在本节中,我们将介绍恶意加密流量检测的设计目标。
A. Efficient Model Training A. 高效的模型训练
Model training requires substantial labeled traffic, which is often challenging to collect. For instance, constructing the CIC-IDS-2017 [24] dataset involved setting up a complex network with various systems to simulate attacks. However, collecting unlabeled traffic at network gateways is more practical, as it allows capturing large volumes of data without needing prior classification. 模型训练需要大量的标记流量,这通常很难收集。例如,构建 CIC-IDS-2017 [24] 数据集涉及建立一个具有各种系统的复杂网络来模拟攻击。但是,在网络网关上收集未标记的流量更实用,因为它允许捕获大量数据,而无需事先分类。
B. Transferable to New Attacks B. 可转移到新的攻击
To minimize economic losses, detection models must quickly adapt to new threats. Current methods require extensive retraining for novel attacks, which is time-consuming and often impractical due to limited attack samples. Therefore, there is a need for a malicious traffic detection approach that can easily transfer to new attack types. 为了最大限度地减少经济损失,检测模型必须快速适应新的威胁。当前方法需要对新型攻击进行大量再训练,这非常耗时,而且由于攻击样本有限,这通常不切实际。因此,需要一种可以轻松转移到新攻击类型的恶意流量检测方法。
C. Robust to Obfuscated Traffic C. 对混淆流量的稳健性
Attackers can evade detection by modifying traffic features, such as inserting dummy packets or altering packet rate, which can significantly change traffic’s statistical features [5], [11]. Network administrators often lack information on whether and how traffic is obfuscated. Therefore, detection methods need to operate effectively with limited knowledge and be resilient against obfuscated traffic. 攻击者可以通过修改流量特征来逃避检测,例如插入虚拟数据包或更改数据包速率,这可以显着改变流量的统计特征 [5],[11]。网络管理员通常缺乏有关流量是否以及如何混淆的信息。因此,检测方法需要在有限的知识下有效运行,并且能够抵御混淆流量。
IV. Traffic Representation 四、流量表示
In this section, we present key observations regarding the distinctions between benign and malicious traffic, and introduce a new traffic representation, the Semantic Attribute Matrix (SAM), designed to capture these distinguishing features. 在本节中,我们介绍了有关良性流量和恶意流量之间区别的关键观察结果,并介绍了一种新的流量表示形式,即语义属性矩阵 (SAM),旨在捕获这些区别特征。