这是用户在 2025-7-28 10:39 为 https://app.immersivetranslate.com/pdf-pro/44a1e9af-fc17-4973-8201-1a3951a56f4b/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

DomainForensics: Exposing Face Forgery Across Domains via Bi-Directional Adaptation
领域取证:通过双向适应揭示跨域人脸伪造

Qingxuan Lv Lv Lv^(o+)\mathrm{Lv}^{\oplus}, Yuezun Li Li Li^(o+)\mathrm{Li}^{\oplus}, Member, IEEE, Junyu Dong ^(⊖){ }^{\ominus}, Sheng Chen ( ( ^((){ }^{( }, Life Fellow, IEEE, Hui Yu ® ®  ^("® "){ }^{\text {® }}, Huiyu Zhou ® ®  ^("® "){ }^{\text {® }}, and Shu Zhang ( ^(( ){ }^{\text {( }}
清玄 Lv Lv Lv^(o+)\mathrm{Lv}^{\oplus} ,岳尊 Li Li Li^(o+)\mathrm{Li}^{\oplus} ,IEEE 会员,董军宇 ^(⊖){ }^{\ominus} ,陈胜 ( ( ^((){ }^{( } ,IEEE 终身 Fellow,余辉 ® ®  ^("® "){ }^{\text {® }} ,周会宇 ® ®  ^("® "){ }^{\text {® }} ,张舒 ( ^(( ){ }^{\text {( }}

Abstract  摘要

Recent DeepFake detection methods have shown excellent performance on public datasets but are significantly degraded on new forgeries. Solving this problem is important, as new forgeries emerge daily with the continuously evolving generative techniques. Many efforts have been made for this issue by seeking the commonly existing traces empirically on data level. In this paper, we rethink this problem and propose a new solution from the unsupervised domain adaptation perspective. Our solution, called DomainForensics, aims to transfer the forgery knowledge from known forgeries (fully labeled source domain) to new forgeries (label-free target domain). Unlike recent efforts, our solution does not focus on data view but on learning strategies of DeepFake detectors to capture the knowledge of new forgeries through the alignment of domain discrepancies. In particular, unlike the general domain adaptation methods which consider the knowledge transfer in the semantic class category, thus having limited application, our approach captures the subtle forgery traces. We describe a new bi-directional adaptation strategy dedicated to capturing the forgery knowledge across domains. Specifically, our strategy considers both forward and backward adaptation, to transfer the forgery knowledge from the source domain to the target domain in forward adaptation and then reverse the adaptation from the target domain to the source domain in backward adaptation. In forward adaptation, we perform supervised training for the DeepFake detector in the source domain and jointly employ adversarial feature adaptation to transfer the ability to detect manipulated faces from known forgeries to new forgeries. In backward adaptation, we further improve the knowledge transfer by coupling adversarial adaptation with self-distillation on new forgeries. This enables
近年来,DeepFake 检测方法在公开数据集上表现出色,但在新伪造物上性能显著下降。解决这个问题很重要,因为随着生成技术的不断演进,新的伪造物每天都在出现。许多研究通过在数据层面寻找经验上普遍存在的痕迹来解决这个问题。在本文中,我们重新思考这个问题,并从无监督域适应的角度提出了一种新方案。我们的方案,称为 DomainForensics,旨在将伪造知识从已知伪造物(完全标记的源域)转移到新伪造物(无标签的目标域)。与最近的研究不同,我们的方案不关注数据视图,而是专注于学习 DeepFake 检测器的策略,通过域差异的对齐来捕获新伪造物的知识。具体来说,与考虑语义类别中知识转移的一般域适应方法不同,我们的方法能够捕获细微的伪造痕迹。我们描述了一种新的双向适应策略,专门用于捕获跨域的伪造知识。 具体来说,我们的策略考虑了正向和反向适应,正向适应将伪造知识从源域转移到目标域,反向适应则将适应从目标域逆转回源域。在正向适应中,我们对源域中的 DeepFake 检测器进行监督训练,并联合使用对抗特征适应,将检测已知伪造人脸的能力迁移到新的伪造中。在反向适应中,我们通过将对抗适应与对新伪造的自蒸馏相结合,进一步改进知识迁移。这能够

Manuscript received 1 November 2023; revised 27 April 2024 and 2 July 2024; accepted 4 July 2024. Date of publication 18 July 2024; date of current version 2 August 2024. This work was supported in part by the National Key Research and Development Program of China under Grant 2022ZD0117201 and in part by the Sanya Science and Technology Special Fund under Grant 2022KJCX92. The work of Yuezun Li was supported by China Postdoctoral Science Foundation under Grant 2021TQ0314 and Grant 2021M703036. The associate editor coordinating the review of this article and approving it for publication was Dr. Benedetta Tondi. (Corresponding authors: Yuezun Li; Junyu Dong.)
稿件收到日期为 2023 年 11 月 1 日;修改日期为 2024 年 4 月 27 日和 2024 年 7 月 2 日;接受日期为 2024 年 7 月 4 日。发表日期为 2024 年 7 月 18 日;当前版本日期为 2024 年 8 月 2 日。本研究部分由国家重点研发计划(项目编号 2022ZD0117201)资助,部分由三亚科学技术专项基金(项目编号 2022KJCX92)资助。李越尊的研究工作由中国博士后科学基金(项目编号 2021TQ0314 和 2021M703036)资助。本文的审稿协调人和批准发表的主编为 Benedetta Tondi 博士。(通讯作者:李越尊;董军宇。)
Qingxuan Lv, Yuezun Li, Junyu Dong, and Shu Zhang are with the College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China (e-mail: lvqingxuan@stu.ouc.edu.cn; liyuezun@ouc.edu.cn; dongjunyu@ouc.edu.cn; zhangshu@ouc.edu.cn).
吕清轩、李越尊、董军宇和章舒均就职于中国海洋大学计算机科学与技术学院,地址为中国山东省青岛市 266100(电子邮件:lvqingxuan@stu.ouc.edu.cn;liyuezun@ouc.edu.cn;dongjunyu@ouc.edu.cn;zhangshu@ouc.edu.cn)。
Sheng Chen is with the School of Electronics and Computer Science, University of Southampton, SO17 1BJ Southampton, U.K., and also with the College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China (e-mail: sqc@ecs.soton.ac.uk).
陈胜就职于英国南安普顿大学电子与计算机科学学院,地址为 SO17 1BJ 南安普顿,同时也就职于中国海洋大学计算机科学与技术学院,地址为中国山东省青岛市 266100(电子邮件:sqc@ecs.soton.ac.uk)。
Hui Yu is with the School of Creative Technologies, Faculty of Creative and Cultural Industries, University of Portsmouth, PO1 2DJ Portsmouth, U.K. (e-mail: hui.yu@port.ac.uk).
Hui Yu 是朴茨茅斯大学创意与文化产业学院创意技术学院的成员,地址为 U.K. 朴茨茅斯 PO1 2DJ(邮箱:hui.yu@port.ac.uk)。
Huiyu Zhou is with the School of Computing and Mathematic Sciences, University of Leicester, LE1 7RH Leicester, U.K. (e-mail: hz143@leicester.ac.uk).
Huiyu Zhou 是莱斯特大学计算与数学科学学院的成员,地址为 U.K. 莱斯特 LE1 7RH(邮箱:hz143@leicester.ac.uk)。
Digital Object Identifier 10.1109/TIFS.2024.3426317
数字对象标识符 10.1109/TIFS.2024.3426317

the detector to expose new forgery features from unlabeled data and avoid forgetting the known knowledge of known forgery. Extensive experiments demonstrate that our method is surprisingly effective in exposing new forgeries, and can be plug-and-play on other DeepFake detection architectures.
该检测器能够从无标签数据中揭示新的伪造特征,并避免遗忘已知的伪造知识。大量实验表明,我们的方法在揭示新伪造方面非常有效,并且可以即插即用在其他 DeepFake 检测架构上。
Index Terms—Digital forensics, DeepFake detection, DomainForensics.
索引词—数字取证,DeepFake 检测,DomainForensics。

I. Introduction  I. 引言

THE ever-growing convolutional neural network (CNN) based generative models [1], [2], [3], [4], [5], [6] have made face forgery much easier than ever before, allowing people to manipulate the face’s identity, appearance and attributes in high realism with little effort. These CNN-based face forgery techniques, known as DeepFake, have drawn much attention, as their abuse using can lead to impersonation videos, economic fraud, biometric attacks, and even national security problems [7]. Thus, it is urgent and important to counteract the misuse of DeepFakes.
随着基于卷积神经网络(CNN)的生成模型[1],[2],[3],[4],[5],[6]的不断发展,人脸伪造变得比以往任何时候都更容易,人们可以毫不费力地以极高的真实性操纵人脸的身份、外观和属性。这些基于 CNN 的人脸伪造技术,即 DeepFake,引起了广泛关注,因为其滥用可能导致身份冒充视频、经济欺诈、生物特征攻击,甚至国家安全问题[7]。因此,对抗 DeepFake 的滥用迫在眉睫且至关重要。
During the past few years, large number of DeepFake detection methods [8], [9], [10], [11], [12], [72], [73] have emerged. Trained on the recently proposed large DeepFake datasets, such as FaceForensics++ (FF++) [13] and Celeb-DF [14], these detection methods have shown promising performance. However, these methods fall into the category that the training and testing sets are from the same distribution, e.g., the same type of forgery or the same dataset, which unfortunately limits their practical applications, as there are always new types of forgeries emerging continuously and widespreading to everywhere on various social platforms. These new types of forgeries are very unlikely to have been included in the existing datasets, and thus they are unseen to these detectors, causing significant performance degradation (see Fig. 1 top part). This circumstance gives rise to a big challenge to DeepFake detectors, that is, how to detect constantly emerging new forgeries.
在过去的几年里,涌现了大量 DeepFake 检测方法[8], [9], [10], [11], [12], [72], [73]。这些检测方法基于最近提出的深度 DeepFake 数据集,如 FaceForensics++ (FF++) [13]和 Celeb-DF [14],展示了良好的性能。然而,这些方法都属于训练集和测试集来自同一分布的类别,例如相同类型的伪造或相同的数据集,这不幸地限制了它们的实际应用,因为新的伪造类型不断涌现并广泛传播到各种社交平台。这些新的伪造类型很可能没有包含在现有数据集中,因此对检测器来说是未知的,导致性能显著下降(见图 1 上部)。这种情况给 DeepFake 检测器带来了巨大挑战,即如何检测不断涌现的新伪造。
Recently, attempts have been made in the literature to solve this issue. One typical line of research is to use a variety of data augmentation to increase the generalization ability [15], [16], [17], [18]. These methods usually create forged faces by augmenting the pristine videos to cover the known types of forgeries as much as possible. Despite of the promisingly improved generalization, the types of augmentation are limited to known forgeries, thus hindering the performance when confronting unseen forgeries. Frequency clue is also used to improve generalization ability [19], [20], [21], [22].
最近,文献中已尝试解决这一问题。一条典型的研究路线是使用多种数据增强来提高泛化能力[15], [16], [17], [18]。这些方法通常通过增强原始视频来生成伪造人脸,尽可能覆盖已知的伪造类型。尽管泛化能力得到了显著提升,但增强的类型仅限于已知的伪造类型,因此在面对未见过的伪造时性能会受限。频率线索也被用于提高泛化能力[19], [20], [21], [22]。

Fig. 1. Overview of traditional forensics (top) and DomainForensics (bottom). Traditional forensics achieves excellent performance on known forgeries but performs poorly on new forgeries. In contrast, DomainForensics can effectively expose new forgeries by performing the proposed bi-directional adaption, which can learn the common forgery features across domains using adversarial training.
图 1. 传统取证(顶部)和 DomainForensics(底部)的概述。传统取证在已知伪造上表现优异,但在新伪造上表现不佳。相比之下,DomainForensics 通过执行所提出的双向适应,能够有效揭示新伪造,该适应方法可以通过对抗训练学习跨域的常见伪造特征。
However, this clue is easily affected by data processing and highly correlated with video quality, which cannot perform consistently across different datasets. A different direction of research is to apply transfer learning, such as zero- and few-shot learning [23], [24], [25], to improving generalization on new forgeries. Since zero-shot learning cannot access the samples of new forgeries in training, its performance is highly suppressed. In contrast, few-shot learning methods relax the restrictions in that they can access a few samples of new forgeries in training. However, this requires the annotation of these samples, which may not be easily obtained in practice, as we may not know whether a face is forged or not, e.g., multiple faces are in view but only video-level labels are provided. Thus, a fundamental question is: can we detect new forgeries by only accessing target samples without any labels, while achieving competitive performance?
然而,这一线索容易受到数据处理的影响,并且与视频质量高度相关,无法在不同数据集上保持一致性。另一种研究方向是应用迁移学习,例如零样本学习和少样本学习[23]、[24]、[25],以提升对新伪造样本的泛化能力。由于零样本学习在训练中无法获取新伪造样本,其性能会受到很大限制。相比之下,少样本学习方法放宽了这一限制,因为它们可以在训练中获取少量新伪造样本。然而,这需要对这些样本进行标注,这在实践中可能难以实现,因为我们可能不知道一张脸是否被伪造,例如,画面中有多张脸,但只提供了视频级别的标签。因此,一个基本的问题是:我们能否仅通过访问目标样本而不需要任何标签,同时实现具有竞争力的性能来检测新伪造样本?
In this paper, we cast DeepFake detection into a new formulation as an unsupervised domain adaptation problem, by transferring the knowledge from the source domain to the target domain, without using any annotations of target samples in training. This is very different from the existing strategies and it offers significant advantages over them. Specifically, for DeepFake detection, we can treat the known forgeries as the source domain and new forgeries as the target domain, see Fig. 1 bottom part. Our goal is to push the DeepFake detector to learn the common forgery features across different domains by only using label-free interested video collections. It is worth noting that this DeepFake detection problem has a significant discrepancy with the general unsupervised domain adaptation
在本文中,我们将 DeepFake 检测重新定义为无监督域适应问题,通过将知识从源域迁移到目标域,而无需在训练中使用目标样本的任何标注。这与现有策略非常不同,并且相对于它们具有显著的优势。具体来说,对于 DeepFake 检测,我们可以将已知的伪造品视为源域,将新的伪造品视为目标域,见图 1 的底部部分。我们的目标是通过仅使用无标签的兴趣视频集合,使 DeepFake 检测器学习跨不同域的常见伪造特征。值得注意的是,这个 DeepFake 检测问题与一般的无监督域适应存在显著差异。

problem, as we aim to learn the common forgery feature from the same category of faces (real or fake), which is more subtle than the semantic features of different categories in the general unsupervised domain adaptation problem (e.g., cat, dog, etc.). To this end, we propose a new unsupervised domain adaptation framework, called DomainForensics, for DeepFake detection. The key to our DomainForensics is a novel bidirectional adaptation strategy. This is very different from the existing DeepFake detection framework which only considers one direction to learn the knowledge supervised by the source domain and transfer it to the target domain. However, since the forgery features are subtle, the one-directional adaptation will inevitably lose a certain amount of knowledge [26], [27], [28], thus limiting the achievable performance on the target domain. To overcome this problem, we design bi-directional adaptation, which first transfers the knowledge from the source domain to the target domain, referred to as forward adaptation, and then reverses the adaptation from the target domain to the source domain, called backward adaptation. The backward adaptation stage utilizes the results of the forward adaptation stage, further explores the knowledge from the target domain, and transfers it back to the source domain. With the mutual adaptation, DeepFake detector can fully grab the common forgery features across domains.
问题在于,我们旨在从同一类别的面部(真实或伪造)中学习常见的伪造特征,这比一般无监督域适应问题中不同类别的语义特征更为微妙(例如猫、狗等)。为此,我们提出了一种新的无监督域适应框架,称为 DomainForensics,用于 DeepFake 检测。DomainForensics 的关键是一种新颖的双向适应策略。这与现有的 DeepFake 检测框架截然不同,后者只考虑一个方向来学习由源域监督的知识,并将其迁移到目标域。然而,由于伪造特征微妙,单向适应将不可避免地丢失一定量的知识[26]、[27]、[28],从而限制在目标域上可达到的性能。为了克服这个问题,我们设计了双向适应,首先将知识从源域迁移到目标域,称为正向适应,然后反向进行从目标域到源域的适应,称为反向适应。 反向适应阶段利用了正向适应阶段的结果,进一步探索目标域的知识,并将其转移回源域。通过相互适应,DeepFake 检测器能够充分捕捉跨域的常见伪造特征。
To verify our idea, we adopt Vision Transformer (ViT) [29] as our DeepFake detector in the experiment, due to its successful application on vision tasks. Other architectures, such as ResNet [30], Xception [31] and EfficientNet [32], can also be used in our framework, and this will also be demonstrated. Since the frequency space can reveal the forgery traces [19], [20], we use color images and corresponding frequency-transformed maps as the input. In the forward adaptation stage, we develop a discriminator that is trained together with the DeepFake detector in an adversarial manner, where the discriminator aims to tell which domain the learned feature is from, and the DeepFake detector aims to extract features that confuse the discriminator. By doing so, the distribution of the target domain is pulled close to the source domain. In the backward adaptation stage, the adaptation is reverted. Since no labels are provided, we employ self-distillation [27] to further excavate the knowledge of the target domain, and then apply the adversarial training to the distilled model, in order to transfer the knowledge back to the source domain. Extensive experiments are conducted on FF + + FF + + FF++\mathrm{FF}++ and Celeb-DF datasets in several cross-domain scenarios, including different manipulation methods, datasets and types, to demonstrate the effectiveness of our method.
为了验证我们的想法,我们在实验中采用视觉 Transformer(ViT)[29]作为我们的 DeepFake 检测器,因为它在视觉任务上取得了成功应用。其他架构,如 ResNet [30]、Xception [31]和 EfficientNet [32],也可以在我们的框架中使用,这也会得到演示。由于频域空间可以揭示伪造痕迹[19]、[20],我们使用彩色图像及其对应的频域变换图作为输入。在前向适应阶段,我们开发了一个判别器,该判别器与 DeepFake 检测器以对抗方式一起训练,其中判别器旨在判断学习到的特征来自哪个域,而 DeepFake 检测器旨在提取让判别器混淆的特征。通过这样做,目标域的分布被拉近到源域。在后向适应阶段,适应过程被逆转。由于没有提供标签,我们采用自蒸馏[27]来进一步挖掘目标域的知识,然后对蒸馏后的模型应用对抗训练,以便将知识转移回源域。 在 FF + + FF + + FF++\mathrm{FF}++ 和 Celeb-DF 数据集上进行了广泛的跨域场景实验,包括不同的操控方法、数据集和类型,以证明我们方法的有效性。
The contribution of this work is summarized as follows.
这项工作的贡献总结如下。
  1. We propose a new DeepFake detection solution called DomainForensics to handle continuously emerged new forgeries. Different from recent efforts, our method focuses on pushing the detectors to learn the common forgery features across domains, that is, to transfer the forgery knowledge from known forgeries to unseen forgeries, instead of empirically blending faces on the data level.
    我们提出了一种名为 DomainForensics 的新型 DeepFake 检测解决方案,以应对不断出现的新伪造。与最近的研究不同,我们的方法着重于推动检测器学习跨域的常见伪造特征,即将从已知伪造中转移伪造知识到未见过的伪造,而不是在数据层面进行经验性的人脸混合。
  2. We propose a new bi-directional adaptation strategy, which first transfers the forgery knowledge from the
    我们提出了一种新的双向适应策略,首先将从

    source domain to the target domain in forward adaptation, and then reverses the adaptation from the target domain to the source domain in backward adaptation. Since the forgery traces are very subtle, we design the backward adaptation stage to further refine the results obtained from the forward adaptation stage with a selfdistillation scheme.
    将源域到目标域进行正向适应,然后在反向适应中从目标域反向适应到源域。由于伪造痕迹非常细微,我们设计了反向适应阶段,通过自蒸馏方案进一步优化正向适应阶段获得的结果。
  3. Extensive experiments are conducted on FF + + FF + + FF++\mathrm{FF}++ and Celeb-DF datasets with several cross-domain scenarios, including crossing manipulation methods, crossing datasets, and crossing generative types, to demonstrate the effectiveness of our method. We also study the effects of various adaptation settings, various amounts of training samples and different components, to provide thoughtful insights for the following research.
    FF + + FF + + FF++\mathrm{FF}++ 和 Celeb-DF 数据集上进行了广泛的实验,包括跨域场景,如跨域操作方法、跨域数据集和跨域生成类型,以证明我们方法的有效性。我们还研究了各种适应设置、不同数量的训练样本和不同组件的影响,为后续研究提供有价值的见解。

    The remainder of this paper is organized as follows. Section II reviews the recent works on DeepFake detection and unsupervised domain adaptation. Section III details our proposed DomainForensics, including the problem formulation, network framework and bi-directional adaptation. Section IV offers extensive experiments and elaborates on the experimental results. The paper concludes in Section V.
    本文的其余部分组织如下。第二节回顾了深度伪造检测和无监督域适应的最新研究。第三节详细介绍了我们提出的 DomainForensics,包括问题公式化、网络框架和双向适应。第四节提供了广泛的实验并详细阐述了实验结果。本文在第五节总结。
In this section, we first present an overview of the existing deepfake detection approaches. We then provide a brief review of unsupervised domain adaptation and discuss the differences between the previous works and our approach.
在这一部分,我们首先概述现有的深度伪造检测方法。然后,我们简要回顾无监督域适应,并讨论先前工作与我们的方法之间的差异。

A. Deepfake Detection  A. 深度伪造检测

With the advent of large-scale DeepFake datasets, e.g., [13] and [14], DeepFake detection has made significant progress in recent years, e.g., [8], [9], [10], [11], [12], [16], [18], [33], [34], [35], [36], [37], [72], and [73]. One challenging problem in this task is how to detect constantly emerging new forgeries. The methods [15], [16], [17], [18], [35], [38], [74] enhance generalization ability by exploring elaborate augmentations on pristine videos, with the aim of covering most of the known forgery types. The limitation of these methods is that the augmentation diversity is restricted to known forgeries. Hence, these methods can hardly handle unknown forgeries. Another vein of methods [10], [19], [20], [21], [22], [39], [40] utilize frequency features to improve generalization ability. However, frequency features can easily be disrupted by post-processing such as compression [41]. Inspired by transfer learning, the methods [23], [24], [25], [37] employ zero-shot and fewshot learning to detect new forgeries. Since zero-shot learning cannot access the samples of new forgeries, its performance gain is severely limited. The few-shot learning needs a small portion of samples and corresponding labels of new forgeries. However, although the video-level label is easily obtained, the face-level label is extremely difficult to obtain in practice.
随着大规模 DeepFake 数据集的出现,例如[13]和[14],近年来 DeepFake 检测取得了显著进展,例如[8]、[9]、[10]、[11]、[12]、[16]、[18]、[33]、[34]、[35]、[36]、[37]、[72]和[73]。这项任务中的一个挑战性问题是如何检测不断出现的新伪造品。方法[15]、[16]、[17]、[18]、[35]、[38]、[74]通过探索原始视频的复杂增强来提高泛化能力,旨在涵盖大多数已知伪造类型。这些方法的局限性在于增强多样性仅限于已知伪造品。因此,这些方法几乎无法处理未知伪造品。另一类方法[10]、[19]、[20]、[21]、[22]、[39]、[40]利用频率特征来提高泛化能力。然而,频率特征很容易被压缩等后处理操作[41]破坏。受迁移学习启发,方法[23]、[24]、[25]、[37]采用零样本学习和少样本学习来检测新伪造品。由于零样本学习无法获取新伪造品的样本,其性能提升受到严重限制。 小样本学习需要少量新伪造样本及其对应的标签。然而,尽管视频级别的标签很容易获得,但在实践中,面部级别的标签极难获得。

B. Unsupervised Domain Adaptation
B. 无监督域适应

Unsupervised domain adaptation (UDA) aims to address the challenge of transferring knowledge from a source domain
无监督域适应(UDA)旨在解决从源域迁移知识的挑战

to a target domain when labeled data is scarce or completely absent in the target domain. Ben-David et al. [42] theoretically revealed that the cross-domain common features serve as latent representations that encapsulate shared and domain-common features across diverse domains. The primary objective is to diminish or eliminate domain-specific variations while retaining domain-agnostic information. The acquisition of cross-domain common representation enhances the model’s reliability to domain shifts by prioritizing task-relevant information that transcends domain-specific discrepancies. Consequently, the model achieves improved generalization to unlabeled target domains, even in the scenarios with limited available data.
在目标域中,当标记数据稀缺或完全不存在时,需要将数据迁移到目标域。Ben-David 等人[42]从理论上揭示了跨域公共特征作为潜在表示,封装了不同域之间的共享和域公共特征。主要目标是通过减少或消除特定域的变异,同时保留与域无关的信息。获取跨域公共表示通过优先考虑超越特定域差异的任务相关信息,增强了模型对域变化的可靠性。因此,模型在有限可用数据的情况下,实现了对未标记目标域的泛化能力提升。
The existing works for addressing UDA can be classified to two main forms, namely, the discrepancy-based approach and the adversarial approach. Concretely, discrepancy-based methods encourage the model to align the domain discrepancy by minimizing the metrics that can measure the distribution discrepancy between the source and target domains [43], [44], [45], [46]. Inspired by the success of generative adversarial network (GAN) [47], recently developed works employed extra adversarial discriminator to align the domain discrepancy, as the feature distributions of source and target domains can be matched by means of confusing the discriminator [48], [49], [50]. In addition, some state-of-the-art methods build up the feature extractor based on modern transformer structure [51], [52], [53], which demonstrates that UDA not only helps traditional CNNs to improve the generalization but also is profitable for transformer-based networks. This motivates us to treat the transformer networks as the cornerstone structure and further explore effective UDA methods for face forgery detection.
解决 UDA(域对抗)的现有工作可分为两种主要形式,即基于差异的方法和对抗方法。具体来说,基于差异的方法鼓励模型通过最小化可以衡量源域和目标域分布差异的指标来对齐域差异[43][44][45][46]。受生成对抗网络(GAN)[47]成功的启发,最近发展的作品采用额外的对抗性判别器来对齐域差异,因为源域和目标域的特征分布可以通过使判别器混淆来匹配[48][49][50]。此外,一些最先进的方法基于现代 Transformer 结构构建特征提取器[51][52][53],这表明 UDA 不仅有助于传统 CNN 提高泛化能力,而且对基于 Transformer 的网络也很有益。这促使我们将 Transformer 网络视为核心结构,并进一步探索有效的 UDA 方法用于人脸伪造检测。
Note that the general UDA task targets transferring the knowledge of the semantic class category. By contrast, our approach differs from the aforementioned UDA methods in that we aim to explore the subtle forgery features in the face category only. We also find that the existing adaptation schemes, which only consider the adaptation from the source domain to the target domain, is unlikely to perform well on our task. In contrast, our proposed bi-directional adaptation strategy can further explore the knowledge from the unlabeled data in the target domain, as such mutual adaptation coupled with knowledge transfer with self-distillation enables the model to learn common forgery features across known and new forgeries. To the best of our knowledge, Chen and Tan [54] is the first work that attempted to solve Deepfake detection using unsupervised domain adaptation. However, it is a trivial usage of a naive existing solution without improvement, and hence the detection performance is not satisfied. By contrast, our DomainForensics adopts a meticulously designed strategy, named bi-directional adaptation, which can fully learn the common forgery features across domains and it is validated under several practical cross-domain scenarios.
请注意,通用 UDA 任务的目标是迁移语义类别知识。相比之下,我们的方法与上述 UDA 方法不同,因为我们旨在仅探索人脸类别中的微妙伪造特征。我们还发现,现有的适应方案,仅考虑从源域到目标域的适应,不太可能在我们的任务上表现良好。相反,我们提出的双向适应策略可以进一步探索目标域中未标记数据中的知识,这种相互适应结合自我蒸馏的知识迁移,使模型能够学习已知和新伪造之间的共同伪造特征。据我们所知,Chen 和 Tan [54] 是第一个尝试使用无监督域适应解决 Deepfake 检测的工作。然而,这是对简单现有解决方案的平凡使用,没有改进,因此检测性能并不令人满意。 相比之下,我们的 DomainForensics 采用了一种精心设计的策略,称为双向适应,它能够全面学习跨域的常见伪造特征,并在多个实际跨域场景中得到验证。

III. DomainForensics

To achieve continuously exposing new forgeries, we formulate DeepFake detection into an unsupervised domain adaptation problem, which transfers the forgery features from
为了持续地揭露新的伪造品,我们将 DeepFake 检测转化为一个无监督域适应问题,该问题将伪造特征从