这是用户在 2025-7-2 22:07 为 https://app.immersivetranslate.com/pdf-pro/6b089af8-48f6-4462-b7d1-4aaf92ddf1b7/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

UNITED STATES DISTRICT COURT NORTHERN DISTRICT OF CALIFORNIA
美国加利福尼亚州北区地方法院

RICHARD KADREY, et al.,  理查德-卡德雷等人、Plaintiffs,  原告v.META PLATFORMS, INC.,  元平台公司Defendant.  被告:

Case No. 23-cv-03417-VC
ORDER DENYING THE PLAINTIFFS' MOTION FOR PARTIAL SUMMARY JUDGMENT AND GRANTING META'S CROSS-MOTION FOR PARTIAL SUMMARY JUDGMENT
案件编号:23-cv-03417-VC 驳回原告要求部分简易判决的动议,同意 META 要求部分简易判决的交叉动议

Re: Dkt. Nos. 482, 501
关于第 482 号、第 501 号
Companies are presently racing to develop generative artificial intelligence modelssoftware products that are capable of generating text, images, videos, or sound based on materials they’ve previously been “trained” on. Because the performance of a generative AI model depends on the amount and quality of data it absorbs as part of its training, companies have been unable to resist the temptation to feed copyright-protected materials into their models-without getting permission from the copyright holders or paying them for the right to use their works for this purpose. This case presents the question whether such conduct is illegal.
目前,各家公司都在竞相开发能够根据先前 "训练 "过的材料生成文本、图像、视频或声音的生成式人工智能模型软件产品。由于生成式人工智能模型的性能取决于其在训练过程中吸收的数据的数量和质量,因此,一些公司无法抵制将受版权保护的材料输入其模型的诱惑--而不征得版权所有者的许可,也不向他们支付为此目的使用其作品的费用。本案提出了这种行为是否违法的问题。
Although the devil is in the details, in most cases the answer will likely be yes. What copyright law cares about, above all else, is preserving the incentive for human beings to create artistic and scientific works. Therefore, it is generally illegal to copy protected works without permission. And the doctrine of “fair use,” which provides a defense to certain claims of copyright infringement, typically doesn’t apply to copying that will significantly diminish the ability of copyright holders to make money from their works (thus significantly diminishing the incentive to create in the future). Generative AI has the potential to flood the market with endless
虽然细节决定成败,但在大多数情况下,答案很可能是肯定的。版权法最关心的是保持人类创作艺术和科学作品的动力。因此,未经许可复制受保护的作品通常是非法的。而 "合理使用 "原则为某些版权侵权索赔提供了辩护理由,通常不适用于会大大削弱版权持有者从其作品中赚钱的能力(从而大大削弱未来创作的动力)的复制行为。生成式人工智能有可能使市场充斥着无穷无尽的

amounts of images, songs, articles, books, and more. People can prompt generative AI models to produce these outputs using a tiny fraction of the time and creativity that would otherwise be required. So by training generative AI models with copyrighted works, companies are creating something that often will dramatically undermine the market for those works, and thus dramatically undermine the incentive for human beings to create things the old-fashioned way.
人们可以使用大量的图片、歌曲、文章、书籍等。人们只需花费极少的时间和创造力,就能促使人工智能生成模型产生这些输出结果。因此,通过使用受版权保护的作品来训练人工智能生成模型,公司正在创造的东西往往会极大地破坏这些作品的市场,从而极大地削弱人类以传统方式创造事物的动力。
Take, for example, biographies. If a company uses copyrighted biographies to train a model, and if the model is thus capable of generating endless amounts of biographies, the market for many of the copied biographies could be severely harmed. Perhaps not the market for Robert Caro’s Master of the Senate, because that book is at the top of so many people’s lists of biographies to read. But you can bet that the market for lesser-known biographies of Lyndon B. Johnson will be affected. And this, in turn, will diminish the incentive to write biographies in the future.
以传记为例。如果一家公司使用受版权保护的传记来训练一个模型,如果该模型因此能够生成无穷无尽的传记,那么许多被复制的传记的市场可能会受到严重损害。也许罗伯特-卡罗的《参议院大师》的市场不会受到影响,因为该书在许多人的传记阅读清单中名列前茅。但可以肯定的是,鲜为人知的林登-约翰逊传记的市场将受到影响。反过来,这也会削弱今后撰写传记的动力。
Or take magazine articles. If a company uses copyrighted magazine articles to train a model capable of generating similar articles, it’s easy to imagine the market for the copied articles diminishing substantially. Especially if the AI-generated articles are made available for free. And again, how will this affect the incentive for human beings to put in the effort necessary to produce high-quality magazine articles?
或者以杂志文章为例。如果一家公司使用受版权保护的杂志文章来训练一个能够生成类似文章的模型,很容易想象复制文章的市场会大幅萎缩。尤其是如果人工智能生成的文章是免费提供的。同样,这又会如何影响人类付出必要努力来制作高质量杂志文章的积极性呢?
With some types of works, the picture is a bit murkier. For example, it’s not clear how generative AI would affect the market for memoirs or autobiographies, since by definition people read those works because of who wrote them. With fiction, it might depend on the type of book. Perhaps classic works of literature like The Catcher in the Rye would not see their markets diminished. But the market for the typical human-created romance or spy novel could be diminished substantially by the proliferation of similar AI-created works. And again, the proliferation of such works would presumably diminish the incentive for human beings to write romance or spy novels in the first place.
对于某些类型的作品,情况就比较模糊了。例如,目前还不清楚生成式人工智能会对回忆录或自传的市场产生怎样的影响,因为根据定义,人们阅读这些作品是因为它们的作者是谁。至于小说,这可能取决于书的类型。也许像《麦田里的守望者》这样的经典文学作品的市场不会减少。但是,人类创作的典型爱情小说或间谍小说的市场可能会因为人工智能创作的类似作品的大量涌现而大大缩小。同样,这类作品的激增也会削弱人类创作爱情小说或间谍小说的动力。
Some students of copyright law respond that none of this matters because when companies use copyrighted works to train generative AI models, they are using the works in a
一些研究版权法的学生回答说,这些都不重要,因为当公司使用受版权保护的作品来训练人工智能生成模型时,他们是在以下情况下使用这些作品的

way that’s highly creative in its own right. In the language of copyright law, the companies’ use of the works is “transformative.” As a factual matter, there’s no disputing that. And as a legal matter, it’s true that you’re less likely to be liable for copyright infringement if you’re copying the work for a transformative purpose. In that situation, you’re more likely to be protected by the fair use doctrine. But as the Supreme Court has emphasized, the fair use inquiry is highly fact dependent, and there are few bright-line rules. There is certainly no rule that when your use of a protected work is “transformative,” this automatically inoculates you from a claim of copyright infringement. And here, copying the protected works, however transformative, involves the creation of a product with the ability to severely harm the market for the works being copied, and thus severely undermine the incentive for human beings to create. Under the fair use doctrine, harm to the market for the copyrighted work is more important than the purpose for which the copies are made.
这种方式本身就极具创造性。用版权法的语言来说,这两家公司对作品的使用具有 "变革性"。就事实而言,这一点无可争议。而从法律角度看,如果你复制作品的目的是为了改变,那么你确实不太可能承担版权侵权责任。在这种情况下,你更有可能受到合理使用原则的保护。但正如最高法院所强调的,合理使用的调查在很大程度上取决于事实,几乎没有什么明线规则。当然,也没有任何规则规定,当你对受保护作品的使用具有 "改变性 "时,就可以自动免于版权侵权索赔。在这里,复制受保护的作品,无论如何具有变革性,都涉及到创造一种产品,这种产品能够严重损害被复制作品的市场,从而严重削弱人类创作的积极性。根据合理使用原则,对版权作品市场的损害比复制的目的更为重要。
Speaking of which, in a recent ruling on this topic, Judge Alsup focused heavily on the transformative nature of generative AI while brushing aside concerns about the harm it can inflict on the market for the works it gets trained on. Such harm would be no different, he reasoned, than the harm caused by using the works for “training schoolchildren to write well,” which could “result in an explosion of competing works.” Order on Fair Use at 28, Bartz v. Anthropic PBC, No. 24-cv-5417 (N.D. Cal. June 23, 2025), Dkt. No. 231. According to Judge Alsup, this “is not the kind of competitive or creative displacement that concerns the Copyright Act.” Id. But when it comes to market effects, using books to teach children to write is not remotely like using books to create a product that a single individual could employ to generate countless competing works with a miniscule fraction of the time and creativity it would otherwise take. This inapt analogy is not a basis for blowing off the most important factor in the fair use analysis.
说到这一点,在最近有关这一话题的一项裁决中,阿尔苏普法官着重强调了生成式人工智能的变革性,而对其可能对被训练作品的市场造成伤害的担忧则一笔带过。他的理由是,这种损害与将作品用于 "训练学童写出好文章 "所造成的损害并无不同,后者可能 "导致竞争作品的爆炸性增长"。关于合理使用的命令,第 28 页,Bartz 诉 Anthropic PBC 案,第 24-cv-5417 号(加州北区法院,2025 年 6 月 23 日),第 231 号文件。阿尔苏普法官认为,这 "不是《版权法》所涉及的那种竞争性或创造性置换"。同上。但就市场效应而言,用书籍来教儿童写作与用书籍来创造一种产品完全不同,后者只需花费极少的时间和创造力,就能创造出无数与之竞争的作品。这种不恰当的类比不能作为放弃合理使用分析中最重要因素的依据。
Another argument offered in support of the companies is more rhetorical than legal: Don’t rule against them, or you’ll stop the development of this groundbreaking technology. The technology is certainly groundbreaking. But the suggestion that adverse copyright rulings would
支持这些公司的另一个理由更多的是修辞而非法律:不要作出不利于它们的裁决,否则就会阻止这一突破性技术的发展。这项技术当然具有突破性。但如果说不利的版权裁决会

stop this technology in its tracks is ridiculous. These products are expected to generate billions, even trillions, of dollars for the companies that are developing them. If using copyrighted works to train the models is as necessary as the companies say, they will figure out a way to compensate copyright holders for it.
阻止这项技术的发展是荒谬的。这些产品预计将为开发这些产品的公司带来数十亿甚至数万亿美元的收益。如果使用受版权保护的作品来训练模型真像这些公司所说的那样有必要,他们会想出办法来补偿版权持有者的。
The upshot is that in many circumstances it will be illegal to copy copyright-protected works to train generative AI models without permission. Which means that the companies, to avoid liability for copyright infringement, will generally need to pay copyright holders for the right to use their materials.
其结果是,在很多情况下,未经许可复制受版权保护的作品来训练人工智能生成模型是违法的。这意味着,为了避免承担版权侵权责任,公司通常需要向版权持有者支付使用其材料的费用。
But that brings us to this particular case. The above discussion is based in significant part on this Court’s general understanding of generative AI models and their capabilities. Courts can’t decide cases based on general understandings. They must decide cases based on the evidence presented by the parties.
但是,这就把我们带到了这个特殊的案例中。上述讨论在很大程度上是基于法院对人工智能生成模型及其能力的一般理解。法院不能根据一般理解来裁决案件。他们必须根据当事人提交的证据来裁决案件。
In this case, thirteen authors —mostly famous fiction writers —have sued Meta for downloading their books from online “shadow libraries” and using the books to train Meta’s generative AI models (specifically, its large language models, called Llama). The parties have filed cross-motions for partial summary judgment, with the plaintiffs arguing that Meta’s conduct cannot possibly be fair use, and with Meta responding that its conduct must be considered fair use as a matter of law. In connection with these fair use arguments, the plaintiffs offer two primary theories for how the markets for their works are affected by Meta’s copying. They contend that Llama is capable of reproducing small snippets of text from their books. And they contend that Meta, by using their works for training without permission, has diminished the authors’ ability to license their works for the purpose of training large language models. As explained below, both of these arguments are clear losers. Llama is not capable of generating enough text from the plaintiffs’ books to matter, and the plaintiffs are not entitled to the market for licensing their works as AI training data. As for the potentially winning argument-that Meta has copied their works to create a product that will likely flood the market with similar works, causing market dilution-the plaintiffs barely give this issue lip service, and they present no
在本案中,十三位作家(主要是著名小说家)起诉 Meta 公司从在线 "影子图书馆 "下载他们的书籍,并将这些书籍用于训练 Meta 公司的生成式人工智能模型(特别是其大型语言模型,称为 Llama)。原告方认为,Meta 公司的行为不可能属于合理使用,而 Meta 公司则回应称,其行为在法律上必须被视为合理使用。关于这些合理使用的论点,原告就其作品的市场如何受到 Meta 复制行为的影响提出了两个主要理论。他们认为,Llama 能够复制他们书籍中的一小段文字。他们还认为,Meta 公司未经许可使用他们的作品进行训练,削弱了作者许可其作品用于训练大型语言模型的能力。如下文所述,这两个论点显然都是失败者。Llama 无法从原告的书籍中生成足够多的文本,而原告也无权将其作品授权为人工智能训练数据。至于有可能获胜的论点--Meta 复制了他们的作品,创造出一种可能会让市场充斥类似作品的产品,从而造成市场稀释--原告对这一问题几乎只字不提,也没有提出任何证据。

evidence about how the current or expected outputs from Meta’s models would dilute the market for their own works.
关于 Meta 模型的现有或预期成果将如何稀释其作品市场的证据。
Given the state of the record, the Court has no choice but to grant summary judgment to Meta on the plaintiffs’ claim that the company violated copyright law by training its models with their books. But in the grand scheme of things, the consequences of this ruling are limited. This is not a class action, so the ruling only affects the rights of these thirteen authors - not the countless others whose works Meta used to train its models. And, as should now be clear, this ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one.
鉴于记录的现状,法院别无选择,只能就原告指控 Meta 公司用其书籍培训模特违反了版权法,做出即决判决。但从大局来看,这一判决的影响是有限的。这不是一起集体诉讼,因此判决只影响到这 13 位作者的权利,而不会影响到 Meta 用其作品训练模型的无数其他作者的权利。而且,现在应该清楚的是,这一裁决并不代表 Meta 使用受版权保护的材料来训练其语言模型是合法的。它只能说明,这些原告提出了错误的论点,而且未能提供支持正确论点的记录。
The goal of copyright law is to promote “broad public availability of literature, music, and the other arts.” Twentieth Century Music Corp. v. Aiken, 422 U.S. 151, 156 (1975). To this end, copyright law incentivizes creativity by giving authors of original works a bundle of exclusive rights-for instance, the rights to prevent others from reproducing or distributing the works. 17 U.S.C. § 106. At the same time, however, copyright law “trades off the benefits of incentives to create against the costs of restrictions on copying.” Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith, 598 U.S. 508, 526 (2023). For example, copyright only protects expression, not underlying ideas, and the duration of copyright protection is limited. See id. (citing 17 U.S.C. §§ 102, 302-305).
版权法的目标是促进 "文学、音乐和其他艺术向公众广泛传播"。二十世纪音乐公司诉艾肯,422 U.S. 151, 156 (1975)。为此,版权法通过赋予原创作品作者一系列专有权--例如,阻止他人复制或传播作品的权利--来激励创作。美国法典》第 17 卷第 106 条。但与此同时,版权法 "将激励创作的好处与限制复制的代价进行了权衡"。安迪-沃霍尔视觉艺术基金会诉戈德史密斯案,598 U.S. 508, 526 (2023)。例如,版权只保护表达,不保护基本思想,而且版权保护期限有限。参见同上。(引用 17 U.S.C. §§ 102, 302-305)。
One major way the Copyright Act strikes a balance between protecting ownership and leaving room for innovation is through the affirmative defense of fair use. Under this doctrine, “the fair use of a copyrighted work . . . for purposes such as criticism, comment, news reporting, teaching . . . , scholarship, or research, is not an infringement of copyright.” 17 U.S.C. § 107. Fair use “permits courts to avoid rigid application of the copyright statute when, on occasion, it would stifle the very creativity which that law is designed to foster.” Google LLC v. Oracle America, Inc., 593 U.S. 1, 18 (2021) (quoting Stewart v. Abend, 495 U.S. 207, 236 (1990)).
版权法》在保护所有权和为创新留有余地之间取得平衡的一个主要方式是通过合理使用这一肯定性辩护。根据这一原则,"为批评、评论、新闻报道、教学......、学术或研究等目的而合理使用受版权保护的作品......不构成对版权的侵犯。、学术或研究等目的而合理使用版权作品......不构成对版权的侵犯"。美国法典》第 17 卷第 107 条。合理使用 "允许法院避免僵化地适用版权法规,因为在某些情况下,这会扼杀版权法旨在培养的创造力"。Google LLC v. Oracle America, Inc., 593 U.S. 1, 18 (2021)(引用 Stewart v. Abend, 495 U.S. 207, 236 (1990))。
The Copyright Act lists four factors to be considered in determining whether a given use is fair:
《版权法》列出了在确定某种使用是否合理时需要考虑的四个因素:
  1. the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
    用途的目的和性质,包括是商业用途还是非营利性教育用途;
  2. the nature of the copyrighted work;
    版权作品的性质;
  3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
    与整个版权作品相比,所使用部分的数量和实质性;以及
  4. the effect of the use upon the potential market for or value of the copyrighted work.
    使用对版权作品潜在市场或价值的影响。

17 U.S.C. § 107.
《美国法典》第 17 卷第 107 条。

While the statute lists these four factors, fair use is a “flexible concept.” Warhol, 598 U.S. at 527 (quotation marks omitted) (quoting Oracle, 593 U.S. at 20). The list is not exhaustive. A particular factor “may prove more important in some contexts than in others.” Oracle, 593 U.S. at 19. And application of the factors “requires judicial balancing, depending upon relevant circumstances, including ‘significant changes in technology.’” Id. (quoting Sony Corp. of America v. Universal City Studios, Inc., 464 U.S. 417, 430 (1984)). The factors may also overlap such that facts relevant to one factor are also relevant to others. See A.V. ex rel. Vanderhye v. iParadigms, LLC, 562 F.3d 630, 642 (4th Cir. 2009). Overall, the factors are not meant to be applied mechanically, but to contribute “to a holistic inquiry”: whether the secondary work is likely to substitute for the original work in the marketplace and therefore undermine the incentive to create. See Romanova v. Amilus Inc., 138 F.4th 104, 117 n. 9 (2d Cir. 2025) (Leval, J.); see also Warhol, 598 U.S. at 528 (referring to substitution as “copyright’s bête noire”).
虽然法规列出了这四个因素,但合理使用是一个 "灵活的概念"。Warhol, 598 U.S. at 527(引号省略)(引用 Oracle, 593 U.S. at 20)。该清单并非详尽无遗。某一因素 "在某些情况下可能比在其他情况下更重要"。甲骨文,593 U.S. at 19。这些因素的应用 "需要根据相关情况,包括'技术的重大变化',进行司法平衡"。同上。(美国索尼公司诉环球影城公司案,464 U.S. 417, 430 (1984))。这些因素也可能相互重叠,与一个因素相关的事实也与其他因素相关。参见 A.V. ex rel. Vanderhye v. iParadigms, LLC, 562 F.3d 630, 642 (4th Cir. 2009)。总之,这些因素不是要机械地应用,而是要有助于 "整体调查":二次创作是否有可能在市场上替代原创作品,从而削弱创作的积极性。见 Romanova v. Amilus Inc., 138 F.4th 104, 117 n. 9 (2d Cir. 2025) (Leval, J.); see also Warhol, 598 U.S. at 528 (referring to substitution as "copyright's bête noire").
Because it “focuses on actual or potential market substitution,” Warhol, 598 U.S. at 536 n.12, the fourth factor is “undoubtedly the single most important element of fair use,” Harper & Row Publishers, Inc. v. Nation Enterprises, 471 U.S. 539, 566 (1985). If the law allowed people to copy your creations in a way that would diminish the market for your works, this would diminish your incentive to create more in the future. Thus, the key question in virtually any case where a defendant has copied someone’s original work without permission is whether allowing people to engage in that sort of conduct would substantially diminish the market for the original
由于第四个因素 "侧重于实际或潜在的市场替代"(沃霍尔案,598 U.S. at 536 n.12),它 "无疑是合理使用的最重要因素"(哈珀-罗出版公司诉民族企业案,471 U.S. 539, 566 (1985))。如果法律允许人们以削弱作品市场的方式复制你的作品,这将削弱你今后创作更多作品的动力。因此,在被告未经许可复制他人原创作品的几乎所有案件中,关键问题是允许人们从事这种行为是否会严重削弱原创作品的市场。

work. See Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569, 590 (1994).
工作。参见 Campbell 诉 Acuff-Rose Music, Inc., 510 U.S. 569, 590 (1994)。

Because fair use is an affirmative defense, the burden of proof is on the party invoking it. Dr. Seuss Enterprises, L.P. v. ComicMix LLC, 983 F.3d 443, 459 (9th Cir. 2020), abrogated on other grounds by Jack Daniel’s Properties, Inc. v. VIP Products LLC, 599 U.S. 140 (2023). In particular, because the fourth factor is the most important, the secondary user (generally, the defendant) will “have difficulty carrying the burden of demonstrating fair use without favorable evidence about relevant markets.” Campbell, 510 U.S. at 590. But while the rightsholder need not prove or present evidence of market harm, they “may bear some initial burden of identifying relevant markets.” Hachette Book Group, Inc. v. Internet Archive, 115 F.4th 163, 194 (2d Cir. 2024); see also Newegg Inc. v. Ezra Sutton, P.A., No. CV 15-01395, 2016 WL 6747629, at *2 (C.D. Cal. Sep. 13, 2016). Moreover, because fair use is a holistic inquiry, the party invoking it “bears the burden on the defense as a whole,” not as to each individual factor. William F. Patry, Patry on Fair Use § 2:5 (May 2025 ed.).
由于合理使用是一种积极抗辩,举证责任在于援引合理使用的一方。Dr. Seuss Enterprises, L.P. v. ComicMix LLC, 983 F.3d 443, 459 (9th Cir. 2020), abrogated on other grounds by Jack Daniel's Properties, Inc. v. VIP Products LLC, 599 U.S. 140 (2023)。特别是,由于第四个因素最为重要,第二使用者(通常是被告)"如果没有相关市场的有利证据,将很难承担证明合理使用的责任"。Campbell, 510 U.S. at 590。但是,虽然权利人不需要证明或提出市场损害的证据,他们 "可能会承担一些识别相关市场的初始责任"。Hachette Book Group, Inc. v. Internet Archive, 115 F.4th 163, 194 (2d Cir. 2024);另见 Newegg Inc. v. Ezra Sutton, P.A.,No. CV 15-01395, 2016 WL 6747629, at *2 (C.D. Cal. Sep. 13, 2016)。此外,由于合理使用是一项整体调查,援引合理使用的一方 "承担整体抗辩的责任",而不是每个单独因素的责任。William F. Patry, Patry on Fair Use § 2:5 (May 2025 ed.).
Fair use is a mixed question of law and fact, but the “question primarily involves legal work.” Oracle, 593 U.S. at 24. Therefore, fair use can be addressed at summary judgment where there are no genuine issues of material fact relevant to fair use. Leadsinger, Inc. v. BMG Music Publishing, 512 F.3d 522, 530 (9th Cir. 2008). By contrast, where there are genuine factual disputes that might affect whether the defendant’s use was fair, those disputes must be resolved by a jury. See Oracle, 593 U.S. at 23-25. Once a jury finds the facts, whether those facts support fair use “is a legal question for judges to decide.” Id. at 23-24.
合理使用是一个法律和事实的混合问题,但 "问题主要涉及法律工作"。Oracle, 593 U.S. at 24。因此,如果与合理使用相关的重要事实不存在真正的争议,则可在简易判决中解决合理使用问题。Leadsinger, Inc. v. BMG Music Publishing, 512 F.3d 522, 530 (9th Cir. 2008)。相反,如果存在可能影响被告的使用是否合理的真正事实争议,这些争议必须由陪审团解决。参见 Oracle, 593 U.S. at 23-25。一旦陪审团认定了事实,这些事实是否支持合理使用 "就是一个由法官决定的法律问题。同上,第 23-24 页。
It bears emphasis that where a fair use defense fails, the consequence isn’t necessarily that the defendant must stop whatever they were doing. The consequence will often be that the defendant needs to pay the copyright owner for a license that grants them permission to do whatever they were doing. This way, the defendant compensates the copyright owner for the fact that the defendant’s conduct will otherwise harm the market for the original work. The defendant will only be forced to stop what they’re doing if they’re unwilling or unable to pay for the right to do it.
需要强调的是,当合理使用辩护失败时,后果并不一定是被告必须停止他们正在做的事情。其后果往往是,被告需要向版权所有者支付许可费,允许他们做他们正在做的事情。这样,被告就可以补偿版权所有者,否则被告的行为就会损害原作品的市场。被告只有在不愿意或没有能力为自己的行为付费的情况下,才会被迫停止自己的行为。

II. FACTS AND PROCEDURAL HISTORY
II.事实和程序史

A

“Generative AI” is a type of artificial intelligence that creates new content, such as text, images, videos, or sound. 1 1 ^(1){ }^{1} Generative AI models do this, as Meta describes it, by extracting “increasingly complex mathematical patterns from training data, enabling the network to output a prediction or decision based on the patterns derived.” To put it more simply, generative AI models are “trained” to identify common patterns across large training datasets. They can then create, in response to user prompts, new content based on the patterns they have recognized in that training data. By the same token, a model’s outputs are limited based on the patterns that existed in its training data. For instance, if the only bridge in an image-generating model’s training data was the Golden Gate Bridge, and a user told that model to generate an image of a bridge, it would likely generate an orange-red suspension bridge, because that is the pattern of a bridge that would emerge from its training data.
"生成式人工智能 "是一种人工智能,它可以创建新的内容,例如文本、图像、视频或声音。 1 1 ^(1){ }^{1} 根据 Meta 的描述,生成式人工智能模型是通过提取 "训练数据中日益复杂的数学模式,使网络能够根据得出的模式输出预测或决策 "来实现这一目标的。更简单地说,生成式人工智能模型经过 "训练",能够识别大型训练数据集中的常见模式。然后,它们就可以根据用户的提示,根据在训练数据中识别出的模式创建新的内容。同样,基于训练数据中存在的模式,模型的输出也是有限的。例如,如果图像生成模型的训练数据中只有金门大桥,而用户告诉该模型要生成一座桥的图像,那么它很可能会生成一座橙红色的吊桥,因为这就是其训练数据中出现的桥的模式。
A large language model, or LLM, is a particular type of generative AI model designed to understand and generate text. Users can prompt LLMs to do a wide range of things, such as draft emails, summarize documents, or write computer code. Well-known LLMs include OpenAI’s ChatGPT models and Google’s Gemini models.
大型语言模型(LLM)是一种特殊的生成式人工智能模型,旨在理解和生成文本。用户可以让 LLM 做很多事情,比如起草电子邮件、总结文档或编写计算机代码。著名的 LLM 包括 OpenAI 的 ChatGPT 模型和谷歌的 Gemini 模型。
LLMs learn to understand language by analyzing relationships among words and punctuation marks in their training data. The units of text-words and punctuation marks-on which LLMs are trained are often referred to as “tokens.” LLMs are trained on an immense amount of text and thereby learn an immense amount about the statistical relationships among words. Based on what they learned from their training data, LLMs can create new text by predicting what words are most likely to come next in sequences. This allows them to generate text responses to basically any user prompt. Model developers may also “post-train” or “finetune” their models to improve their performance at specific tasks or otherwise adjust their outputs, such as to prevent generation of offensive statements. Therefore, as with other
LLM 通过分析训练数据中单词和标点符号之间的关系来学习理解语言。训练 LLM 的文本单位(单词和标点符号)通常被称为 "词块"。LLM 在大量的文本中接受训练,从而学习到大量有关单词之间统计关系的知识。基于从训练数据中学到的知识,LLMs 可以通过预测哪些词最有可能出现在下一个序列中来创建新文本。这样,它们基本上就能根据用户的任何提示生成文本回复。模型开发人员还可以对模型进行 "后训练 "或 "微调",以提高其在特定任务中的性能,或以其他方式调整其输出,例如防止生成攻击性语句。因此,与其他
generative AI models, LLMs’ outputs are limited by their training data. To be able to generate a wide range of text-in different languages or styles, or regarding different subject matter-an LLM’s training dataset must be large and diverse. As one Meta witness put it, “If a model only saw social media posts, for example, it would not do well in generating source computer code.”
与人工智能生成模型一样,LLM 的输出也受到训练数据的限制。为了能够生成不同语言、不同风格或不同主题的各种文本,LLM 的训练数据集必须庞大且多样化。正如一位 Meta 证人所说:"例如,如果一个模型只看到社交媒体上的帖子,那么它在生成计算机源代码方面就不会做得很好。
But while a variety of text is necessary for training, books make for especially valuable training data. This is because they provide very high-quality data for training an LLM’s “memory” and allowing it to work with larger amounts of text at once. (The technical term for how many tokens an LLM can hold in its memory at once is its “context window.”) For instance, an LLM with a better memory will be able to process and respond to longer prompts, incorporate more information into outputs, and remember things from earlier in an exchange, resulting in smoother “conversations.” Books are good data for training LLMs’ memories because, in the words of one of Meta’s expert witnesses, they are “long but consistent,” maintaining a particular style and coherent structure. They are also high quality in the sense that they generally are well written and use proper grammar (especially compared to text from the internet, which varies widely on these metrics).
虽然各种文本对于训练都是必要的,但书籍是特别有价值的训练数据。这是因为书籍为训练 LLM 的 "内存 "提供了非常高质量的数据,使其能够同时处理更大量的文本。(LLM 内存中可同时容纳多少词块的专业术语是 "上下文窗口")。例如,记忆力较好的 LLM 能够处理和响应较长的提示,将更多信息纳入输出,并记住交流中较早的内容,从而使 "对话 "更加流畅。书籍是训练法律硕士记忆力的良好数据,因为用 Meta 的一位专家证人的话来说,书籍 "长而连贯",保持了特定的风格和连贯的结构。此外,书籍的质量也很高,因为它们一般都写得很好,语法使用得当(特别是与互联网上的文本相比,后者在这些指标上差别很大)。

B

Meta Platforms owns and operates social media services including Facebook, Instagram, and WhatsApp. It is also the developer of a series of LLMs named “Llama.” Meta released Llama 1 in February 2023 and Llama 2 that July. Llama 3-along with Meta AI, an easily accessible AI chatbot (analogous to ChatGPT) that incorporates Llama 3-was released in April 2024. Llama 4 is planned for release later in 2025. As Meta explains, each new Llama edition improved in certain ways over its predecessor: Llama 2 was finetuned “to improve the safety, quality, and consistency” of its outputs; Llama 3 made “significant improvements in performance and efficiency”; and Llama 4 is generally “larger” and “more advanced.” Subject to certain restrictions, members of the public can download all of the Llama models for free for noncommercial use; Llama 2 and 3 are also free to download for commercial use. While the Llama models are free to download, Meta estimates that its total revenue from generative AI will
Meta Platforms 拥有并运营 Facebook、Instagram 和 WhatsApp 等社交媒体服务。它还是名为 "Llama "的系列 LLM 的开发商。Meta 于 2023 年 2 月发布了 Llama 1,并于同年 7 月发布了 Llama 2。2024 年 4 月,Meta 发布了 Llama 3 以及 Meta AI,Meta AI 是一个易于访问的人工智能聊天机器人(类似于 ChatGPT),它集成了 Llama 3。Llama 4 计划于 2025 年晚些时候发布。正如 Meta 解释的那样,每个新版的 Llama 都在某些方面比前版有所改进:Llama 2 经过了微调,"提高了输出的安全性、质量和一致性";Llama 3 在 "性能和效率方面有了显著提高";Llama 4 总体上 "更大"、"更先进"。在某些限制条件下,公众可以免费下载所有 Llama 模型用于非商业用途;Llama 2 和 3 也可以免费下载用于商业用途。虽然 Llama 模型是免费下载的,但 Meta 预计其从生成式人工智能获得的总收入将达到

range from $ 2 $ 2 $2\$ 2 to $ 3 $ 3 $3\$ 3 billion in 2025, and from $ 460 $ 460 $460\$ 460 billion to $ 1.4 $ 1.4 $1.4\$ 1.4 trillion over the next ten years. See Pls. MSJ Ex. 8 at 12.
在 2025 年,这一数字将从 $ 2 $ 2 $2\$ 2 亿到 $ 3 $ 3 $3\$ 3 亿不等,在未来十年内,这一数字将从 $ 460 $ 460 $460\$ 460 亿到 $ 1.4 $ 1.4 $1.4\$ 1.4 万亿不等。请参见 Pls.
To get the varied and extensive text necessary to train its models, Meta cast a wide net. Approximately two-thirds of the data used to train Llama 1 and 2 came from Common Crawl, a nonprofit organization that collects and provides free access to website data, metadata, and text. The remainder came from websites and databases including Wikipedia, GitHub, ArXiv, Stack Exchange, and a combination of Project Gutenberg and Books3 (two book databases). 2 2 ^(2){ }^{2} With the exception of Books3, none of the sources contained any copyrighted material at issue in this case.
为了获得训练模型所需的各种广泛文本,Meta 进行了广泛的搜索。用于训练 Llama 1 和 2 的数据约有三分之二来自 Common Crawl,这是一个收集并免费提供网站数据、元数据和文本的非营利组织。其余数据来自维基百科、GitHub、ArXiv、Stack Exchange 等网站和数据库,以及 Project Gutenberg 和 Books3(两个图书数据库)的组合。 2 2 ^(2){ }^{2} 除 Books3 外,其他来源均不包含本案中涉及版权的材料。
Although Meta needed (and acquired and used) a wide range of training data, it especially needed books because, as discussed above, books make for high-quality data. Meta AI researchers and engineers repeatedly discussed the benefits of using books as training data, as well as the need to acquire more books for this use. One Meta employee said that the “best resources we can think of are definitely books.” Pls. MSJ Ex. 18 at 2 . Another said it was “really important for us to get books data ASAP.” Id. Ex. 40 at 2. So as Meta expanded its datasets generally, it also continued to look for more books in particular.
尽管 Meta 需要(并获取和使用)各种训练数据,但它尤其需要书籍,因为如上所述,书籍是高质量的数据。Meta 公司的人工智能研究人员和工程师反复讨论了将书籍作为训练数据的好处,以及为此获取更多书籍的必要性。一名 Meta 员工说,"我们能想到的最好的资源肯定是书籍"。Pls. MSJ Ex. 18 at 2。另一名员工说,"尽快获得书籍数据对我们来说非常重要"。同上,Ex.因此,在 Meta 扩大其数据集的同时,它也在继续寻找更多的书籍。
At first, Meta wanted to license books and so tried to negotiate licensing deals with several major publishers. Meta’s head of generative AI discussed spending up to $ 100 $ 100 $100\$ 100 million on licensing. But as negotiations proceeded, Meta realized that licensing would be more difficult than anticipated. For one thing, publishers generally do not hold the subsidiary rights to license books for AI training. These rights are instead held by individual authors, and there is no organization for collective licensing of such rights. Sinkinson Decl. ISO Meta MSJ థ| 58-59, 62. Even where publishers do hold AI training licensing rights, they do so regionally rather than globally. Meta MSJ Ex. 34 at 22:22-25:15. For another thing, some publishers apparently
起初,Meta 希望获得图书授权,因此试图与几家主要出版商谈判授权协议。Meta 的生成式人工智能部门主管讨论了花费高达 $ 100 $ 100 $100\$ 100 百万美元的授权费用。但随着谈判的进行,Meta 意识到授权比预期的要困难得多。首先,出版商一般不持有用于人工智能训练的书籍的附属许可权。相反,这些权利由作者个人持有,而且没有组织对此类权利进行集体许可。Sinkinson Decl. ISO Meta MSJ థ|58-59,62。即使出版商拥有人工智能培训许可权,也是在地区范围内而非全球范围内。Meta MSJ Ex. 34 at 22:22-25:15。另外,一些出版商显然
ignored Meta’s outreach, and only one gave Meta a pricing proposal. Id. at 23:11-14, 24:2-10.
只有一家公司向 Meta 提出了定价建议。同上,23:11-14、24:2-10。

Eventually, Meta began investigating the possibility of procuring the books (and other text) needed for training by downloading them from “shadow libraries.” A shadow library is an online repository that provides things like books, academic journal articles, music, or films for free download, regardless of whether that media is copyrighted. Meta first used a shadow library in October 2022, when it downloaded the Library Genesis (“LibGen”) database to investigate whether there was value in training Llama on the works it contained. Pls. MSJ Ex. 32 at 3. If the answer was yes, the plan was to then set up licensing agreements for those or similar works. Id. But in spring 2023, after failing to acquire licenses and following escalation to CEO Mark Zuckerberg, Meta decided to just use the works acquired from LibGen as training data. Id. Ex. 61 at 5 . And after confirming that LibGen contained most of the works available for license from certain publishers with which it had been negotiating, Meta abandoned its licensing efforts. Id. Ex. 50 at 131:1-132:10, 383:5-384:12; id. Ex. 57 at 2; id. Ex. 58 at 3; see also id. Ex. 92 at 12. In early 2024, Meta also downloaded Anna’s Archive, a compilation of shadow libraries including LibGen, Z-Library, and others. See id. Ex. 66 at 2-3.
最终,Meta 开始研究从 "影子图书馆 "下载培训所需的书籍(和其他文本)的可能性。影子图书馆是一个在线资料库,提供书籍、学术期刊文章、音乐或电影等供免费下载,而不管这些媒体是否受版权保护。Meta 首次使用影子图书馆是在 2022 年 10 月,当时它下载了 Library Genesis("LibGen")数据库,以调查是否有必要对 Llama 进行有关该数据库所含作品的培训。如果答案是肯定的,则计划为这些作品或类似作品签订许可协议。同上。但在 2023 年春,由于未能获得许可,在上报首席执行官马克-扎克伯格(Mark Zuckerberg)后,Meta 决定只将从 LibGen 获得的作品用作训练数据。同上。Ex. 61 at 5。在确认 LibGen 中包含的大部分作品都可以从与之谈判的某些出版商那里获得许可后,Meta 放弃了许可努力。同上。Ex. 50 at 131:1-132:10, 383:5-384:12; id.Ex. 57 at 2; id.Ex. 58 at 3; see also id.Ex. 92 at 12.2024 年初,Meta 还下载了 "安娜档案"(Anna's Archive),这是一个包括 LibGen、Z-Library 等影子图书馆的汇编。参见同上。同上,前 66 页,第 2-3 段。
To download these large datasets more quickly and without unnecessarily slowing down its networks, Meta torrented them. “Torrenting” is a filesharing technique that entails the simultaneous distribution of small portions of a larger file from many different sources. To be more precise, those sources are many other computer systems that also contain that file. So, for instance, one who torrented LibGen would download small pieces of each book LibGen contains from other users who had copies of LibGen on their computer and who were participating in the torrenting network. The torrenting software would then take those pieces and reassemble them into the original files on the downloader’s computer. 3 3 ^(3){ }^{3}
为了更快地下载这些大型数据集,同时避免不必要地降低网络速度,Meta 采用了 Torrenting 技术。"Torrenting "是一种文件共享技术,即从许多不同来源同时分发一个较大文件的一小部分。更准确地说,这些来源是许多其他计算机系统,它们也包含该文件。因此,举例来说,对 LibGen 进行转发的人可以从其他用户那里下载 LibGen 中每本书的小部分内容,这些用户的计算机上都有 LibGen 的副本,而且他们也参与了转发网络。然后,torrenting 软件会将这些碎片重新组合成下载者计算机上的原始文件。 3 3 ^(3){ }^{3}
Certain torrenting protocols—including the one used by Meta, called BitTorrent-are, by
某些流媒体协议,包括 Meta 使用的 BitTorrent 协议,由于
default, configured so that files downloaded via torrenting may also be reuploaded to other computer systems. This reuploading can occur both while files are still being downloaded (which the parties refer to as “leeching”) and after those files have been fully downloaded (which the parties refer to as “seeding”). Some torrenting protocols-including BitTorrent-are designed to prioritize downloads to users who are also uploading.
默认情况下,通过 Torrenting 下载的文件也可被重新上载到其他计算机系统。这种重新上传既可能发生在文件仍在下载的过程中(双方称之为 "leeching"),也可能发生在这些文件被完全下载之后(双方称之为 "seeding")。包括 BitTorrent 在内的一些 Torrenting 协议的设计目的是让同时上传的用户优先下载。
There is no dispute that Meta torrented LibGen and Anna’s Archive, but the parties dispute whether and to what extent Meta uploaded (via leeching or seeding) the data it torrented. A Meta engineer involved in the torrenting wrote a script to prevent seeding, but apparently not leeching. See Pls. MSJ at 13; id. Ex. 71 Iศ 16-17, 19; id. Ex. 67 at 3, 6-7, 13-16, 24-26; see also Meta MSJ Ex. 38 at 4-5. Therefore, say the plaintiffs, because BitTorrent’s default settings allow for leeching, and because Meta did nothing to change those default settings, Meta must have reuploaded “at least some” of the data Meta downloaded via torrent. The plaintiffs assert further that Meta chose not to take any steps to prevent leeching because that would have slowed its download speeds. Meta responds that, even if it reuploaded some of what it downloaded, that doesn’t mean it reuploaded any of the plaintiffs’ books. It also notes that leeching was not clearly an issue in the case until recently, and so it has not yet had a chance to fully develop evidence to address the plaintiffs’ assertions.
Meta 公司对 LibGen 和 Anna's Archive 进行了转录,这一点没有争议,但双方对 Meta 公司是否以及在多大程度上(通过沥滤或播种)上传了其转录的数据存在争议。一位参与转录的 Meta 工程师写了一个脚本来防止播种,但显然不能防止浸染。参见 Pls.Ex. 71 Iศ 16-17, 19; id.Ex. 67 at 3, 6-7, 13-16, 24-26; see also Meta MSJ Ex. 38 at 4-5.因此,原告称,由于 BitTorrent 的默认设置允许泄密,而且 Meta 没有采取任何措施更改这些默认设置,因此 Meta 必须重新上传 "至少部分 "Meta 通过 torrent 下载的数据。原告进一步声称,Meta 选择不采取任何措施来防止泄密,因为这样做会降低其下载速度。Meta 回应称,即使它重新上传了部分下载内容,也并不意味着它重新上传了原告的任何书籍。Meta 还指出,直到最近,"偷窃 "才明显成为本案中的一个问题,因此它还没有机会针对原告的主张充分收集证据。
Either way, Meta added the books it downloaded to the datasets it used to train the Llama models. It also post-trained its models to prevent them from “memorizing” and outputting certain text from their training data, including copyrighted material. These training efforts, which Meta calls “mitigations,” appear to have been successful. Meta’s expert witness tested them using a method designed to get LLMs to regurgitate material from its training data (which Meta calls “adversarial prompting”). Even using that method, the expert could get no model to generate more than 50 words and punctuation marks (that is, “tokens”) from the plaintiffs" books. And the plaintiffs’ expert could only get the Llama model best at regurgitation to generate 50 words and punctuation marks from the plaintiffs’ books in 60% of tests. She also testified that Llama was not able to reproduce “any significant percentage” of them. Meta MSJ Ex. 24 at 237:16-19; see
无论如何,Meta 都会将下载的书籍添加到用于训练 Llama 模型的数据集中。它还对模型进行了后期训练,以防止模型 "记忆 "和输出训练数据中的某些文本,包括受版权保护的材料。这些被 Meta 称为 "缓解措施 "的训练似乎取得了成功。Meta 公司的专家证人使用一种方法对其进行了测试,这种方法的目的是让 LLM 重新说出训练数据中的材料(Meta 公司称之为 "对抗性提示")。即使使用这种方法,专家也无法让任何模型从原告的书中生成超过 50 个单词和标点符号(即 "标记")。而原告的专家在 60% 的测试中只能让最擅长反刍的 Llama 模型从原告的书中生成 50 个单词和标点符号。她还证实,Llama 无法再现其中的 "任何重要百分比"。Meta MSJ Ex. 24 at 237:16-19; see

also Pls. Ex. 79 9| 70-72, 79, 82-83, 92; Meta MSJ Ex. 23 at 179:22-25, 180:17-181:16. In short, Llama cannot currently be used to read or otherwise meaningfully access the plaintiffs’ books.
另见 Pls.Ex.79 9| 70-72, 79, 82-83, 92;Meta MSJ Ex.23 at 179:22-25, 180:17-181:16。简而言之,Llama 目前无法用于阅读或以其他方式有意义地访问原告的书籍。

C

The plaintiffs are thirteen published authors who have written, and who hold copyright in, various works. Those works are mostly novels, but also include plays, short stories, memoirs, essays, and nonfiction books. Examples include Sarah Silverman’s The Bedwetter, a comic memoir; Rachel Louise Snyder’s No Visible Bruises: What We Don’t Know About Domestic Violence Can Kill Us, a nonfiction book about domestic violence and how to combat it; Junot Díaz’s Pulitzer Prize-winning novel, The Brief Wondrous Life of Oscar Wao; and Andrew Sean Greer’s Less, also a Pulitzer Prize-winning novel. All of the books in which the plaintiffs hold copyright can be found in the datasets Meta downloaded, including both Books3 and the Anna’s Archive databases. In total, Meta downloaded at least 666 copies of books whose copyrights the plaintiffs hold.
原告是 13 位已出版作品的作家,他们撰写了各种作品,并拥有这些作品的版权。这些作品大多是小说,但也包括戏剧、短篇小说、回忆录、散文和非虚构类书籍。这是一本关于家庭暴力和如何打击家庭暴力的非虚构类书籍;朱诺-迪亚斯(Junot Díaz)的普利策奖获奖小说《奥斯卡-沃的短暂奇妙生活》(The Brief Wondrous Life of Oscar Wao);以及安德鲁-肖恩-格里尔(Andrew Sean Greer)的《更少》(Less),也是一本普利策奖获奖小说。原告拥有版权的所有书籍都可以在 Meta 下载的数据集中找到,包括 Books3 和 Anna's Archive 数据库。Meta 总共下载了至少 666 本原告拥有版权的书籍。
Each plaintiff says that they would be open to licensing their books for use as generative AI training data, but that Meta did not approach them about this licensing. No plaintiff has licensed a book to any company for use as LLM training data or been asked by any company to license a book for that purpose.
每名原告都表示,他们愿意将自己的书籍授权用作人工智能生成训练数据,但 Meta 公司并未就这一授权事宜与他们接触。没有任何原告向任何公司授权将图书用作 LLM 训练数据,也没有任何公司要求原告为此目的授权图书。
The plaintiffs filed this lawsuit seeking to represent a class of all owners of copyrighted works used as training data for Llama. They brought claims for direct copyright infringement (based on Meta’s reproduction of their books), vicarious copyright infringement, removal of copyright management information in violation of the Digital Millennium Copyright Act (DMCA), unfair competition, unjust enrichment, and negligence. The plaintiffs seek damages, restitution, and injunctive and declaratory relief, although it is not entirely clear what exactly they seek to enjoin. They did not, for instance, seek a preliminary injunction preventing Meta from using their works as training data or requiring Meta to retrain the existing Llama models on data excluding their books. Cf. Concord Music Group, Inc. v. Anthropic PBC, No. 24-cv-3811,
原告提起诉讼,要求代表所有被用作 Llama 训练数据的版权作品的所有者。他们提出了直接版权侵权(基于 Meta 对其书籍的复制)、间接版权侵权、违反《数字千年版权法案》(DMCA)删除版权管理信息、不正当竞争、不当得利和疏忽等索赔。原告寻求损害赔偿、恢复原状、禁令和宣告性救济,但并不完全清楚他们究竟寻求禁止什么。例如,他们并没有要求发布初步禁令,禁止 Meta 将他们的作品用作训练数据,也没有要求 Meta 在不包括他们书籍的数据上重新训练现有的 Llama 模型。参见 Concord Music Group, Inc、
2025 WL 904333, at *3-4 (N.D. Cal. Mar. 25, 2025) (discussing music publishers’ motion for a preliminary injunction seeking relief based on future training of AI models).
2025 WL 904333, at *3-4 (N.D. Cal. Mar. 25, 2025)(讨论音乐出版商申请初步禁令,寻求基于未来人工智能模型训练的救济)。
All of the claims except the direct copyright infringement claim were dismissed early on. The plaintiffs were later granted leave to amend to expand their copyright claim to encompass a theory of infringement by distribution (based on the allegation that Meta was reuploading the data it torrented), and to add both a different DMCA claim and a claim under the California Comprehensive Computer Data Access and Fraud Act (CDAFA). Meta moved to dismiss the new claims, and its motion was granted as to the CDAFA claim but denied as to the DMCA claim.
除直接版权侵权索赔外,其他所有索赔均在早期被驳回。原告后来获准修改诉讼请求,将其版权诉讼请求扩大到包括传播侵权理论(基于 Meta 重新上传其下载的数据的指控),并增加了一项不同的《数字千年版权法》诉讼请求和一项根据《加利福尼亚州计算机数据访问和欺诈综合法案》(CDAFA)提出的诉讼请求。Meta 提出动议,要求驳回这些新的诉讼请求,其关于 CDAFA 诉讼请求的动议获得批准,但关于 DMCA 诉讼请求的动议被驳回。
Often, the next step in a case like this is a motion for class certification by the named plaintiffs. But sometimes the parties will first move for summary judgment regarding the individual claims of the named plaintiffs. For defendants in such cases, there is a trade-off. On the one hand, a defendant could benefit by getting a favorable ruling and disposing of the case before being subjected to expensive and burdensome class-related discovery and motion practice. On the other hand, this favorable ruling for the defendant binds only the individual named plaintiffs, leaving all other members of the proposed class free to sue on the same claims. In this case, Meta proposed doing summary judgment regarding the individual claims of the named plaintiffs first, and the Court accepted this approach.
通常情况下,此类案件的下一步是由具名原告提出集体认证动议。但有时,双方会首先就具名原告的个人索赔申请简易判决。对于此类案件的被告来说,需要权衡利弊。一方面,被告可以通过获得有利裁决而获益,并在进行昂贵而繁琐的集体相关发现和动议实践之前处理完案件。另一方面,对被告有利的裁决只对具名原告个人有约束力,而拟议集体中的所有其他成员都可以就相同的索赔提起诉讼。在本案中,Meta 提议先对具名原告的个人索赔进行简易判决,法院接受了这一做法。
Thus, after the close of discovery relating to the merits of the named plaintiffs’ claims, the plaintiffs moved for partial summary judgment, arguing that they had made out a facial claim for copyright infringement and that Meta’s fair use defense could not possibly apply to negate that claim. Meta did not dispute that the plaintiffs established a facial case of infringement of their rights of reproduction. But Meta opposed the plaintiffs’ motion-and indeed filed its own cross-motion-on the ground that its reproduction was fair use as a matter of law.
因此,在有关原告诉讼请求实质的调查结束后,原告提出了部分简易判决的动议,认为他们已经提出了侵犯版权的表面诉讼请求,Meta 公司的合理使用抗辩不可能适用于否定该诉讼请求。对于原告提出的侵犯其复制权的表面证据,Meta 公司并未提出异议。但 Meta 公司反对原告的动议,并提出了自己的交叉动议,理由是其复制行为在法律上属于合理使用。
Meta also moved for summary judgment as to the plaintiffs’ DMCA claim; that motion will be granted in a separate order. With respect to the plaintiffs’ claim that Meta infringed their copyrights by distributing their works (via leeching or seeding), neither side moved for summary
Meta 还请求对原告的《数字千年版权法》索赔进行简易判决;该请求将在另一项命令中予以批准。关于原告声称 Meta 通过传播其作品(通过 "leeching "或 "seeding")侵犯其版权的主张,双方均未提出即决判决动议。

judgment, so this will remain a live issue in the case. 4 4 ^(4){ }^{4}
因此,这仍将是本案中的一个现实问题。 4 4 ^(4){ }^{4}

Six friend-of-the-court briefs (that is, amicus briefs) were also filed. An assortment of intellectual property law professors and the Electronic Frontier Foundation (a civil liberties organization) filed briefs in support of Meta. An assortment of copyright professors; the Copyright Alliance (an organization of creators); the Association of American Publishers; and the International Association of Scientific, Technical and Medical Publishers filed briefs in support of the plaintiffs.
此外,还提交了六份法庭之友书状(即法庭之友书状)。多位知识产权法教授和电子前沿基金会(一个公民自由组织)提交了支持 Meta 的辩护状。各类版权教授、版权联盟(一个创作者组织)、美国出版商协会以及国际科学、技术和医学出版商协会提交了支持原告的辩护状。

III. FACTOR ONE: THE PURPOSE AND CHARACTER OF THE USE
III.因素一:用途的目的和特点

The first factor “considers the reasons for, and nature of, the copier’s use of an original work.” Warhol, 598 U.S. at 528. Several things can be relevant to the “purpose and character” of a use. One is whether that use is “of a commercial nature or is for nonprofit educational purposes.” 17 U.S.C. § 107(1). Another might be whether it was made in good or bad faith (although whether this is relevant is unclear under current law). See Oracle, 593 U.S. at 32-33.
第一个因素是 "考虑复制者使用原创作品的原因和性质"。沃霍尔案,598 U.S. at 528。有几种情况可能与使用的 "目的和性质 "有关。其一是使用是否 "具有商业性质或用于非营利性教育目的"。17 U.S.C. § 107(1)。另一个可能是使用是出于善意还是恶意(尽管根据现行法律,这一点是否相关尚不清楚)。参见 Oracle, 593 U.S. at 32-33。
Primarily, however, the first factor focuses on whether the secondary use is “transformative”-that is, on whether and to what extent "the new work merely supersedes the
不过,第一个因素主要关注的是二次使用是否具有 "变革性",即 "新作品是否以及在多大程度上仅仅取代了原作品"。
objects of the original creation (supplanting the original), or instead adds something new, with a further purpose or different character." Warhol, 598 U.S. at 528 (cleaned up). Allowing a use with a “distinct purpose” is often consistent with the goals of copyright because it encourages the development of new expression “without diminishing the incentive to create.” Id. at 531. On the other hand, a secondary use with the same purpose as the original work is “more likely to provide the public with a substantial substitute for” the original. Id. at 531-32 (cleaned up).
沃霍尔,598 U.S.,第 528 页(已清理)。Warhol, 598 U.S. at 528(已清理)。允许具有 "独特目的 "的使用通常符合版权的目标,因为它鼓励发展新的表达方式,"而不会削弱创作的积极性"。同上,第 531 页。另一方面,与原作目的相同的二次使用 "更有可能为公众提供原作的实质性替代品"。同上,第 531-32 页(已清理)。
This factor favors Meta. There is no serious question that Meta’s use of the plaintiffs’ books had a “further purpose” and “different character” than the books-that it was highly transformative. The purpose of Meta’s copying was to train its LLMs, which are innovative tools that can be used to generate diverse text and perform a wide range of functions. Cf. Oracle, 593 U.S. at 30 (transformative to use copyrighted computer code “to create a new platform that could be readily used by programmers”). Users can ask Llama to edit an email they have written, translate an excerpt from or into a foreign language, write a skit based on a hypothetical scenario, or do any number of other tasks. The purpose of the plaintiffs’ books, by contrast, is to be read for entertainment or education.
这一因素对 Meta 有利。毫无疑问,Meta 公司对原告书籍的使用具有 "进一步的目的 "和与原告书籍 "不同的性质",即具有高度的转换性。Meta 复制的目的是为了培训其 LLM,而 LLM 是一种创新工具,可用于生成各种文本并执行各种功能。参见 Oracle 案,593 U.S. at 30(使用受版权保护的计算机代码 "创建一个程序员可随时使用的新平台 "具有变革性)。用户可以要求 Llama 编辑他们所写的电子邮件,将摘录的内容翻译成外语,根据假设的情景编写短剧,或完成其他任何任务。相比之下,原告书籍的目的在于娱乐或教育。
The plaintiffs do not meaningfully disagree about Llama’s purpose. To the contrary, they acknowledge that LLMs have “end uses” including serving “as a personal tutor,” assisting “with creative ideation,” and helping users “generate business reports.” And several of the plaintiffs testified to using LLMs for various purposes, all distinct from creating or reading an expressive work like a novel or biography-for instance, to find recipes, get tax or medical advice, translate documents, or conduct research. All of these functions are different from the use to which the plaintiffs’ books are generally put. So copying the books to develop a tool that can perform those functions is a use with a different purpose and character than the books themselves.
原告对 Llama 的目的并无重大分歧。相反,他们承认 LLM 的 "最终用途 "包括 "作为个人导师"、协助 "创意构思 "和帮助用户 "生成商业报告"。几名原告作证说,他们使用法律硕士的目的多种多样,都不同于创作或阅读小说或传记等表达性作品--例如,查找食谱、获取税务或医疗建议、翻译文件或进行研究。所有这些功能都不同于原告书籍的一般用途。因此,复制书籍以开发能够实现这些功能的工具,是一种与书籍本身具有不同目的和性质的使用。

A

The plaintiffs’ law professor amici argue that Meta’s use has the same purpose and character as the books because an LLM training on a book is akin to a human reading one. One might also analogize Meta’s copying of the books to train Llama to a situation in which a
原告的法律教授之友认为,Meta 的使用具有与书籍相同的目的和性质,因为法律硕士对书籍的训练类似于人类阅读书籍。我们也可以将 Meta 复制书籍以训练 Llama 的行为类比为一个人在阅读书籍时的行为。

professor copies a book and gives it to a student so that the student can use the knowledge from the book (along with knowledge they get from other books) to go do great things. But there are a few important differences.
教授复制一本书,然后把它交给学生,这样学生就可以利用书中的知识(以及从其他书中获得的知识)去做大事。但两者之间有一些重要的区别。
First, an LLM’s consumption of a book is different than a person’s. An LLM ingests text to learn “statistical patterns” of how words are used together in different contexts. It does so by taking a piece of text from its training data, removing a word from that text, predicting what that word will be, and updating its general understanding of language based on whether it was right or wrong-and then repeating this exercise billions or trillions of times with different text. This is not how a human reads a book.
首先,法学硕士阅读书籍的方式与普通人不同。LLM 读取文本的目的是学习不同语境中词语搭配使用的 "统计模式"。它的做法是从训练数据中提取一段文本,从文本中删除一个单词,预测该单词的用法,并根据预测的对错更新对语言的总体理解,然后用不同的文本重复数十亿或数万亿次这样的练习。这不是人类阅读书籍的方式。
Second, unlike the hypothetical professor, Meta did not just give the plaintiffs’ books to one person. Meta copied the plaintiffs’ books as part of an effort to create a tool that can generate a wide range of text. Any person can use that tool to help them create further expression, whether by having it help them brainstorm or research for a creative writing project (like plaintiff David Henry Hwang, a playwright and screenwriter) or by having it write code to develop new software programs (like Lockheed Martin). By creating a tool that anyone can use, Meta’s copying has the potential to exponentially multiply creative expression in a way that teaching individual people does not. Cf. Oracle, 593 U.S. at 30.
其次,与假定的教授不同,Meta 公司并不是只将原告的书籍交给一个人。Meta 复制原告的书籍是为了创造一种可以生成大量文本的工具。任何人都可以使用该工具来帮助自己进一步创作,无论是让它帮助自己为创作项目进行头脑风暴或研究(如原告大卫-亨利-黄(David Henry Hwang),一位剧作家和编剧),还是让它编写代码来开发新的软件程序(如洛克希德-马丁公司)。通过创建一个任何人都能使用的工具,Meta 的复制有可能使创造性表达成倍增长,而这是对个人进行教学所无法实现的。参见 Oracle, 593 U.S. at 30。
In contrast to the copyright professors, the plaintiffs make different (and much weaker) arguments for why Meta’s use is not transformative. For example, the plaintiffs suggest that Llama has “no critical bearing” on their books, the way criticism or parody would. But “critique or commentary on the original” are not “the only uses that will furnish a justification ultimately qualifying as fair use.” Romanova, 138 F.4th at 115. To the contrary, a use that enables “the furnishing of valuable information on any subject of public interest” or renders “a valuable service to the public” might be justified, especially where that benefit is “provided without allowing public access to the copy.” Id.
与版权教授的观点不同,原告对 Meta 的使用不具有变革性提出了不同的论据(也弱得多)。例如,原告认为 Llama 对他们的书籍 "没有任何批判性影响",就像批评或模仿那样。但 "对原著的批评或评论 "并不是 "唯一能提供最终符合合理使用的理由的用途"。Romanova, 138 F.4th at 115。相反,能够 "就任何公众感兴趣的主题提供有价值的信息 "或提供 "对公众有价值的服务 "的使用可能是合理的,特别是当这种利益是 "在不允许公众获取复制品的情况下提供的"。同上。
In addition, the plaintiffs argue that Meta’s use is not transformative because Llama will output material that “mimics” the plaintiffs’ work or writing styles if prompted to do so.
此外,原告还辩称,Meta 的使用不具有转换性,因为 Llama 会在提示下输出 "模仿 "原告作品或写作风格的材料。
Therefore, the plaintiffs say, Meta’s use “merely amounts to a ‘repackaging’” of their books. The plaintiffs point to evidence that they say shows that Meta trained Llama to be able to emulate certain writers’ styles. Pls. Reply Exs. 111-14. But this evidence does not show that Meta trained Llama to repackage the plaintiffs’ works. To the contrary, as noted above, even using “adversarial” prompts designed to get Llama to regurgitate its training data, Llama will not produce more than 50 words of any of the plaintiffs’ books. Pls. Reply Ex. 79 【I 79, 82-83, 92. And there is no indication that it will generate longer portions of text that would function as “repackaging” of those books. Nor is there even any indication that, as the plaintiffs’ amici claim, Meta developed Llama with the purpose of enabling it to create books that compete with the plaintiffs’ (without rising to the level of repackaging them). 5 5 ^(5){ }^{5} So at most, this evidence shows that Meta wanted Llama to be able to generate text in certain styles. But style is not copyrightable-only expression is. See 17 U.S.C. § 102(b); cf. Mattel, Inc. v. MGA Entertainment, Inc., 616 F.3d 904, 916 (9th Cir. 2010). Even if one possible use of Llama is to generate text with similarities to unprotectable aspects of the plaintiffs’ books, that does not mean Meta’s copying had the same purpose as those books. 6 6 ^(6){ }^{6}
因此,原告说,Meta 的使用 "仅仅相当于'重新包装'"他们的书籍。原告指出,有证据表明,Meta 曾训练 Llama 模仿某些作家的风格。Pls.Reply Exs.但这些证据并不能证明 Meta 训练 Llama 重新包装原告的作品。相反,如上所述,即使使用旨在让 Llama 重复其训练数据的 "对抗性 "提示,Llama 也不会生成超过 50 个字的原告书籍。Pls.Reply Ex.79 【I 79, 82-83, 92.也没有任何迹象表明,Llama 会制作更长的文字部分,对这些书籍进行 "重新包装"。甚至也没有任何迹象表明,正如原告之友所声称的那样,Meta 开发 Llama 的目的是为了使其能够创建与原告的图书相竞争的图书(但并没有达到重新包装的程度)。 5 5 ^(5){ }^{5} 因此,这些证据最多只能表明,Meta 希望 Llama 能够生成具有特定风格的文本。但是,风格是不可受版权保护的,只有表达方式才受版权保护。参见 17 U.S.C. § 102(b);参见 Mattel, Inc.即使 Llama 的一种可能用途是生成与原告书籍中不可保护的内容相似的文本,这也并不意味着 Meta 的复制与这些书籍具有相同的目的。 6 6 ^(6){ }^{6}

B

As noted earlier, whether the secondary use is transformative doesn’t dictate the outcome of the first factor analysis (let alone of the entire fair use inquiry). Also relevant is the commercial nature of Meta’s use. Although Llama is available under a free license, it was ultimately developed for commercial reasons, and Meta expects it to generate 460 billion to 1.4 trillion dollars in revenue over the next ten years. Pls. MSJ Ex. 8 at 2. That a use is commercial
如前所述,二次使用是否具有转换性并不决定第一个因素分析的结果(更不用说整个合理使用调查的结果)。与此相关的还有 Meta 使用的商业性质。虽然 Llama 是在免费许可下提供的,但它最终是出于商业目的而开发的,Meta 公司预计它将在未来十年内创造 4600 亿至 1.4 万亿美元的收入。Pls.
“tends to weigh against a finding of fair use” because, all else equal, commercial copying is less justified than noncommercial copying. Warhol, 598 U.S. at 537 & n. 13 (quoting Harper & Row, 471 U.S. at 562). So the fact that Llama may make Meta many billions of dollars is relevant and shouldn’t be completely brushed aside, as Meta tries to do. As discussed later, if copying would result in market harm to the protected works, it could matter a great deal whether the copying was part of a for-profit endeavor as opposed to, say, an academic endeavor. Nevertheless, commercialism isn’t dispositive of the first factor and tends to be less important when the secondary use is highly transformative. See Oracle, 593 U.S. at 32; Kelly v. Arriba Soft Corp., 336 F.3d 811, 818 (9th Cir. 2003). Thus, while the profit Meta stands to gain from its development of a product trained on the plaintiffs’ works is relevant to the fair use analysis overall, it does not tilt the first factor in the plaintiffs’ favor.
"倾向于不利于合理使用的认定",因为在其他条件相同的情况下,商业性复制比非商业性复制更不合理。Warhol, 598 U.S. at 537 & n. 13(引用 Harper & Row, 471 U.S. at 562)。因此,"Llama "可能为 Meta 赚取数十亿美元的事实是相关的,不应该像 Meta 试图做的那样被完全忽略。正如后面所讨论的,如果复制会对受保护作品造成市场损害,那么复制是否属于营利活动而不是学术活动就非常重要了。尽管如此,商业性并不是第一个因素的决定性因素,而且当二次使用具有高度变革性时,商业性往往就不那么重要了。参见 Oracle, 593 U.S. at 32; Kelly v. Arriba Soft Corp., 336 F.3d 811, 818 (9th Cir. 2003)。因此,尽管 Meta 公司开发基于原告作品的产品所获得的利润与合理使用的整体分析相关,但它并不能使第一个因素对原告有利。
The same is true of the manner in which Meta acquired the plaintiffs’ books. The plaintiffs are wrong that the fact that Meta downloaded the books from shadow libraries and did not start with an “authorized copy” of each book gives them an automatic win. To say that Meta’s downloading was “piracy” and thus cannot be fair use begs the question because the whole point of fair use analysis is to determine whether a given act of copying was unlawful. See generally Amicus Br. of Electronic Frontier Foundation. Although the Federal Circuit once suggested the contrary in Atari Games Corp. v. Nintendo of America Inc., 975 F.2d 832, 843 (Fed. Cir. 1992), that opinion overread the cases on which it relied for its statement about the need to start with an authorized copy, see, e.g., Religious Technology Center v. Netcom On-Line Communication Services, Inc., 923 F. Supp. 1231, 1244 n. 14 (N.D. Cal. 1995) (discussing Atari’s reasoning and concluding that the cases it relied on misread Harper & Row).
Meta 获取原告书籍的方式也是如此。原告认为 Meta 从影子图书馆下载书籍,而不是从每本书的 "授权副本 "开始,就能自动获胜,这种说法是错误的。如果说 Meta 的下载行为是 "盗版",因此不能算作合理使用,这未免有些自欺欺人,因为合理使用分析的全部意义就在于确定特定的复制行为是否违法。参见电子前线基金会的法庭之友陈述。尽管联邦巡回法院曾在 Atari Games Corp. v. Nintendo of America Inc、Religious Technology Center v. Netcom On-Line Communication Services, Inc., 923 F. Supp. 1231, 1244 n. 14 (N.D. Cal. 1995)(讨论 Atari 的推理并得出结论认为其依据的案例误读了 Harper & Row)。
But Meta is also wrong to suggest that its use of shadow libraries is irrelevant to whether its copying was fair use. It’s relevant-or at least potentially relevant-in a few different ways.
但 Meta 认为其使用影子库与复制是否属于合理使用无关的说法也是错误的。它在几个不同方面是相关的,或者至少是潜在相关的。
First, Meta’s use of shadow libraries is relevant to the issue of bad faith, which is “often taken up under the first factor.” Oracle, 593 U.S. at 32. The law is in flux about whether bad faith is relevant to fair use. Compare, e.g., id. at 32 (noting that “skepticism about whether bad
首先,Meta 对影子库的使用与恶意问题有关,而恶意问题 "通常在第一个因素下讨论"。甲骨文,593 U.S. at 32。关于恶意是否与合理使用相关的问题,法律一直在变化。例如,比较同上,第 32 页(指出 "对恶意是否与合理使用相关持怀疑态度")。

faith has any role in a fair use analysis” is “justifiable”), with Perfect 10, Inc. v. Amazon.com, Inc., 508 F.3d 1146, 1164 n. 8 (9th Cir. 2007) ("[A] party claiming fair use must act in a manner generally compatible with principles of good faith and fair dealing."), and Triller Fight Club II LLC v. H3 Podcast, No. CV21-3942, 2023 WL 11877604, at *8 (C.D. Cal. Sep. 15, 2023) (determining that Perfect 10’s language regarding good faith was still binding despite Oracle’s “skepticism”). It seems like good faith versus bad faith shouldn’t be especially relevant: The purpose of fair use is to allow new expression that won’t substitute for the original work, and whether a given use was made in good or bad faith wouldn’t seem to affect the likelihood of that use substituting for the original. 7 7 ^(7){ }^{7} But even if bad faith is relevant, it doesn’t move the needle here, given the rest of the summary judgment record. See Oracle, 593 U.S. at 33 (describing bad faith as a “factbound consideration” that was “not determinative” in that case).
在合理使用分析中,"诚信 "的作用是 "合理的"),以及 Perfect 10, Inc.1146, 1164 n. 8 (9th Cir. 2007)("主张合理使用的一方必须以符合善意和公平交易原则的方式行事"),以及 Triller Fight Club II LLC v. H3 Podcast, No. CV21-3942, 2023 WL 11877604, at *8 (C.D. Cal. Sep. 15, 2023)(认定尽管甲骨文公司 "持怀疑态度",Perfect 10 关于善意的措辞仍具有约束力)。善意与恶意似乎并不特别相关:合理使用的目的是允许不取代原作品的新表达,而某一使用是善意还是恶意似乎并不影响该使用取代原作品的可能性。 7 7 ^(7){ }^{7} 但是,即使恶意是相关的,考虑到简易判决记录的其余部分,它在这里也不起作用。参见 Oracle, 593 U.S. at 33(将恶意描述为 "与事实相关的考虑因素",在该案中 "不具有决定性")。
Second, downloading copyrighted material from shadow libraries would be relevant if it benefitted those who created the libraries and thus supported and perpetuated their unauthorized copying and distribution of copyrighted works. In the vast majority of cases, this sort of peer-topeer file-sharing will constitute copyright infringement. Some of the libraries Meta used have themselves been found liable for infringement. See, e.g., Elsevier Inc. v. Sci-Hub, No. 15-cv4282, 2017 WL 3868800, at *1-2 (S.D.N.Y. June 21, 2017) (entering default judgment and finding that LibGen was liable for willful copyright infringement). Some of their operators have even been indicted for criminal copyright infringement. See Indictment, United States v.
其次,从影子图书馆下载受版权保护的资料,如果能使那些创建图书馆的人受益,从而支持和延续他们未经授权复制和传播受版权保护作品的行为,则与此相关。在绝大多数情况下,这种点对点文件共享会构成版权侵权。Meta 使用的一些图书馆本身也被认定负有侵权责任。例如,参见 Elsevier Inc. 诉 Sci-Hub,No. 15-cv4282,2017 WL 3868800,at *1-2(S.D.N.Y. June 21, 2017)(作出缺席判决,认定 LibGen 对故意侵犯版权负有责任)。其中一些经营者甚至因刑事侵犯版权而被起诉。见起诉书,美国诉
Napolsky, No. 22-cr-525 (E.D.N.Y. Nov. 16, 2022), Dkt. No. 4 (indictment against founders of
Napolsky,No. 22-cr-525(纽约东区 2022 年 11 月 16 日),Dkt.
Z-Library). So if Meta’s act of downloading propped up these libraries or perpetuated their unlawful activities-for instance, if they got ad revenue from Meta’s visits to their websitesthen that could affect the “character” of Meta’s use. But the plaintiffs have not submitted any evidence about this. In any event, because any such effects would be even more relevant to the fourth factor (insofar as they could contribute to use of the libraries and thus infringement by others), this issue is discussed below as part of the analysis of that factor. 8 8 ^(8){ }^{8}
Z-Library)。因此,如果 Meta 的下载行为支撑了这些图书馆,或使它们的非法活动永久化--例如,如果它们从 Meta 访问其网站中获得了广告收入,那么这可能会影响 Meta 使用的 "性质"。但原告并未就此提交任何证据。无论如何,由于任何此类影响都与第四个因素更加相关(因为它们可能会促进他人使用这些库,从而造成侵权),因此下文将在分析该因素时讨论这一问题。 8 8 ^(8){ }^{8}

C

The last issue relating to the character of Meta’s use (and thus the first factor) is the relationship between Meta’s downloading of the plaintiffs’ books and Meta’s use of the books to train Llama. To the extent the plaintiffs suggest that the former must be considered wholly separately from the latter, they are wrong. To be sure, Meta’s downloading is a different use from any copying done in the course of LLM training. But that downloading must still be considered in light of its ultimate, highly transformative purpose: training Llama. See Authors Guild v. Google, Inc. (Google Books), 804 F.3d 202, 216-18 (2d Cir. 2015) (considering the creation of digital copies of books in light of the secondary user’s overall purpose of creating a searchable database); cf. Warhol, 598 U.S. at 533 (noting that different uses must be considered separately, but that “the same copying may be fair when used for one purpose but not another”); contra Order on Fair Use at 18, Bartz, No. 24-cv-5417. Because Meta’s ultimate use of the plaintiffs’ books was transformative, so too was Meta’s downloading of those books.
与 Meta 的使用性质有关的最后一个问题(因此也是第一个因素)是 Meta 下载原告书籍与 Meta 使用书籍训练 Llama 之间的关系。原告认为前者必须与后者完全分开考虑,这种观点是错误的。可以肯定的是,Meta 的下载行为不同于在 LLM 培训过程中的任何复制行为。但这种下载仍然必须根据其最终的、高度变革性的目的来考虑:培训 Llama。参见 Authors Guild v. Google, Inc. (Google Books), 804 F.3d 202, 216-18 (2d Cir. 2015)(根据第二用户创建可搜索数据库的总体目的来考虑创建书籍的数字副本);参见 Warhol, 598 U.S. at 533(指出不同用途必须分开考虑,但 "同一复制用于一个目的时可能是公平的,但用于另一个目的时则不是");cont Order on Fair Use at 18, Bartz, No.由于 Meta 对原告书籍的最终使用具有转换性,因此 Meta 下载这些书籍的行为也具有转换性。
The plaintiffs also assert that Meta downloaded multiple copies of the databases containing their books, and that only some of these copies were used for LLM training, so the downloading of the ones that were not used for training cannot be fair use. But all of the
原告还声称,Meta 下载了包含其书籍的数据库的多个副本,其中只有部分副本用于法律硕士培训,因此下载未用于培训的副本不能算作合理使用。但所有
downloads the plaintiffs identify had the ultimate purpose of LLM training. The plaintiffs say that Meta only used its initial October 2022 download of LibGen to see whether the books in the database made for good training data. Pls. Reply at 12. This is a reasonable first step towards training an LLM. See Pls. MSJ Ex. 32 at 3. The plaintiffs say that Meta cross-referenced its next download of LibGen and its first download of Anna’s Archive with publisher catalogues to see whether it was still worth pursuing licensing (or whether all the books available for licensing were already included in those databases). But the plaintiffs concede that these downloads were also used as training data. See Pls. Reply at 13-14. And there is no indication that comparing the books in those databases to the books in another entailed any additional copying. So that crossreferencing alone cannot create infringement liability and does not need to separately constitute fair use. Cf. Warhol, 598 U.S. at 534 & n. 10 (discussing application of fair use test to different uses).
原告指出,这些下载的最终目的是进行 LLM 培训。原告称,Meta 在 2022 年 10 月首次下载 LibGen 时,只是为了查看数据库中的书籍是否可以作为良好的培训数据。Pls.Reply at 12.这是培训法律硕士的合理的第一步。原告称,Meta 将其下一次下载的 LibGen 和第一次下载的 Anna's Archive 与出版商目录进行交叉比对,以确定是否仍值得寻求许可(或是否所有可供许可的书籍都已包含在这些数据库中)。但原告承认这些下载也被用作训练数据。见 Pls.见 Pls.而且没有迹象表明,将这些数据库中的书籍与另一个数据库中的书籍进行比较需要额外的复制。因此,交叉引用本身并不构成侵权责任,也无需单独构成合理使用。参见 Warhol, 598 U.S. at 534 & n.10(讨论对不同用途适用合理使用测试)。
Finally, the plaintiffs say that after Meta abandoned licensing and decided to use the books it downloaded from shadow libraries as training data, it also downloaded several other “copies of pirated datasets, only some of which ever made it into an LLM for training.” But they provide no evidence that this was actually the case. They point only to deposition testimony in which a Meta employee said she didn’t know whether Meta used every LibGen copy it downloaded for training. Pls. Reply Ex. 109 at 66:17-20. Two other Meta AI employees, meanwhile, said that they weren’t aware of any downloads that weren’t used as training data or for related efforts like the experiments mentioned above. Pineau Decl. ISO Meta Reply 9 9 9\mathbf{9} 6; Kambadur Decl. ISO Meta Reply 9 7 . 9 9 7 . 9 97.^(9)\mathbf{9} 7 .{ }^{9} In any event, even if Meta did download some copies that weren’t ultimately used for training, fair use doesn’t require that the secondary user make the lowest number of copies possible. Cf. Sony Computer Entertainment, Inc. v. Connectix Corp., 203 F.3d 596, 601, 605 (9th Cir. 2000).
最后,原告称,在 Meta 放弃许可并决定使用从影子图书馆下载的书籍作为训练数据后,它还下载了其他几份 "盗版数据集的副本,其中只有一部分曾进入 LLM 进行训练"。但他们没有提供证据证明实际情况确实如此。他们仅指出,在证词中,一名 Meta 雇员说她不知道 Meta 是否将下载的每一份 LibGen 副本都用于培训。Pls.Pls.109 at 66:17-20。与此同时,另外两名 Meta AI 员工表示,他们不知道有任何下载未被用作训练数据或用于上述实验等相关工作。Pineau Decl. ISO Meta 答复 9 9 9\mathbf{9} 6;Kambadur Decl. ISO Meta 答复 9 7 . 9 9 7 . 9 97.^(9)\mathbf{9} 7 .{ }^{9} 无论如何,即使 Meta 确实下载了一些最终未用于培训的副本,合理使用也并不要求第二用户尽可能少地复制副本。参见 Sony Computer Entertainment, Inc. 诉 Connectix Corp., 203 F.3d 596, 601, 605 (9th Cir. 2000)。

IV. FACTOR TWO: THE NATURE OF THE COPYRIGHTED WORK
IV.因素之二:版权作品的性质版权作品的性质

The second factor recognizes that “some works are closer to the core of intended copyright protection than others, with the consequence that fair use is more difficult to establish when the former works are copied.” Campbell, 510 U.S. at 586. Works receiving greater copyright protection include creative ones like books and movies; works receiving lesser protection include computer code. Oracle, 593 U.S. at 29.
第二个因素承认,"有些作品比其他作品更接近版权保护的核心,因此在复制前者的作品时,合理使用更难成立"。Campbell, 510 U.S. at 586。受版权保护较多的作品包括书籍和电影等创意作品;受保护较少的作品包括计算机代码。甲骨文,593 U.S. at 29。
This factor favors the plaintiffs. Their books-mostly novels, memoirs, and plays-are highly expressive works “of the type that the copyright laws value and seek to protect.” Hachette, 115 F.4th at 187 (quoting Authors Guild, Inc. v. HathiTrust, 755 F.3d 87, 98 (2d Cir. 2014)). That some of their works may be factual (like an autobiography) as opposed to fictional does not meaningfully change this conclusion, because copyright still protects an author’s “manner of expressing” facts. Google Books, 804 F.3d at 220.
这一因素对原告有利。他们的书--主要是小说、回忆录和戏剧--是 "版权法所重视并寻求保护的类型 "的高度表现性作品。Hachette, 115 F.4th at 187(引用 Authors Guild, Inc.他们的一些作品可能是事实性的(如自传),而不是虚构的,但这并不能有效地改变这一结论,因为版权仍然保护作者 "表达 "事实的 "方式"。Google Books, 804 F.3d at 220。
Meta argues that this factor favors it anyway because Meta only used the plaintiffs’ books to gain access to their “functional elements,” not to capitalize on their creative expression. Meta primarily relies on two Ninth Circuit cases involving “intermediate copying.” In both of those cases, a video game company copied a video game console manufacturer’s copyrighted code and reverse-engineered it to understand certain functional elements of that code. This allowed the game companies to build their own products that would work with the plaintiffs’. In each case, the Ninth Circuit held that the defendant’s fair use defense would likely succeed because, although the defendants copied expressive elements of the plaintiffs’ code, they only did so to access the code’s unprotected, functional elements. See Sega Enterprises Ltd. v. Accolade, Inc., 977 F.2d 1510, 1520-26 (9th Cir. 1992); Connectix, 203 F.3d at 602.
Meta 辩称,无论如何这一因素对其有利,因为 Meta 使用原告的书籍只是为了获得其 "功能要素",而不是为了利用其创造性表达。Meta 公司主要依据的是涉及 "中间复制 "的两个第九巡回法院案例。在这两个案件中,一家电子游戏公司复制了一家电子游戏机制造商受版权保护的代码,并对其进行了逆向工程,以了解该代码的某些功能元素。这样,游戏公司就可以制造出自己的产品,与原告的产品配合使用。在每个案件中,第九巡回法院都认为被告的合理使用抗辩很可能会成功,因为尽管被告复制了原告代码中的表现性元素,但他们这样做只是为了获取代码中不受保护的功能性元素。参见 Sega Enterprises Ltd. v. Accolade, Inc., 977 F.2d 1510, 1520-26 (9th Cir. 1992);Connectix, 203 F.3d at 602。
But unlike the uses in those cases, Meta’s use of the plaintiffs’ books does depend on the books’ creative expression. As Meta itself notes, LLMs are trained through learning about “statistical relationships between words and concepts” and collecting “statistical data regarding word order, frequencies [what words are used and how often], grammar, and syntax.” Word order, word choice, grammar, and syntax are how people express their ideas. See Harper & Row,
但与这些案件不同的是,Meta 对原告书籍的使用确实取决于书籍的创造性表达。正如 Meta 公司自己所指出的,法律硕士是通过学习 "词语和概念之间的统计关系 "以及收集 "关于词序、词频(使用了哪些词语以及使用频率)、语法和句法的统计数据 "而接受培训的。语序、选词、语法和句法是人们表达思想的方式。参见 Harper & Row、
471 U.S. at 548 (discussing how “ordering and choice of words” are expression under even a narrow interpretation of what counts as expression). So even though LLMs may only learn about “statistical relationships,” those relationships are the product of creative expression. This is true even though, as discussed earlier, Llama consumes that expression in a different way than a person would.
471 U.S.,第 548 页(讨论了 "词语的排序和选择 "如何在即使是狭义的表达解释下也是表达)。因此,即使法学硕士可能只了解 "统计关系",这些关系也是创造性表达的产物。正如前面所讨论的,即使 Llama 使用的表达方式与人不同,情况也是如此。
To support its argument that it copied the plaintiffs’ books to extract non-expressive information (such that the intermediate copying cases should apply), Meta cites Google Books. But that case is distinguishable. There, the plaintiffs were authors who alleged that Google committed copyright infringement by making digital copies of their books and creating a database that users could search to see what books in the database contained the search terms. 804 F.3d at 207-10. Unlike here, the technology at issue in Google Books was content agnostic: The database wouldn’t work any better or worse if it contained books full of complete gibberish or written in unknown languages. If someone searched for that text, those books would appear. Here, by contrast, if Meta’s LLMs are to generate high-quality text, they need coherent, reasonably high-quality training data. In other words, they need high-quality expression. Therefore, the “intermediate copying” cases don’t apply. See Disney Enterprises, Inc. v. VidAngel, Inc., 869 F.3d 848, 862 n. 12 (9th Cir. 2017).
为了支持自己的论点,即复制原告的书籍是为了提取非表达性信息(因此应适用中间复制案例),Meta 引用了 Google Books 一案。但该案与之不同。在该案中,原告是作者,他们指控谷歌侵犯了他们的版权,因为谷歌制作了他们书籍的数字拷贝,并创建了一个数据库,用户可以通过搜索查看数据库中包含搜索关键词的书籍。804 F.3d at 207-10。与本案不同的是,谷歌图书案中涉及的技术与内容无关:如果数据库中的书籍完全是胡言乱语或以未知语言书写,那么数据库的效果也不会更好或更差。如果有人搜索这些文字,这些书就会出现。相比之下,如果 Meta 的 LLM 要生成高质量的文本,就需要连贯的、相当高质量的训练数据。换句话说,它们需要高质量的表达。因此,"中间复制 "案例并不适用。参见 Disney Enterprises, Inc. v. VidAngel, Inc., 869 F.3d 848, 862 n. 12 (9th Cir. 2017)。
The second factor, however, “has rarely played a significant role in the determination of a fair use dispute.” Google Books, 804 F.3d at 220. And it applies “with less force” when the copied works have already been published and the secondary user therefore cannot interfere with the creator’s right to control the first public appearance of their work. VHT, Inc. v. Zillow Group, Inc., 918 F.3d 723, 744 (9th Cir. 2019) (quoting Kelly, 336 F.3d at 820). So the fact that the second factor favors the plaintiffs doesn’t mean much for the analysis as a whole.
然而,第二个因素 "很少在合理使用争议的裁决中发挥重要作用"。Google Books, 804 F.3d at 220。当被复制的作品已经出版,二级用户因此不能干涉创作者控制其作品首次公开亮相的权利时,第二个因素的适用 "力度较小"。VHT, Inc. v. Zillow Group, Inc., 918 F.3d 723, 744 (9th Cir. 2019)(引用 Kelly, 336 F.3d at 820)。因此,第二个因素对原告有利的事实对整个分析并不意味着什么。

V. FACTOR THREE: THE AMOUNT AND SUBSTANTIALITY OF THE PORTION USED IN RELATION TO THE COPYRIGHTED WORK AS A WHOLE
V.因素三:与整个版权作品相比,所使用部分的数量和实质程度

This factor “asks whether 'the amount and substantiality of the portion used” are “reasonable in relation to the purpose of the copying.” Campbell, 510 U.S. at 586 (quoting 17
这一因素 "询问'使用部分的数量和实质性'"是否 "与复制目的相关"。Campbell, 510 U.S. at 586(引用 17

U.S.C. § 107(3)). This factor is therefore related to the first, because “the extent of permissible copying varies with the purpose and character of the use.” Id. at 586-87.
U.S.C. § 107(3))。因此,这一因素与第一个因素相关,因为 "允许复制的程度因使用的目的和性质而异"。同上,第 586-87 页。
As an initial matter, the amount copied doesn’t seem especially relevant in this case. In a case involving, for instance, a musical parody, copying large portions of the original song might increase the parody’s “potential for market substitution.” See id. at 589. But given that Meta’s LLMs won’t output any meaningful amount of the plaintiffs’ books, it’s not clear how or why Meta’s copying would be less likely to lead to the creation of direct substitutes for the books if Meta had copied less of them. Cf. Hachette, 115 F.4th at 188-89 (“The relevant consideration . . . is not the amount of copyrighted material used by the copier, but 'the amount of copyrighted material made available to the public.” (quoting Fox News Network, LLC v. TVEyes, 883 F.3d 169, 179 (2d Cir. 2018))).
首先,复制的数量在本案中似乎并不特别相关。例如,在涉及音乐模仿的案件中,大量复制原歌曲可能会增加模仿作品的 "市场替代潜力"。参见同上,第 589 页。但鉴于 Meta 的 LLM 不会产出任何有意义的原告书籍数量,如果 Meta 复制的数量较少,那么如何或为什么 Meta 的复制会降低直接替代原告书籍的可能性,这一点并不清楚。参见 Hachette, 115 F.4th at 188-89("相关的考虑因素......不是复制者使用了多少受版权保护的材料,而是'向公众提供了多少受版权保护的材料'"(引用 Fox News Network, 115 F.4th at 188-89)。(引用 Fox News Network, LLC v. TVEyes, 883 F.3d 169, 179 (2d Cir. 2018))。
In any event, this factor favors Meta, even though it copied the plaintiffs’ books in their entirety. The amount that Meta copied was reasonable given its relationship to Meta’s transformative purpose. See Oracle, 593 U.S. at 34. Everyone agrees that LLMs work better if they are trained on more high-quality material. See Ungar Decl. ISO Meta MSJ IIT 42-48; Pls. Reply Ex. 115 IIT 79-80. So feeding a whole book to an LLM does more to train it than would feeding it only half of that book. With this in mind, it was “reasonably necessary” for Meta to “make use of the entirety of the works.” HathiTrust, 755 F.3d at 98 . 10 98 . 10 98.^(10)98 .{ }^{10}
无论如何,这一因素对 Meta 有利,尽管它复制了原告的全部书籍。考虑到其与 Meta 变更目的之间的关系,Meta 复制的数量是合理的。参见 Oracle, 593 U.S. at 34。每个人都同意,如果法律硕士接受更多高质量材料的培训,他们的工作会更好。见 Ungar Decl. ISO Meta MSJ IIT 42-48;Pls.Reply Ex.115 IIT 79-80。因此,向法律硕士灌输整本书的内容比只灌输半本书的内容更能训练法律硕士。有鉴于此,Meta 公司 "使用全部作品 "是 "合理必要的"。HathiTrust, 755 F.3d at 98 . 10 98 . 10 98.^(10)98 .{ }^{10} .

VI. FACTOR FOUR: THE EFFECT OF THE USE UPON THE POTENTIAL MARKET FOR OR VALUE OF THE COPYRIGHTED WORK
VI.因素四:使用对版权作品潜在市场或价值的影响

This factor looks to both the “extent of market harm caused by the particular actions of the alleged infringer” and to “whether unrestricted and widespread conduct of the sort engaged in by the defendant . . . would result in a substantially adverse impact on the potential market’ for the original.” Campbell, 510 U.S. at 590 (quoting 3 M. Nimmer & D. Nimmer, Nimmer on Copyright § 13.05 (1993)). The “only harm” relevant to this factor "is the harm of market
这一因素既要考虑 "被控侵权人的特定行为对市场造成的损害程度",也要考虑 "被告所从事的那种不受限制的广泛行为......是否会对原作的潜在市场造成实质性的不利影响"。Campbell, 510 U.S. at 590(引用 3 M. Nimmer & D. Nimmer, Nimmer on Copyright § 13.05 (1993))。与此因素相关的 "唯一损害 "是 "市场损害"。
substitution." Id. at 593. When, by contrast, the secondary work kills demand for the first through criticism or parody, the harm is not cognizable under the Copyright Act. Id. at 591-92. Also relevant to this factor are “the public benefits the copying will likely produce.” Oracle, 593 U.S. at 35.
替代"。同上,第 593 页。相反,如果第二部作品通过批评或戏仿扼杀了对第一部作品的需求,那么根据《版权法》,这种损害是不可认定的。同上,第 591-92 页。与此因素相关的还有 "复制可能产生的公共利益"。甲骨文,593 U.S. at 35。
As noted previously, the fourth factor is “undoubtedly the single most important element of fair use.” Harper & Row, 471 U.S. at 566. Meta is therefore wrong to suggest that, because the first factor strongly favors it, the inquiry should basically end there. To the contrary, given the fourth factor’s importance, it’s easy to imagine a situation in which a secondary use is highly transformative but the secondary user nonetheless loses on fair use because allowing people to engage in that kind of use would have too great an effect on the market for the original work. But by the same token, in a case where the first factor cuts strongly in favor of the defendant, generally the plaintiff’s only chance to defeat fair use will be to win decisively on factor four.
如前所述,第四个因素 "无疑是合理使用的最重要因素"。Harper & Row, 471 U.S. at 566。因此,Meta 公司错误地认为,由于第一个因素对其非常有利,因此调查基本上应该就此结束。恰恰相反,考虑到第四个因素的重要性,我们很容易想象这样一种情况:二次使用具有很强的转换性,但二次使用人还是输掉了合理使用,因为允许人们进行这种使用会对原作品的市场产生太大的影响。但同样,在第一个因素对被告有利的情况下,原告要想击败合理使用,通常只能在第四个因素上取得决定性胜利。
In a case involving the use of copyrighted works to train generative AI models, there are at least three ways a plaintiff might try to argue that the defendant’s copying harmed the market for the works (or that the market would be harmed if that copying were widespread). First, the plaintiff might claim that the model will regurgitate their works (or outputs that are substantially similar), thereby allowing users to access those works or substitutes for them for free via the model. Second, the plaintiff might point to the market for licensing their works for AI training and contend that unauthorized copying for training harms that market (or precludes the development of that market). Third, the plaintiff might argue that, even if the model can’t regurgitate their own works or generate substantially similar ones, it can generate works that are similar enough (in subject matter or genre) that they will compete with the originals and thereby indirectly substitute for them. In this case, the first two arguments fail. The third argument is far more promising, but the plaintiffs’ presentation is so weak that it does not move the needle, or even raise a dispute of fact sufficient to defeat summary judgment.
在涉及使用受版权保护的作品训练人工智能生成模型的案件中,原告至少可以通过三种方式来证明被告的复制行为损害了作品市场(或者说,如果复制行为普遍存在,市场就会受到损害)。首先,原告可能会声称,该模型会重复他们的作品(或实质上相似的产出),从而允许用户通过该模型免费获取这些作品或其替代品。其次,原告可能会指出授权其作品用于人工智能训练的市场,并认为未经授权的训练用复制损害了该市场(或阻碍了该市场的发展)。第三,原告可能会辩称,即使模型无法复制他们自己的作品或生成实质上相似的作品,但它可以生成足够相似(在主题或类型上)的作品,从而与原创作品竞争,进而间接替代原创作品。在这种情况下,前两个论点都不成立。第三个论点更有希望,但原告的陈述非常薄弱,根本无法打动人,甚至无法提出足以推翻即决判决的事实争议。
A
If Llama could be used to generate significant portions of the plaintiffs’ books - or text so
如果 Llama 可以用来生成原告书籍中的大部分内容--或文本,那么

similar to their books as to be infringing in its own right-that would threaten the market for the books because people would read those outputs instead. But that theory of harm is not viable in this particular case because, as discussed above, Llama does not allow users to generate any meaningful portion of the plaintiffs’ books. Neither party’s expert opined that Llama was able to regurgitate more than 50 words from any of the plaintiffs’ books, even in response to “adversarial” prompting designed specifically to make LLMs regurgitate. See Pls. Ex. 79 9/7 7172, 82-84, 92. And the plaintiffs’ expert conceded that Llama would not generate “any significant percentage” of their books. Meta MSJ Ex. 24 at 237:16-19. In Google Books, by way of comparison, the Second Circuit held that the secondary use did “not threaten the rights holders with any significant harm to the value of their copyrights or diminish their harvest of copyright revenue” despite allowing users to see snippets adding up to as much as 16 % 16 % 16%16 \% of a book. 11 804 11 804 ^(11)804{ }^{11} 804 F.3d at 224. Llama’s ability to regurgitate miniscule portions of the plaintiffs’ books if manipulated into doing so does not threaten to have a “meaningful or significant effect 'upon the potential market for or value of”" the plaintiffs’ books. Id. (quoting 17 U.S.C. § 107(4)).
这将威胁到原告图书的市场,因为人们会转而阅读这些作品。但这种损害理论在本案中并不可行,因为如上所述,Llama 并不允许用户生成原告书籍中任何有意义的部分。双方的专家都认为 Llama 无法从原告的任何书籍中反刍出超过 50 个单词,即使是在回应专门为让 LLM 反刍而设计的 "对抗性 "提示时也是如此。见 Pls.Ex.79 9/7 7172、82-84、92。原告专家承认,Llama 不会产生 "任何重大比例 "的书籍。Meta MSJ Ex. 24 at 237:16-19。相比之下,在谷歌图书案中,第二巡回法院认为,尽管用户可以看到多达 16 % 16 % 16%16 \% 本书的片段,但二次使用 "并未对权利人的版权价值造成任何重大损害,也未减少他们的版权收入"。 11 804 11 804 ^(11)804{ }^{11} 804 F.3d 页,第 224 段。Llama 在被操纵的情况下转录原告书籍中极小部分内容的能力并不会对原告书籍的 "潜在市场或价值 "产生 "有意义或重大的影响"。同上。(引用 17 U.S.C. § 107(4))。

B

The plaintiffs’ primary theory of market harm is that Meta’s unauthorized use of their books for LLM training harms the market for licensing their books for that purpose. The plaintiffs devote nearly all of their discussion of the fourth factor to this theory. The parties therefore go back and forth at length about whether a market for licensing general trade books exists or is likely to develop.
原告关于市场损害的主要理论是,Meta 在未经授权的情况下将他们的书籍用于法律硕士培训,损害了为此目的许可其书籍的市场。原告对第四个因素的讨论几乎全部集中在这一理论上。因此,双方就是否存在或可能发展普通贸易图书许可市场的问题进行了长时间的争论。
But whether such a market exists or is likely to develop is irrelevant, because this market is not one that the plaintiffs are legally entitled to monopolize. In every fair use case, the
但是,这样的市场是否存在或是否有可能发展并不重要,因为这个市场并不是原告在法律上有权垄断的市场。在每一起合理使用案件中

11 11 ^(11){ }^{11} As Google Books noted, it isn’t the case that allowing someone to see 16 % 16 % 16%16 \% of a book could never threaten substantial harm. The tool there allowed users to see snippets that “were usually not sequential but scattered randomly throughout the book.” 804 F.3d at 222. If it “could be used to reveal a coherent block amounting to 16 % 16 % 16%16 \% of a book, that would raise a very different question.” Id. at 223. A portion of a work that is small in quantitative terms may also still be significant if it is “the heart of” the work or otherwise qualitatively important. Cf. Harper & Row, 471 U.S. at 565 (quoting Harper & Row, Publishers, Inc. v. Nation Enterprises, 557 F. Supp. 1067, 1072 (S.D.N.Y. 1983)).
11 11 ^(11){ }^{11} 正如谷歌图书所指出的,允许他人查看一本书的 16 % 16 % 16%16 \% 并不会造成实质性伤害。该工具允许用户查看 "通常不是连续的,而是随机散布在整本书中 "的片段。804 F.3d at 222。如果该工具 "可用于揭示相当于一本书的 16 % 16 % 16%16 \% 的连贯片段,那就会产生一个非常不同的问题"。同上,第 223 页。如果作品的一部分在数量上很小,但它是作品的 "核心 "或在质量上很重要,那么这一部分仍然可能很重要。参见 Harper & Row, 471 U.S. at 565 (引用 Harper & Row, Publishers, Inc. v. Nation Enterprises, 557 F. Supp. 1067, 1072 (S.D.N.Y. 1983))。

“plaintiff suffers a loss of a potential market if that potential [market] is defined as the theoretical market for licensing” the use at issue in the case. Tresóna Multimedia, LLC v. Burbank High School Vocal Music Association, 953 F.3d 638, 652 (9th Cir. 2020) (emphasis omitted) (quoting 4 Melville B. Nimmer & David Nimmer, Nimmer on Copyright § 13.05 (2019)). Therefore, to prevent the fourth factor analysis from becoming circular and favoring the rightsholder in every case, harm from the loss of fees paid to license a work for a transformative purpose is not cognizable. Id.; Bill Graham Archives v. Dorling Kindersley Ltd., 448 F.3d 605, 614-15 (2d Cir. 2006); see also Oracle, 593 U.S. at 38 (“cautioning against the 'danger of circularity”’ (quoting 4 Nimmer § 13.05)).
如果潜在[市场]被定义为许可使用的理论市场",则 "原告遭受了潜在市场的损失"。Tresóna Multimedia, LLC v. Burbank High School Vocal Music Association, 953 F.3d 638, 652 (9th Cir. 2020) (emphasis omitted) (quoting 4 Melville B. Nimmer & David Nimmer, Nimmer on Copyright § 13.05 (2019))。因此,为了防止第四个因素的分析成为循环论证,并在每个案件中都有利于权利人,因许可作品用于转化目的而支付的费用损失所造成的损害是不可认知的。同上;Bill Graham Archives v. Dorling Kindersley Ltd., 448 F.3d 605, 614-15 (2d Cir. 2006); see also Oracle, 593 U.S. at 38 ("cautioning against the 'danger of circularity"' (quoting 4 Nimmer § 13.05)).

C

The third way that using copyrighted books to train an LLM might harm the market for those works is by helping to enable the rapid generation of countless works that compete with the originals, even if those works aren’t themselves infringing. Assume for this discussion that people can (or will soon be able to) use LLMs to generate massive amounts of text in significantly less time than it would take to write that text, and using a fraction of the creativity. People could thus use LLMs to create books and then sell them, competing with books written by human authors for sales and attention. Indeed, to some extent, this appears to be occurring already—one expert for the plaintiffs briefly discusses reports of AI-generated books “flooding Amazon.” Pls. MSJ Ex. 76 【 199; see id. II 193-207. People might even be motivated to make those books available for free, given how easily it will presumably be to prompt an LLM to create them. Harm from this form of competition is the harm of market dilution. Or as one commentator describes it, the harm of “indirect” substitution, rather than “direct” substitution (which would be the first form of harm described). See Matthew Sag, Fairness and Fair Use in Generative AI, 92 Fordham L. Rev. 1887, 1916-20 (2024).
使用受版权保护的书籍培养法学硕士可能损害这些作品市场的第三种方式是,帮助快速生成无数与原作竞争的作品,即使这些作品本身并不侵权。在本次讨论中,我们假设人们可以(或很快就能)使用 LLM 生成大量文本,所需的时间大大少于撰写这些文本所需的时间,所使用的创造力也只是撰写这些文本的一小部分。因此,人们可以利用 LLM 生成书籍,然后出售,与人类作者撰写的书籍争夺销量和关注度。事实上,在某种程度上,这种情况似乎已经出现--原告方专家简要讨论了人工智能生成的书籍 "充斥亚马逊 "的报告。Pls. MSJ Ex. 76 【199;see id.II 193-207。人们甚至可能有动力免费提供这些书籍,因为促使法学硕士创建这些书籍大概是一件很容易的事情。这种形式的竞争所造成的危害就是市场稀释。或如一位评论家所说,是 "间接 "替代的损害,而不是 "直接 "替代(这是第一种损害形式)。见 Matthew Sag, Fairness and Fair Use in Generative AI, 92 Fordham L. Rev. 1887, 1916-20 (2024).
Of course, not all copyrighted works would have their markets diluted equally by AIgenerated competitors. It seems unlikely, for instance, that AI-generated books would meaningfully siphon sales away from well-known authors who sell books to people looking for
当然,并非所有版权作品的市场都会被人工智能生成的竞争者同样稀释。例如,人工智能生成的书籍似乎不太可能从知名作家那里虹吸走销量,因为这些作家的书籍是卖给那些寻找

books by those particular authors. But it’s easy to imagine that AI-generated books could successfully crowd out lesser-known works or works by up-and-coming authors. While AIgenerated books probably wouldn’t have much of an effect on the market for the works of Agatha Christie, they could very well prevent the next Agatha Christie from getting noticed or selling enough books to keep writing. 12 12 ^(12){ }^{12}
这些特定作者的书籍。但不难想象,人工智能生成的图书可能会成功地排挤掉知名度较低的作品或新锐作家的作品。虽然人工智能生成的图书可能不会对阿加莎-克里斯蒂作品的市场产生太大影响,但它们很可能会阻止下一个阿加莎-克里斯蒂受到关注或卖出足够多的书来继续写作。 12 12 ^(12){ }^{12}
This effect also seems likely to be more pronounced with respect to certain types of works. For instance, an AI model that can generate high-quality images at will might be expected to greatly affect the market for such images, diminishing the incentive for humans to create them. An LLM that could generate accurate information about current events might be expected to greatly harm the print news market. The market for certain nonfiction works-for example, books about how to take care of your garden-could be greatly diminished by the ability of LLMs to produce books on that topic. For fiction works, it might be more dependent on the author or the genre in which that author operates.
对于某些类型的作品,这种影响似乎也可能更为明显。例如,一个能够随意生成高质量图像的人工智能模型可能会极大地影响此类图像的市场,从而削弱人类创作此类图像的动力。一个能够生成准确时事信息的 LLM 可能会极大地损害印刷新闻市场。某些非虚构类作品的市场--例如,关于如何打理花园的书籍--可能会因为 LLM 能够制作该主题的书籍而大打折扣。对于小说作品来说,这可能更多地取决于作者或作者所从事的体裁。
The difference might be in part because some works are relatively functional and generally less dependent on the author’s creativity. When picking a news article, readers want something that will tell them about a current (or past) event clearly, accurately, and concisely. When picking a novel, by contrast, readers may care about a much longer list of characteristics. They may care, for instance, about tone, thematic depth, writing style, plot, or characters; they may want a book that contains a number of plot twists or depicts a certain type of character development. These elements of a novel depend greatly on the creativity of the author. While a news article is also a product of its author’s creativity (especially with respect to things like
造成这种差异的部分原因可能是有些作品相对实用,一般不太依赖作者的创造力。在挑选新闻文章时,读者希望文章能清楚、准确、简洁地告诉他们当前(或过去)发生的事件。相比之下,在挑选小说时,读者关心的特点可能要多得多。例如,他们可能关心语气、主题深度、写作风格、情节或人物;他们可能希望一本书包含许多情节转折或描写某种类型的人物发展。小说的这些要素在很大程度上取决于作者的创造力。虽然新闻文章也是作者创造力的产物(尤其是在以下方面
structure and diction), there are many more creative choices in the average novel than the average news article, and those creative choices are more important to the average novel’s quality. Relatedly, one could imagine people caring more about whether a novel is AI-generated (as opposed to the product of human creativity) than whether a news article is AI-generated. 13 13 ^(13){ }^{13}
与新闻文章相比,普通小说中的创造性选择要多得多,而这些创造性选择对普通小说的质量更为重要。与此相关,我们可以想象,人们更关心的是一部小说是否是人工智能生成的(而不是人类创造力的产物),而不是一篇新闻报道是否是人工智能生成的。 13 13 ^(13){ }^{13}
It also should be noted that, when considering market dilution, the proper comparison isn’t to a world with no LLMs, but to a world where LLMs weren’t trained on copyrighted books. Perhaps an LLM trained only on public domain works could still be capable of quickly generating large numbers of books that could compete for sales with copyrighted books. But there is plenty of evidence in the record that training on books substantially benefits LLMs’ creativity and ability to generate long pieces of text. E.g., Pls. MSJ Ex. 25 at 2; id. Ex. 27 【 183. And because LLMs perform better the more text they are trained on, an LLM trained only on public domain books would presumably, all else equal, lag significantly behind a book trained also on copyrighted ones. See Ungar Decl. ISO Meta MSJ © 45. So training an LLM on copyrighted books would seem, in most circumstances, to make that LLM better able to generate works that could dilute the market for the books in its training data.
还应该指出的是,在考虑市场稀释问题时,正确的比较对象不是没有法学硕士的世界,而是法学硕士没有接受过版权图书培训的世界。也许,只接受公有领域作品培训的法学硕士仍然能够迅速出版大量书籍,与受版权保护的书籍争夺销量。但记录中有大量证据表明,图书培训大大提高了法学硕士的创造力和撰写长篇文字的能力。例如,Pls.Ex. 27 【 183.而且,由于 LLM 的性能越好,它们接受训练的文本就越多,因此,在其他条件相同的情况下,只接受公有领域书籍训练的 LLM 大概会大大落后于接受版权书籍训练的 LLM。见 Ungar Decl.因此,在大多数情况下,用受版权保护的图书来训练 LLM 似乎会使 LLM 更有能力生成作品,从而稀释其训练数据中图书的市场。
Meta and its law professor amici, as well as the Matthew Sag article cited above, argue that market dilution does not count under the fourth factor. They argue that harm caused by an LLM’s outputs is only relevant if the outputs are themselves infringing-that is, if the LLM regurgitates copyrighted material (or generates text that is substantially similar to copyrighted material). See May 1 Hr’g Tr. at 22:7-24:21; 108-09; Amicus Br. of Intellectual Property Law Professors at 9-10; see also Sag, 92 Fordham L. Rev. at 1919-20. But that can’t be right. To be sure, it would be easier to conclude that the market for copied books would be harmed by an LLM that is capable of regurgitating those books or generating substantially similar text. But less
Meta 及其法律教授之友,以及上文引用的 Matthew Sag 的文章都认为,市场稀释不属于第四个因素。他们认为,只有当法律硕士的成果本身是侵权的,即法律硕士转述受版权保护的材料(或生成与受版权保护的材料实质上相似的文本)时,法律硕士的成果所造成的损害才是相关的。见 May 1 Hr'g Tr. at 22:7-24:21;108-09;Amicus Br. of Intellectual Property Law Professors at 9-10;另见 Sag, 92 Fordham L. Rev. at 1919-20。但这是不对的。可以肯定的是,更容易得出结论认为,能够重复这些书籍或产生实质上相似文本的法律硕士会损害复制书籍的市场。但是
similar outputs, such as books on the same topics or in the same genres, can still compete for sales with the books in the training data. And by taking sales from those books, or by flooding stores and online marketplaces so that some of those books don’t get noticed and purchased, those outputs would reduce the incentive for authors to create-the harm that copyright aims to prevent.
类似的产出,如相同主题或相同类型的书籍,仍然可以与训练数据中的书籍争夺销量。通过抢夺这些图书的销售额,或通过充斥商店和在线市场,使其中一些图书无法被关注和购买,这些产出将降低作者创作的积极性,而这正是版权所要防止的危害。
The Supreme Court has said that the “only harm” that matters under the fourth factor “is the harm of market substitution.” Campbell, 510 U.S. at 593. But indirect substitution is still substitution: If someone bought a romance novel written by an LLM instead of a romance novel written by a human author, the LLM-generated novel is substituting for the human-written one. This is different from the (non-cognizable) harm caused by criticism or commentary, which can harm demand for an original work without serving as a replacement for it.
最高法院指出,在第四个因素下,"唯一重要的损害 "是 "市场替代的损害"。坎贝尔案,510 U.S. at 593。但间接替代仍然是替代:如果有人购买了一部由法律硕士撰写的爱情小说,而不是人类作者撰写的爱情小说,那么法律硕士撰写的小说就替代了人类作者撰写的小说。这与批评或评论造成的(不可识别的)损害不同,批评或评论可能会损害对原创作品的需求,而不会成为原创作品的替代品。
Relatedly, Meta argues that “legitimate” competition from noninfringing secondary works is not cognizable under the fourth factor. It cites the intermediate copying cases for this proposition. See Sega, 977 F.2d at 1523-24; Connectix, 203 F.3d at 607. But key to those cases’ reasoning was the fact that the secondary users’ competing products did not benefit from the creative expression in the works they copied. By contrast, as discussed, LLMs are better able to generate text (including competing works) because they are trained on the creative expression in copyrighted books. So this competition is not “legitimate” within the meaning of those cases.
与此相关的是,Meta 公司认为,根据第四个因素,来自非侵权二次作品的 "合法 "竞争是不可接受的。它援引中间复制案例来证明这一观点。参见 Sega, 977 F.2d at 1523-24;Connectix, 203 F.3d at 607。但这些案件推理的关键在于,二次使用者的竞争产品并没有从他们复制的作品中的创造性表达中获益。相比之下,如前所述,法律硕士更有能力生成文本(包括竞争作品),因为他们接受过版权书籍中创造性表达的培训。因此,这种竞争不属于上述案例所指的 "合法 "竞争。
It’s true that, in many copyright cases, this concept of market dilution or indirect substitution is not particularly important. That’s because, in a more typical case, an original work is being compared to a single secondary work. If the secondary work is somewhat similar, but not so similar as to effectively be a copy, it still might have a small indirect effect on the market for the original work. But that likely won’t matter. Recall that the fourth factor looks to whether “conduct of the sort engaged in by the defendant” would have a “substantially adverse impact on the potential market for the original.” Campbell, 510 U.S. at 590 (emphasis added) (quoting 3 Nimmer § 13.05). The existence of some harm from indirect substitution isn’t dispositive of the fourth factor or the fair use inquiry. Where, for instance, the first factor cuts in favor of the
的确,在许多版权案件中,市场稀释或间接替代的概念并不特别重要。这是因为,在更典型的案例中,原创作品被拿来与单一的二次创作作品进行比较。如果次要作品有些相似,但还没有相似到实际上是复制品的程度,那么它仍然可能会对原作品的市场产生微小的间接影响。但这可能并不重要。回顾第四个因素,"被告所从事的行为 "是否会 "对原作的潜在市场产生重大不利影响"。Campbell, 510 U.S. at 590(着重号后加)(引用 3 Nimmer § 13.05)。间接替代造成的某种损害并不是第四个因素或合理使用调查的决定性因素。例如,如果第一个因素有利于

secondary user, the law might tolerate a little bit of competition. See Google Books, 804 F.3d at 224. In cases involving a single secondary work that’s similar-but-not-too-similar, it’s unlikely that harm from market dilution would be significant enough to matter. Even considering the effect of “widespread conduct of the sort engaged in by the defendant,” Oracle, 593 U.S. at 38 (quoting 4 Nimmer § 13.05), creating one indirectly substitutional work at a time could only have so great an effect on the market for the original.
对于二级用户,法律可能会容忍一点竞争。见 Google Books, 804 F.3d at 224。在涉及相似但不太相似的单一二次创作的案件中,市场稀释造成的损害不太可能大到足以引起重视。即使考虑到 "被告所从事的那种广泛行为 "的影响,Oracle, 593 U.S. at 38(引用 4 Nimmer § 13.05),一次创作一件间接替代作品也只能对原作市场产生如此大的影响。
This case is different. This is not a case where an original work is being compared to one secondary work. Nor is this case like the previous fair use cases involving creation of a digital tool. In those cases, like Google Books and Perfect 10, the tool could at most be used to access part or all of the original works. This case, unlike any of those cases, involves a technology that can generate literally millions of secondary works, with a miniscule fraction of the time and creativity used to create the original works it was trained on. No other use-whether it’s the creation of a single secondary work or the creation of other digital tools-has anything near the potential to flood the market with competing works the way that LLM training does. And so the concept of market dilution becomes highly relevant.
本案则不同。本案不是原创作品与二次创作作品的比较。本案也不像以往涉及数字工具创作的合理使用案件。在这些案件中,如谷歌图书和 Perfect 10,该工具最多只能用于获取部分或全部原作。而本案与这些案件都不同,它涉及的技术可以产生数以百万计的二次作品,而所花费的时间和创造力仅是创作原作品所需的一小部分。其他任何用途--无论是创作单一的二次创作,还是创作其他数字工具--都不可能像法律硕士培训那样,让市场充斥大量竞争作品。因此,市场稀释的概念变得非常重要。
In arguing that this sort of harm doesn’t count just because it’s never made a difference in a case before, Meta makes the mistake the Supreme Court instructs parties and courts to avoid: robotically applying concepts from previous cases without stepping back to consider context. Fair use is meant to be a flexible doctrine that takes account of “significant changes in technology.” Oracle, 593 U.S. at 19 (quoting Sony, 464 U.S. at 430). Courts can’t stick their heads in the sand to an obvious way that a new technology might severely harm the incentive to create, just because the issue has not come up before. Indeed, it seems likely that market dilution will often cause plaintiffs to decisively win the fourth factor-and thus win the fair use question overall—in cases like this.
Meta 公司认为,这种损害在以前的案件中从未产生过影响,因此不算数,这犯了最高法院要求当事人和法院避免的错误:机械地套用以前案件中的概念,而不退一步考虑上下文。合理使用本应是一种灵活的理论,应考虑到 "技术的重大变化"。甲骨文,593 U.S. at 19(引用索尼,464 U.S. at 430)。法院不能因为新技术可能会严重损害创造的积极性,就对这一显而易见的问题视而不见。事实上,在类似案件中,市场稀释往往会使原告在第四个因素上取得决定性胜利,从而在合理使用问题上取得全面胜利。
But courts can’t decide cases based on what they think will or should happen in other cases. They must decide cases based on the arguments presented and the evidence submitted by the parties. The question, then, is whether these particular thirteen plaintiffs in this particular
但法院不能根据他们认为在其他案件中会发生或应该发生的事情来裁决案件。法院必须根据当事人提出的论点和提交的证据来裁决案件。那么,问题在于,在这个特定的案件中,这 13 名原告是否

case have presented enough evidence to win on this factor. Or, to put it more precisely given the procedural posture of this case, whether these plaintiffs have presented enough evidence to raise a genuine dispute of material fact sufficient to give the question of market dilution to a jury. The answer is no.
或者更确切地说,鉴于本案的程序态势,这些原告是否提供了足够的证据来引起真正的重大事实争议,足以将市场稀释问题提交给陪审团。或者,考虑到本案的程序态势,更确切地说,这些原告是否提供了足够的证据,足以引起对重大事实的真正争议,从而将市场稀释的问题提交给陪审团。答案是否定的。
In their complaint, the plaintiffs asserted only two types of market harm-that users of Llama can reproduce text from their books, and that Meta’s copying harmed the market for licensing copyrighted materials to companies for AI training. As for market dilution-the notion that allowing companies like Meta to copy their works to train products like Llama would inevitably cause the market for the plaintiffs’ works to be flooded with similar works-the plaintiffs never so much as mentioned it in their complaint. Nor did they mention it in their own summary judgment motion.
原告在诉状中只声称了两类市场损害--Llama 的用户可以复制其书籍中的文字,以及 Meta 的复制损害了将版权材料授权给公司用于人工智能训练的市场。至于市场稀释--允许 Meta 这样的公司复制他们的作品来训练 Llama 这样的产品将不可避免地导致原告作品的市场充斥着类似作品--原告从未在诉状中提及。他们在自己的简易判决动议中也没有提到这一点。
Naturally, given the allegations in the complaint, Meta’s cross-motion for summary judgment focused on defeating the first two theories. But Meta also noted in its motion that the plaintiffs hadn’t presented any evidence that Meta’s use of their books to train Llama had harmed book sales. See Meta MSJ Exs. 8-9. And Meta presented its own expert testimony explaining that Llama 3’s release did not have any discernible effect on the plaintiffs’ sales (or those of other books in Llama’s training data), at least in the period shortly after the release. See Sinkinson Decl. ISO Meta MSJ ศ| 18-35.
当然,考虑到原告在诉状中的指控,Meta 公司的即决判决交叉动议主要集中在推翻前两个理论上。但 Meta 也在其动议中指出,原告没有提供任何证据证明 Meta 使用他们的书籍培训 Llama 损害了书籍的销售。见 Meta MSJ Exs.Meta 提交了自己的专家证词,解释 Llama 3 的发布对原告的销售(或 Llama 培训数据中其他书籍的销售)没有任何明显影响,至少在发布后不久的一段时间内是如此。见 Sinkinson Decl. ISO Meta MSJ ศ| 18-35。
In opposition, the plaintiffs’ primary response was that this was beside the point because of their first two theories. They did make fleeting reference to a report by one of their experts, who briefly discussed the concept of indirect substitution and mentioned articles discussing how AI-created books are starting to flood Amazon. See Pls. Reply Ex. 126 II 193-207. But this discussion generates more questions than answers.
对此,原告的主要回应是,这与他们的前两个理论无关。原告稍稍提到了他们的一位专家的报告,该专家简要讨论了间接替代的概念,并提到了讨论人工智能创建的书籍如何开始充斥亚马逊的文章。见 Pls.答复 Ex.126 ii 193-207。但这一讨论产生的问题多于答案。
First, is Llama capable of generating such books? If it isn’t currently, will it be capable of doing so in the near future? Presumably the answer is yes, but that’s not a foregone conclusion. An LLM could, for instance, be configured to be unable to produce book-length or book-style outputs. So the fact that books are being created by some LLM does not automatically mean that
首先,Llama 是否能够生成此类书籍?如果目前还不能,那么在不久的将来是否能做到?答案应该是肯定的,但这并不是一个必然的结论。例如,LLM 可以被配置为无法生成书籍长度或书籍风格的输出。因此,某些 LLM 正在制作书籍这一事实并不自动意味着
Llama can create them or will be able to do so soon.
Llama 可以创建它们,或者很快就能创建。

Second, what are these AI-generated books? Do they compete with Sarah Silverman’s memoir? With plaintiff Matthew Klam’s book of short stories? With Rachel Louise Snyder’s nonfiction work on domestic violence? The plaintiffs provide no analysis of the markets for their books, no discussion of whether these markets are or could be affected by AI-generated books, and no explanation of whether the existing AI-generated books referenced in the expert report compete in these markets.
第二,这些人工智能生成的书是什么?它们与莎拉-西尔弗曼的回忆录竞争吗?与原告马修-克拉姆(Matthew Klam)的短篇小说集竞争吗?与雷切尔-路易斯-斯奈德(Rachel Louise Snyder)关于家庭暴力的非虚构作品竞争吗?原告没有对其书籍的市场进行分析,没有讨论这些市场是否受到或可能受到人工智能生成书籍的影响,也没有解释专家报告中提到的现有人工智能生成书籍是否在这些市场中存在竞争。
Third, what impact does this competition actually have on sales of the books it competes with? Does it drown out those books entirely? Does it just chisel at their sales at the margins? Or, as discussed above and seems likely, does it depend on the book-are readers of romance novels happy to buy AI-generated ones, while all the people who want to read Sarah Silverman’s memoir still want to read it over AI-generated comic memoirs? Whatever the effects have been thus far, are they likely to increase in the future, as more and more AI-generated books are written, and as LLMs get better and better at writing human-like text?
第三,这种竞争对与之竞争的图书的销售究竟有什么影响?是否完全淹没了这些图书?是否只是蚕食了它们的销售利润?或者,正如上文所讨论的,而且似乎很有可能,这是否取决于图书--爱情小说的读者是否乐于购买人工智能生成的爱情小说,而所有想读莎拉-西尔弗曼回忆录的人是否仍然想读它而不是人工智能生成的漫画回忆录?无论目前的影响如何,随着人工智能生成的书籍越来越多,以及法律硕士在撰写类人文本方面的能力越来越强,未来的影响是否会越来越大?
Fourth, how does the threat to the market for the plaintiffs’ books in a world where LLM developers can copy those books compare to the threat to the market for the plaintiffs’ books in a world where the developers can’t copy them? There is no hint of that in the briefs or evidence presented by the plaintiffs.
第四,在法律硕士开发者可以复制原告书籍的情况下,原告书籍对市场的威胁与在开发者不能复制原告书籍的情况下,原告书籍对市场的威胁相比如何?在原告提交的辩护状或证据中没有任何这方面的暗示。
The analysis is complicated somewhat by the fact that fair use is an affirmative defense and that Meta moved for summary judgment on it. For those reasons, Meta had the burden of presenting evidence that its copying doesn’t threaten to substantially harm the market for the plaintiffs’ books. It didn’t conclusively establish that its copying couldn’t do so in the futurepotentially because its copying did in fact make Llama better able to generate countless works that will dilute the market for the plaintiffs’ books. But where a defendant introduces evidence of a lack of market harm, “and the plaintiff fails to introduce empirical evidence countering such a showing, the fourth factor should be weighed in the defendant’s favor.” Patry on Fair Use § 6:13; see also Seltzer v. Green Day, Inc., 725 F.3d 1170, 1179 (9th Cir. 2013); cf. Perfect 10,
由于合理使用是一项肯定性抗辩,而 Meta 公司又提出了简易判决,因此分析变得有些复杂。因此,Meta 有责任提供证据,证明其复制行为不会对原告图书的市场造成实质性损害。它没有确凿证据证明其复制行为将来不会造成损害,这可能是因为它的复制行为确实使 Llama 能够更好地创作出无数作品,从而稀释了原告图书的市场。但是,如果被告提出证据证明不存在市场损害,"而原告未能提出经验证据反驳这种证明,则第四个因素应当对被告有利"。Patry on Fair Use § 6:13; see also Seltzer v. Green Day, Inc., 725 F.3d 1170, 1179 (9th Cir. 2013); cf. Perfect 10、
508 F.3d at 1168. That is exactly what happened here. Meta introduced evidence that its copying hasn’t caused market harm. The plaintiffs presented no empirical evidence to the contrary-no evidence that the copying has already caused market harm, and no evidence that the copying is likely to cause market harm in the future. All the plaintiffs presented is speculation, and speculation is insufficient to raise a genuine issue of fact and defeat summary judgment. E.g., Anheuser-Busch, Inc. v. Natural Beverage Distributors, 69 F.3d 337, 345 (9th Cir. 1995).
508 F.3d at 1168。这正是本案的情况。Meta 公司提出的证据表明,其抄袭行为并未造成市场损害。原告没有提供相反的经验证据--没有证据表明抄袭已经造成了市场损害,也没有证据表明抄袭将来可能造成市场损害。原告提出的所有证据都是推测,而推测不足以引起真正的事实问题并使简易判决失败。例如,Anheuser-Busch, Inc. v. Natural Beverage Distributors, 69 F.3d 337, 345 (9th Cir. 1995)。
The plaintiffs argue that they didn’t need to present empirical evidence because market harm can be inferred. For this argument, they cite to Hachette, in which the Second Circuit inferred market harm—even though the plaintiffs had not provided “empirical data” showing any and the secondary user presented expert testimony that there was none-because it was “selfevident” that the secondary use would cause such harm if widespread. 115 F.4th at 192-93. In Hachette, the secondary user maintained a database that let internet users “download an identical copy of” the plaintiffs’ books for free. Id. at 194. The secondary use therefore offered a directly “competing substitute” for the original books. Id. at 195.
原告认为,他们不需要提供经验证据,因为市场损害是可以推断出来的。为此,他们引用了 Hachette 案,在该案中,第二巡回法院推断出了市场损害--尽管原告没有提供 "经验数据 "表明存在市场损害,而二次使用方也提供了专家证词证明不存在市场损害--因为 "不言而喻",如果二次使用广泛存在,就会造成这种损害。115 F.4th at 192-93。在 Hachette 案中,二级用户维护了一个数据库,让互联网用户免费 "下载 "原告书籍的 "相同副本"。同上,第 194 页。因此,二次使用为原书提供了直接的 "竞争替代品"。同上,第 195 页。
While it made sense to infer market harm in Hachette, it doesn’t make sense to do so here. First, the Supreme Court has stated that no “inference of market harm . . . is applicable to a case involving something beyond mere duplication for commercial purposes.” Campbell, 510 U.S. at 591. In Hachette, the secondary use was basically “mere duplication.” Here, by contrast, Meta’s use is highly transformative and has a purpose well beyond that. Second, unlike in Hachette, Meta’s use does not let users access any significant portion of the plaintiffs’ books, so it isn’t self-evident that Meta’s use would create harm via direct substitution. Nor is it selfevident that Llama will harm the book sale market by enabling users to create a flood of competing books. It’s possible, even likely, that Llama will harm the book sale market. But to conclude that it will requires inferring that Llama (and not just any LLM) can be used to create such books, that it will be used to create such books, that consumers will purchase those books instead of books written by human authors, that consumers will buy those books instead of the plaintiffs’ books in particular, and that Llama is meaningfully better at creating those books
虽然在 Hachette 案中推断市场损害是合理的,但在此案中这样做并不合理。首先,最高法院曾指出,"市场损害推论......不适用于涉及商业目的而非单纯复制的案件"。Campbell, 510 U.S. at 591。在 Hachette 案中,二次使用基本上是 "单纯的复制"。而在本案中,Meta 的使用具有高度的转换性,其目的远远超出了这一范围。其次,与 Hachette 案不同的是,Meta 的使用并没有让用户访问原告书籍的任何重要部分,因此 Meta 的使用会通过直接替代造成损害并不是不言而喻的。Llama 会使用户制造大量竞争图书,从而损害图书销售市场的说法也不言自明。Llama 有可能,甚至很有可能损害图书销售市场。但是,要得出这样的结论,就必须推断出 Llama(而不仅仅是任何 LLM)可以被用来创建这样的书籍,它将被用来创建这样的书籍,消费者将购买这些书籍而不是由人类作者撰写的书籍,消费者将购买这些书籍而不是特别是原告的书籍,以及 Llama 在创建这些书籍方面具有有意义的优势

because it was trained on copyrighted material. In Hachette, on the other hand, the only necessary inference was that readers might choose to download the plaintiffs’ books for free instead of paying for them-a much shorter (and more obvious) inferential leap. Cf. American Society for Testing & Materials v. Public.Resource.Org, 82 F.4th 1262, 1271-72 (D.C. Cir. 2023).
因为它使用的是受版权保护的材料。另一方面,在 Hachette 案中,唯一必要的推论是读者可能会选择免费下载原告的书籍,而不是付费购买--这是一个短得多(也明显得多)的推论跳跃。参见 American Society for Testing & Materials v. Public.Resource.Org, 82 F.4th 1262, 1271-72 (D.C.ir. 2023)。
On this record, then, Meta has defeated the plaintiffs’ half-hearted argument that its copying causes or threatens significant market harm. That conclusion may be in significant tension with reality, but it’s dictated by the choice the plaintiffs made to put forward two flawed theories of market harm while failing to present meaningful evidence on the effect of training LLMs like Llama with their books on the market for those books. 14 14 ^(14){ }^{14}
因此,根据这一记录,Meta 公司已经击败了原告关于其抄袭行为造成或可能造成重大市场损害的半信半疑的论点。这一结论可能与现实有很大的矛盾,但这是由原告的选择所决定的,他们提出了两个有缺陷的市场损害理论,但却没有提出有意义的证据来证明像 Llama 这样的法律硕士用他们的书籍进行培训对这些书籍市场的影响。 14 14 ^(14){ }^{14}

D

Two other issues are relevant to the fourth factor. First, as noted above, is whether Meta’s use of shadow libraries benefited those libraries or their other users. If it did, then this would be relevant to the fourth factor. It would mean that Meta’s copying helped others acquire copyrighted works, potentially including the plaintiffs’ works, without paying for them (and without any indication that those other people were acquiring the works for fair use purposes). But although the plaintiffs discussed Meta’s use of shadow libraries at length, they did not argue that it had these effects or was relevant to the fourth factor beyond allowing Meta to get the books without paying. At the hearing, the plaintiffs’ counsel did suggest that, by using shadow libraries, Meta (and other companies like it) would reduce the stigma associated with shadow
与第四个因素相关的还有两个问题。首先,如上所述,Meta 对影子图书馆的使用是否有利于这些图书馆或其其他用户。如果是,那么这就与第四个因素有关。这意味着 Meta 的复制行为帮助其他人获得了版权作品,可能包括原告的作品,而无需付费(而且没有任何迹象表明这些其他人是出于合理使用的目的获得作品)。但是,尽管原告详细讨论了 Meta 使用影子图书馆的行为,但他们并没有辩称这种行为产生了这些影响,也没有辩称除了允许 Meta 在不付费的情况下获得图书之外,这种行为还与第四个因素有关。在听证会上,原告律师确实提出,通过使用影子图书馆,Meta 公司(以及其他类似公司)将减少与影子图书馆相关的污名。
libraries and encourage more people to use them. May 1 Hr’g Tr. at 92-93. It’s not clear whether this would matter in the overall analysis. But in any event, counsel conceded that the record contains no evidence of this dynamic playing out. Id. at 93-94. 15 15 ^(15){ }^{15}
并鼓励更多人使用图书馆。5 月 1 日会议记录,第 92-93 页。目前还不清楚这在总体分析中是否重要。但无论如何,律师承认记录中没有证据显示这种动态效果。同上,第 93-94 页。 15 15 ^(15){ }^{15}
Second is the public benefit associated with Meta’s copying. Neither side’s presentation on this front does much to move the needle. The plaintiffs say that sanctioning Meta’s conduct would encourage piracy by incentivizing other LLM companies to pirate and to “support and defend” shadow libraries “that make stolen works available for free.” There is no evidence in the record that Meta (or any other LLM developer) is actively supporting or otherwise encouraging widespread use of shadow libraries. As for incentivizing other LLM developers to use shadow libraries, the plaintiffs again beg the question-whether LLM developers should have to pay for the books they use as training data is the issue addressed in this opinion (and, obviously, a factspecific one that can’t be answered uniformly across the board). Meta, for its part, mostly discusses the various ways that LLMs can be useful. But the public benefits most relevant to the fourth factor are those “related to copyright’s concern for the creative production of new expression.” Oracle, 593 U.S. at 35. So the fact that Llama can help someone do their taxes, for example, is not especially relevant here. Nevertheless, Meta’s use of copyrighted works as training data will likely help Llama create new expression, whether by making it better at helping users generate creative text or by improving its “memory” and thereby making it more useful to the researchers who use it to develop software. Public benefit considerations thus slightly favor
其次是与 Meta 复制有关的公共利益。双方在这方面的陈述都没有起到多大作用。原告称,制裁 Meta 的行为将鼓励其他 LLM 公司盗版,并 "支持和维护""免费提供被盗作品 "的影子库,从而助长盗版行为。记录中没有证据表明 Meta(或任何其他 LLM 开发商)积极支持或以其他方式鼓励影子库的广泛使用。至于鼓励其他 LLM 开发者使用影子库,原告再次提出了一个问题--LLM 开发者是否应该为他们用作训练数据的书籍付费,这才是本意见书所要解决的问题(显然,这是一个针对具体事实的问题,不能一概而论)。Meta 则主要讨论了 LLM 的各种用途。但与第四个因素最相关的公共利益是那些 "与版权对新表达的创造性生产的关注有关的利益"。甲骨文,593 U.S. at 35。因此,例如 Llama 可以帮助他人报税这一事实在此并不特别相关。尽管如此,Meta 使用受版权保护的作品作为训练数据很可能有助于 Llama 创造新的表达方式,无论是通过使其更好地帮助用户生成创造性文本,还是通过改善其 "记忆",从而使其对使用其开发软件的研究人员更有用。因此,公共利益方面的考虑略微倾向于
Meta, confirming that it wins on factor four.
元,证实它在第四要素上获胜。

E

Relatedly, Meta argues that the “public interest” would be “badly disserved” by preventing Meta (and other AI developers) from using copyrighted text as training data without paying to do so. Meta seems to imply that such a ruling would stop the development of LLMs and other generative AI technologies in its tracks. This is nonsense.
与此相关的是,Meta 公司辩称,如果阻止 Meta 公司(以及其他人工智能开发者)使用受版权保护的文本作为训练数据而不付费,将 "严重损害""公共利益"。Meta 似乎在暗示,这样的裁决将阻止 LLM 和其他生成式人工智能技术的发展。这是无稽之谈。
As mentioned earlier, a ruling that certain copying isn’t fair use doesn’t necessarily mean the copier has to stop their copying-it means that they have to get permission for it. So where copying for LLM training isn’t fair use, LLM developers (including Meta) won’t need to stop using copyrighted works to train their models. They will need only to pay rightsholders for licenses for that training.
如前所述,裁定某些复制不属于合理使用并不一定意味着复制者必须停止复制,而是意味着他们必须获得许可。因此,如果用于 LLM 训练的复制不属于合理使用,LLM 开发者(包括 Meta)无需停止使用受版权保护的作品来训练他们的模型。他们只需要向权利人支付培训许可费即可。
Presumably, where copying for AI training isn’t fair use, AI developers will simply figure out a way to license the works they wish to use as training data. Meta’s contention that markets for this licensing can’t or won’t develop is hard to believe. If books are as good for LLM training as Meta says they are, then it seems nearly certain that LLM developers would be willing to pay for licenses. (Indeed, Meta itself was willing to pay to license books-it just found licensing too logistically difficult.) Even if the value of any particular book as training data is too low to justify negotiating licensing deals book by book, LLM developers would still presumably be interested in licensing large numbers of books at once. Publishers may not currently hold the subsidiary rights necessary to make group licensing possible. But it’s hard to believe that they won’t soon start negotiating those rights with their authors so that they can engage in large-scale negotiation and licensing with LLM developers-assuming they haven’t already started to do so. It seems especially likely that these licensing markets will arise if LLM developers’ only choices are to get licenses or forgo the use of copyrighted books as training data. If they instead choose to use only public domain works as training data (instead of licensing copyrighted works), that would indicate that they don’t actually need the copyrighted works as badly as they say they do.
据推测,在为人工智能训练而复制作品不属于合理使用的情况下,人工智能开发者只需想办法许可他们希望用作训练数据的作品即可。Meta 公司认为这种许可市场不可能或不会发展起来,这种说法很难令人信服。如果图书真如 Meta 所说的那样适合用于 LLM 训练,那么 LLM 开发者几乎肯定会愿意为许可证付费。(事实上,Meta 公司本身也愿意为图书的授权付费,只是发现授权在逻辑上过于困难)。即使任何一本书作为训练数据的价值太低,无法证明逐本谈判许可协议是合理的,LLM 开发者仍然可能对同时许可大量书籍感兴趣。出版商目前可能并不拥有必要的附属权利来实现集体授权。但很难相信,他们不会很快开始与作者谈判这些权利,以便与 LLM 开发商进行大规模的谈判和授权--假设他们还没有开始这样做的话。如果 LLM 开发者的唯一选择是获得许可或放弃使用受版权保护的图书作为训练数据,那么这些许可市场似乎就更有可能出现。如果他们选择只使用公有领域的作品作为训练数据(而不是授权使用受版权保护的作品),那就表明他们实际上并不像他们所说的那样迫切需要受版权保护的作品。
So if it isn’t fair use for Meta and other LLM developers to use copyrighted books as
因此,如果 Meta 和其他 LLM 开发人员将受版权保护的书籍用作

training data without permission, they won’t have to stop working on their LLMs altogether. They’ll just have to pay for licenses or use books that aren’t copyrighted. Either way, it may be that LLM companies move somewhat more slowly or make somewhat less money. But the suggestion that the growth of LLM technology would come to a halt (or anything close) doesn’t pass the straight face test.
在未经许可培训数据的情况下,他们不必完全停止法学硕士的学习。他们只需支付许可证费用或使用无版权的书籍。无论如何,LLM 公司的发展速度可能会慢一些,或者赚的钱会少一些。但是,认为法律硕士技术的发展会停滞不前(或接近停滞不前)的说法并不能通过直面测试。

VII. CONCLUSION  VII. 结论

Fair use is a fact-specific doctrine that requires case-by-case analysis that is sensitive to new technologies and their potential consequences. No previous case has involved a use that is both as transformative and as capable of diluting the market for the original works as LLM training is. So no previous case answers the question whether Meta’s copying was fair use. That question must be answered by flexibly applying the fair use factors and considering Meta’s copying in light of the purpose of copyright and fair use: protecting the incentive to create by preventing copiers from creating works that substitute for the originals in the marketplace.
合理使用是一项针对具体事实的理论,需要对新技术及其潜在后果进行逐案分析。此前没有任何案例涉及像 LLM 培训这样既具有变革性又能稀释原作品市场的使用。因此,Meta 公司的复制行为是否属于合理使用,以前的案例都无法回答这个问题。要回答这个问题,必须灵活运用合理使用的因素,并根据版权和合理使用的目的来考虑 Meta 的复制行为:通过防止复制者创作出在市场上替代原作的作品来保护创作的积极性。
In cases involving uses like Meta’s, it seems like the plaintiffs will often win, at least where those cases have better-developed records on the market effects of the defendant’s use. No matter how transformative LLM training may be, it’s hard to imagine that it can be fair use to use copyrighted books to develop a tool to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books. And some cases might present even stronger arguments against fair use. For instance, as discussed above, it seems that markets for certain types of works (like news articles) might be even more vulnerable to indirect competition from AI outputs. On the other hand, though, tweak some facts and defendants might win. For example, using copyrighted books to train an LLM for nonprofit purposes, like national security or medical research, might be fair use even in the face of some amount of market dilution. See Oracle, 593 U.S. at 32 ("[A] finding that copying was not commercial in nature tips the scales in favor of fair use."). Or plaintiffs whose works are unlikely to face meaningful competition from AI-generated ones may be unable to defeat a fair use defense.
在涉及类似 Meta 的使用的案件中,原告似乎往往会胜诉,至少在这些案件中,被告的使用对市场的影响有更好的记录。无论法律硕士培训如何具有变革性,很难想象使用受版权保护的书籍来开发一种工具,从而赚取数十亿或数万亿美元,同时又能创作出可能无穷无尽的竞争作品,从而严重损害这些书籍的市场,这样的使用会是合理的。有些情况下,反对合理使用的理由可能更充分。例如,如上所述,某些类型作品(如新闻报道)的市场似乎更容易受到人工智能产出的间接竞争。但另一方面,如果对一些事实进行调整,被告可能会胜诉。例如,为非营利目的(如国家安全或医学研究)使用受版权保护的书籍来培训法学硕士,即使面临一定程度的市场稀释,也可能是合理使用。参见 Oracle, 593 U.S. at 32("复制不具有商业性质的结论使合理使用的天平倾向于合理使用")。或者,如果原告的作品不太可能面临人工智能生成的作品的竞争,那么原告可能无法通过合理使用抗辩。
In this case, because Meta’s use of the works of these thirteen authors is highly transformative, the plaintiffs needed to win decisively on the fourth factor to win on fair use. See, e.g., Perfect 10, 508 F.3d at 1168 (fair use where secondary use was “significant[ly] transformative” and fourth factor “favor[ed] neither party”). And to stave off summary judgment, they needed to create a genuine issue of material fact as to that factor. Because the issue of market dilution is so important in this context, had the plaintiffs presented any evidence that a jury could use to find in their favor on the issue, factor four would have needed to go to a jury. Or perhaps the plaintiffs could even have made a strong enough showing to win on the fair use issue at summary judgment. But the plaintiffs presented no meaningful evidence on market dilution at all. Absent such evidence and in light of Meta’s evidence, the fourth factor can only favor Meta. Therefore, on this record, Meta is entitled to summary judgment on its fair use defense to the claim that copying these plaintiffs’ books for use as LLM training data was infringement.
在本案中,由于 Meta 对这 13 位作家作品的使用具有很强的转换性,原告需要在第四个因素上取得决定性的胜利,才能在合理使用上获胜。参见,例如,Perfect 10, 508 F.3d at 1168(在二次使用具有 "重大改变性 "且第四个因素 "对任何一方都不利 "的情况下的合理使用)。为了避免即决判决,他们需要就该因素提出真正的重大事实问题。由于市场稀释问题在这种情况下非常重要,如果原告提供了任何证据,陪审团可以在这个问题上做出对他们有利的裁决,那么第四个因素就需要提交给陪审团。或许原告甚至可以提出足够有力的证据,在简易判决中就合理使用问题胜诉。但原告根本没有就市场稀释问题提交任何有意义的证据。如果没有这样的证据,根据 Meta 的证据,第四个因素只能有利于 Meta。因此,根据这一记录,Meta 公司有权就其合理使用抗辩进行简易判决,即复制这些原告的书籍用作 LLM 培训数据是侵权行为。
As previously noted, summary judgment will be granted for Meta in a separate ruling on the plaintiffs’ DMCA claim. A Zoom case management conference is scheduled for July 11, 2025, at 10:00 a.m. to discuss how to proceed on the plaintiffs’ separate claim that Meta unlawfully distributed their protected works during the torrenting process.
如前所述,在对原告的 DMCA 诉讼请求的单独裁决中,Meta 将获得简易判决。Zoom 案件管理会议定于 2025 年 7 月 11 日上午 10:00,讨论如何继续审理原告关于 Meta 在下载过程中非法传播其受保护作品的单独诉求。

IT IS SO ORDERED.
它就是这样被命令的。

Dated: June 25, 2025  日期: 2025 年 6 月 25 日日期: 2025 年 6 月 25 日
VINCE CHHABRIA  文斯-查布里亚
United States District Judge
美国地区法官

  1. 1 1 ^(1){ }^{1} Except as noted, the parties do not dispute the facts described in this section.
    1 1 ^(1){ }^{1} 除已指出的内容外,双方对本节所述事实没有争议。
  2. 2 2 ^(2){ }^{2} According to Meta, GitHub is “a leading cloud-based platform where coders store and share code, frequently on an open-source basis.” ArXiv is “a free online archive of math, science, and economics papers.” Stack Exchange is “a network of question-and-answer websites for sharing technical knowledge, geared toward the programming community.”
    2 2 ^(2){ }^{2} 根据 Meta 的介绍,GitHub 是 "一个领先的基于云的平台,程序员在此存储和共享代码,而且经常是在开源的基础上共享代码"。ArXiv 是 "一个免费的数学、科学和经济学论文在线档案库"。Stack Exchange 是 "一个共享技术知识的问答网站网络,面向编程社区"。
  3. 3 3 ^(3){ }^{3} See generally David Gerwitz, What Is Torrenting?, ZDNET (Aug. 6, 2024), https://www.zdnet.com/article/what-is-torrenting-and-how-does-it-work [https://perma.cc/8PG5H7UW].
    3 3 ^(3){ }^{3} 一般参见 David Gerwitz,《什么是 Torrenting? ZDNET》(2024 年 8 月 6 日),https://www.zdnet.com/article/what-is-torrenting-and-how-does-it-work [ https://perma.cc/8PG5H7UW]。
  4. 4 4 ^(4){ }^{4} The plaintiffs moved for summary judgment only on the grounds that “Meta copied [their] copyrighted books without permission” and that its “reproduction . . . without permission . . . is not fair use.” Pls. MSJ at vii, 19. The plaintiffs did at times suggest that their motion encompassed their distribution claim. See, e.g., id. at 22 (“Meta’s initial reproduction” was not fair use because it “result[ed] in distributing copyrighted material.”). But reproduction and distribution are separate rights that must be considered separately. See 17 U.S.C. § 106(1), (3); Columbia Pictures Industries, Inc. v. Fun, 710 F.3d 1020, 1034 (9th Cir. 2013) (“Both uploading and downloading copyrighted material are infringing acts. The former violates the copyright holder’s right to distribution, the latter the right to reproduction.”).
    4 4 ^(4){ }^{4} 原告仅以 "Meta 公司未经许可复制了受版权保护的书籍 "以及 "未经许可的复制......不是合理使用 "为由,请求法院做出简易判决。Pls. MSJ at vii, 19.原告有时确实暗示他们的动议包括他们的发行索赔。例如,参见同上,第 22 页("Meta 的初次复制 "不是合理使用,因为它 "导致了受版权保护材料的传播")。但复制和传播是不同的权利,必须分别考虑。参见 17 U.S.C. § 106(1)、(3);Columbia Pictures Industries, Inc. v. Fun, 710 F.3d 1020, 1034 (9th Cir. 2013)("上传和下载版权材料都是侵权行为。前者侵犯了版权持有者的发行权,后者侵犯了复制权")。
    As discussed below, the specific manner of Meta’s reproduction (that is, torrenting the plaintiffs’ books from shadow libraries) is still relevant to whether that reproduction was fair use. But Meta’s alleged distribution must be addressed independently (unless, maybe, its acquisition necessarily involved distribution, which does not appear to be the case, see Pls. Ex. 67 at 106:14-108:25, 246:13-248:23, 270:12-16 (explaining that default settings could be changed such that leeching was not always occurring)). Even if the plaintiffs had moved for summary judgment as to whether any distribution was fair use, the record on Meta’s alleged distribution is incomplete, making summary judgment on that issue improper at this point in the case. See Order Granting as Modified Meta’s Request for Leave to File a Rebuttal Expert Report, Dkt. No. 499 (giving Meta leave to serve supplemental expert report on distribution, with deadline after Meta’s deadline to oppose the plaintiffs’ motion for summary judgment).
    正如下文所讨论的,Meta 复制的具体方式(即从影子图书馆下载原告的书籍)仍然与该复制是否属于合理使用有关。但 Meta 所称的传播必须单独处理(除非,也许其获取必然涉及传播,但情况似乎并非如此,见 Pls.Ex.67 at 106:14-108:25、246:13-248:23、270:12-16(解释默认设置可以更改,因此并不总是发生泄密))。即使原告就任何传播是否属于合理使用提出了即决判决动议,但有关 Meta 所称传播的记录并不完整,因此在本案的这一阶段对该问题进行即决判决是不恰当的。参见经修改后批准 Meta 提交反驳专家报告请求的命令,Dkt. No. 499(允许 Meta 就传播问题提交补充专家报告,截止日期在 Meta 反对原告即决判决动议的截止日期之后)。
  5. 5 5 ^(5){ }^{5} This sort of competition-from AI-generated books that are like the plaintiffs’ but not similar enough to be infringing - is also discussed at length with respect to the fourth factor.
    5 5 ^(5){ }^{5} 关于第四个因素,我们还详细讨论了这种竞争--来自人工智能生成的图书,这些图书与原告的图书相似,但相似程度不足以构成侵权。

    6 6 ^(6){ }^{6} By contrast, consider an LLM that was designed to be used to create works substantially similar to those on which it was trained, or to create works that competed with the originals without being substantially similar. Using copyrighted works to train such an LLM could be less transformative than using them to train a general-purpose LLM, because that use would have the purpose and character of enabling an LLM to develop substitute works. That said, even then, training the LLM would still likely be at least somewhat transformative; transformativeness isn’t an on-off switch.
    6 6 ^(6){ }^{6} 相比之下,我们可以考虑这样一种法律硕士,它被设计用来创作与培训它的作品基本相似的作品,或者创作与原作竞争但不基本相似的作品。使用受版权保护的作品来训练这样的法律硕士,其改变性可能小于使用这些作品来训练一般用途的法律硕士,因为这种使用的目的和特点是使法律硕士能够开发替代作品。尽管如此,即便如此,对法律硕士的培训仍然可能至少具有一定的变革性;变革性并不是一个开关。
  6. 7 7 ^(7){ }^{7} It may seem inconsistent to say that commercialism is relevant but bad faith shouldn’t be. After all, commercial uses can still entail the creation of new expression for public consumption. The difference between commercialism and good faith, however, is that the former has to do with the secondary use while the latter mostly has to do with the secondary user. The goal of copyright is to encourage “activity that is useful to the public education.” Pierre N. Leval, Toward a Fair Use Standard, 103 Harv. L. Rev. 1105, 1126 (1990). Although not dispositive, commercialism is relevant because nonprofit uses are (at least theoretically) more likely to be aimed at benefiting the public than are for-profit uses. Cf. 17 U.S.C. § 107(1) (juxtaposing “commercial nature” with “nonprofit educational purposes”); Sony, 464 U.S. at 448-51. Good faith, by contrast, focuses on “the morality of the secondary user”-not on “whether her creation . . . is of the type” that benefits the public and thus should be protected by copyright law. Leval, 103 Harv. L. Rev. at 1126.
    7 7 ^(7){ }^{7} 说商业性是相关的,但恶意不应该是相关的,这似乎前后矛盾。毕竟,商业性使用仍可能涉及为公众消费创造新的表达方式。不过,商业性与善意的区别在于,前者与二次使用有关,而后者主要与二次使用者有关。版权的目标是鼓励 "有益于公众教育的活动"。Pierre N. Leval, Toward a Fair Use Standard, 103 Harv.Rev. 1105, 1126 (1990)。尽管不是决定性的,但商业性是相关的,因为非营利性使用(至少在理论上)比营利性使用更有可能以造福公众为目的。参见 17 U.S.C. § 107(1)(将 "商业性质 "与 "非营利教育目的 "并列);Sony, 464 U.S. at 448-51。相比之下,善意侧重于 "二次使用者的道德"--而不是 "她的创作......是否属于 "有益于公众从而应受版权法保护的类型。Leval, 103 Harv.L. Rev., 103 Harv.
  7. 8 8 ^(8){ }^{8} Meta’s use of shadow libraries is also clearly relevant to the plaintiffs’ distribution claim. But as discussed above, reproduction and distribution present separate issues. So even if Meta’s torrenting from shadow libraries did entail distribution, that wouldn’t be dispositive of whether its reproduction was fair use.
    8 8 ^(8){ }^{8} Meta 对影子库的使用显然也与原告的发行索赔有关。但如上所述,复制和传播是两个不同的问题。因此,即使 Meta 从影子库中下载文件确实导致了传播,这也不能决定其复制是否属于合理使用。
    Separately, if Meta’s downloading materially contributed to the shadow libraries’ own infringement, Meta could potentially be liable as a contributory infringer. See Perfect 10, 508 F.3d at 1170-72. But the plaintiffs did not bring a contributory infringement claim or develop any evidence in support of one.
    另外,如果 Meta 的下载实质上促成了影子图书馆自身的侵权行为,Meta 有可能作为共同侵权人承担责任。参见 Perfect 10, 508 F.3d at 1170-72。但原告并没有提出共同侵权的主张,也没有提供任何支持共同侵权的证据。
  8. 9 9 ^(9){ }^{9} The plaintiffs’ objections to these declarations are overruled because a party may attach to a reply brief declarations that are a “reasonable response to the opposition,” and these declarations were not inconsistent with the declarants’ deposition testimony. Hodges v. Hertz Corp., 351 F. Supp. 3d 1227, 1249 (N.D. Cal. 2018); see also Civil L.R. 7-3©.
    9 9 ^(9){ }^{9} 原告对这些声明的反对意见被驳回,因为一方当事人可以在答辩状中附上 "对反对意见的合理回应 "的声明,而且这些声明与声明人的证词并不矛盾。Hodges v. Hertz Corp., 351 F. Supp. 3d 1227, 1249 (N.D. Cal. 2018); see also Civil L.R. 7-3©.
  9. 10 10 ^(10){ }^{10} Meta’s argument here does cut against it on the fourth factor, however. As discussed below, an LLM trained on copyrighted books is more likely to be capable of generating books that can compete with the ones on which it was trained.
    10 10 ^(10){ }^{10} 然而,Meta 的论点在第四个因素上与之相悖。正如下文所讨论的,在受版权保护的图书基础上培养出来的法律硕士更有可能创造出能与受其培养的图书竞争的图书。
  10. 12 12 ^(12){ }^{12} To be clear, the point is not that authors are entitled to more or less copyright protection based on how famous or popular they are. Cf. Warhol, 598 U.S. at 544 & n.19. The point is that different works may have different markets that will be affected differently by floods of AIgenerated competitors. See, e.g., Cariou v. Prince, 714 F.3d 694, 709 (2d Cir. 2013) (“Prince’s work appeals to an entirely different sort of collector than Cariou’s.”); Andy Warhol Foundation for Visual Arts, Inc. v. Goldsmith, 11 F.4th 26, 48 (2d Cir. 2021) (“We cannot . . . endorse the district court’s implicit rationale that the market for Warhol’s works is the market for ‘Warhols,’ . . . [but] we see no reason to disturb the district court’s overall conclusion that the two works occupy distinct markets[.]”), aff’d, Warhol 598 U.S. 508.
    12 12 ^(12){ }^{12} 要明确的是,这并不是说作者有权根据其知名度或受欢迎程度获得更多或更少的版权保护。问题的关键在于,不同的作品可能有不同的市场,这些市场会受到大量人工智能竞争者的不同影响。例如,参见 Cariou v. Prince, 714 F.3d 694, 709 (2d Cir. 2013)("Prince 的作品与 Cariou 的作品吸引的收藏家完全不同");Andy Warhol Foundation for Visual Arts, Inc.th 26, 48 (2d Cir. 2021)("我们不能......赞同地区法院隐含的理由,即沃霍尔作品的市场就是'沃霍尔'的市场,......。[但我们认为没有理由干扰地区法院关于这两件作品占据不同市场的总体结论[]"),维持原判,Warhol 598 U.S. 508。
  11. 13 13 ^(13){ }^{13} This is not to suggest that news articles or other works that may be less dependent on their author’s creativity are thus less deserving of protection, or that it would therefore be more appropriate to use those works to train an LLM. To the contrary, as noted with respect to the second factor, nonfiction works are still protected by copyright because the law protects their authors’ choices as to how to express facts. See Google Books, 804 F.3d at 220.
    13 13 ^(13){ }^{13} 这并不是说,新闻报道或其他不太依赖作者创造力的作品就不那么值得保护,也不是说用这些作品来培养法学硕士会更合适。相反,正如第二个因素所指出的,非虚构作品仍受版权保护,因为法律保护作者对如何表达事实的选择。参见 Google Books, 804 F.3d at 220。
  12. 14 14 ^(14){ }^{14} The plaintiffs also assert that the market for their works was harmed in the more narrow sense that, if Meta had not downloaded the books from a shadow library, it would have been required to buy the books. But as already discussed, even though that downloading is a separate use, it must be considered in light of its overall purpose. For instance, imagine a researcher who downloaded books from a shadow library in the process of writing an article on shadow libraries, and only did so for their research. That downloading would almost certainly be a fair use. Of course, in that example, the downloader has less ability to procure the books elsewhere than Meta did. But the point is that downloading from a shadow library, which the plaintiffs refer to as “unmitigated piracy,” must be viewed in light of its ultimate end. Because Meta’s purpose of LLM training is so transformative, the plaintiffs needed to win decisively on the fourth factor. The loss of isolated sales to AI developers is not the kind of market harm that could tip the scales for the plaintiffs.
    14 14 ^(14){ }^{14} 原告还声称,从更狭义的角度来看,其作品的市场受到了损害,即如果 Meta 没有从影子图书馆下载书籍,就必须购买这些书籍。但如前所述,即使下载是一种单独的使用,也必须根据其整体目的加以考虑。例如,设想一位研究人员在撰写一篇关于影子图书馆的文章时从影子图书馆下载了书籍,并且只是为了研究而下载。这种下载几乎肯定属于合理使用。当然,在这个例子中,下载者比 Meta 更没有能力从其他地方获得这些书籍。但问题是,从影子图书馆下载,也就是原告所说的 "不折不扣的盗版",必须根据其最终目的来看待。由于 Meta 公司的法律硕士培训目的是如此具有变革性,原告需要在第四个因素上取得决定性的胜利。对人工智能开发者造成的孤立销售损失并不是那种能使原告胜诉的市场损害。
  13. 15 15 ^(15){ }^{15} One of Meta’s expert witnesses did testify at her deposition that, depending on how it configured its torrenting software, it was more likely than not that Meta contributed to the BitTorrent network’s “bandwidth, content, storage, and processing power.” Pls. MSJ Ex. 67 at 103:3-104:5. But there is no evidence of whether Meta actually had the right settings to do so or of how much it might have contributed to this network. More importantly, there is no indication that any computing power Meta contributed to the BitTorrent network would have assisted the shadow libraries from which Meta torrented (or otherwise contributed to infringement of the plaintiffs’ copyrights). To the contrary, the plaintiffs cite a source indicating that the vast majority of torrented files are movies, TV shows, video games, and music-which are generally copyrighted but are not at issue in this case-and that books comprise less than one percent of torrented material. Jacqui Cheng, BitTorrent Census: About 99% of Files Copyright Infringing, Ars Technica (Jan. 29, 2010), https://arstechnica.com/information-technology/2010/01/bittorrent-census-about-99-of-files-copyright-infringing [https://perma.cc/KZ7N-R9BN].
    15 15 ^(15){ }^{15} Meta 公司的一名专家证人确实在其证词中证实,根据其对 Torrenting 软件的配置,Meta 公司很可能对 BitTorrent 网络的 "带宽、内容、存储和处理能力 "做出了贡献。Pls. MSJ Ex. 67 at 103:3-104:5。但是,没有证据表明 Meta 是否拥有这样做的正确设置,也没有证据表明 Meta 为该网络做出了多大贡献。更重要的是,没有迹象表明 Meta 为 BitTorrent 网络贡献的任何计算能力会对 Meta 转录的影子库有所帮助(或以其他方式助长了对原告版权的侵犯)。恰恰相反,原告引用的资料显示,绝大多数下载文件都是电影、电视节目、视频游戏和音乐--这些文件一般都受版权保护,但与本案无关--而书籍只占下载资料的不到百分之一。Jacqui Cheng,BitTorrent Census:约 99% 的文件侵犯版权,Ars Technica(2010 年 1 月 29 日),https://arstechnica.com/information-technology/2010/01/bittorrent-census-about-99-of-files-copyright-infringing [ https://perma.cc/KZ7N-R9BN]。