Pooled analysis of 3,741 stool metagenomes from 18 cohorts for cross-stage and strain-level reproducible microbial biomarkers of colorectal cancer 对来自 18 个队列的 3,741 个粪便宏基因组进行结直肠癌跨分期和菌株水平可重复微生物生物标志物的汇总分析
Received: 10 April 2024 收稿日期: 2024-04-10
Accepted: 2 April 2025 录用日期: 2025-04-02
Published online: 03 June 2025 在线发布: 03 六月 2025
(1) Check for updates (1) 检查更新
A list of authors and their affiliations appears at the end of the paper 作者及其单位列表显示在论文的末尾
Associations between the gut microbiome and colorectal cancer (CRC) have been uncovered, but larger and more diverse studies are needed to assess their potential clinical use. We expanded upon 12 metagenomic datasets of patients with CRC ( n=930n=930 ), adenomas ( n=210n=210 ) and healthy control individuals ( n=976n=976; total n=2,116n=2,116 ) with 6 new cohorts ( n=1,625n=1,625 ) providing granular information on cancer stage and the anatomic location of tumors. We improved CRC prediction accuracy based solely on gut metagenomics (average area under the curve =0.85=0.85 ) and highlighted the contribution of 19 newly profiled species and distinct Fusobacterium nucleatum clades. Specific gut species distinguish left-sided versus right-sided CRC (area under the curve =0.66=0.66 ) with an enrichment of oral-typical microbes. We identified strain-specific CRC signatures with the commensal Ruminococcus bicirculans and Faecalibacterium prausnitzii showing subclades associated with late-stage CRC. Our analysis confirms that the microbiome can be a clinical target for CRC screening and characterizes it as a biomarker for CRC progression. 肠道微生物组与结直肠癌 (CRC) 之间的关联已被揭示,但需要更大规模、更多样化的研究来评估其潜在的临床用途。我们扩展了 CRC 患者 ( n=930n=930 )、腺瘤 ( n=210n=210 ) 和健康对照个体 ( n=976n=976 ; 总计 n=2,116n=2,116 ) 的 12 个宏基因组数据集,增加了 6 个新队列 ( n=1,625n=1,625 ),提供了有关癌症分期和肿瘤解剖位置的精细信息。我们仅基于肠道宏基因组学 (曲线 =0.85=0.85 下的平均面积) 提高了 CRC 预测的准确性,并强调了 19 个新分析的物种和不同的梭杆菌核分支的贡献。特定的肠道种类通过丰富的口腔典型微生物来区分左侧和右侧 CRC (曲线 =0.66=0.66 下面积)。我们用共生瘤胃球菌 bicirculans 和 Faecalibacterium prausnitzii 鉴定了菌株特异性 CRC 特征,显示与晚期 CRC 相关的亚分支。我们的分析证实,微生物组可以成为 CRC 筛查的临床靶点,并将其表征为 CRC 进展的生物标志物。
CRC is the third most frequent and the second most lethal tumor type worldwide ^(1){ }^{1}. It has a 30%30 \% higher incidence in men ^(2)^{2} and 60-65%60-65 \% of all CRC cases occur in individuals with no previous family history (sporadic cancers) ^(3){ }^{3}. Only 40%40 \% of cases are diagnosed before metastasis ^(2){ }^{2}, with highest survival rates when the tumor is diagnosed at an early stage and a 5-year survival rate for stage IV for colon and rectal cancer of 11% and 15%15 \%, respectively ^(4){ }^{4}. CRC originates in the epithelial layer of either the proximal or distal colon plus rectum ^(5){ }^{5}, usually referred to as right- and left-sided CRC, respectively. Progression from benign precursor lesion (adenoma) to a malignant tumor (carcinoma), termed the adenomacarcinoma sequence, may take several years ^(6){ }^{6} and is characterized by an accumulation of mutations in tumor cells ^(5){ }^{5}, impairment in the gut mucosal barrier and intestinal inflammation ^(7,8){ }^{7,8}. CRC 是全球 ^(1){ }^{1} 第三大最常见和第二致命的肿瘤类型。它在男性中的发病率 30%30 \% 较高,并且 60-65%60-65 \% 所有 CRC 病例都发生在没有家族史(散发性癌症) ^(3){ }^{3} 的个体 ^(2)^{2} 中。只有 40%40 \% 病例在转移 ^(2){ }^{2} 前被诊断出来,早期诊断出肿瘤时的生存率最高,IV 期结肠癌和直肠癌的 5 年生存率分别为 11% 和 15%15 \%^(4){ }^{4} 。CRC 起源于近端或远端结肠加直肠的上皮层 ^(5){ }^{5} ,通常分别称为右侧和左侧 CRC。从良性前体病变(腺瘤)进展为恶性肿瘤(癌),称为腺瘤癌序列,可能需要数年 ^(6){ }^{6} 时间,其特征是肿瘤细胞 ^(5){ }^{5} 突变积累、肠道粘膜屏障受损和肠道炎症 ^(7,8){ }^{7,8} 。
Interest in the tumor microenvironment has increased alongside advances in distinguishing tumor histological features and expression patterns of CRC^(9)\mathrm{CRC}^{9}, with the gut microbiome suggested as another 随着区分肿瘤组织学特征和表达模式的进步,对肿瘤微环境的兴趣也增加了 CRC^(9)\mathrm{CRC}^{9} ,肠道微生物组被认为是另一种
important hallmark of cancer ^(9){ }^{9}. Specific microbes have been proposed as major contributors to carcinogenesis, particularly pks ^(+)^{+}Escherichia coli and Fusobacterium nucleatum ^(10,11){ }^{10,11}. Several individual cohort studies and earlier meta-analyses have observed distinct microbiome signatures in patients with CRC when compared with patients with adenomas or healthy controls ^(12-17){ }^{12-17}, consistently across different countries and cohorts ^(18-20){ }^{18-20}.A few noteworthy metagenomic studies also interrogated microbiome changes along the adenoma-carcinoma sequence and according to primary neoplasia location ^(15,21){ }^{15,21}, and links between CRC and oral species have been suggested ^(15){ }^{15}. Further evidence points toward the enrichment of oral-typical microbes (at the genus level ^(21){ }^{21} ) and of oral biofilm-forming species ^(22){ }^{22} in the gut metagenomes of patients with proximal CRC. However, no metagenomic studies have gone beyond characterizing already well-known strain-specific factors influencing CRC risk (for example, pks island, fragilysin), and no untargeted searches for subspecies and strain-level genomic associations with 癌症 ^(9){ }^{9} 的重要标志 .特定微生物被认为是致癌作用的主要因素,特别是 大肠 ^(+)^{+} 杆菌 和 核梭杆菌 ^(10,11){ }^{10,11} 。几项个体队列研究和早期的荟萃分析观察到,与腺瘤患者或健康对照 ^(12-17){ }^{12-17} 者相比,CRC 患者具有不同的微生物组特征,在不同国家和队列中一致 ^(18-20){ }^{18-20} 。一些值得注意的宏基因组研究还询问了沿腺瘤-癌序列和原发性肿瘤位置的 ^(15,21){ }^{15,21} 微生物组变化,并且已经提出了 ^(15){ }^{15} CRC 与口腔物种之间的联系。进一步的证据表明,口腔典型微生物(在属水平 ^(21){ }^{21} 上)和口腔生物膜形成物种 ^(22){ }^{22} 在近端 CRC 患者的肠道宏基因组中富集。然而,没有宏基因组学研究超越表征已经众所周知的影响 CRC 风险的菌株特异性因素(例如,pks 岛、脆弱溶血素),并且没有对亚种和菌株水平基因组关联进行非靶向搜索
CRC phenotypes are available. These gaps in the state-of-the-art currently limit the microbiome’s potential to be used as a screening tool in clinical settings. CRC 表型可用。目前最先进的这些差距限制了微生物组在临床环境中用作筛查工具的潜力。
Here, we investigated gut microbiome composition along the adenoma-carcinoma sequence and across different primary tumor locations using a meta-analytical approach comprised of an unprecedented number of cohorts ( 12 public studies and 6 new cohorts generated in this study) and samples ( 2,116 from public studies and 1,625 from our new CRC cohorts). We also used new computational, statistical and machine learning (ML) strategies to achieve higher profiling resolution extended to previously unknown species and differentiated clades of F. nucleatum ^(23){ }^{23}. 在这里,我们使用荟萃分析方法研究了沿腺瘤-癌序列和不同原发肿瘤位置的肠道微生物组组成,该方法由前所未有的队列(本研究中生成的 12 项公共研究和 6 个新队列)和样本(2,116 个来自公共研究和 1,625 个来自我们新的 CRC 队列)组成。我们还使用了新的计算、统计和机器学习 (ML) 策略来实现更高的分析分辨率,扩展到以前未知的物种和核镰刀菌的分化分支 ^(23){ }^{23} 。
Results 结果
An expanded metagenomic study population for CRC 扩大的 CRC 宏基因组研究人群
We established a large and diverse set of gut metagenomic cohorts associated with sporadic CRC and with information on CRC stage (stages 0-IV) and primary tumor location (right-sided or left-sided). To this end, we sequenced 1,625 new stool metagenomes from 6 previously unpublished CRC cohorts (Methods) and integrated them with 2,116 stool metagenomes from 12 public studies. In total, we leveraged 1,471 samples from patients with CRC (1,191 with staging information and 989 with primary tumor location information), 702 from patients with colorectal adenoma and 1,568 from control participants, from 16 case-control and two CRC-only studies (Supplementary Tables 1 and 2). Four of the six newly sequenced cohorts (cohorts 1-4,n=6711-4, n=671 ) are part of the European ONCOBIOME initiative (Methods and ‘Data Availability’), whereas the fifth (cohort 5) is part of the Micro-N Nurses’ Health Study II (NHSII) (n=897)^(24)(n=897)^{24}. Cohort 6 included stool samples from CRC cases and controls ( n=18n=18 and 39, respectively) from the Umraniye Training and Research Hospital and the Department of Medical Biology, Yeditepe University (Istanbul, Turkey). Considering the 3,741 metagenomes in the 18 integrated datasets, we gathered 94 stool metagenomes from patients with stage 0 CRC or carcinoma in situ, and more than 250 for each single stage from stage I to stage IV. In total, 344 samples were from individuals whose primary tumors originated in the right colon (cecum, ascending and transverse colon ( 10 cohorts)) and 645 samples were from patients whose primary tumors originated in the left colon and rectum (11 cohorts) (Fig. 1a,b and Supplementary Tables 1 and 2a). In addition, cohort 1 includes patients with stage IV CRC with either resected primary tumor ( n=68n=68 ) or in situ primary tumor ( n=95n=95 ). 我们建立了一组与散发性 CRC 相关的大量多样的肠道宏基因组队列,并提供了 CRC 分期(0-IV 期)和原发性肿瘤位置(右侧或左侧)的信息。为此,我们从 6 个以前未发表的 CRC 队列 (方法) 中对 1,625 个新的粪便宏基因组进行了测序,并将它们与来自 12 项公共研究的 2,116 个粪便宏基因组进行了整合。我们总共利用了 1,471 份来自 CRC 患者的样本 (1,191 份具有分期信息,989 份具有原发性肿瘤位置信息),702 份来自结直肠腺瘤患者,1,568 份来自对照参与者,来自 16 项病例对照研究和 2 项仅 CRC 研究(补充表 1 和 2)。六个新测序的队列(队列 1-4,n=6711-4, n=671 )中有四个是欧洲 ONCOBIOME 倡议(方法和“数据可用性”)的一部分,而第五个(队列 5)是 Micro-N 护士健康研究 II (NHSII) (n=897)^(24)(n=897)^{24} 的一部分。队列 6 包括来自 Umraniye 培训和研究医院以及 Yeditepe 大学(土耳其伊斯坦布尔)医学生物学系的 CRC 病例和对照 ( n=18n=18 分别为 39) 的粪便样本。考虑到 18 个集成数据集中的 3,741 个宏基因组,我们从 0 期 CRC 或原位癌患者收集了 94 个粪便宏基因组,从 I 期到 IV 期的每个阶段收集了 250 多个。总共有 344 个样本来自原发肿瘤起源于右结肠(盲肠、升结肠和横结肠(10 个队列))的个体,645 个样本来自原发肿瘤起源于左结肠和直肠的患者(11 个队列)(图 1a、b 和补充表 1 和 2a)。 此外,队列 1 包括患有切除原发肿瘤 ( n=68n=68 ) 或原位原发肿瘤 ( ) 的 IV 期 CRC 患者 n=95n=95 。
Samples were profiled using MetaPhlAn 4 (ref.25), which leverages species-level genome bins (SGB) ^(26){ }^{26} to enumerate and quantify characterized (known SGBs (kSGBs) having at least one cultivated reference) and uncharacterized species (unknown SGBs (uSGbs) lacking cultured representatives). In total, we detected 3,866 bacterial, 15 eukaryotic and 23 archaeal SGBs. Some bacterial species spanned multiple SGBs, as was the case for CRC-associated F. nucleatum species for which five SGBs described known and unknown subspecies found by MetaPhIAn 4 (that is, SGB6001, SGB6007, SGB6011, SGB6013, SGB6014), with SGB6007 and SGB6013 recently independently investigated ^(23){ }^{23} and corresponding to F. nucleatum subspecies animalis (Fna) clade 2 (C2) and Fna clade 1 (C1) ^(23){ }^{23}. To test the relevance of the presence and overall abundance of oral microbial species in the CRC gut ecosystem, we defined a panel of typically oral SGBs. These were defined on an independent set of 990 matched oral and stool samples from 495 healthy individuals in 5 public microbiome studies ^(27){ }^{27} (Methods). In particular, we considered oral SGBs to be those prevalent ( > 20%>20 \% ) in the oral microbiome but not (<5%) in the gut microbiome (Methods and Supplementary Table 3). 使用 MetaPhlAn 4 (参考文献 25) 对样品进行分析,MetaPhlAn 4 利用物种水平基因组箱 (SGB) ^(26){ }^{26} 来枚举和量化有特征的(已知的 SGB (kSGBs) 至少有一个培养的参考)和未表征的物种(未知的 SGB (uSGb) 缺乏培养的代表)。我们总共检测到 3,866 个细菌、15 个真核生物和 23 个古细菌 SGB。一些细菌物种跨越多个 SGB,例如 CRC 相关的 F. nucleatum 物种,其中 5 个 SGB 描述了 MetaPhIAn 4 发现的已知和未知亚种(即 SGB6001、SGB6007、SGB6011、SGB6013、SGB6014),最近独立 ^(23){ }^{23} 研究了 SGB6007 和 SGB6013,对应于动物镰刀菌亚种 (Fna) 分支 2 (C2) 和 Fna 分支 1 (C1) ^(23){ }^{23} .为了测试 CRC 肠道生态系统中口腔微生物物种的存在和总体丰度的相关性,我们定义了一组典型的口服 SGB。这些是在 5 项公共微生物组研究 ^(27){ }^{27} (方法)中来自 495 名健康个体的 990 个匹配的口腔和粪便样本的独立集上定义的。特别是,我们认为口服 SGB 是口腔微生物组中普遍存在的 ( > 20%>20 \% ) 而不是肠道微生物组中普遍存在的 (<5%) (方法和补充表 3)。
Functional profiles were also generated with HUMAnN 3.6 (ref. 28), and used for a comprehensive analysis on UniRef90 (UR90) gene profiles and corresponding functional grouping according to MetaCyc Pathways, Enzyme Commission (EC) and Gene Ontology (GO) terms. In addition, we investigated within-species phylogenetic structure for uSGBs using StrainPhIAn 4 (ref. 25) and evaluated the resulting 112 还使用 HUMAnN 3.6 生成功能谱(参考文献 28),并根据 MetaCyc 通路、酶委员会 (EC) 和基因本体论 (GO) 术语对 UniRef90 (UR90) 基因谱和相应的功能分组进行综合分析。此外,我们使用 StrainPhIAn 4(参考文献 25)研究了 uSGBs 的物种内系统发育结构,并评估了所得 112
within-SGB phylogenies to assess differential strain carriage by CRC phenotypes and for subclade association with CRC-related microbial genes. 在 SGB 系统发育中,用于评估 CRC 表型的差异菌株携带以及与 CRC 相关微生物基因的亚分支关联。
CRC gut microbiome signatures are stage- and location-specific CRC 肠道微生物组特征具有阶段和位置特异性
Consistent with previous reports ^(18){ }^{18}, gut microbial alpha-diversity was higher in CRC than controls in 9 of 16 cohorts (SMD > 0, only two with P < 0.05P<0.05 ), but this was not a particularly strong effect according to the meta-analytic approach via standardized mean differences (SMD), which was not statistically significant ( P >= 0.05P \geq 0.05 ) (Fig. 1c,d,Extended Data Fig. 1a and Supplementary Table 4). We observed no clear relationship between richness and clinical stage compared with controls. Estimated oral-to-gut microbiome score (Methods and Extended Data Fig. 2a-e) was instead higher both in CRC cases (Hedges’ SMD = 0.47, P < 0.001P<0.001 ) (Fig. 1e) and in later CRC stages (Hedges’ SMD =0.14,P=0.003=0.14, P=0.003 ). In addition, CRC originating from the right colon presented lower richness (Hedges’ SMD = 0.25, P=0.07P=0.07 ) (Fig. 1d and Supplementary Table 4) and a higher presence of orally derived SGBs than CRC originating from the left colon and rectum (Hedges’ SMD =-0.23,P=0.003=-0.23, P=0.003 ) (Fig. 1e and Supplementary Table 4). 与以前的报告 ^(18){ }^{18} 一致,在 16 个队列中的 9 个队列中,CRC 中的肠道微生物 α 多样性高于对照组(SMD > 0,只有两个 与 P < 0.05P<0.05 ),但根据通过标准化均差 (SMD) 的荟萃分析方法,这并不是一个特别强的效果,这在统计学上没有显着性 ( P >= 0.05P \geq 0.05 ) (图 1c,d,扩展数据图 1a 和补充表 4)。与对照组相比,我们观察到丰富度与临床分期之间没有明显的关系。相反,估计的口腔到肠道微生物组评分(方法和扩展数据图 2a-e)在 CRC 病例(Hedges 的 SMD = 0.47, P < 0.001P<0.001 )(图 1e)和后期的 CRC 阶段(Hedges' SMD =0.14,P=0.003=0.14, P=0.003 )中都较高。此外,与源自左结肠和直肠的 CRC 相比,源自右结肠的 CRC 的丰富度较低 (Hedges' SMD = 0.25) P=0.07P=0.07 (图 1d 和补充表 4) 和口服来源的 SGB 的存在率更高 (Hedges' SMD =-0.23,P=0.003=-0.23, P=0.003 ) (图 1e 和补充表 4)。
Control and CRC microbiomes were clearly compositionally distinct, confirming previous findings (proportion of sum of squares R^(2)=0.014R^{2}=0.014, permutational multivariate analysis of variance (PERMANOVA) P <= 0.01P \leq 0.01 ) (Fig. 1f, Extended Data Fig. 1b and Supplementary Table 5). Stage 0-III microbiomes were not different from stage IV ( R^(2)=0.01,P >= 0.05R^{2}=0.01, P \geq 0.05 ), and stages 0-II (early) were not different from stages III-IV (late) R^(2)=0.01R^{2}=0.01, Bray-Curtis; PERMANOVA P >= 0.05P \geq 0.05 ) (Fig. 1f and Supplementary Table 5). In addition, the microbiome of patients with adenoma did not differ significantly from controls (Fig. 1f and Supplementary Table 5), suggesting a more crucial role for the gut microbiome in the adenoma-carcinoma transition compared with earlier phases. Primary locations (right versus left) showed microbiome differences ( R^(2)=0.017,P=0.002R^{2}=0.017, P=0.002 ) (Fig. 1f and Supplementary Table 5) with no strain-level contribution to the separation (Fig. If and Methods). Altogether, the combined data support the potential of enriched oral microbial infiltration into the gut microbiome as a differentiator of CRC stages and locations (Fig. 1e,f). 对照和 CRC 微生物组在组成上明显不同,证实了先前的发现(平方和的比例 R^(2)=0.014R^{2}=0.014 、排列多变量方差分析 (PERMANOVA) P <= 0.01P \leq 0.01 )(图 1f、扩展数据图 1b 和补充表 5)。0-III 期微生物组与 IV 期 ( R^(2)=0.01,P >= 0.05R^{2}=0.01, P \geq 0.05 ) 无差异,0-II 期(早期)与 III-IV 期(晚期) R^(2)=0.01R^{2}=0.01 无差异,Bray-Curtis;PERMANOVA P >= 0.05P \geq 0.05 )(图 1f 和补充表 5)。此外,腺瘤患者的微生物组与对照组没有显着差异(图 1f 和补充表 5),表明与早期阶段相比,肠道微生物组在腺瘤-癌转变中的作用更为关键。主要位置(右与左)显示微生物组差异 ( R^(2)=0.017,P=0.002R^{2}=0.017, P=0.002 )(图 1f 和补充表 5),菌株水平对分离没有贡献(图 5)。if 和 methods)。总而言之,综合数据支持富含口腔微生物浸润到肠道微生物组的潜力,作为 CRC 分期和位置的区分因素(图 1e、f)。
Improved CRC screening potential of gut metagenomics 提高肠道宏基因组学的 CRC 筛查潜力
ML applied to stool metagenomics can be a potential option for noninvasive CRC screening ^(18,19,28){ }^{18,19,28}. Here, we tested whether leveraging increased sample sizes and methods could further improve predictions of CRC cases. To do so, we exploited ML algorithms models ^(18,28,29){ }^{18,28,29} in three different ways: (1) 10 -fold cross-validation (CV) applied 20 times on each dataset separately (per-dataset CV); (2) a training-testing approach applied to pairs of distinct datasets (between-dataset CV); and (3) a leave-one-dataset-out (LODO) setting, in which the classifier was trained on all but one dataset and tested on the left-out dataset (iterated over each left-out dataset) (Methods and Fig. 2a). 应用于粪便宏基因组学的 ML 可能是无创 CRC 筛查 ^(18,19,28){ }^{18,19,28} 的潜在选择。在这里,我们测试了利用增加的样本量和方法是否可以进一步改善对 CRC 病例的预测。为此,我们以三种不同的方式利用了 ML 算法模型 ^(18,28,29){ }^{18,28,29} :(1) 在每个数据集上分别应用 20 次 10 倍交叉验证 (CV)(每个数据集 CV);(2) 应用于不同数据集对的训练测试方法(数据集间 CV);(3) 留一数据集 (LODO) 设置,其中分类器在除一个数据集之外的所有数据集上进行训练,并在留出数据集上进行测试(在每个留出数据集上迭代)(方法和图 2a)。
Predictions of CRC status using a LODO approach achieved the highest and most stable area under the curve (AUC) values (average AUC =0.85=0.85, ranging from 0.71 to 0.97 ) (Fig. 2a) and were an improvement compared with previous studies (average LODO AUC =0.81=0.81 ) ^(18){ }^{18}. Predictions based on CV were, as expected, generally high but variable across datasets (average AUC +-\pm s.d. =0.87+-0.09=0.87 \pm 0.09, ranging from 0.68 to 0.96), with similar results for between-dataset CV (average AUC > 0.72+-0.11>0.72 \pm 0.11 ) (Fig. 2a). 使用 LODO 方法预测 CRC 状态达到最高和最稳定的曲线下面积 (AUC) 值(平均 AUC =0.85=0.85 ,范围从 0.71 到 0.97)(图 2a),并且与以前的研究(平均 LODO AUC =0.81=0.81 ) ^(18){ }^{18} 相比有所改善。正如预期的那样,基于 CV 的预测通常很高,但跨数据集可变(平均 AUC +-\pm s.d. =0.87+-0.09=0.87 \pm 0.09 ,范围从 0.68 到 0.96),数据集间 CV 的结果相似(平均 AUC > 0.72+-0.11>0.72 \pm 0.11 )(图 2a)。
We then tested the use of only oral and nonoral SGBs for CRC case versus control classification and obtained similar AUC values to the model considering all SGBs (average LODO AUC =0.83=0.83 compared with 0.85 when considering only oral SGBs and 0.79 when considering nonoral SGBs) (Fig. 2a), confirming that a large part-but not all-of the predictive power of the microbiome for CRC lies in the presence of oral-typical taxa in the stool. By contrast, ML models 然后,我们测试了仅使用口服和非口服 SGB 进行 CRC 病例与对照分类,并获得了与考虑所有 SGB 的模型相似的 AUC 值(平均 LODO AUC =0.83=0.83 与仅考虑口服 SGB 时为 0.85 相比,考虑非口腔 SGB 时为 0.79)(图 2a),证实微生物组对 CRC 的预测能力很大一部分但不是全部在于粪便中存在口腔典型分类群。相比之下,ML 模型
a
CRC gut microbiome samples CRC 肠道微生物组样本
ONCOBIOME
NHSII
IIGM TU
Definition oral SGBs 定义 oral SGB
4 ONCOBIOME NHSII Cohort TU 12 Public 4 ONCOBIOME NHSII 队列 TU 12 公众
Oral and stool samples from same individual 来自同一个体的口腔和粪便样本
CRC gut microbiome samples ONCOBIOME NHSII IIGM TU Definition oral SGBs
4 ONCOBIOME NHSII Cohort TU 12 Public 18 studies 11 countries 3,741 samples 4 Cohorts 509 CRC 435 Adenomas 448 Controls 18 CRC 5 cohorts 495 healthy individuals
57 Adenomas 105 Controls 39 Controls Oral and stool samples from same individual| CRC gut microbiome samples | | ONCOBIOME | NHSII | IIGM TU | Definition oral SGBs |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 4 ONCOBIOME NHSII Cohort TU 12 Public | 18 studies 11 countries 3,741 samples | 4 Cohorts 509 CRC | 435 Adenomas 448 Controls | 18 CRC | 5 cohorts 495 healthy individuals |
| | | 57 Adenomas 105 Controls | | 39 Controls | Oral and stool samples from same individual |
Meta-analysis Meta 分析
Single studies 单项研究
◻diamond***P < 0.05\square \diamond \star P<0.05
Crude model 粗略模型
◻harr P >= 0.05\square \leftrightarrow P \geq 0.05
***\star Adjusted model ***\star 调整后的模型
Meta-analysis
Single studies ◻diamond***P < 0.05
Crude model ◻harr P >= 0.05
*** Adjusted model | Meta-analysis | |
| :--- | :--- |
| Single studies | $\square \diamond \star P<0.05$ |
| Crude model | $\square \leftrightarrow P \geq 0.05$ |
| $\star$ Adjusted model | |
Fig. 1| Overall and oral taxa-specific gut microbial diversity were significantly different according to CRC status, stage and primary tumor location. a, 图 1|总体和口腔分类群特异性肠道微生物多样性根据 CRC 状态、分期和原发肿瘤位置存在显著差异。一个
Overview of the cohorts ( n=18n=18 ) and sample sizes ( n=3,741n=3,741 ) according to casecontrol, cancer stage and primary tumor location, along with the cohorts used to define oral-typical species. b\mathbf{b}, Number of samples available from each CRC stage and the two primary tumor locations. Symbols indicate the cohort. c, Sample microbiome richness at each stage and primary tumor location. Box plots represent the within-category microbial richness distribution summarized by the first and third quartiles as hinges of the box, the median and whiskers extending to the largest or smallest value not exceeding 1.5 xx1.5 \times the interquartile range from the two ends of the box, with data beyond these values plotted individually as outliers. d, Meta-analyzed SMDs of the associations between alpha-diversity (Shannon diversity (upper) and SGB richness (lower)) and all paired comparisons. The 95% CIs for each meta-analysis model are indicated by a horizontal line. PP values were computed via two-tailed tt-test. Significant associations ( P < 0.05P<0.05 ) are indicated by a light blue diamond. SMD values corrected for age, sex and BMI 根据病例对照、癌症分期和原发肿瘤位置概述队列 ( n=18n=18 ) 和样本量 ( n=3,741n=3,741 ),以及用于定义口腔典型物种的队列。 b\mathbf{b} ,每个 CRC 阶段和两个主要肿瘤位置的可用样本数量。符号表示同类群组。c,每个阶段和原发肿瘤位置的样本微生物组丰富度。箱形图表示由第一和第三个四分位数汇总为箱形铰链的类别内微生物丰富度分布,中位数和晶须从箱形的两端延伸到不超过 1.5 xx1.5 \times 四分位间距的最大值或最小值,超出这些值的数据单独绘制为异常值。d,对 α 多样性 (Shannon 多样性 (上)和 SGB 丰富度 (下)) 和所有配对比较之间关联的 SMD 进行 Meta 分析。每个荟萃分析模型的 95% CI 由水平线表示。 PP 值是通过双尾 tt -test 计算的。重要的关联 ( P < 0.05P<0.05 ) 由浅蓝色菱形表示。根据年龄、性别和 BMI 校正的 SMD 值
(Methods) are indicated by a star (blue when P < 0.05P<0.05 ). No correction for multiple hypothesis testing was performed. e, Meta-analysis of the association between the cumulative relative abundance of oral species (oral-to-gut score) and all paired comparisons (left) and between the number of oral species (oral-to-gut richness) and all paired comparisons (right). Symbols and axes are similar to d\mathbf{d}. PP values were computed via two-tailed tt-test. No correction for multiple hypothesis testing was performed. SMD values corrected for age, sex and BMI are indicated by a star.f, PERMANOVA (stratified by dataset) derived R^(2)R^{2} according to CRC stage and primary tumor location, computed via adonis2 (Methods) on Bray-Curtis distances. Comparisons with P < 0.01P<0.01 are highlighted in dark blue. Circles indicate the R^(2)R^{2} explained by strain-level microbial features, and comparisons with P < 0.01P<0.01 are marked with an asterisk. The text f0.5 denotes the feature set defined in the Methods section ‘Strain-level feature identification’. A, adenoma; C, control; L, left-sided; R, right-sided; 0, stage 0; I, stage I; II, stage II; III, stage III; IV, stage IV; U, stage not available. (方法) 由星号表示(蓝色表示 P < 0.05P<0.05 )。未对多重假设检验进行校正。e,对口腔物种的累积相对丰度(口腔到肠道评分)与所有配对比较(左)以及口腔物种数量(口腔到肠道丰富度)与所有配对比较(右)之间关联的荟萃分析。符号和轴类似于 d\mathbf{d} 。 PP 值是通过双尾 tt -test 计算的。未对多重假设检验进行校正。根据年龄、性别和 BMI 校正的 SMD 值由 star.f、PERMANOVA(按数据集分层)表示,根据 CRC 分期和原发性肿瘤位置得出 R^(2)R^{2} ,通过 adonis2(方法)在 Bray-Curtis 距离上计算。比较 P < 0.01P<0.01 项以深蓝色突出显示。圆圈表示由菌株水平微生物特征 R^(2)R^{2} 解释,与 P < 0.01P<0.01 的比较用星号标记。文本 f0.5 表示在方法部分“应变级特征识别”中定义的特征集。A, 腺瘤;C、控制;L,左侧;R,右侧;0,阶段 0;I,第一阶段;II,II 期;III,III 期;IV,IV 期;U,舞台不可用。
using different sets of microbiome functional features were less predictive (average LODO AUC of 0.68 to 0.72 ). These results reinforce the potential of predictive tools applied to stool metagenomics to be useful for CRC screening when trained on large and diverse datasets and highlight the predictive importance of oral species present in the gut during CRC. 使用不同的微生物组功能特征组的预测性较低(平均 LODO AUC 为 0.68 至 0.72)。这些结果加强了应用于粪便宏基因组学的预测工具的潜力,当在大型和多样化的数据集上进行训练时,这些工具可用于 CRC 筛查,并强调了 CRC 期间肠道中存在的口腔物种的预测重要性。
Oral and newly associated SGBs enriched in the CRC microbiome 富含 CRC 微生物组的口服和新关联的 SGB
We next aimed to pinpoint specific microbiome biomarkers associated with CRC using the increased power of our multicohort framework (Methods). We identified 125 SGBs with increases relative abundance in CRC ( q < 0.1,106q<0.1,106 kSGBs and 19 uSGBs) and 83 SGBs more abundant 接下来,我们旨在利用我们的多队列框架 (Methods) 的增强功能来确定与 CRC 相关的特定微生物组生物标志物。我们确定了 125 个 SGBs,CRC 相对丰度增加 ( q < 0.1,106q<0.1,106 kSGBs 和 19 个 uSGBs),83 个 SGBs 更丰富