13. Pseudotemporal ordering#
13. 伪时序排序#
13.1. Motivation# 13.1. 动机#
Single-cell sequencing assays provide high resolution measurements of biological tissues [Islam et al., 2011], [Hwang et al., 2018].
Consequently, such technologies can help decipher and understand cellular heterogeneity [Briggs et al., 2018], [Sikkema et al., 2022] and the dynamics of a biological process [Jardine et al., 2021], [He et al., 2022].
单细胞测序测定提供生物组织的高分辨率测量 [Islam et al., 2011], [Hwang et al., 2018]。因此,此类技术可以帮助破译和理解细胞异质性 [Briggs 等人,2018 年]、[Sikkema 等人,2022 年] 和生物过程的动力学 [Jardine 等人,2021 年]、[He 等人,2022 年]。
Corresponding studies include quantifying cellular fates as well as identifying genes driving the process.
However, as cells are destroyed when sequenced in classical single-cell RNA sequencing (scRNA-seq) protocols, it is impossible to track their development and, for example, gene expression profile over time.
Although recent technological advances allow recording the transcriptome sequentially [Chen et al., 2022], they are experimentally challenging and currently fail to scale to larger dataset.
Consequently, the underlying dynamic process needs to be estimated from the measured snapshot data, instead.
相应的研究包括量化细胞命运以及识别驱动该过程的基因。然而,由于细胞在经典的单细胞 RNA 测序 (scRNA-seq) 方案中进行测序时会被破坏,因此无法跟踪它们的发育,例如,随着时间的推移,基因表达谱。尽管最近的技术进步允许按顺序记录转录组 [Chen et al., 2022],但它们在实验上具有挑战性,目前无法扩展到更大的数据集。因此,需要从测量的快照数据中估计底层动态过程。
Even though samples are, traditionally, taken from a single experimental time point, a multitude of cell types can be observed. This diversity stems from the asynchronous nature of biological processes. As such, a range of the developmental process can be observed. Reconstructing the developmental landscape is the goal of the field coined trajectory inference (TI). This task is achieved by ordering the observed cellular states according to the developmental process. States are aligned along the developmental direction by mapping discrete annotations to a continuous domain - the so-called pseudotime.
尽管传统上样品是从单个实验时间点采集的,但可以观察到多种细胞类型。这种多样性源于生物过程的异步性质。因此,可以观察到一系列发展过程。重建发展格局是现场创造轨迹推断 (TI) 的目标。这项任务是通过根据发育过程对观察到的细胞状态进行排序来实现的。通过将离散注释映射到连续域(即所谓的伪时间),状态沿着发展方向对齐。
Pseudotimes rank cells relative to each other according to their respective stage in the developmental process. Less mature cells are assigned small, mature cells large values. Studying a bone marrow sample for example, hematopoietic stem cells are assigned a low, and erythroid cells a high pseudotime. The assignment is, in case of single-cell RNA sequencing data, based on the transcriptomic profile of a cell. Additionally, the construction usually requires the specification of an initial, or, equivalently, root cell where the overall process starts.
假时间根据细胞在发育过程中各自的阶段对细胞进行相对排名。不太成熟的细胞被分配为小的,成熟的细胞被分配大的值。例如,研究骨髓样本,造血干细胞被分配为低假时间,红细胞被分配为高假时间。对于单细胞 RNA 测序数据,分配基于细胞的转录组学谱。此外,构造通常需要指定整个过程开始的初始单元,或者等效的根单元。
13.2. Pseudotime construction#
13.2. 伪时间构造#
Pseudotime construction generally follows a common workflow: As a first step, the ultra high-dimensional single-cell data is projected onto a lower dimensional representation. This procedure is justified by the observation that dynamical processes progress on a low-dimensional manifold [Wagner et al., 2016]. In practice, pseudotime methods may rely on principal components (for example Palantir [Setty et al., 2019]) or diffusion components (for example diffusion pseudotime (DPT) [Haghverdi et al., 2016]). Following, pseudotimes are constructed based on one of the following principles.
伪时间构造通常遵循一个常见的工作流程:作为第一步,将超高维单单元数据投影到较低维度的表示上。这一过程的合理性是由于观察到动态过程在低维流形上进行的[Wagner 等人,2016]。在实践中,伪时间方法可能依赖于主成分(例如 Palantir [Setty et al., 2019])或扩散成分(例如扩散伪时间 (DPT) [Haghverdi et al., 2016])。下面,伪时间是根据以下原则之一构造的。
Observations are first clustered and, following, connections between these clusters identified. The clusters can be ordered and, thereby, a pseudotime constructed. Henceforth, we will refer to this approach as the cluster approach. Classical cluster algorithms include
-means [Lloyd, 1982], [MacQueen, 1967], Leiden [Traag et al., 2019], or hierarchical clustering [Müllner, 2011]. Clusters may be connected based on similarity, or by constructing a minimum spanning tree (MST) [Pettie and Ramachandran, 2002].
首先对观测值进行聚类,然后确定这些聚类之间的联系。可以对集群进行排序,从而构造伪时间。从今以后,我们将这种方法称为集群方法。经典聚类算法包括 -means [ Lloyd, 1982]、[ MacQueen, 1967]、Leiden [Traag et al., 2019] 或分层聚类 [ Müllner, 2011]。聚类可以基于相似性进行连接,也可以通过构造最小生成树(MST)来连接[Pettie 和 Ramachandran,2002]。The graph approach first finds connections between the lower dimensional representation of the observations. This procedure defines a graph based on which clusters, and thus an ordering, are defined. PAGA [Wolf et al., 2019], for example, partitions the graph into Leiden clusters and estimates connections between them. Intuitively, this approach preserves the global topology of the data while analyzing it at a lower resolution. Consequently, the computational efficiency is increased.
图方法首先找到观测值的低维表示之间的联系。此过程定义了一个图形,该图形基于该图形定义了集群,从而定义了排序。例如,PAGA [Wolf et al., 2019] 将图划分为莱顿集群并估计它们之间的联系。直观地讲,这种方法保留了数据的全局拓扑,同时以较低的分辨率分析数据。因此,计算效率得到了提高。Manifold-learning based approaches proceed similar to the cluster approach. However, connections between clusters are defined by using principal curves or graphs to estimate the underlying trajectories. Principal curves find a one-dimensional curve that connects cellular observations in the higher dimensional space. A notable representation of this approach is Slingshot [Street et al., 2018].
基于流形学习的方法与聚类方法类似。然而,聚类之间的连接是通过使用主曲线或图形来估计潜在轨迹来定义的。主曲线找到一条连接高维空间中细胞观测的一维曲线。这种方法的一个显着代表是 Slingshot [Street et al., 2018]。Probabilistic frameworks assign transition probabilities to ordered cell-cell pairs. Each transition probability quantifies how likely the reference cell is the ancestor of the other cell. These probabilities define random processes that are used to define a pseudotime. DPT, for example, is defined as the difference between consecutive states of a random walk. Contrastingly, Palantir [Setty et al., 2019] models trajectories themselves as Markov chains. While both approaches rely on a probabilistic framework, they require a root cell to be specified. The pseudotime itself is computed with respect to this cell.
概率框架将转换概率分配给有序的细胞-细胞对。每个转换概率都量化了参考单元格是另一个单元格祖先的可能性。这些概率定义了用于定义伪时间的随机过程。例如,DPT 被定义为随机游走的连续状态之间的差异。相比之下,Palantir [Setty et al., 2019] 将轨迹本身建模为马尔可夫链。虽然这两种方法都依赖于概率框架,但它们需要指定根细胞。伪时间本身是相对于该单元计算的。
TI is a well-studied field providing a rich set of methods. To apply the appropriate method to analyze a single-cell dataset, the biological process itself needs to be understood first. This understanding especially includes the nature of the process, i.e., if it, for example, is linear, cyclic, or branching. Similarly, orthogonal processes within one and the same dataset limits the TI methods applicable. To help identifying appropriate tools, dynguidelines [Deconinck et al., 2021] provides an exhaustive overview of algorithms and their characteristics.
TI 是一个经过充分研究的领域,提供了一套丰富的方法。要应用适当的方法来分析单细胞数据集,首先需要了解生物过程本身。这种理解特别包括过程的性质,即,例如,它是线性的、循环的还是分支的。同样,同一数据集中的正交过程限制了适用的 TI 方法。为了帮助识别适当的工具,dynguidelines [Deconinck et al., 2021] 提供了算法及其特征的详尽概述。
13.3. Down-stream tasks and outlook#
13.3. 下游任务和展望#
Even though TI and pseudotime can already provide valuable insight, they usually act as a stepping stone for more fine grained analysis. Identifying terminal states, for example, is a classical biological question that can be studied. Similarly, lineage bifurcation and genes driving fate decisions can be identified based on TI and pseudotime. Which question can answer and how the answer is found is usually method specific. Palantir, for example, identifies terminal states as absorbing states of its constructed Markov chain.
尽管 TI 和伪时间已经可以提供有价值的见解,但它们通常可以作为更细粒度分析的垫脚石。例如,识别终末状态是一个可以研究的经典生物学问题。同样,谱系分叉和驱动命运决策的基因可以根据 TI 和伪时间进行识别。哪个问题可以回答以及如何找到答案通常是特定于方法的。例如,Palantir 将终端状态识别为其构建的马尔可夫链的吸收态。
The success of trajectory inference is well documented and, consequently, many methods have been proposed. However, with the advances of sequencing technologies, new sources of information become available. ATAC-seq [Buenrostro et al., 2015], CITE-seq [Stoeckius et al., 2017], and DOGMA-seq [Mimitou et al., 2021], for example, measure additional modalities beyond the transcriptome. Lineage tracing [] and metabolic labeling [Erhard et al., 2019], [Battich et al., 2020], [Qiu et al., 2020], [Erhard et al., 2022] even provide the (likely) future state of a given cell. Consequently, future TI tools will be able to include more information to estimate trajectories and pseudotime more accurately and robustly, and allow answering novel questions. For example, RNA velocity [Manno et al., 2018], [Bergen et al., 2020], [Bergen et al., 2021] is one technique that uses unspliced and spliced mRNA to infer directed, dynamic information beyond classical, static snapshot data.
轨迹推断的成功有据可查,因此提出了许多方法。然而,随着测序技术的进步,新的信息来源出现了。例如,ATAC-seq [Buenrostro et al., 2015]、CITE-seq [Stoeckius et al., 2017] 和 DOGMA-seq [Mimitou et al., 2021] 测量转录组以外的其他模式。谱系追踪 [] 和代谢标记 [Erhard 等人,2019]、[Battich 等人,2020]、[Qiu 等人,2020]、[Erhard 等人,2022] 甚至提供了给定细胞的(可能的)未来状态。因此,未来的 TI 工具将能够包含更多信息,以更准确、更稳健地估计轨迹和伪时间,并允许回答新问题。例如,RNA 速度 [Manno et al., 2018]、[Bergen et al., 2020]、[Bergen et al., 2021] 是一种使用未剪接和剪接的 mRNA 来推断超出经典静态快照数据的定向动态信息的技术。
13.4. Inferring pseudotime for adult human bone marrow#
13.4. 推断成人人骨髓的伪时间#
To show how a pseudotime can be constructed and different pseudotimes compared, we study a dataset of adult human bone marrow [Setty et al., 2019].
为了展示如何构建伪时间并比较不同的伪时间,我们研究了成人骨髓数据集 [Setty et al., 2019]。
13.4.1. Environment setup#
13.4.1. 环境设置#
from pathlib import Path
import scanpy as sc
13.4.2. General settings#
13.4.2. 常规设置#
DATA_DIR = Path("../../data/")
DATA_DIR.mkdir(parents=True, exist_ok=True)
FILE_NAME = DATA_DIR / "bone_marrow.h5ad"
13.4.3. Data loading# 13.4.3. 数据加载#
adata = sc.read(
filename=FILE_NAME,
backup_url="https://figshare.com/ndownloader/files/35826944",
)
adata
AnnData object with n_obs × n_vars = 5780 × 27876
obs: 'clusters', 'palantir_pseudotime', 'palantir_diff_potential'
var: 'palantir'
uns: 'clusters_colors', 'palantir_branch_probs_cell_types'
obsm: 'MAGIC_imputed_data', 'X_tsne', 'palantir_branch_probs'
layers: 'spliced', 'unspliced'
To construct pseudotimes, the data must be preprocessed. Here, we filter out genes expressed in only a few number of cells (here, at least 20). Notably, the construction of the pseudotime later on is robust to the exact choice of the threshold. Following to this first gene filtering, the cell size is normalized, and counts log1p transformed to reduce the effect of outliers. As usual, we also identify and annotate highly variable genes. Finally, a nearest neighbor graph is constructed based on which we will define the pseudotime. The number of principle components is chosen based on the explained variance.
要构造伪时间,必须对数据进行预处理。在这里,我们过滤掉仅在少数细胞中表达的基因(这里至少有 20 个)。值得注意的是,后期伪时间的构造对于阈值的精确选择是鲁棒的。在第一次基因过滤之后,细胞大小被归一化,并计数转化的 log1p 以减少异常值的影响。像往常一样,我们还识别和注释高度可变的基因。最后,构建一个最近邻图,我们将在此基础上定义伪时间。根据解释的方差选择主成分的数量。
sc.pp.filter_genes(adata, min_counts=20)
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata)
sc.tl.pca(adata)
sc.pp.neighbors(adata, n_pcs=10)
The two-dimensional t-SNE representation colored by cell type annotations shows that the cell types cluster together well. Additionally, the developmental hierarchy is visible.
由细胞类型注释着色的二维 t-SNE 表示表明细胞类型很好地聚集在一起。此外,发展层次结构是可见的。
sc.pl.scatter(adata, basis="tsne", color="clusters")
13.4.4. Pseudotime construction#
13.4.4. 伪时间构造#
To calculate diffusion pseudotime (DPT), first, the corresponding diffusion maps need to be calculated.
要计算扩散伪时间(DPT),首先需要计算相应的扩散图。
sc.tl.diffmap(adata)
The differentiation hierarchy in bone marrow is well understood. However, it is only know that the developmental process starts in the form of hematopoietic stem cells but not at which cell exactly in the corresponding cluster in our dataset. To identify a putative initial cell, we study the individual diffusion components. We identify the stem cell with the most extreme diffusion component in one dimension (in our case dimension 3).
骨髓中的分化层次结构是众所周知的。然而,只知道发育过程以造血干细胞的形式开始,而不知道我们数据集中相应集群中的哪个细胞。为了识别假定的初始细胞,我们研究了各个扩散成分。我们确定了在一维(在我们的例子中为维度 3)具有最极端扩散成分的干细胞。
# Setting root cell as described above
root_ixs = adata.obsm["X_diffmap"][:, 3].argmin()
sc.pl.scatter(
adata,
basis="diffmap",
color=["clusters"],
components=[2, 3],
)
adata.uns["iroot"] = root_ixs
sc.tl.dpt(adata)
Different pseudotime methods give different results. Sometimes, one pseudotime captures the underlying developmental processes more accurately. Here, we compare the just computed DPT with the pre-computed Palantir pseudotime (see here for the corresponding tutorial). One option to compare different pseudotimes is by coloring the low dimensional embedding of the data (here, t-SNE). Here, DPT is extremely high in the cluster of CLPs compared to all other cell types. Contrastingly, the Palantir pseudotime increases continuously with developmental maturity.
不同的伪时间方法给出不同的结果。有时,一个伪时间可以更准确地捕捉潜在的发育过程。在这里,我们将刚刚计算的 DPT 与预先计算的 Palantir 伪时间进行比较(有关相应的教程,请参阅此处)。比较不同伪时间的一种选择是对数据的低维嵌入(此处为 t-SNE)进行着色。在这里,与所有其他细胞类型相比,CLP 簇中的 DPT 非常高。相比之下,Palantir 伪时间随着发育成熟度而不断增加。
sc.pl.scatter(
adata,
basis="tsne",
color=["dpt_pseudotime", "palantir_pseudotime"],
color_map="gnuplot2",
)
Instead of coloring the lower dimensional representation of the data, we can study the distribution of pseudotime values assigned to each cell type cluster. This representation again shows that the CLP cluster forms an outlier in case of DPT. Additionally, clusters such as HSC_1 and HSC_2 include several cells with increased pseudotime. These inflated values contrast our prior biological knowledge that these clusters form the beginning of the developmental process.
我们可以研究分配给每个细胞类型簇的伪时间值的分布,而不是对数据的低维表示进行着色。这种表示再次表明,CLP 簇在 DPT 的情况下形成异常值。此外,HSC_1 和 HSC_2 等簇包括几个伪时间增加的单元。这些夸大的值与我们之前的生物学知识形成鲜明对比,即这些簇构成了发育过程的开始。
sc.pl.violin(
adata,
keys=["dpt_pseudotime", "palantir_pseudotime"],
groupby="clusters",
rotation=45,
order=[
"HSC_1",
"HSC_2",
"Precursors",
"Ery_1",
"Ery_2",
"Mono_1",
"Mono_2",
"CLP",
"DCs",
"Mega",
],
)
Considering these observations and prior knowledge about the development in bone marrow, we would conclude to continue working with the Palantir pseudotime.
考虑到这些观察结果和有关骨髓发育的先验知识,我们得出结论继续使用 Palantir 伪时间。
13.5. Key takeaways# 13.5. 关键要点#
Trajectory inference methods require the start of the biological process to be known (approximately).
轨迹推断方法需要(大致)知道生物过程的开始。The nature of the biological process defines which TI algorithms can be used.
dynguidelineshelps selecting the appropriate TI method.
生物过程的性质决定了可以使用哪些 TI 算法。dynguidelines有助于选择合适的 TI 方法。
13.6. References# 13.6. 参考资料#
Nico Battich, Joep Beumer, Buys de Barbanson, Lenno Krenning, Chloé S. Baron, Marvin E. Tanenbaum, Hans Clevers, and Alexander van Oudenaarden. Sequencing metabolically labeled transcripts in single cells reveals mRNA turnover strategies. Science, 367(6482):1151–1156, March 2020. URL: https://doi.org/10.1126/science.aax3072, doi:10.1126/science.aax3072.
尼科·巴蒂奇、乔普·博默、布伊斯·德·巴班森、伦诺·克伦宁、克洛伊·巴伦、马文·塔南鲍姆、汉斯·克莱弗斯和亚历山大·范·奥德纳登。对单细胞中代谢标记的转录本进行测序揭示了 mRNA 周转策略。科学,367(6482):1151–1156,2020 年 3 月。网址:https://doi.org/10.1126/science.aax3072,doi:10.1126/science.aax3072。
Volker Bergen, Marius Lange, Stefan Peidli, F. Alexander Wolf, and Fabian J. Theis. Generalizing RNA velocity to transient cell states through dynamical modeling. Nature Biotechnology, 38(12):1408–1414, August 2020. URL: https://doi.org/10.1038/s41587-020-0591-3, doi:10.1038/s41587-020-0591-3.
沃尔克·卑尔根、马吕斯·兰格、斯特凡·佩德利、F·亚历山大·沃尔夫和法比安·泰斯。通过动态建模将 RNA 速度推广到瞬时细胞状态。自然生物技术,38(12):1408–1414,2020 年 8 月。网址:https://doi.org/10.1038/s41587-020-0591-3,doi:10.1038/s41587-020-0591-3。
Volker Bergen, Ruslan A Soldatov, Peter V Kharchenko, and Fabian J Theis. RNA velocity—current challenges and future perspectives. Molecular Systems Biology, August 2021. URL: https://doi.org/10.15252/msb.202110282, doi:10.15252/msb.202110282.
沃尔克·卑尔根、鲁斯兰·索尔达托夫、彼得·哈尔琴科和法比安·泰斯。RNA 速度——当前的挑战和未来的前景。分子系统生物学,2021 年 8 月。网址:https://doi.org/10.15252/msb.202110282,doi:10.15252/msb.202110282。
James A. Briggs, Caleb Weinreb, Daniel E. Wagner, Sean Megason, Leonid Peshkin, Marc W. Kirschner, and Allon M. Klein. The dynamics of gene expression in vertebrate embryogenesis at single-cell resolution. Science, June 2018. URL: https://doi.org/10.1126/science.aar5780, doi:10.1126/science.aar5780.
詹姆斯·布里格斯、凯莱布·温雷布、丹尼尔·瓦格纳、肖恩·梅加森、列昂尼德·佩什金、马克·基尔施纳和阿隆·克莱因。单细胞分辨率下脊椎动物胚胎发生中基因表达的动力学。科学,2018 年 6 月。网址:https://doi.org/10.1126/science.aar5780,doi:10.1126/science.aar5780。
Jason D. Buenrostro, Beijing Wu, Ulrike M. Litzenburger, Dave Ruff, Michael L. Gonzales, Michael P. Snyder, Howard Y. Chang, and William J. Greenleaf. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature, 523(7561):486–490, June 2015. URL: https://doi.org/10.1038/nature14590, doi:10.1038/nature14590.
Jason D. Buenrostro、吴北京、Ulrike M. Litzenburger、Dave Ruff、Michael L. Gonzales、Michael P. Snyder、Howard Y. Chang 和 William J. Greenleaf。单细胞染色质可及性揭示了调控变异的原理。自然,523(7561):486–490,2015 年 6 月。网址:https://doi.org/10.1038/nature14590,doi:10.1038/nature14590。
Wanze Chen, Orane Guillaume-Gentil, Pernille Yde Rainer, Christoph G. Gäbelein, Wouter Saelens, Vincent Gardeux, Amanda Klaeger, Riccardo Dainese, Magda Zachara, Tomaso Zambelli, Julia A. Vorholt, and Bart Deplancke. Live-seq enables temporal transcriptomic recording of single cells. Nature, 608(7924):733–740, August 2022. URL: https://doi.org/10.1038/s41586-022-05046-9, doi:10.1038/s41586-022-05046-9.
Wanze Chen、Orane Guillaume-Gentil、Pernille Yde Rainer、Christoph G. Gäbelein、Wouter Saelens、Vincent Gardeux、Amanda Klaeger、Riccardo Dainese、Magda Zachara、Tomaso Zambelli、Julia A. Vorholt 和 Bart Deplancke。Live-seq 可以对单细胞进行时间转录组学记录。自然,608(7924):733–740,2022 年 8 月。网址:https://doi.org/10.1038/s41586-022-05046-9,doi:10.1038/s41586-022-05046-9。
Louise Deconinck, Robrecht Cannoodt, Wouter Saelens, Bart Deplancke, and Yvan Saeys. Recent advances in trajectory inference from single-cell omics data. Current Opinion in Systems Biology, 27:100344, September 2021. URL: https://doi.org/10.1016/j.coisb.2021.05.005, doi:10.1016/j.coisb.2021.05.005.
路易丝·德科宁克、罗布雷希特·坎努特、沃特·萨伦斯、巴特·德普兰克和伊万·赛伊斯。单细胞组学数据轨迹推断的最新进展。系统生物学当前观点,2021 年 9 月 27:100344。网址:https://doi.org/10.1016/j.coisb.2021.05.005,doi:10.1016/j.coisb.2021.05.005。
Florian Erhard, Marisa A. P. Baptista, Tobias Krammer, Thomas Hennig, Marius Lange, Panagiota Arampatzi, Christopher S. Jürges, Fabian J. Theis, Antoine-Emmanuel Saliba, and Lars Dölken. scSLAM-seq reveals core features of transcription dynamics in single cells. Nature, 571(7765):419–423, July 2019. URL: https://doi.org/10.1038/s41586-019-1369-y, doi:10.1038/s41586-019-1369-y.
弗洛里安·艾哈德、玛丽莎·A·P·巴普蒂斯塔、托比亚斯·克拉默、托马斯·亨尼格、马吕斯·兰格、帕纳吉奥塔·阿兰帕齐、克里斯托弗·尤尔格斯、法比安·泰斯、安托万-伊曼纽尔·萨利巴和拉尔斯·德尔肯。scSLAM-seq 揭示了单细胞转录动力学的核心特征。自然,571(7765):419–423,2019 年 7 月。网址:https://doi.org/10.1038/s41586-019-1369-y,doi:10.1038/s41586-019-1369-y。
Florian Erhard, Antoine-Emmanuel Saliba, Alexandra Lusser, Christophe Toussaint, Thomas Hennig, Bhupesh K. Prusty, Daniel Kirschenbaum, Kathleen Abadie, Eric A. Miska, Caroline C. Friedel, Ido Amit, Ronald Micura, and Lars Dölken. Time-resolved single-cell RNA-seq using metabolic RNA labelling. Nature Reviews Methods Primers, September 2022. URL: https://doi.org/10.1038/s43586-022-00157-z, doi:10.1038/s43586-022-00157-z.
弗洛里安·艾哈德、安托万-伊曼纽尔·萨利巴、亚历山德拉·卢瑟、克里斯托夫·杜桑、托马斯·亨尼格、布佩什·普鲁斯蒂、丹尼尔·基申鲍姆、凯瑟琳·阿巴迪、埃里克·米斯卡、卡罗琳·弗里德尔、伊多·阿米特、罗纳德·米库拉和拉尔斯·多尔肯。使用代谢 RNA 标记的时间分辨单细胞 RNA-seq。自然评论方法入门,2022 年 9 月。网址:https://doi.org/10.1038/s43586-022-00157-z,doi:10.1038/s43586-022-00157-z。
Laleh Haghverdi, Maren Büttner, F Alexander Wolf, Florian Buettner, and Fabian J Theis. Diffusion pseudotime robustly reconstructs lineage branching. Nature Methods, 13(10):845–848, August 2016. URL: https://doi.org/10.1038/nmeth.3971, doi:10.1038/nmeth.3971.
拉莱·哈格韦尔迪、马伦·比特纳、F·亚历山大·沃尔夫、弗洛里安·比特纳和法比安·泰斯。扩散伪时间稳健地重建谱系分支。自然方法,13(10):845–848,2016 年 8 月。网址:https://doi.org/10.1038/nmeth.3971,doi:10.1038/nmeth.3971。
Peng He, Kyungtae Lim, Dawei Sun, Jan Patrick Pett, Quitz Jeng, Krzysztof Polanski, Ziqi Dong, Liam Bolt, Laura Richardson, Lira Mamanova, Monika Dabrowska, Anna Wilbrey-Clark, Elo Madissoon, Zewen Kelvin Tuong, Emma Dann, Chenqu Suo, Isaac Goh, Masahiro Yoshida, Marko Z Nikolić, Sam M Janes, Xiaoling He, Roger A Barker, Sarah A Teichmann, John C. Marioni, Kerstin B Meyer, and Emma L Rawlins. A human fetal lung cell atlas uncovers proximal-distal gradients of differentiation and key regulators of epithelial fates. bioRxiv, 2022. URL: https://www.biorxiv.org/content/early/2022/09/30/2022.01.11.474933, arXiv:https://www.biorxiv.org/content/early/2022/09/30/2022.01.11.474933.full.pdf, doi:10.1101/2022.01.11.474933.
Peng He, Kyungtae Lim, Dawei Sun, Jan Patrick Pett, Quitz Jeng, Krzysztof Polanski, Ziqi Dong, Liam Bolt, Laura Richardson, Lira Mamanova, Monika Dabrowska, Anna Wilbrey-Clark, Elo Madissoon, Zewen Kelvin Tuong, Emma Dann, Chenqu Suo, Isaac Goh, Masahiro Yoshida, Marko Z Nikolić, Sam M Janes, Xiaoling He, Roger A Barker, Sarah A Teichmann, John C. Marioni, Kerstin B Meyer, 和艾玛·罗林斯。人类胎儿肺细胞图谱揭示了分化的近端-远端梯度和上皮命运的关键调节因子。生物 Rxiv,2022 年。网址:https://www.biorxiv.org/content/early/2022/09/30/2022.01.11.474933,arXiv:https://www.biorxiv.org/content/early/2022/09/30/2022.01.11.474933.full.pdf,doi:10.1101/2022.01.11.474933。
Byungjin Hwang, Ji Hyun Lee, and Duhee Bang. Single-cell RNA sequencing technologies and bioinformatics pipelines. Experimental &$\mathsemicolon $ Molecular Medicine, 50(8):1–14, August 2018. URL: https://doi.org/10.1038/s12276-018-0071-8, doi:10.1038/s12276-018-0071-8.
Saiful Islam, Una Kjällquist, Annalena Moliner, Pawel Zajac, Jian-Bing Fan, Peter Lönnerberg, and Sten Linnarsson. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Research, 21(7):1160–1167, May 2011. URL: https://doi.org/10.1101/gr.110882.110, doi:10.1101/gr.110882.110.
Laura Jardine, Simone Webb, Issac Goh, Mariana Quiroga Londoño, Gary Reynolds, Michael Mather, Bayanne Olabi, Emily Stephenson, Rachel A. Botting, Dave Horsfall, Justin Engelbert, Daniel Maunder, Nicole Mende, Caitlin Murnane, Emma Dann, Jim McGrath, Hamish King, Iwo Kucinski, Rachel Queen, Christopher D. Carey, Caroline Shrubsole, Elizabeth Poyner, Meghan Acres, Claire Jones, Thomas Ness, Rowen Coulthard, Natalina Elliott, Sorcha O'Byrne, Myriam L. R. Haltalli, John E. Lawrence, Steven Lisgo, Petra Balogh, Kerstin B. Meyer, Elena Prigmore, Kirsty Ambridge, Mika Sarkin Jain, Mirjana Efremova, Keir Pickard, Thomas Creasey, Jaume Bacardit, Deborah Henderson, Jonathan Coxhead, Andrew Filby, Rafiqul Hussain, David Dixon, David McDonald, Dorin-Mirel Popescu, Monika S. Kowalczyk, Bo Li, Orr Ashenberg, Marcin Tabaka, Danielle Dionne, Timothy L. Tickle, Michal Slyper, Orit Rozenblatt-Rosen, Aviv Regev, Sam Behjati, Elisa Laurenti, Nicola K. Wilson, Anindita Roy, Berthold Göttgens, Irene Roberts, Sarah A. Teichmann, and Muzlifah Haniffa. Blood and immune development in human fetal bone marrow and down syndrome. Nature, 598(7880):327–331, September 2021. URL: https://doi.org/10.1038/s41586-021-03929-x, doi:10.1038/s41586-021-03929-x.
S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129–137, March 1982. URL: https://doi.org/10.1109/tit.1982.1056489, doi:10.1109/tit.1982.1056489.
Daniel Müllner. Modern hierarchical, agglomerative clustering algorithms. 2011. URL: https://arxiv.org/abs/1109.2378, doi:10.48550/ARXIV.1109.2378.
J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In L. M. Le Cam and J. Neyman, editors, Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, 281–297. University of California Press, 1967.
Gioele La Manno, Ruslan Soldatov, Amit Zeisel, Emelie Braun, Hannah Hochgerner, Viktor Petukhov, Katja Lidschreiber, Maria E. Kastriti, Peter Lönnerberg, Alessandro Furlan, Jean Fan, Lars E. Borm, Zehua Liu, David van Bruggen, Jimin Guo, Xiaoling He, Roger Barker, Erik Sundström, Gonçalo Castelo-Branco, Patrick Cramer, Igor Adameyko, Sten Linnarsson, and Peter V. Kharchenko. RNA velocity of single cells. Nature, 560(7719):494–498, August 2018. URL: https://doi.org/10.1038/s41586-018-0414-6, doi:10.1038/s41586-018-0414-6.
Eleni P. Mimitou, Caleb A. Lareau, Kelvin Y. Chen, Andre L. Zorzetto-Fernandes, Yuhan Hao, Yusuke Takeshima, Wendy Luo, Tse-Shun Huang, Bertrand Z. Yeung, Efthymia Papalexi, Pratiksha I. Thakore, Tatsuya Kibayashi, James Badger Wing, Mayu Hata, Rahul Satija, Kristopher L. Nazor, Shimon Sakaguchi, Leif S. Ludwig, Vijay G. Sankaran, Aviv Regev, and Peter Smibert. Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells. Nature Biotechnology, 39(10):1246–1258, June 2021. URL: https://doi.org/10.1038/s41587-021-00927-2, doi:10.1038/s41587-021-00927-2.
Seth Pettie and Vijaya Ramachandran. An optimal minimum spanning tree algorithm. Journal of the ACM, 49(1):16–34, January 2002. URL: https://doi.org/10.1145/505241.505243, doi:10.1145/505241.505243.
Qi Qiu, Peng Hu, Xiaojie Qiu, Kiya W. Govek, Pablo G. Cámara, and Hao Wu. Massively parallel and time-resolved RNA sequencing in single cells with scNT-seq. Nature Methods, 17(10):991–1001, August 2020. URL: https://doi.org/10.1038/s41592-020-0935-4, doi:10.1038/s41592-020-0935-4.
Manu Setty, Vaidotas Kiseliovas, Jacob Levine, Adam Gayoso, Linas Mazutis, and Dana Pe'er. Characterization of cell fate probabilities in single-cell data with palantir. Nature Biotechnology, 37(4):451–460, March 2019. URL: https://doi.org/10.1038/s41587-019-0068-4, doi:10.1038/s41587-019-0068-4.
Lisa Sikkema, Daniel C Strobl, Luke Zappia, Elo Madissoon, Nikolay S Markov, Laure-Emmanuelle Zaragosi, Meshal Ansari, Marie-Jeanne Arguel, Leonie Apperloo, Christophe Becavin, Marijn Berg, Evgeny Chichelnitskiy, Mei-I Chung, Antoine Collin, Aurore C A Gay, Baharak Hooshiar Kashani, Manu Jain, Theodore Kapellos, Tessa M Kole, Christoph H Mayr, Von Michael Papen, Lance Peter, Ciro Ramirez-Suastegui, Janine Schniering, Chase J Taylor, Thomas Walzthoeni, Chuan Xu, Linh T Bui, Carlo de Donno, Leander Dony, Minzhe Guo, Austin J Gutierrez, Lukas Heumos, Ni Huang, Ignacio L Ibarra, Nathan D Jackson, Preetish Kadur Lakshminarasimha Murthy, Mohammad Lotfollahi, Tracy Tabib, Carlos Talavera-Lopez, Kyle J Travaglini, Anna Wilbrey-Clark, Kaylee B Worlock, Masahiro Yoshida, Tushar J Desai, Oliver Eickelberg, Christine Falk, Naftali Kaminski, Mark A Krasnow, Robert Lafyatis, Marko Z Nikoli, Joseph E Powell, Jayaraj Rajagopal, Orit Rozenblatt-Rosen, Max A Seibold, Dean Sheppard, Douglas P Shepherd, Sarah A Teichmann, Alexander M Tsankov, Jeffrey Whitsett, Yan Xu, Nicholas E Banovich, Pascal Barbry, Thu E Duong, Kerstin B Meyer, Jonathan A Kropski, Dana Pe'er, Herbert B Schiller, Purushothama Rao Tata, Joachim L Schultze, Alexander V Misharin, Martijn C Nawijn, Malte D Luecken, and Fabian J Theis. An integrated cell atlas of the human lung in health and disease. bioRxiv, pages 2022.03.10.483747, March 2022. doi:10.1101/2022.03.10.483747.
Marlon Stoeckius, Christoph Hafemeister, William Stephenson, Brian Houck-Loomis, Pratip K Chattopadhyay, Harold Swerdlow, Rahul Satija, and Peter Smibert. Simultaneous epitope and transcriptome measurement in single cells. Nature Methods, 14(9):865–868, July 2017. URL: https://doi.org/10.1038/nmeth.4380, doi:10.1038/nmeth.4380.
Kelly Street, Davide Risso, Russell B. Fletcher, Diya Das, John Ngai, Nir Yosef, Elizabeth Purdom, and Sandrine Dudoit. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics, June 2018. URL: https://doi.org/10.1186/s12864-018-4772-0, doi:10.1186/s12864-018-4772-0.
V. A. Traag, L. Waltman, and N. J. van Eck. From louvain to leiden: guaranteeing well-connected communities. Scientific Reports, March 2019. URL: https://doi.org/10.1038/s41598-019-41695-z, doi:10.1038/s41598-019-41695-z.
Allon Wagner, Aviv Regev, and Nir Yosef. Revealing the vectors of cellular identity with single-cell genomics. Nature Biotechnology, 34(11):1145–1160, November 2016. URL: https://doi.org/10.1038/nbt.3711, doi:10.1038/nbt.3711.
F. Alexander Wolf, Fiona K. Hamey, Mireya Plass, Jordi Solana, Joakim S. Dahlin, Berthold Göttgens, Nikolaus Rajewsky, Lukas Simon, and Fabian J. Theis. Paga: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biology, 20(1):59, Mar 2019. URL: https://doi.org/10.1186/s13059-019-1663-x, doi:10.1186/s13059-019-1663-x.
13.7. Contributors#
We gratefully acknowledge the contributions of:
13.7.2. Reviewers#
Lukas Heumos