In the rapidly evolving landscape of single cell RNA sequencing, researchers still face...
Pushing the Boundaries: Why your scRNA-seq project deserves more cells and samples
Single cell RNA sequencing (scRNA-seq) is rapidly advancing our understanding of complex biological phenomena – from the organization and function of cells within a tissue, cellular functions that go awry in disease, to the diversity of our immune systems – by analyzing the unique transcriptomic signatures of individual cells. Yet, despite its transformative potential, we've only scratched the surface of what scRNA-seq can reveal. The cells and samples analyzed to date represent just a fraction of the vast cellular universe waiting to be explored. In this article, we'll dive into the exciting possibilities that arise when we dramatically scale up our single cell projects, both in the sheer number of cells analyzed and the diversity of samples studied.
More power to identify rare or novel cell types
Larger cell numbers increase the probability of detecting and characterizing rare cell populations. Just look at the Human Lung Cell Atlas (HLCA), which recently published a paper describing a single cell transcriptomic atlas generated by integrating data across 49 different datasets, including 486 individuals and a total of 2.4 million cells (Sikkema et al 2023). Researchers were able to identify six new cell types never before recovered in lung only when they integrated the single cell data across the 49 datasets, demonstrating the power of more cells to identify new cell types.
Enhanced resolution of cellular states
Increasing the number of cells sequenced can reveal finer gradations of cellular states and subtypes. This higher resolution view of the cellular landscape can uncover previously unappreciated diversity within known cell types. For example, researchers interested in Alzheimer’s disease (AD) generated one of the largest single cell transcriptomic atlases of human aged brains - 283 post-mortem brain sections from both healthy and AD individuals - over 1.3 million single cell transcriptomes in total. Using this data-rich atlas, they were able to identify regionally restricted excitatory neuron and astrocyte subtypes, enabling a more refined view of cellular and molecular differences in Alzheimer’s disease brains compared to healthy controls. This level of resolution was only possible due to the distinct regional sampling and number of cells profiled in the study.
More accurate characterization of populations
Individuals can be very diverse – for example, different ancestorial backgrounds, genders, and ages. By capturing more of this diversity in research projects, the more opportunities for identifying underlying biological signal in a population-based study. For instance, consider the selection of potential drug targets. A retrospective analysis of clinical trials suggested single-cell transcriptomic data could potentially triple the chances of a drug reaching phase III (Dann et al 2024). To identify potential drug targets, researchers compared publicly available single cell data from healthy individuals to patients with a particular disease, and identified differentially expressed genes, and therefore potential drug targets, between the two. The more individuals profiled, the more potential drug targets are identified. More targets enable more opportunities for designing potentially life-saving drugs.
Enhanced machine learning
Larger datasets are invaluable for training machine learning models. Researchers developing one of the first single cell foundational models, scGPT, observed that larger pre-training datasets can result in improved performance in downstream tasks (Cui et al 2024). For example, cell type annotation on a multiple sclerosis dataset dramatically improved in accuracy when scaling the size of the healthy immune reference from 30,000 cells to 33 million cells. Cell type annotation is only one of the downstream tasks that can be accomplished with foundational single cell models – other examples include multi-omics data integration, predicting genetic perturbation response, and gene regulatory network inference. Larger single cell datasets can help power more accurate AI tools going forward.
Improved statistical power
Analyzing more cells and samples provides greater statistical power, allowing for more robust and reliable conclusions. This is particularly important when studying subtle gene expression differences or identifying cell state transitions. Researchers studying MYTL1, a gene implicated in neurodevelopment diseases, expected the mutation’s effects to be very subtle (Yen et al 2024). They used Scale Bio Single Cell RNA Sequencing Kit to profile many biological replicates and over 300,000 cells total. Although MYTL1 is expressed in all neurons, they found it predominantly affected the proportion of cortical excitatory neurons during specific developmental stages. This discovery was only possible with the inclusion of multiple developmental timepoints and replicates across homozygous, heterozygous, and knockout mouse strains.
Increasing the scale of single cell RNA sequencing projects, both in terms of cell number and sample diversity, unlocks powerful new insights across biological research. To learn more about how Scale Bio's cutting-edge single cell technology is poised to drive these large-scale studies, click here.
References
Cui H, Wang C, Maan H, et al. scGPT: Towards building a foundation model for single-cell multi-omics using generative AI. Nature Methods 2024. http://dx.doi.org/10.1038/s41592-024-02201-0
Dann E, Teeple E, Elmentaite R, et al. Single-cell RNA sequencing of human tissue supports successful drug targets. medRxiv 2024. https://doi.org/10.1101/2024.04.04.24305313
Mathys H, Boix CA, Akay LA, et al. Single-cell multiregion dissection of Alzhiemer’s disease. Nature 2024. https://doi.org/10.1038/s41586-024-07606-7
Sikkema L, Ramirez-Suastegui C, Strobl DC, et al. An integrated cell atlas of the lung in health and disease. Nature Med 2023. https://doi.org/10.1038/s41591-023-02327-2
Yen A, Chen X, Skinner DD, et al. MYT1L deficieny impairs excitatory neuron trajectory during cortical development. bioRxiv 2024. https://doi.org/10.1101/2024.03.06.583632
Contact us