Abstract
Background: Low-input or single-cell RNA-Seq are widely used today, but two technical questions remain: 1) in technical replicates, what proportion of noises comes from input RNA quantity rather than variation of bioinformatics tools?; 2) In single neurons, whether variation in gene expression is attributable to biological heterogeneity or just random noise? To examine the sources of variability, we have generated RNA-Seq data from low-input (10/100/1000pg) reference RNA samples and 38 single neurons from human brains. Results: For technical replicates, the quantity of input RNA is negatively correlated with expression variation. For genes in the medium- and high-expression groups, input RNA amount explains most of the variation, whereas bioinformatic pipelines explain some variation for the low-expression group. The t-distributed stochastic neighbour embedding (t-SNE) method reveals data-inherent aggregation of low-input replicate data, and suggests heterogeneity of single pyramidal neuron transcriptome. Interestingly, expression variation in single neurons is biologically relevant. Conclusions: We found that differences in bioinformatics pipelines do not present a major source of variation.
Original language | English (US) |
---|---|
Journal | International Journal of Computational Biology and Drug Design |
Volume | 11 |
Issue number | 1-2 |
DOIs | |
State | Published - 2018 |
Keywords
- ANNOVAR
- Annotate variation
- Bioinformatics
- PCA
- Principal component analysis
- RNA-Seq
- RNA-Seq by expectation maximisation
- RSEM
- Single-cell sequencing
- T-SNE
- T-distributed stochastic neighbour embedding
- TopHat
- Variance
ASJC Scopus subject areas
- Drug Discovery
- Computer Science Applications