5 规范化数据[Normalization]
5.1 数据标准化[Normalization]
标准化是一个比较简单的过程,使用的是“logNormalize”,通过总表达式对每个单元格的要素表达式度量进行标准化,将其乘以比例因子(默认为10,000),并对结果进行对数转换。
Normalization is a relatively simple process, using “logNormalize”, normalizes the feature expression metric for each cell by a total expression, multiplies it by a scale factor (defaults to 10,000), and logs the result Conversion.
5.2 高差异基因[Variable genes]
鉴定细胞间高度变化基因(即,它们在一些细胞中高度表达,而在其他细胞中低表达),在下游分析中关注这些基因有助于突出单细胞数据集中的生物信号。
Identifying intercellularly highly variable genes (that is, they are highly expressed in some cells and low in others), and focusing on these genes in downstream analysis helps highlight biological signals in single-cell data sets.
P

Figure 5.1: 基因表达情况无标签

Figure 5.2: 基因表达情况有标签
PDF 文件 : P-variableGene.jpg
T

Figure 5.3: 基因表达情况无标签

Figure 5.4: 基因表达情况有标签
PDF 文件 : T-variableGene.jpg
5.3 二次标准化[Remove unwanted sources of variation]
改变每个基因的表达,使得跨细胞的平均表达为0
缩放每个基因的表达,以便跨细胞的方差为1
该步骤在下游分析中给予相同的权重,因此高表达的基因不占优势
Change the expression of each gene so that the average expression across cells is 0
Scale the expression of each gene so that the variance across cells is 1
This step gives the same weight in downstream analysis, so highly expressed genes are not dominant