stranded 和 non-stranded RNA-seq
stranded 和 non-stranded RNA-seq 的表达谱分析和基因overlap研究的比较
背景
non-stranded RNA-seq的建库存在的问题:不能确定每个转录本(transcript)是来自哪条链,没有链特异信息,则难以准确量化基因的表达水平,尤其是在genomic loci有重叠,但是转录方向是相反的这些基因。
strand-specific or stranded RNA-seq的建库方式能够保留链的信息。
实验结果
材料:whole blood RNA samples
处理:stranded 和 non-stranded RNA-seq
结果:stranded 和 non-stranded RNA-seq 的基因表达分析表明,stranded RNA-seq 能识别更多的差异表达基因。这主要是因为能够更准确的估计在genomic loci有重叠,但是转录方向是相反的基因的表达水平。
建议:使用Stranded RNA-seq方式建库。
Non-stranded 和 stranded RNA-seq 建库步骤的比较
区别:
(1)cDNA 合成中,第二条链合成是否使用dUTPs代替dTTPs;
(2)文库建成后,是否降解第二条链。
例1:
The mapping profiles for IL24 in Replicate PFE1. In non-stranded RNA-seq, all reads mapped to IL24 are counted regardless if they are in the forward or reverse strands. However, in stranded RNA-seq, nearly all reads are mapped to the “+” strand and thus not counted because these reads are not reverse complementary to IL24 in the “+” strand. However, the coverage pattern of sequence reads does not support the sequence reads mapped to the IL24 genomic region that truly originate from this gene. All genes, transcripts, and sequence reads are colored in blue if they are in the “+” strand and colored in green if in the “−“ strand
例2:
The mapping profiles for ICAM4 (intercellular adhesion molecule 4) in Replicate PFE1. The gene ICAM4 is on the “+” strand, and 100 % contained within CTD-2369P2.8 in the “−“ strand. In non-stranded RNA-seq, the ambiguous reads in overlapping regions are excluded from counting, which explains why there is no expression for ICAM4. However, the ambiguous reads can be perfectly resolved in stranded RNA-seq. By considering the read direction, all reads can be counted to ICAM4 because they are reverse complementary to ICAM4, but not CTD-2369P2.8