测序的PCR duplicates及用samtools的rmdup去除PCR重复reads

 

建库中有一步是:

PCR扩增加了接头的DNA片段。

 

理想情况下,对打碎的基因组DNA,每个DNA片段测且仅测到一次。

 

但这一步扩增了6个cycle,那么每个DNA片段有了64份拷贝。将扩增后所有产物“洒”到flowcell,来自一个DNA片段的两个拷贝,可能会锚定在两个bead上,经过测序得到的这两条read,就是PCR duplication

一般来说,如果PCR duplication rate过高,那么同样总数目的reads,所提供的关于基因组的信息就大大减少了


 

samtools的rmdup如何去除PCR重复reads
 
随机打断测序需要去除PCR重复reads,特异性捕获不需要
samtools rmdup 的官方说明书见: http://www.htslib.org/doc/samtools.html
Remove potential PCR duplicates: if multiple read pairs have identical external coordinates, only retain the pair with highest mapping quality. In the paired-end mode, this command ONLY works with FR orientation and requires ISIZE is correctly set. It does not work for unpaired reads (e.g. two ends mapped to different chromosomes or orphan reads).
 
拿一个小的双端测序数据来测试一下:
samtools rmdup tmp.sorted.bam tmp.rmdup.bam
[bam_rmdup_core] processing reference chr10...
[bam_rmdup_core] 2 / 12 = 0.1667 in library
 
双端测序数据用samtools rmdup效果很差,很多人建议用picard工具的MarkDuplicates 功能
 
 

samtools 去除PCR冗余

ref: samtools 使用说明

samtools markdup [-l length] [-r] [-s] [-T] [-Sin.algsort.bam out.bam

-l INT Expected maximum read length of INT bases. [300]

-r Remove duplicate reads.

-s Print some basic stats.

-T PREFIX Write temporary files to PREFIX.samtools.nnnn.mmmm.tmp

-S Mark supplementary reads of duplicates as duplicates.

需要四步:

samtools sort -n  xxx.bam -o xxx.sort.bam

samtools fixmate -m xxx.sort.bam xxx.fixmate.bam #注意这里samtools 1.2 的fixmate没有-m参数

samtools sort  xxx.fixmate.bam -o xxx.positionsort.bam

samtools markdup -r xxx.positionsort.bam xxx.markdup.bam #注意这里samtools 1.2 去冗余参数为rmdup,且1.2版本会报错,实际用1.3的rmdup参数

all:

samtools sort -n  xxx.bam | samtools fixmate -m | samtools sort | samtools markdup -r > xxx.markdup.bam


在sam/bam水平:

picard

ref网站:Picard Tools - By Broad Institute

使用:

java -jar picard.jar MarkDuplicates \

I=xxx.sorted.bam \

O=xxx.sorted.markdup.bam \

M=xxx.markdup.txt

直接删除冗余:

java -jar picard.jar MarkDuplicates \

REMOVE_DUPLICATES =true \

I=xxx.sorted.bam \

O=xxx.sorted.markdup.bam \

M=xxx.markdup.txt


参考来源:
https://www.jianshu.com/p/73483070379b

 
https://www.jianshu.com/p/879c5e9ed56e
 
https://www.jianshu.com/p/879c5e9ed56e

posted on 2019-12-17 11:11  BPSO_mynotes  阅读(11929)  评论(0编辑  收藏  举报

导航