SNPsnap | 筛选最佳匹配的SNP | 富集分析 | CP loci

SNPsnap已经可以被vsampler取代了


 

一个矛盾:

GWAS得到的SNP做富集分析的话,通常都会有强的偏向性。

co-localization of GWAS signals to gene-dense and high linkage disequilibrium (LD) regions, and correlations of gene size, location and function

 

数据库使用注意:

  • 一次最多只能输入200-300个SNP
  • SNP必须以rs id格式输入,否则基本不识别

 

SNPsnap: a Web-based tool for identification and annotation of matched SNPs 

providing matched sets of SNPs that can be used to calibrate background expectations.

基于:allele frequency, number of SNPs in LD, distance to nearest gene and gene density

 

根据条件,选出类似的SNP:

  1. Minor allele frequency : we partitioned SNPs into minor allele frequency bins (using 1–2, 2–3, … , 49–50% strata).
  2. LD buddies : for each SNP, we counted the number of ‘buddy’ SNPs in LD at various thresholds (r 2 > 0.1, 0.2, … , 0.9) [using PLINK v.1.07 ( Purcell et al. , 2007 ) to compute LD].
  3. Distance to nearest gene : we computed the distance to the nearest 5′ start site using Ensembl gene coordinates ( Flicek et al. , 2014 ). If the SNP was within a gene, we used the distance to that gene’s start site.
  4. Gene density : we counted the number of genes in loci around the SNP, using LD (r 2 > 0.1, 0.2, … , 0.9) and physical distance (100, 200, … , 1000 kb) to define loci.

 

这里我们就要根据这个工具来筛选T0的SNP。

a) the number of T0 loci was set to be the same as that of the T1 loci (associated with a single trait);

b) the length distribution of T0 loci was set to be the same as that of the T1 loci;

c) the T0 loci should not include the ENCODE blacklist regions and human leukocyte antigen (HLA) regions; and

d) they should be randomly selected from autosomal regions.

 

 

画这个图的脚本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
head=T2
bedfile=../sort.CP.region.T2.bed
 
# cat CP.region.T0.bed | bedtools sort -g ../genome.txt > sort.CP.region.T0.bed
# cat CP.region.T2.bed | bedtools sort -g ../genome.txt > sort.CP.region.T2.bed
# cat CP.region.T3.bed | bedtools sort -g ../genome.txt > sort.CP.region.T3.bed
 
bedtools intersect -a ../../UCSC.anno/CDS.bed -b $bedfile -wa | bedtools merge > $head.CDS.bed &&\
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.CDS.bed -wa > $head.CDS.cons.bed &&\
 
bedtools intersect -a ../../UCSC.anno/UTR3.bed -b $bedfile -wa | bedtools merge > $head.UTR3.bed &&\
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.UTR3.bed -wa > $head.UTR3.cons.bed &&\
 
bedtools intersect -a ../../UCSC.anno/UTR5.bed -b $bedfile -wa | bedtools merge > $head.UTR5.bed &&\
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.UTR5.bed -wa > $head.UTR5.cons.bed &&\
 
bedtools intersect -a ../../UCSC.anno/Down2K.bed -b $bedfile -wa | bedtools merge > $head.Down2K.bed &&\
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.Down2K.bed -wa > $head.Down2K.cons.bed &&\
 
bedtools intersect -a ../../UCSC.anno/Up2K.bed -b $bedfile -wa | bedtools merge > $head.Up2K.bed &&\
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.Up2K.bed -wa > $head.Up2K.cons.bed &&\
 
bedtools intersect -a ../../UCSC.anno/Intron.bed -b $bedfile -wa | bedtools merge > $head.Intron.bed &&\
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.Intron.bed -wa > $head.Intron.cons.bed &&\
 
bedtools intersect -a ../../UCSC.anno/intergenic.bed -b $bedfile -wa | bedtools merge > $head.intergenic.bed &&\
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.intergenic.bed -wa > $head.intergenic.cons.bed &&\
 
echo done!
 
 
# awk '{ total += $4 } END { print total/NR }' T2.CDS.cons.bed

  

批量求均值

1
2
3
4
5
6
7
awk '{ total += $4 } END { print total/NR }' T*.CDS.cons.bed
awk '{ total += $4 } END { print total/NR }' T*.UTR3.cons.bed
awk '{ total += $4 } END { print total/NR }' T*.UTR5.cons.bed
awk '{ total += $4 } END { print total/NR }' T*.Down2K.cons.bed
awk '{ total += $4 } END { print total/NR }' T*.Up2K.cons.bed
awk '{ total += $4 } END { print total/NR }' T*.Intron.cons.bed
awk '{ total += $4 } END { print total/NR }' T*.intergenic.cons.bed

 

按CP loci来分别统计平均分,bedtools的特殊功能

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
for i in CDS UTR3 UTR5 Down2K Up2K Intron intergenic
do
# bedtools map -a sort.CP.region.T0.bed -b T0/T0.CDS.cons.bed -c 4 -o mean | cut -f4
echo $i
#
# echo $i > CPmerge/$i.T0.score
# bedtools map -a sort.CP.region.T0.bed -b T0/T0.$i.cons.bed -c 4 -o mean | cut -f4 >> CPmerge/$i.T0.score
#
echo $i > CPmerge/$i.T1.score
bedtools map -a sort.CP.region.T1.bed -b T1/T1.$i.cons.bed -c 4 -o mean | cut -f6 >> CPmerge/$i.T1.score
#
echo $i > CPmerge/$i.T2.score
bedtools map -a sort.CP.region.T2.bed -b T2/T2.$i.cons.bed -c 4 -o mean | cut -f6 >> CPmerge/$i.T2.score
#
echo $i > CPmerge/$i.T00.score
bedtools map -a sort.SNPsnap.bed -b SNPsnap/SNPsnap.$i.cons.bed -c 4 -o mean | cut -f6 >> CPmerge/$i.T00.score
#
done
 
#paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T0.* > T0.score
#paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T1.* > T1.score
#paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T2.* > T2.score
#paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T00.* > T00.score

  

 

 

待续

posted @   Life·Intelligence  阅读(2400)  评论(0编辑  收藏  举报
(评论功能已被禁用)
编辑推荐:
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
阅读排行:
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· .NET10 - 预览版1新功能体验(一)
TOP
点击右上角即可分享
微信分享提示