SNPsnap | 筛选最佳匹配的SNP | 富集分析 | CP loci
SNPsnap已经可以被vsampler取代了
一个矛盾:
GWAS得到的SNP做富集分析的话,通常都会有强的偏向性。
co-localization of GWAS signals to gene-dense and high linkage disequilibrium (LD) regions, and correlations of gene size, location and function
数据库使用注意:
- 一次最多只能输入200-300个SNP
- SNP必须以rs id格式输入,否则基本不识别
SNPsnap: a Web-based tool for identification and annotation of matched SNPs
providing matched sets of SNPs that can be used to calibrate background expectations.
基于:allele frequency, number of SNPs in LD, distance to nearest gene and gene density
根据条件,选出类似的SNP:
- Minor allele frequency : we partitioned SNPs into minor allele frequency bins (using 1–2, 2–3, … , 49–50% strata).
- LD buddies : for each SNP, we counted the number of ‘buddy’ SNPs in LD at various thresholds (r 2 > 0.1, 0.2, … , 0.9) [using PLINK v.1.07 ( Purcell et al. , 2007 ) to compute LD].
- Distance to nearest gene : we computed the distance to the nearest 5′ start site using Ensembl gene coordinates ( Flicek et al. , 2014 ). If the SNP was within a gene, we used the distance to that gene’s start site.
- Gene density : we counted the number of genes in loci around the SNP, using LD (r 2 > 0.1, 0.2, … , 0.9) and physical distance (100, 200, … , 1000 kb) to define loci.
这里我们就要根据这个工具来筛选T0的SNP。
a) the number of T0 loci was set to be the same as that of the T1 loci (associated with a single trait);
b) the length distribution of T0 loci was set to be the same as that of the T1 loci;
c) the T0 loci should not include the ENCODE blacklist regions and human leukocyte antigen (HLA) regions; and
d) they should be randomly selected from autosomal regions.
画这个图的脚本:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | head =T2 bedfile=.. /sort .CP.region.T2.bed # cat CP.region.T0.bed | bedtools sort -g ../genome.txt > sort.CP.region.T0.bed # cat CP.region.T2.bed | bedtools sort -g ../genome.txt > sort.CP.region.T2.bed # cat CP.region.T3.bed | bedtools sort -g ../genome.txt > sort.CP.region.T3.bed bedtools intersect -a ../.. /UCSC .anno /CDS .bed -b $bedfile -wa | bedtools merge > $ head .CDS.bed &&\ bedtools intersect -a ../.. /PhastCons .bed /all .chr.phastCons46way.primates.bed -b $ head .CDS.bed -wa > $ head .CDS.cons.bed &&\ bedtools intersect -a ../.. /UCSC .anno /UTR3 .bed -b $bedfile -wa | bedtools merge > $ head .UTR3.bed &&\ bedtools intersect -a ../.. /PhastCons .bed /all .chr.phastCons46way.primates.bed -b $ head .UTR3.bed -wa > $ head .UTR3.cons.bed &&\ bedtools intersect -a ../.. /UCSC .anno /UTR5 .bed -b $bedfile -wa | bedtools merge > $ head .UTR5.bed &&\ bedtools intersect -a ../.. /PhastCons .bed /all .chr.phastCons46way.primates.bed -b $ head .UTR5.bed -wa > $ head .UTR5.cons.bed &&\ bedtools intersect -a ../.. /UCSC .anno /Down2K .bed -b $bedfile -wa | bedtools merge > $ head .Down2K.bed &&\ bedtools intersect -a ../.. /PhastCons .bed /all .chr.phastCons46way.primates.bed -b $ head .Down2K.bed -wa > $ head .Down2K.cons.bed &&\ bedtools intersect -a ../.. /UCSC .anno /Up2K .bed -b $bedfile -wa | bedtools merge > $ head .Up2K.bed &&\ bedtools intersect -a ../.. /PhastCons .bed /all .chr.phastCons46way.primates.bed -b $ head .Up2K.bed -wa > $ head .Up2K.cons.bed &&\ bedtools intersect -a ../.. /UCSC .anno /Intron .bed -b $bedfile -wa | bedtools merge > $ head .Intron.bed &&\ bedtools intersect -a ../.. /PhastCons .bed /all .chr.phastCons46way.primates.bed -b $ head .Intron.bed -wa > $ head .Intron.cons.bed &&\ bedtools intersect -a ../.. /UCSC .anno /intergenic .bed -b $bedfile -wa | bedtools merge > $ head .intergenic.bed &&\ bedtools intersect -a ../.. /PhastCons .bed /all .chr.phastCons46way.primates.bed -b $ head .intergenic.bed -wa > $ head .intergenic.cons.bed &&\ echo done ! # awk '{ total += $4 } END { print total/NR }' T2.CDS.cons.bed |
批量求均值
1 2 3 4 5 6 7 | awk '{ total += $4 } END { print total/NR }' T*.CDS.cons.bed awk '{ total += $4 } END { print total/NR }' T*.UTR3.cons.bed awk '{ total += $4 } END { print total/NR }' T*.UTR5.cons.bed awk '{ total += $4 } END { print total/NR }' T*.Down2K.cons.bed awk '{ total += $4 } END { print total/NR }' T*.Up2K.cons.bed awk '{ total += $4 } END { print total/NR }' T*.Intron.cons.bed awk '{ total += $4 } END { print total/NR }' T*.intergenic.cons.bed |
按CP loci来分别统计平均分,bedtools的特殊功能
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | for i in CDS UTR3 UTR5 Down2K Up2K Intron intergenic do # bedtools map -a sort.CP.region.T0.bed -b T0/T0.CDS.cons.bed -c 4 -o mean | cut -f4 echo $i # # echo $i > CPmerge/$i.T0.score # bedtools map -a sort.CP.region.T0.bed -b T0/T0.$i.cons.bed -c 4 -o mean | cut -f4 >> CPmerge/$i.T0.score # echo $i > CPmerge/$i.T1.score bedtools map -a sort .CP.region.T1.bed -b T1 /T1 .$i.cons.bed -c 4 -o mean | cut -f6 >> CPmerge/$i.T1.score # echo $i > CPmerge/$i.T2.score bedtools map -a sort .CP.region.T2.bed -b T2 /T2 .$i.cons.bed -c 4 -o mean | cut -f6 >> CPmerge/$i.T2.score # echo $i > CPmerge/$i.T00.score bedtools map -a sort .SNPsnap.bed -b SNPsnap /SNPsnap .$i.cons.bed -c 4 -o mean | cut -f6 >> CPmerge/$i.T00.score # done #paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T0.* > T0.score #paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T1.* > T1.score #paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T2.* > T2.score #paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T00.* > T00.score |
待续
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· .NET10 - 预览版1新功能体验(一)