vcftools 软件中 --site-pi 计算的杂合度指的是什么?
--site-pi:位点的期待杂合度。(计算等位基因频率p、q, 处于哈迪温伯格平衡时杂合子的概率,即2pq。)
001、
plink 软件中计算位点的期待杂合度
root@DESKTOP-1N42TVH:/home/test3# ls result.map result.ped root@DESKTOP-1N42TVH:/home/test3# plink --file result --hardy PLINK v1.90b6.26 64-bit (2 Apr 2022) www.cog-genomics.org/plink/1.9/ (C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to plink.log. Options in effect: --file result --hardy 16007 MB RAM detected; reserving 8003 MB for main workspace. .ped scan complete (for binary autoconversion). Performing single-pass .bed write (442957 variants, 407 people). --file: plink-temporary.bed + plink-temporary.bim + plink-temporary.fam written. 442957 variants loaded from .bim file. 407 people (0 males, 0 females, 407 ambiguous) loaded from .fam. Ambiguous sex IDs written to plink.nosex . Using 1 thread (no multithreaded calculations invoked). Before main variant filters, 407 founders and 0 nonfounders present. Calculating allele frequencies... done. Total genotyping rate is exactly 1. --hardy: Writing Hardy-Weinberg report (founders only) to plink.hwe ... done. root@DESKTOP-1N42TVH:/home/test3# ls plink.hwe plink.log plink.nosex result.map result.ped root@DESKTOP-1N42TVH:/home/test3# head plink.hwe ## 期待杂合度 CHR SNP TEST A1 A2 GENO O(HET) E(HET) P 1 oar3_OAR1_17218 ALL(NP) G A 9/105/293 0.258 0.2565 1 1 oar3_OAR1_20658 ALL(NP) C A 25/123/259 0.3022 0.3347 0.05401 1 oar3_OAR1_28296 ALL(NP) A G 17/140/250 0.344 0.3361 0.7679 1 oar3_OAR1_31152 ALL(NP) G A 103/185/119 0.4545 0.4992 0.07405 1 oar3_OAR1_38175 ALL(NP) A G 14/119/274 0.2924 0.296 0.8667 1 oar3_OAR1_38264 ALL(NP) A G 39/191/177 0.4693 0.4425 0.2626 1 s64199.1 ALL(NP) A G 6/101/300 0.2482 0.2391 0.5385 1 oar3_OAR1_52919 ALL(NP) G A 98/198/111 0.4865 0.4995 0.62 1 oar3_OAR1_55363 ALL(NP) A G 70/166/171 0.4079 0.4692 0.00837
002、vcftools中--site-pi计算杂合度
root@DESKTOP-1N42TVH:/home/test3# ls plink.hwe plink.log plink.nosex result.map result.ped root@DESKTOP-1N42TVH:/home/test3# plink --file result --recode vcf-iid --out result ## 转vcf格式 PLINK v1.90b6.26 64-bit (2 Apr 2022) www.cog-genomics.org/plink/1.9/ (C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to result.log. Options in effect: --file result --out result --recode vcf-iid 16007 MB RAM detected; reserving 8003 MB for main workspace. .ped scan complete (for binary autoconversion). Performing single-pass .bed write (442957 variants, 407 people). --file: result-temporary.bed + result-temporary.bim + result-temporary.fam written. 442957 variants loaded from .bim file. 407 people (0 males, 0 females, 407 ambiguous) loaded from .fam. Ambiguous sex IDs written to result.nosex . Using 1 thread (no multithreaded calculations invoked). Before main variant filters, 407 founders and 0 nonfounders present. Calculating allele frequencies... done. Total genotyping rate is exactly 1. 442957 variants and 407 people pass filters and QC. Note: No phenotypes present. --recode vcf-iid to result.vcf ... done. root@DESKTOP-1N42TVH:/home/test3# ls ## 转换结果 plink.hwe plink.log plink.nosex result.log result.map result.nosex result.ped result.vcf root@DESKTOP-1N42TVH:/home/test3# vcftools --vcf result.vcf --site-pi --out vcf_pi ## 计算杂合度pi VCFtools - 0.1.16 (C) Adam Auton and Anthony Marcketta 2009 Parameters as interpreted: --vcf result.vcf --out vcf_pi --site-pi Warning: Expected at least 2 parts in INFO entry: ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome"> After filtering, kept 407 out of 407 Individuals Outputting Per-Site Nucleotide Diversity Statistics... After filtering, kept 442957 out of a possible 442957 Sites Run Time = 12.00 seconds
比较验证:
root@DESKTOP-1N42TVH:/home/test3# ls plink.hwe plink.log plink.nosex result.log result.map result.nosex result.ped result.vcf vcf_pi.log vcf_pi.sites.pi root@DESKTOP-1N42TVH:/home/test3# head vcf_pi.sites.pi CHROM POS PI 1 17218 0.256861 1 20658 0.335135 1 28296 0.336546 1 31152 0.499841 1 38175 0.296318 1 38264 0.443061 1 52854 0.239393 1 52919 0.500104 1 55363 0.469786 root@DESKTOP-1N42TVH:/home/test3# head plink.hwe CHR SNP TEST A1 A2 GENO O(HET) E(HET) P 1 oar3_OAR1_17218 ALL(NP) G A 9/105/293 0.258 0.2565 1 1 oar3_OAR1_20658 ALL(NP) C A 25/123/259 0.3022 0.3347 0.05401 1 oar3_OAR1_28296 ALL(NP) A G 17/140/250 0.344 0.3361 0.7679 1 oar3_OAR1_31152 ALL(NP) G A 103/185/119 0.4545 0.4992 0.07405 1 oar3_OAR1_38175 ALL(NP) A G 14/119/274 0.2924 0.296 0.8667 1 oar3_OAR1_38264 ALL(NP) A G 39/191/177 0.4693 0.4425 0.2626 1 s64199.1 ALL(NP) A G 6/101/300 0.2482 0.2391 0.5385 1 oar3_OAR1_52919 ALL(NP) G A 98/198/111 0.4865 0.4995 0.62 1 oar3_OAR1_55363 ALL(NP) A G 70/166/171 0.4079 0.4692 0.00837 root@DESKTOP-1N42TVH:/home/test3# tail vcf_pi.sites.pi 20 51060804 0.499742 20 51081995 0.438815 20 51083803 0.454147 20 51104501 0.50018 20 51110559 0.50037 20 51114511 0.498073 20 51119355 0.42517 20 51137793 0.437948 20 51138395 0.454147 20 51139507 0.498073 root@DESKTOP-1N42TVH:/home/test3# tail plink.hwe 20 oar3_OAR20_51060804 ALL(NP) G A 96/198/113 0.4865 0.4991 0.62 20 oar3_OAR20_51081995 ALL(NP) A G 39/186/182 0.457 0.4383 0.4296 20 oar3_OAR20_51083803 ALL(NP) A G 46/191/170 0.4693 0.4536 0.5139 20 oar3_OAR20_51104501 ALL(NP) G A 103/189/115 0.4644 0.4996 0.1648 20 oar3_OAR20_51110559 ALL(NP) G A 101/196/110 0.4816 0.4998 0.4875 20 oar3_OAR20_51114511 ALL(NP) A G 92/194/121 0.4767 0.4975 0.4253 20 oar3_OAR20_51119355 ALL(NP) A G 34/181/192 0.4447 0.4246 0.4135 20 oar3_OAR20_51137793 ALL(NP) G A 40/183/184 0.4496 0.4374 0.6504 20 oar3_OAR20_51138395 ALL(NP) G A 48/187/172 0.4595 0.4536 0.8278 20 oar3_OAR20_51139507 ALL(NP) A G 92/194/121 0.4767 0.4975 0.4253
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· winform 绘制太阳,地球,月球 运作规律
2021-07-14 c语言中整数类型的显示