vcftools 软件中 --site-pi 计算的杂合度指的是什么?

 

--site-pi:位点的期待杂合度。(计算等位基因频率p、q, 处于哈迪温伯格平衡时杂合子的概率,即2pq。)

 

001、

plink 软件中计算位点的期待杂合度

root@DESKTOP-1N42TVH:/home/test3# ls
result.map  result.ped
root@DESKTOP-1N42TVH:/home/test3# plink --file result --hardy
PLINK v1.90b6.26 64-bit (2 Apr 2022)           www.cog-genomics.org/plink/1.9/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink.log.
Options in effect:
  --file result
  --hardy

16007 MB RAM detected; reserving 8003 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (442957 variants, 407 people).
--file: plink-temporary.bed + plink-temporary.bim + plink-temporary.fam
written.
442957 variants loaded from .bim file.
407 people (0 males, 0 females, 407 ambiguous) loaded from .fam.
Ambiguous sex IDs written to plink.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 407 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is exactly 1.
--hardy: Writing Hardy-Weinberg report (founders only) to plink.hwe ... done.
root@DESKTOP-1N42TVH:/home/test3# ls
plink.hwe  plink.log  plink.nosex  result.map  result.ped
root@DESKTOP-1N42TVH:/home/test3# head plink.hwe                                      ## 期待杂合度
 CHR                           SNP     TEST   A1   A2                 GENO   O(HET)   E(HET)            P
   1               oar3_OAR1_17218  ALL(NP)    G    A            9/105/293    0.258   0.2565            1
   1               oar3_OAR1_20658  ALL(NP)    C    A           25/123/259   0.3022   0.3347      0.05401
   1               oar3_OAR1_28296  ALL(NP)    A    G           17/140/250    0.344   0.3361       0.7679
   1               oar3_OAR1_31152  ALL(NP)    G    A          103/185/119   0.4545   0.4992      0.07405
   1               oar3_OAR1_38175  ALL(NP)    A    G           14/119/274   0.2924    0.296       0.8667
   1               oar3_OAR1_38264  ALL(NP)    A    G           39/191/177   0.4693   0.4425       0.2626
   1                      s64199.1  ALL(NP)    A    G            6/101/300   0.2482   0.2391       0.5385
   1               oar3_OAR1_52919  ALL(NP)    G    A           98/198/111   0.4865   0.4995         0.62
   1               oar3_OAR1_55363  ALL(NP)    A    G           70/166/171   0.4079   0.4692      0.00837

 

002、vcftools中--site-pi计算杂合度

root@DESKTOP-1N42TVH:/home/test3# ls
plink.hwe  plink.log  plink.nosex  result.map  result.ped
root@DESKTOP-1N42TVH:/home/test3# plink --file result --recode vcf-iid --out result  ## 转vcf格式
PLINK v1.90b6.26 64-bit (2 Apr 2022)           www.cog-genomics.org/plink/1.9/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to result.log.
Options in effect:
  --file result
  --out result
  --recode vcf-iid

16007 MB RAM detected; reserving 8003 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (442957 variants, 407 people).
--file: result-temporary.bed + result-temporary.bim + result-temporary.fam
written.
442957 variants loaded from .bim file.
407 people (0 males, 0 females, 407 ambiguous) loaded from .fam.
Ambiguous sex IDs written to result.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 407 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is exactly 1.
442957 variants and 407 people pass filters and QC.
Note: No phenotypes present.
--recode vcf-iid to result.vcf ... done.
root@DESKTOP-1N42TVH:/home/test3# ls                       ## 转换结果
plink.hwe  plink.log  plink.nosex  result.log  result.map  result.nosex  result.ped  result.vcf
root@DESKTOP-1N42TVH:/home/test3# vcftools --vcf result.vcf --site-pi --out vcf_pi  ## 计算杂合度pi

VCFtools - 0.1.16
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
        --vcf result.vcf
        --out vcf_pi
        --site-pi

Warning: Expected at least 2 parts in INFO entry: ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome">
After filtering, kept 407 out of 407 Individuals
Outputting Per-Site Nucleotide Diversity Statistics...
After filtering, kept 442957 out of a possible 442957 Sites
Run Time = 12.00 seconds

 

比较验证:

root@DESKTOP-1N42TVH:/home/test3# ls
plink.hwe  plink.log  plink.nosex  result.log  result.map  result.nosex  result.ped  result.vcf  vcf_pi.log  vcf_pi.sites.pi
root@DESKTOP-1N42TVH:/home/test3# head vcf_pi.sites.pi
CHROM   POS     PI
1       17218   0.256861
1       20658   0.335135
1       28296   0.336546
1       31152   0.499841
1       38175   0.296318
1       38264   0.443061
1       52854   0.239393
1       52919   0.500104
1       55363   0.469786
root@DESKTOP-1N42TVH:/home/test3# head plink.hwe
 CHR                           SNP     TEST   A1   A2                 GENO   O(HET)   E(HET)            P
   1               oar3_OAR1_17218  ALL(NP)    G    A            9/105/293    0.258   0.2565            1
   1               oar3_OAR1_20658  ALL(NP)    C    A           25/123/259   0.3022   0.3347      0.05401
   1               oar3_OAR1_28296  ALL(NP)    A    G           17/140/250    0.344   0.3361       0.7679
   1               oar3_OAR1_31152  ALL(NP)    G    A          103/185/119   0.4545   0.4992      0.07405
   1               oar3_OAR1_38175  ALL(NP)    A    G           14/119/274   0.2924    0.296       0.8667
   1               oar3_OAR1_38264  ALL(NP)    A    G           39/191/177   0.4693   0.4425       0.2626
   1                      s64199.1  ALL(NP)    A    G            6/101/300   0.2482   0.2391       0.5385
   1               oar3_OAR1_52919  ALL(NP)    G    A           98/198/111   0.4865   0.4995         0.62
   1               oar3_OAR1_55363  ALL(NP)    A    G           70/166/171   0.4079   0.4692      0.00837
root@DESKTOP-1N42TVH:/home/test3# tail vcf_pi.sites.pi
20      51060804        0.499742
20      51081995        0.438815
20      51083803        0.454147
20      51104501        0.50018
20      51110559        0.50037
20      51114511        0.498073
20      51119355        0.42517
20      51137793        0.437948
20      51138395        0.454147
20      51139507        0.498073
root@DESKTOP-1N42TVH:/home/test3# tail plink.hwe
  20           oar3_OAR20_51060804  ALL(NP)    G    A           96/198/113   0.4865   0.4991         0.62
  20           oar3_OAR20_51081995  ALL(NP)    A    G           39/186/182    0.457   0.4383       0.4296
  20           oar3_OAR20_51083803  ALL(NP)    A    G           46/191/170   0.4693   0.4536       0.5139
  20           oar3_OAR20_51104501  ALL(NP)    G    A          103/189/115   0.4644   0.4996       0.1648
  20           oar3_OAR20_51110559  ALL(NP)    G    A          101/196/110   0.4816   0.4998       0.4875
  20           oar3_OAR20_51114511  ALL(NP)    A    G           92/194/121   0.4767   0.4975       0.4253
  20           oar3_OAR20_51119355  ALL(NP)    A    G           34/181/192   0.4447   0.4246       0.4135
  20           oar3_OAR20_51137793  ALL(NP)    G    A           40/183/184   0.4496   0.4374       0.6504
  20           oar3_OAR20_51138395  ALL(NP)    G    A           48/187/172   0.4595   0.4536       0.8278
  20           oar3_OAR20_51139507  ALL(NP)    A    G           92/194/121   0.4767   0.4975       0.4253

 

posted @ 2022-07-14 10:21  小鲨鱼2018  阅读(1539)  评论(0编辑  收藏  举报