R 脚本实现 xx.hmp.txt格式数据转换为plink格式

 

001、

dir()
dat <- fread("mdp_genotype_test.hmp.txt", header = F, data.table = F)   ## 读入文件

genotype <- t(dat[,12:ncol(dat)])[,-1]
genotype <- apply(genotype,2,function(x){gsub(pattern = "(^.)", replacement = "\\1 ", x)})
genotype <- apply(genotype,2,function(x){gsub(pattern = "N", replacement = "0", x)})

id <- t(dat[1,][,12:ncol(dat)])

ped <- data.frame(fid = id, id = id, pid = 0, mid = 0, gender = 0, phy = -9, genotype)
map <- data.frame(dat$V3, dat$V1, 0, dat$V4)[-1,]

write.table(ped, "result.ped", row.names = F, col.names = F, sep = "\t", quote = F)
write.table(map, "result.map", row.names = F, col.names = F, sep = "\t", quote = F)
dir()

 

 

002、plink验证

root@PC1:/home/test# ls
result.map  result.ped
root@PC1:/home/test# plink --file result --recode
PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
(C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink.log.
Options in effect:
  --file result
  --recode

15969 MB RAM detected; reserving 7984 MB for main workspace.
.ped scan complete (for binary autoconversion).
Warning: Variant 424 quadallelic; setting rarest alleles missing.
Performing single-pass .bed write (3093 variants, 281 people).
--file: plink-temporary.bed + plink-temporary.bim + plink-temporary.fam
written.
3093 variants loaded from .bim file.
281 people (0 males, 0 females, 281 ambiguous) loaded from .fam.
Ambiguous sex IDs written to plink.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 281 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.963828.
3093 variants and 281 people pass filters and QC.
Note: No phenotypes present.
--recode ped to plink.ped + plink.map ... done.

 

posted @ 2022-07-29 18:03  小鲨鱼2018  阅读(357)  评论(0编辑  收藏  举报