21 、GPD-PSL-VCF

https://genome.ucsc.edu/FAQ/FAQformat.html#format9

1、Variant Call Format(VCF)

Example

##fileformat=VCFv4.0
##fileDate=20110705
##reference=1000GenomesPilot-NCBI37
##phasing=partial
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM POS    ID        REF  ALT     QUAL FILTER INFO                              FORMAT      Sample1        Sample2        Sample3
2      4370   rs6057    G    A       29   .      NS=2;DP=13;AF=0.5;DB;H2           GT:GQ:DP:HQ 0|0:48:1:52,51 1|0:48:8:51,51 1/1:43:5:.,.
2      7330   .         T    A       3    q10    NS=5;DP=12;AF=0.017               GT:GQ:DP:HQ 0|0:46:3:58,50 0|1:3:5:65,3   0/0:41:3
2      110696 rs6055    A    G,T     67   PASS   NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2   2/2:35:4
2      130237 .         T    .       47   .      NS=2;DP=16;AA=T                   GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:56,51 0/0:61:2
2      134567 microsat1 GTCT G,GTACT 50   PASS   NS=2;DP=9;AA=G                    GT:GQ:DP    0/1:35:4       0/2:17:2       1/1:40:3
chr1    45796269        .       G       C
chr1    45797505        .       C       G
chr1    45798555        .       T       C
chr1    45798901        .       C       T
chr1    45805566        .       G       C
chr2    47703379        .       C       T
chr2    48010488        .       G       A
chr2    48030838        .       A       T
chr2    48032875        .       CTAT    -
chr2    48032937        .       T       C
chr2    48033273        .       TTTTTGTTTTAATTCCT       -
chr2    48033551        .       C       G
chr2    48033910        .       A       T
chr2    215632048       .       G       T
chr2    215632125       .       TT      -
chr2    215632155       .       T       C
chr2    215632192       .       G       A
chr2    215632255       .       CA      TG
chr2    215634055       .       C       T

2、Gene Predictions (Extended)（GPD）

The following definition is used for extended gene prediction tables. In alternative-splicing situations, each transcript has a row in this table. The refGene table is an example of the genePredExt format.

table genePredExt
"A gene prediction with some additional info."
    (
    string name;        	"Name of gene (usually transcript_id from GTF)"
    string chrom;       	"Chromosome name"
    char[1] strand;     	"+ or - for strand"
    uint txStart;       	"Transcription start position"
    uint txEnd;         	"Transcription end position"
    uint cdsStart;      	"Coding region start"
    uint cdsEnd;        	"Coding region end"
    uint exonCount;     	"Number of exons"
    uint[exonCount] exonStarts; "Exon start positions"
    uint[exonCount] exonEnds;   "Exon end positions"
    int score;            	"Score"
    string name2;       	"Alternate name (e.g. gene_id from GTF)"
    string cdsStartStat; 	"enum('none','unk','incmpl','cmpl')"
    string cdsEndStat;   	"enum('none','unk','incmpl','cmpl')"
    lstring exonFrames; 	"Exon frame offsets {0,1,2}"
    )

3、PSL format

PSL lines represent alignments, and are typically taken from files generated by BLAT or psLayout. See the BLAT documentation for more details. All of the following fields are required on each data line within a PSL file:

matches - Number of bases that match that aren't repeats
misMatches - Number of bases that don't match
repMatches - Number of bases that match but are part of repeats
nCount - Number of "N" bases
qNumInsert - Number of inserts in query
qBaseInsert - Number of bases inserted in query
tNumInsert - Number of inserts in target
tBaseInsert - Number of bases inserted in target
strand - "+" or "-" for query strand. For translated alignments, second "+"or "-" is for genomic strand
qName - Query sequence name
qSize - Query sequence size
qStart - Alignment start position in query
qEnd - Alignment end position in query
tName - Target sequence name
tSize - Target sequence size
tStart - Alignment start position in target
tEnd - Alignment end position in target
blockCount - Number of blocks in the alignment (a block contains no gaps)
blockSizes - Comma-separated list of sizes of each block
qStarts - Comma-separated list of starting positions of each block in query
tStarts - Comma-separated list of starting positions of each block in target

posted @ 2017-08-17 16:37 风中之铃阅读(402) 评论(0) 收藏举报

刷新页面返回顶部

风中之铃

21 、GPD-PSL-VCF

1、Variant Call Format(VCF)

Example

2、Gene Predictions (Extended)（GPD）

3、PSL format

公告