PSL format

PSL lines represent alignments, and are typically taken from files generated by BLAT or psLayout. See the BLAT documentation for more details. All of the following fields are required on each data line within a PSL file:

  1. matches - Number of bases that match that aren't repeats
  2. misMatches - Number of bases that don't match
  3. repMatches - Number of bases that match but are part of repeats
  4. nCount - Number of 'N' bases
  5. qNumInsert - Number of inserts in query
  6. qBaseInsert - Number of bases inserted in query
  7. tNumInsert - Number of inserts in target
  8. tBaseInsert - Number of bases inserted in target
  9. strand - '+' or '-' for query strand. For translated alignments, second '+'or '-' is for genomic strand
  10. qName - Query sequence name
  11. qSize - Query sequence size
  12. qStart - Alignment start position in query
  13. qEnd - Alignment end position in query
  14. tName - Target sequence name
  15. tSize - Target sequence size
  16. tStart - Alignment start position in target
  17. tEnd - Alignment end position in target
  18. blockCount - Number of blocks in the alignment (a block contains no gaps)
  19. blockSizes - Comma-separated list of sizes of each block
  20. qStarts - Comma-separated list of starting positions of each block in query
  21. tStarts - Comma-separated list of starting positions of each block in target

Example:
Here is an example of an annotation track in PSL format. Note that line breaks have been inserted into the PSL lines in this example for documentation display purposes. This example can be pasted into the browser without editing.

browser position chr22:13073000-13074000
browser hide all
track name=fishBlats description="Fish BLAT" visibility=2
useScore=1
59 9 0 0 1 823 1 96 +- FS_CONTIG_48080_1 1955 171 1062 chr22
    47748585 13073589 13073753 2 48,20,  171,1042,  34674832,34674976,
59 7 0 0 1 55 1 55 +- FS_CONTIG_26780_1 2825 2456 2577 chr22
    47748585 13073626 13073747 2 21,45,  2456,2532,  34674838,34674914,
59 7 0 0 1 55 1 55 -+ FS_CONTIG_26780_1 2825 2455 2676 chr22
    47748585 13073727 13073848 2 45,21,  249,349,  13073727,13073827,

Click here to display this track in the Genome Browser.

Be aware that the coordinates for a negative strand in a PSL line are handled in a special way. In the qStart and qEnd fields, the coordinates indicate the position where the query matches from the point of view of the forward strand, even when the match is on the reverse strand. However, in the qStarts list, the coordinates are reversed.

 

Example:
Here is a 61-mer containing 2 blocks that align on the minus strand and 2 blocks that align on the plus strand (this sometimes happens due to assembly errors):

0         1         2         3         4         5         6 tens position in query  
0123456789012345678901234567890123456789012345678901234567890 ones position in query   
                      ++++++++++++++                    +++++ plus strand alignment on query   
    ------------------              --------------------      minus strand alignment on query   
0987654321098765432109876543210987654321098765432109876543210 ones position in query negative strand coordinates
6         5         4         3         2         1         0 tens position in query negative strand coordinates

Plus strand:   
     qStart=22
     qEnd=61 
     blockSizes=14,5 
     qStarts=22,56 
                  
Minus strand:   
     qStart=4 
     qEnd=56 
     blockSizes=20,18 
     qStarts=5,39   

Essentially, the minus strand blockSizes and qStarts are what you would get if you reverse-complemented the query. However, the qStart and qEnd are not reversed. Use the following formulas to convert one to the other:

     Negative-strand-coordinate-qStart = qSize - qEnd   = 61 - 56 =  5
     Negative-strand-coordinate-qEnd   = qSize - qStart = 61 -  4 = 57

BLAT this actual sequence against hg19 for a real-world example:


CCCC
GGGTAAAATGAGTTTTTT
GGTCCAATCTTTTA
ATCCACTCCCTACCCTCCTA
GCAAG


Look for the alignment on the negative strand (-) of chr21, which conveniently aligns to the window chr21:10,000,001-10,000,061.

Browser window coordinates are 1-based [start,end] while psl coordinates are 0-based [start,end), so a start of 10,000,001 in the browser corresponds to a start of 10,000,000 in the psl. Subtracting 10,000,000 from the target (chromosome) position in psl gives the query negative strand coordinate above.

The 4, 14, and 5 bases at beginning, middle, and end were chosen to not match with the genome at the corresponding position.

posted @ 2015-01-29 21:12  凉皮子  阅读(903)  评论(0编辑  收藏  举报