[TIPS]BEDtools -f -split options produce empty results
Questions:
could you give an example of intersectBed -split syntax working with the current version of bedtools (v2.16.2)?
My input A is a BED12 containing only reads that have been aligned to *exon-exon junctions*.
My input B is a gtf annotation file.
Since input A contains only reads that have been mapped to splice junctions, I need to be strict about the degree of overlap, therefore I use -f 1.
With a previous version of bedtools (2.11), I was using the following:
$ intersectBed -a myAlignment.bed12 -b myAnnotation.gtf -wa -f 1 -wb -bed -split > myOutput.hit
The same syntax unfortunately now returns an empty output file.
Only when being less strict with the -f option I'm getting some hits (same as above, but with -f 0.55 or just skipping the -f restriction).But since my alignment file contains only reads falling into splice junctions, I would like to be as strict as possible.
Answers:
The basic issue is that the behavior of the -f argument has changed in the latest version when dealing with -split records. Let's consider a toy example where we have split record with 3 "blocks" that are each 10 base pairs. Previously, if any of the 3 blocks overlapped something in the B file that exceeded the -f parameter, then the record was reported. Currently, the _total_ number of base pairs intersecting _each_ block must be greater than -f percent of the _total_ length of the 3 blocks. In other words, whereas a single feature that overlapped 100% of one block would now only represent a 33% overlap with the blocked feature.
I recognize that this is the right way to look at things in some cases and not in others. I will give this some thought and am welcome to input from the list. Blocked/split records are difficult to handle with the existing infrastructure...
In tracking this down, I did uncover a bug that reduced the number of B intervals that were considered for potential -split overlaps. This has been addressed and corrected in the github repository.