[bio-tips]genomeCoverageBed introduction
Having sequenced and obatain BAM/SAM file, one is going to visulize the data in histogram. WIG, or Bedgraph format file will work. Thus what we need is a tool that convert BAM/SAM file into Bedgraph or WIG file.
Before the converting, two things should be prepared.
1. BAM file is suggested to be sorted with SAMTOOLS. The input is `mapped.bam`, and out sorted file is `sort.mapped.bam`
samtools sort mapped.bam sort.mapped.bam
2. chromInfo file is needed.
chromInfo file records the length for each chromatin. And can be available from UCSC genome browser.
#chrom size chr1 197195432 chr2 181748087 chr3 159599783 chr4 155630120
Next, let's hit it.
1. BAM2Bedgraph
genomeCoverageBed -bg -ibam sort.mapped.bam -g genome.chromInfo >genomewide.bedgraph
2. BAM2Wig
genomeCoverageBed -d -strand + -ibam sort.mapped.bam -g genome.chromInfo >genomewide.wig
Calculating the forward strand coverage. And Wig file is 1-based coordinated format, thus we use the -d option.
====
Here is attached the full help document for genomecoveragebed tool. And the algorithm is simple, I ever managed to write a perl version.
Usage: bedtools genomecov [OPTIONS] -i <bed/gff/vcf> -g <genome>
Options: -ibam The input file is in BAM format. Note: BAM _must_ be sorted by position -d Report the depth at each genome position (with one-based coordinates). Default behavior is to report a histogram. -dz Report the depth at each genome position (with zero-based coordinates). Reports only non-zero positions. Default behavior is to report a histogram. -bg Report depth in BedGraph format. For details, see: genome.ucsc.edu/goldenPath/help/bedgraph.html -bga Report depth in BedGraph format, as above (-bg). However with this option, regions with zero coverage are also reported. This allows one to quickly extract all regions of a genome with 0 coverage by applying: "grep -w 0$" to the output. -split Treat "split" BAM or BED12 entries as distinct BED intervals. when computing coverage. For BAM files, this uses the CIGAR "N" and "D" operations to infer the blocks for computing coverage. For BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds fields (i.e., columns 10,11,12). -strand Calculate coverage of intervals from a specific strand. With BED files, requires at least 6 columns (strand is column 6). - (STRING): can be + or - -5 Calculate coverage of 5" positions (instead of entire interval). -3 Calculate coverage of 3" positions (instead of entire interval). -max Combine all positions with a depth >= max into a single bin in the histogram. Irrelevant for -d and -bedGraph - (INTEGER) -scale Scale the coverage by a constant factor. Each coverage value is multiplied by this factor before being reported. Useful for normalizing coverage by, e.g., reads per million (RPM). - Default is 1.0; i.e., unscaled. - (FLOAT) -trackline Adds a UCSC/Genome-Browser track line definition in the first line of the output. - See here for more details about track line definition: http://genome.ucsc.edu/goldenPath/help/bedgraph.html - NOTE: When adding a trackline definition, the output BedGraph can be easily uploaded to the Genome Browser as a custom track, BUT CAN NOT be converted into a BigWig file (w/o removing the first line). -trackopts Writes additional track line definition parameters in the first line. - Example: -trackopts 'name="My Track" visibility=2 color=255,30,30' Note the use of single-quotes if you have spaces in your parameters. - (TEXT)