pybedtools 提取序列
前言
pybedtools 是封装了BEDTools 所有可用的程序。下文学习下pybedtools 如何通过bed文件的坐标提取对应序列
正文
对pybedtools还不了解的参考下这篇文章 在Python中使用BEDTools。
提取序列的方法在BEDTools 中的命令是bedtools getfasta
, 在pybedtools中是BedTool.sequence
方法。
第一步是创建BEDTool
实例, 作为例子,这里用字符串来生成:
import pybedtools
a = pybedtools.BedTool("""
chr1 1 10 feature1
chr1 50 55 feature2""", from_string=True)
# 若从bed文件中生成则
# bed = pybedtools.BedTool(bed_file_path)
指定fasta文件地址
## fasta 是一个软件自带的测试fasta文件地址
fasta = pybedtools.example_filename('test.fa')
a = a.sequence(fi=fasta)
print(open(a.seqfn).read())
输出
>chr1:1-10
GATGAGTCT
>chr1:50-55
CCATC
保存文件,可以通过设置fo
参数设置保存地址
a = a.sequence(fi=fasta, fo="a1.fa")
也可以使用save_seqs()
方法
a.save_seqs("a2.fa")
还有个参数nameOnly
,值得一说。之前提取的"a1.fa"是这样的:
>chr1:1-10
GATGAGTCT
>chr1:50-55
CCATC
nameOnly
参数可以让fasta的header变为bed 的name 一列
a.sequence(fi=fasta, fo="a3.fa", nameOnly=True)
a3序列:
>feature1
GATGAGTCT
>feature2
CCATC
而name
参数,是name + 坐标,见下面参数文档
Original BEDTools help::
Tool: bedtools getfasta (aka fastaFromBed)
Version: v2.30.0
Summary: Extract DNA sequences from a fasta file based on feature coordinates.
Usage: bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>
Options:
-fi Input FASTA file
-fo Output file (opt., default is STDOUT
-bed BED/GFF/VCF file of ranges to extract from -fi
-name Use the name field and coordinates for the FASTA header
-name+ (deprecated) Use the name field and coordinates for the FASTA header
-nameOnly Use the name field for the FASTA header
-split Given BED12 fmt., extract and concatenate the sequences
from the BED "blocks" (e.g., exons)
-tab Write output in TAB delimited format.
-bedOut Report extract sequences in a tab-delimited BED format instead of in FASTA format.
- Default is FASTA format.
-s Force strandedness. If the feature occupies the antisense,
strand, the sequence will be reverse complemented.
- By default, strand information is ignored.
-fullHeader Use full fasta header.
- By default, only the word before the first space or tab
is used.
-rna The FASTA is RNA not DNA. Reverse complementation handled accordingly.