BLAST is an extensively used local similarity search tool for identifying homologous sequences. When a gene sequence (either protein sequence or nucleotide sequence) is used as a query to search for homologous sequences in a genome, the search results, represented as a list of high-scoring pairs (HSPs), are fragments of candidate genes rather than full-length candidate genes. Relevant HSPs ("signals"), which represent candidate genes in the target genome sequences, are buried within a report that contains also hundreds to thousands of random HSPs ("noises"). Consequently, BLAST results are often overwhelming and confusing even to experienced users. For effective use of BLAST, a program is needed for extracting relevant HSPs that represent candidate homologous genes from the entire HSP report. To achieve this goal, we have designed a graph-based algorithm, genBlastA, which automatically filters HSPs into well-defined groups, each representing a candidate gene in the target genome. The novelty of genBlastA is an edge length metric that reflects a set of biologically motivated requirements so that each shortest path corresponds to an HSP group representing a homologous gene. We have demonstrated that this novel algorithm is both efficient and accurate for identifying homologous sequences, and that it outperforms existing approaches with similar functionalities.
做完tblastn之后,output是很多fragment represent sequence,与fragment represent sequence对应的gene便是candidate gene,这些fragment represent sequence收在一个report中(就是all-opsin.pep.gba.report这个report),这个report中有相关的HSP(也就是高分序列)和random HSP(随机产生,但是被tblastn program认为是HSP的序列,这些错误序列就是noise),genblasta就是将这些noise filter 的 tool。
genBlastA release v1.0.1
SYNOPSIS:
Given a list of query protein or DNA sequences and a target database that
consists of DNA sequences, this program runs wu-blast tblastn on the list
of sequences provided, then for each query, it groups the resulted HSPs
into sensible groups so that each group of HSPs corresponds to a potential
target gene that is homologous to the query. The output is ranked according
to their homology to the query.
Command line options:
-P Search program used to produce blast-format sequence alignments,
can be either "blast" or "wublast", default is "blast",
optional
-q List of query sequences to blast, must be in fasta format,
required
-t The target database of genomic sequences in fasta format,
required
-p Whether query sequences are protein sequences (T/F)
[default: T], optional
-pg Specify which blast/wublast program to run. If not specified,
the default behaviour is to run tblastn (for blast/wublast protein
sequence) / blastn (for blast nucleotide sequence) or tblastx
(for wublast nucleotide sequence).
-e parameter for blast: The e-value, [default: 1e-2],
optional
-g parameter for blast: Perform gapped alignment (T/F)
[default: T], optional
-f parameter for blast: Perform filtering (T/F) [default: F],
optional
-a parameter for genBlast: weight of penalty for skipping HSPs,
between 0 and 1 [default: 0.5], optional
-d parameter for genBlast: maximum allowed distance between HSPs
within the same gene, a non-negative integer [default: 100000],
optional
-r parameter for genBlast: number of ranks in the output,
a positive integer, optional
-c parameter for genBlast: minimum percentage of query gene
coverage in the output, between 0 and 1 (e.g. for 50%
gene coverage, use "0.5"), optional
-s parameter for genBlast: minimum score of the HSP group in
the output, a real number, optional
-o output filename, optional. If not specified, the output
will be the same as the query filename with ".gblast"
extension.
Example:
genblasta -P blast -pg tblastn -q myquery -t mytarget -p T -e 1e-2 -g T -f F -a 0.5 -d 100000 -r 10 -c 0.5 -s 0 -o myoutput
(Rong She (rshe@cs.sfu.ca) May 2010)
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· 10年+ .NET Coder 心语 ── 封装的思维:从隐藏、稳定开始理解其本质意义
· 地球OL攻略 —— 某应届生求职总结
· 提示词工程——AI应用必不可少的技术
· Open-Sora 2.0 重磅开源!
· 周边上新:园子的第一款马克杯温暖上架