随笔分类

随笔档案

paper

The sequence and de novo assembly of the giant panda genome

review

Dissecting evolution and disease using comparative vertebrate genomics

最新评论

1. Re:genBlastA
@胡枝子请问您解决了吗，我也遇到一样的报错，如果解决了可以告知一下经验吗，感谢...
--ywaanngg
2. Re:k-means|k-mode|k-prototype|PAM|AGNES|DIANA|Hierarchical cluster|DA|VIF|
感觉怎么这么像我们老师讲的课，生物统计学课？李老师讲的？ucas？
--2017张晶晶
3. Re:genBlastA
你好，可以交流下genblasta的使用吗？我在用它的时候有一个报错： formatting target database... sh: formatdb: command not found b...
--胡枝子
4. Re:Repbase library|divergence rate|self-sequence alignment|genomic rearrangement|cutoffs|breakpoint
您好:
请问用Repbase数据库（扩张度来自repeat base）分析熊猫基因中的转录原件的扩张度具体是如何操作呢,很感兴趣,谢谢
--4nyong
5. Re:grep -v|grep -F
@ 刘琪我已经修改了，由于年代久远我已经忘了，大概是这个意思，有问题请及时联系，谢谢。...
--YUANya

BLAST is an extensively used local similarity search tool for identifying homologous sequences. When a gene sequence (either protein sequence or nucleotide sequence) is used as a query to search for homologous sequences in a genome, the search results, represented as a list of high-scoring pairs (HSPs), are fragments of candidate genes rather than full-length candidate genes. Relevant HSPs ("signals"), which represent candidate genes in the target genome sequences, are buried within a report that contains also hundreds to thousands of random HSPs ("noises"). Consequently, BLAST results are often overwhelming and confusing even to experienced users. For effective use of BLAST, a program is needed for extracting relevant HSPs that represent candidate homologous genes from the entire HSP report. To achieve this goal, we have designed a graph-based algorithm, genBlastA, which automatically filters HSPs into well-defined groups, each representing a candidate gene in the target genome. The novelty of genBlastA is an edge length metric that reflects a set of biologically motivated requirements so that each shortest path corresponds to an HSP group representing a homologous gene. We have demonstrated that this novel algorithm is both efficient and accurate for identifying homologous sequences, and that it outperforms existing approaches with similar functionalities.

做完tblastn之后，output是很多fragment represent sequence，与fragment represent sequence对应的gene便是candidate gene，这些fragment represent sequence收在一个report中（就是all-opsin.pep.gba.report这个report），这个report中有相关的HSP（也就是高分序列）和random HSP(随机产生，但是被tblastn program认为是HSP的序列，这些错误序列就是noise)，genblasta就是将这些noise filter 的 tool。

genBlastA release v1.0.1

SYNOPSIS:
Given a list of query protein or DNA sequences and a target database that
consists of DNA sequences, this program runs wu-blast tblastn on the list
of sequences provided, then for each query, it groups the resulted HSPs
into sensible groups so that each group of HSPs corresponds to a potential
target gene that is homologous to the query. The output is ranked according
to their homology to the query.

Command line options:
-P Search program used to produce blast-format sequence alignments,
can be either "blast" or "wublast", default is "blast",
optional
-q List of query sequences to blast, must be in fasta format,
required
-t The target database of genomic sequences in fasta format,
required
-p Whether query sequences are protein sequences (T/F)
[default: T], optional
-pg Specify which blast/wublast program to run. If not specified,
the default behaviour is to run tblastn (for blast/wublast protein
sequence) / blastn (for blast nucleotide sequence) or tblastx
(for wublast nucleotide sequence).
-e parameter for blast: The e-value, [default: 1e-2],
optional
-g parameter for blast: Perform gapped alignment (T/F)
[default: T], optional
-f parameter for blast: Perform filtering (T/F) [default: F],
optional
-a parameter for genBlast: weight of penalty for skipping HSPs,
between 0 and 1 [default: 0.5], optional
-d parameter for genBlast: maximum allowed distance between HSPs
within the same gene, a non-negative integer [default: 100000],
optional
-r parameter for genBlast: number of ranks in the output,
a positive integer, optional
-c parameter for genBlast: minimum percentage of query gene
coverage in the output, between 0 and 1 (e.g. for 50%
gene coverage, use "0.5"), optional
-s parameter for genBlast: minimum score of the HSP group in
the output, a real number, optional
-o output filename, optional. If not specified, the output
will be the same as the query filename with ".gblast"
extension.