生信软件的安装和使用—— 建树前非保守序列的过滤

所有的事情，都是问题导向的，这里遇到的问题，便是怎么将多序列比对进行有效的过滤。

我们关注到了一个软件，bmge，这是一个不算很老的软件，怎么说呢，引用量还可以，文中对当前比较热的几个软件之间进行了对比，包括Gblocks，trimAI，和Noisy。

文章标题如下：

BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments

安装过程很简单：

conda install -c bioconda bmge

conda是怎么安装的不再赘述，前面的博客和公众号中讲过。

本软件调用的是java，如果之前conda已有Java，需要注意版本信息，防止影响其他工具的工作。

软件用法如下：

$ bmge -?

BMGE comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to
redistribute it under certain conditions. See the file COPYING.txt for details.

BMGE (version 1.12) arguments :
-i <infile> : input file in fasta or phylip sequential format # 输入文件，建议用fasta格式，若为phylip格式，可以用 abyss phy2fa 做格式转换
-t [AA,DNA,CODON] : sequence coding in the input file (Amino Acid-, DNA-, RNA-, or CODON-coding sequences, respectively) # 输入文件格式，必填
-m BLOSUM<n> : for Amino Acid or CODON sequence alignment; name of the BLOSUM matrix used to estimate the entropy-like value for each character (n = 30, 35, 40, ..., 60, 62, 65, ..., 90, 95; default: BLOSUM62) # BLOSUM矩阵，不支持DNA模式，推荐使用30，详情看文献。
-m DNAPAM<n:r> : for DNA or RNA sequence alignment; name of the PAM matrix (n ranges from 1 to 10,000) and transition/transvertion ratio r value (r ranges from 0 to 10,000) used to estimate the entropy-like value for each character (default: DNAPAM100:2) # PAM 矩阵算法，其实我没看懂，就看到有人说选错了会造成误差放大，建议不用。。。用的话自己试试吧，默认的行最好。
-m DNAPAM<n> : same as previous option but with r = 1 # 同上
-m [ID,PAM0] : for all sequence coding; identity matrix used to estimate entropy-like values for each characters
-g <rate_max> : real number corresponding to the maximum gap rate allowed per character (ranges from 0 to 1; default: 0.2)
-g <col_rate:row_rate> : real numbers corresponding to the maximum gap rates allowed per sequence and character, respectively (range from 0 to 1; default: 0:0.2)

-h <thr_max> : real number corresponding to the maximum entropy threshold (ranges from 0 to 1; default: 0.5)

-h <thr_min:thr_max> : real numbers corresponding to the minimum and maximum entropy threshold, respectively (range from 0 to 1; default: 0:0.5)
-b <min_size> : integer number corresponding to the minimum length of selected region(s) (ranges from 1 to alignment length; default: 5)
-w <size> : sliding window size (must be odd; ranges from 1 to alignment length; if set to 1, then entropy-like values are not smoothed; default: 3)
-s [NO,YES] : if set to YES, performs a stationarity-based trimming of the multiple sequence alignement (default: NO)
-o<x> <outfile> : output file in phylip sequential (-o, -op, -opp, -oppp), fasta (-of), nexus (-on, -onn, -onnn) or html (-oh) format; for phylip and nexus format, options -opp and -onn allow NCBI-formatted sequence names to be renamed onto their taxon name only; options -oppp and -onnn allow renaming onto 'taxon name'_____'accession number' (default: -oppp) # 输出文件
-c<x> <outfile> : same as previous option but for the complementary alignment, except for html output (i.e. -ch do not exist)
-o<y> <outfile> : converts the trimmed alignment in Amino Acid- (-oaa), DNA- (-odna), codon- (-oco) or RY- (-ory) coding sequences (can be combined with -o<x> options; default: no conversion) (see documentation for more details and more output options)

上面密密麻麻的参数，去相关文献里面看，要么用了blosum参数，要么直接默认，真实效果还不知，因为我没有测试数据，tools的文献怎么说，当然说自己好啊，不信你看：

当然，文中毫不吝啬地说，其实，在分分歧度较低的时候，gblocks会好一些。

分歧度较低的时候，怎么样乱来都能建好树的好吧。

具体用着怎么样，我吃完了这个瓜会写在公众号里面的，希望这个瓜是甜的。

gblock的引用量很高，这里给出传送门，就不说怎么安装了：

Gblocks Documentation (csic.es)

Gblocks nad3.pir -t=p -e=-gb1 -b4=5 -d=y # 这是个例子，蛋白的。

参数直接投喂在这里了：

请自行食用。

以上，

abysw

有个做比对的链接，有兴趣的同学可以试试Gblocks (vardb.org)

posted on 2020-11-30 19:43 Yuan-SW-F(abysw) 阅读(1228) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

生信软件的安装和使用—— 建树前非保守序列的过滤

BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments

导航

公告