写了两组代码文件,组内代码功能相同:
testv.pl vs testv.cpp
testreg.pl vs testreg.cpp
代码如下:
////////testreg.cpp/////////
#include<iostream>
#include<fstream>
#include<regex>
using namespace std;
int main(int argv, char ** argc)
{
fstream in(argc[1], fstream::in);
int line_count;
string line_content;
regex reg("[ATCG]");
while(getline(in, line_content))
{
line_count++;
if(line_count % 4 == 2)
{
if(regex_search(line_content, reg))
{
cout<<1<<endl;
}
}
}
return 0;
}
////////testreg.pl/////////
#!/usr/bin/perl
use strict;
use 5.010;
my $file = shift;
open SEQ, '<', $file or die "$!";
while(<SEQ>) {
chomp;
if($. % 4 == 2) {
if(/[ATCG]/) {
say 1;
}
}
}
////////testv.cpp/////////
#include<iostream>
#include<fstream>
#include<unordered_map>
using namespace std;
int main(int argv, char ** argc)
{
fstream in(argc[1], fstream::in);
int line_count;
string line_content;
typedef unordered_map<string, int> mapdef;
mapdef mymap;
while(getline(in, line_content))
{
line_count++;
if(line_count % 4 == 2)
{
mymap[line_content]++;
}
}
cout<<mymap.size()<<endl;
return 0;
}
////////testv.pl/////////
#!/usr/bin/perl
use strict;
use 5.010;
my $file = shift;
open SEQ, '<', $file or die "$!";
my %hash;
while(<SEQ>) {
chomp;
if($. % 4 == 2) {
$hash{$_}++;
}
}
say scalar(keys %hash);
使用shell命令,计算运行时间,结果如下:
time perl testv.pl Input
time ./a.out Input
time perl testreg.pl Input | wc -l
time ./a.out Input | wc -l
real | user | sys | ||
testv.pl | 0m0.141s | 0m0.121s | 0m0.011s | |
testv.cpp | 0m0.077s | 0m0.054s | 0m0.012s | |
testreg.pl | 0m0.142s | 0m0.122s | 0m0.006s | |
testreg.cpp | 0m0.251s | 0m0.104s | 0m0.137s |
其中,Input是fastq文件,含有54914DNA序列。
可以看出在涉及正则表达式运算时, c++明显不占优势,要卡一两面才输出结果