Perl-统计文本中各个单词出现的次数(NVDIA2019笔试)

1、原题

 

 

 2、perl脚本

print "================ Method 1=====================\n";
open IN,'<','anna-karenina.txt';
while(<IN>){
        chomp;  
        $line = $_;
        $line =~ s/[ \. , ? ! ; : ' " ( ) { }  \[ \]]/ /g; #句号,逗号等统一改为空格
        #print("$line\n");
        @words = split(/\s+/,$line);
        foreach $word (@words){
                $counts{lc($word)}++;  #将出现的单词存入hash表
        }
};


foreach $word (sort keys %counts) {
        print "$word,$counts{$word}\n";  #打印出单词出现的个数
}
close IN;


print "================ Method 2=====================\n";
open IN,'<','anna-karenina.txt';
while (my $line = <IN>)
{
        #map{$words{$_}++;} $line =~ /(\w+)/g   # 与下面的语句等效

        #print($line =~ /(\w+)/g);
        foreach ($line =~ /(\w+)/g){   # 对单词进行匹配
                #print("$_\n");
                $words{lc($_)}++;
        }
}
for (sort keys(%words))
{
    print "$_: $words{$_}\n";
}

 

3、结果

1)测试文本

All happy families resemble one another; every unhappy family is unhappy in its own way.
All was confusion in the house of Oblonskys. happy? happy: [happy] {happy} "happy" 'happy'

2)输出

================ Method 1=====================
all,2
another,1
confusion,1
every,1
families,1
family,1
happy,7
house,1
in,2
is,1
its,1
oblonskys,1
of,1
one,1
own,1
resemble,1
the,1
unhappy,2
was,1
way,1
================ Method 2=====================
all: 2
another: 1
confusion: 1
every: 1
families: 1
family: 1
happy: 7
house: 1
in: 2
is: 1
its: 1
oblonskys: 1
of: 1
one: 1
own: 1
resemble: 1
the: 1
unhappy: 2
was: 1
way: 1

4、涉及的知识点

1)对多个项目进行替换可以使用方括号:

  $line =~ s/[ \. , ? ! ; : ' " ( ) { }  \[ \]]/ /g; #句号,逗号等统一改为空格

2)将单词小写lc,用哈希计数

  $counts{lc($word)}++;  #将出现的单词存入hash表

3)访问哈希整体%,访问哈希键值keys %,排序sort

  sort keys %counts

4)方法2使用  $line =~ /(\w+)/g  直接将文本中的单词转换成列表

 

posted @ 2020-02-26 20:41  笑着刻印在那一张泛黄  阅读(1413)  评论(0编辑  收藏  举报