Perl-统计文本中各个单词出现的次数（NVDIA2019笔试）

1、原题

2、perl脚本

print "================ Method 1=====================\n";
open IN,'<','anna-karenina.txt';
while(<IN>){
        chomp;  
        $line = $_;
        $line =~ s/[ \. , ? ! ; : ' " ( ) { }  \[ \]]/ /g; #句号，逗号等统一改为空格
        #print("$line\n");
        @words = split(/\s+/,$line);
        foreach $word (@words){
                $counts{lc($word)}++;  #将出现的单词存入hash表
        }
};


foreach $word (sort keys %counts) {
        print "$word,$counts{$word}\n";  #打印出单词出现的个数
}
close IN;


print "================ Method 2=====================\n";
open IN,'<','anna-karenina.txt';
while (my $line = <IN>)
{
        #map{$words{$_}++;} $line =~ /(\w+)/g   # 与下面的语句等效

        #print($line =~ /(\w+)/g);
        foreach ($line =~ /(\w+)/g){   # 对单词进行匹配
                #print("$_\n");
                $words{lc($_)}++;
        }
}
for (sort keys(%words))
{
    print "$_: $words{$_}\n";
}

3、结果

1）测试文本

All happy families resemble one another; every unhappy family is unhappy in its own way.
All was confusion in the house of Oblonskys. happy? happy: [happy] {happy} "happy" 'happy'

2）输出

================ Method 1=====================
all,2
another,1
confusion,1
every,1
families,1
family,1
happy,7
house,1
in,2
is,1
its,1
oblonskys,1
of,1
one,1
own,1
resemble,1
the,1
unhappy,2
was,1
way,1
================ Method 2=====================
all: 2
another: 1
confusion: 1
every: 1
families: 1
family: 1
happy: 7
house: 1
in: 2
is: 1
its: 1
oblonskys: 1
of: 1
one: 1
own: 1
resemble: 1
the: 1
unhappy: 2
was: 1
way: 1

4、涉及的知识点

1）对多个项目进行替换可以使用方括号：

　　$line =~ s/[ \. , ? ! ; : ' " ( ) { } \[ \]]/ /g; #句号，逗号等统一改为空格

2）将单词小写lc，用哈希计数

　　$counts{lc($word)}++; #将出现的单词存入hash表

3）访问哈希整体%，访问哈希键值keys %，排序sort

　　sort keys %counts

4）方法2使用 $line =~ /(\w+)/g 直接将文本中的单词转换成列表

posted @ 2020-02-26 20:41 笑着刻印在那一张泛黄阅读(1471) 评论(0) 收藏举报

刷新页面返回顶部