20180918博客作业

作业要求参见 https://edu.cnblogs.com/campus/nenu/2018fall/homework/2126

本次作业代码地址https://git.coding.net/zsy1996/text.git代码为wf.py文件

1.根据ASCII码表来去掉特殊字符,比如,。!“”等

   定义词频字典

   循环判定每个单词的频率

根据字典的value每个单词词频排序

 

def getFrequency(testtext):
    testtext = re.sub('[^a-zA-Z0-9n]', ' ', testtext) #根据ASCII码表去掉特殊字符,比如,。!“”等
    frequency = {} #定义词频字典
    for word in testtext.split(): #循环判定每个单词频率
        if word in frequency:
            frequency[word] += 1
        else:
            frequency[word] = 1
    frequency = sorted(frequency.items(),key = lambda x:x[1],reverse = True)

 

 2.全文共有___个不重复的单词,以及每个单词出现的次数

根据文本的词汇量总数来判定显示多少个数据

 print('全文共有',len(frequency),'个不重复的单词')

    if (len(frequency) > 100):
        for x in range(0,10):
            a = frequency[x][0]
            b = frequency[x][1]       
            print('单词',a,'出现的次数为',b)
    else:
        for x in range(0,len(frequency)):
            a = frequency[x][0]
            b = frequency[x][1]       
            print('单词',a,'出现的次数为',b)

 

 

 

3.输出单一文本

 

def inputfc(inputtxt):      
    with open(inputtxt,encoding = 'UTF-8') as wf: 
        getFrequency(wf.read())     

 

 

 4.输入文件夹统计文件夹内文本函数

 

def inputfilefc(self):
    name_delete = '([\s\S]*?).txt'
    txtlist = []
    txtlist = os.listdir(inputfile)
    for i in range(0,len(txtlist)):
        a = re.findall(name_delete, txtlist[i])
        print(a)
        inputfc(txtlist[i])

 

 

功能实现

 

  1. 测试文本,无重难点

 

 

 

代码

for word in testtext.split(): #循环判定每个单词频率
        if word in frequency:
            frequency[word] += 1
        else:
            frequency[word] = 1
    frequency = sorted(frequency.items(),key = lambda x:x[1],reverse = True) #根据字典的value(每个单词词频)排序
    print('') #为了好看
    print('全文共有',len(frequency),'个不重复的单词')

 

 

 2.重点是文件名后面一定要输入.txt

   def(定义函数)inputfc输入文本到python

 

 

 

输入文本地址

inputtxt = input() #输入文本地址
inputfc(inputtxt)

 

 

def(定义函数)inputfc输入文本到python

def inputfc(inputtxt):      
    with open(inputtxt,encoding = 'UTF-8') as wf: 
        getFrequency(wf.read())     

 

 

 

3.输入文件夹,对文件夹内的所有文本进行词频统计

 

 

 

 

 

重点:导入文本

def inputfilefc(self)

txtlist = os.listdir(inputfile)导入文件夹

def inputfilefc(self):
    name_delete = '([\s\S]*?).txt'
    txtlist = []
    txtlist = os.listdir(inputfile)
    for i in range(0,len(txtlist)):
        a = re.findall(name_delete, txtlist[i])
        print(a)
        inputfc(txtlist[i])

 

 

统计结果

 

 

 

4.

重点:command = input()键盘输入指令

将数据导入CSV文档中

command = 1
while command:
    print("输入1导入文档,输入其他退出程序,回车键确认!")
    command = input()
    if command == '1':
        print("请输入导入的文档地址,回车键确认!")
        inputcommand = input()
        with open(inputcommand,encoding = 'UTF-8') as wf: 
            outtxt = wf.read()
            with open("1.csv","w", newline='') as csvfile: 
                writer = csv.writer(csvfile)
                writer.writerow(["单词","词频"])
                outtxt = re.sub('[^a-zA-Z0-9n]', ' ', outtxt) #根据ASCII码表去掉特殊字符,比如,。!“”等
                frequency = {} #定义词频字典   

 

 

 

PSP阶段

 

 

posted @ 2018-09-24 15:40  朱珅莹  阅读(309)  评论(0编辑  收藏  举报