python json 解析 + python编程题目

Encode过程，是把python对象转换成json对象的一个过程，常用的两个函数是dumps和dump函数。

dic1 = {'type':'dic1','username':'loleina','age':16}


两个函数的唯一区别就是dump把python对象转换成json对象生成一个fp的文件流，而dumps则是生成了一个字符串：
json_dic2 = json.dumps(dic1,sort_keys=True,indent =4,separators=(',', ': '),encoding="gbk",ensure_ascii=True )

ensure_ascii：默认值True，如果dict内含有non-ASCII的字符，则会类似\uXXXX的显示数据，设置成False后，就能正常显示

indent：应该是一个非负的整型，如果是0，或者为空，则一行显示数据，否则会换行且按照indent的数量显示前面的空白，这样打印出来的json数据也叫pretty-printed json

sort_keys：将数据根据keys的值进行排序。

Decode过程，是把json对象转换成python对象的一个过程，常用的两个函数是loads和load函数。区别跟dump和dumps是一样的。

让我们来做一个简单的 NLP（自然语言处理）任务

首先，我们要清楚 NLP 任务的基本步骤，也就是下面的四步：读取文件；去除所有标点符号和换行符，并把所有大写变成小写；
合并相同的词，统计每个词出现的频率，并按照词频从大到小排序；将结果按行输出到文件 out.txt。

[root@localhost opt]# cat   mynlp.py
#!/usr/bin/python
import re
def parse(text):
#substitute no word
        text = re.sub(r'[^\w]', ' ', text)
        print(text)
#lower
    text = text.lower()
#split the text into a word list
    word_list = text.split(' ')
#filter the none space
    word_list = filter(None,word_list)
#count times the word appear and sort the count
        count_word = {}
    for word in word_list:
        if word not in count_word:
            count_word[word] = 0
        count_word[word]+=1
        sorted_count_word = sorted(count_word.items(),key = lambda kv:kv[1],reverse=True)
    return(sorted_count_word)
#read the in.txt
with open('/opt/in.txt','r') as f:
    my_text = f.read()
with open('/opt/out.txt','a') as f:
    for word_count in parse(my_text):
        f.write('{} {}\n'.format(word_count[0],word_count[1]))
[root@localhost opt]#

posted @ 2018-03-06 22:48 littlevigra 阅读(935) 评论(2) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

littlevigra

用个简单例子来让自己清楚工作过程

python json 解析 + python编程题目

公告