机器会学习么学习总结

import pandas as pd
%matplotlib inline
raw_train = pd.read_csv("./input/train_sample_utf8.csv",encoding="utf8")
raw_test = pd.read_csv("./input/test_sample_utf8.csv",encoding="utf8")
raw_train.head(5)
raw_test.head(5)
raw_train.shape
raw_test.shape

import matplotlib.pyplot as plt
plt.figure(figsize=(15, 8))
plt.subplot(1, 2, 1)
raw_train["分类"].value_counts().sort_index().plot(kind="barh",title='训练集新闻主题分布')
plt.subplot(1, 2, 2)
raw_test["分类"].value_counts().sort_index().plot(kind="barh",title='测试集新闻主题分布')

内容进行分词；
import jieba
def news_cut(text):
    return " ".join(list(jieba.cut(text)))
#简单测试下分词效果
test_content = "六月初的一天，来自深圳的中国旅游团游客纷纷拿起相机拍摄新奇刺激的好莱坞环球影城主题公园场景。"
print(news_cut(test_content))

Python 中一个著名的中文分析器 jieba 完成这项任务。封装一个 news_cut 函数，它接受的输入为新闻内容，输出为分词后的结果。分词后，词与词之间使用空格进行分隔。

posted @ 2021-01-26 11:30 大米粒o 阅读(89) 评论(0) 编辑收藏举报

刷新页面返回顶部