酒店评论情感判断模型训练（非神经网络）

hotel.txt

1,距离川沙公路较近,但是公交指示不对,如果是蔡陆线的话,会非常麻烦
1,商务大床房，房间很大，床有2M宽，整体感觉经济实惠不错!
1,酒店比较新，装潢和设施还不错，只是房间有些油漆味。
0,房间设施还可以，但酒店内非常的冷，冬天不推荐入住。
0,太令人失望了。太差劲了。
0,什么电力宾馆呀？！根本就象私人的“大车店”！

import jieba
import re
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB

pattern = re.compile(r'[^\u4e00-\u9fa5a-zA-Z0-9]+')
# 读取数据集文件并解析每行数据
with open('hotel.txt', 'r', encoding='utf-8') as f:
    lines = f.readlines()
    labels = []
    texts = []
    for line in lines:
        label, text = line.strip().split(',',1)
        labels.append(int(label))
        text = pattern.sub('', text)
        texts.append(text)

# 对于中文文本数据进行预处理，使用jieba分词，去停用词
stop_words = [line.strip() for line in open('stopwords.txt', 'r', encoding='utf-8').readlines()]
texts_tokenized = []
for text in texts:
    words = jieba.cut(text)
    words_cleaned = [word for word in words if word not in stop_words]
    texts_tokenized.append(' '.join(words_cleaned))


# 将清洗和预处理后的数据转化成模型所需要的向量格式，使用词袋模型
vectorizer = TfidfVectorizer()
vectors = vectorizer.fit_transform(texts_tokenized)
vectors = vectors.toarray()

# print(vectors)
# 建立一个分类模型来训练数据，使用朴素贝叶斯分类器
clf = MultinomialNB()
clf.fit(vectors, labels)

# 对新的评论进行预测
test_texts = ['这间房间很干净，创业很舒服','很早就到了，但是服务员半天才给打开房间，真是没法说','虽然来晚了，但是房间一直留着呢，还安排了汽车从机场接回','早饭有两个鸡蛋，一个香肠。午饭也安排了好吃的自助，唯一不足的就是肉有点少，哈哈哈','洗澡水温度不错，但是没有浴巾，这可怎么办，总不能让我拿衣服擦吧','临走的时候互相加了微信，我会推荐给好友的']

for test_text in test_texts:
    test_text_tokenized = jieba.cut(test_text)
    test_text_cleaned = [word for word in test_text_tokenized if word not in stop_words]
    test_text_processed = ' '.join(test_text_cleaned)
    test_vector = vectorizer.transform([test_text_processed]).toarray()
    test_prediction = clf.predict(test_vector)

    if test_prediction[0] == 0:
        print('[负面]' + test_text)
    else:
        print('[正面]' + test_text)

posted @ 2024-03-15 15:39 从雍和宫走到电影学院阅读(88) 评论(0) 收藏举报

刷新页面返回顶部

从雍和宫走到电影学院

酒店评论情感判断模型训练（非神经网络）

公告