中文情感识别 3
中文情感识别 3
序列分类:IMDB 影评分类
序列分类是通过输入的空间或时间序列,预测序列类别的任务。在序列分类中,最
大的问题是序列的长度可以变化,并且输入符号由非常多的词汇组成,而且可能需要模型来学习输入序列中的上下文或符号之间的依赖关系。本章将介绍如何利用 LSTM来解决序列分类问题
问题描述
采用 IMDB 数据集来对序列分类问题进行分析,通过LSTM来分析影评中对电影的评价。
简单 LSTM
词嵌入层 + LSTM + 输出层
关键问题
- 嵌入层 理解不到位
考虑复现: http://frankchen.xyz/2017/12/18/How-to-Use-Word-Embedding-Layers-for-Deep-Learning-with-Keras/
代码
'''
序列分类:IMDB 影评分类 LSTM
'''
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
from keras.datasets import imdb
import numpy as np
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers.embeddings import Embedding
from keras.layers import LSTM
from keras.layers import Dense
seed = 7
top_words = 5000
max_words = 500
out_dimension = 32
batch_size = 128
epochs = 2
def build_model():
model = Sequential()
model.add(Embedding(top_words, out_dimension, input_length=max_words))
model.add(LSTM(units=100))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam',metrics=['accuracy'])
# 输出模型的概要信息
model.summary()
return model
np.random.seed(seed=seed)
# 导入数据
(x_train, y_train), (x_validation, y_validation) = imdb.load_data(num_words=top_words)
x_train = sequence.pad_sequences(x_train, maxlen=max_words)
x_train = sequence.pad_sequences(x_train, maxlen=max_words)
x_validation = sequence.pad_sequences(x_validation, maxlen=max_words)
# 生成模型并训练模型
model = build_model()
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=2)
scores = model.evaluate(x_validation, y_validation, verbose=2)
print('Accuracy: %.2f%%' % (scores[1] * 100))
结果
Model: "sequential_6"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_3 (Embedding) (None, 500, 32) 160000
_________________________________________________________________
lstm_6 (LSTM) (None, 100) 53200
_________________________________________________________________
dense_3 (Dense) (None, 1) 101
=================================================================
Total params: 213,301
Trainable params: 213,301
Non-trainable params: 0
_________________________________________________________________
M:\Anaconda3\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:433: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Epoch 1/2
- 468s - loss: 0.5198 - accuracy: 0.7326
Epoch 2/2
- 412s - loss: 0.2807 - accuracy: 0.8871
Accuracy: 85.58%