Sentiment Analysis and Opinion Mining (3)- Document Sentiment Classification - LakeLight

公告

Sentiment Analysis and Opinion Mining (3)- Document Sentiment Classification

将文档看作一个整体，得到该文档的整体情感倾向，可以看作是对实体Entity的整体GENERAL方面Aspect进行情感分类。基本假设：文档只是一个观点持有者对一个实体进行的情感表达。此假设对于商品评论比较合适，但对于论坛、博客等不太适合。

有监督的情感分类方法:

情感分类可以简单的归于一个二分类问题：Positive 和 Negative。训练集和测试集可以用商品的评论数据，商品的评论一般有整体评分，4-5分可以认为是Positive，1-2分可以认为是Negative。一般都没有考虑中立 Neutral 类。

情感分类也是一个文本分类问题，传统的文本分类主要特征是主题相关词，而情感分类主要特征是情感和观点词，如：great, amazing, excellent, horrible, bad, worse etc.Pang, Lee 2002 movie reviews Naive Bayes SVM

常用特征：

Terms and their frequency: Term , N-grams, TF , IDF, Term Position

Pos of speech: 词性标注，Pos tags and their n-grams，如对于观点的表达，形容词比较重要

Sentiment words and phrases: 情感词典，常用的表达情感的词汇，主要是形容词、动词和名词

Rules of opinions：观点规则，除情感词外，常用的表达隐含情感的规则

Sentiment Shifters：情感转移，能改变情感倾向的词汇，否定词，如：not 不

Syntactic dependency：语法依赖树

除传统的机器学习方法外，也有一些适合情感分类的专用方法：

Dave 2003 score function

Tong 2001 domain-specific aggregation

已有工作：

(Pang and Lee, 2004), the minimum cut algorithm working on a graph
(Mullen and Collier,2004; Xia and Zong, 2010), syntactic relations were used together with traditional features.
(Kennedy and Inkpen, 2006; Li et al., 2010), the contextual valence and sentiment shifters were employed for classification
(Cui, Mittal and Datar, 2006), an evaluation was reported with several sentiment classification algorithms
(Abbasi, Chen and Salem, 2008), a genetic algorithm based feature selection
(Li, Zhang and Sindhwani, 2009), a non-negative matrix factorization method
(Dasgupta and Ng, 2009; Li et al., 2011; Zhou, Chen and Wang, 2010), semi-supervised learning and/or active learning were experimented
(Kim, Li and Lee, 2009) and (Paltoglou and Thelwall, 2010), different IR term weighting schemes were studied and compared for sentiment classification
(Martineau and Finin, 2009), a new term weighting scheme called Delta TFIDF was proposed
(Mejova and Srinivasan, 2011) the authors explored various feature definition and selection strategies
(Nakagawa, Inui and Kurohashi, 2010), a dependency tree-based classification method was proposed, which used conditional random fields
(Yessenalina, Yue and Cardie, 2010), multilevel structured models were proposed
(Wang et al., 2011), the authors proposed a graph-based hashtag approach to classifying Twitter post sentiment
(Liu et al., 2010), different linguistic features were compared for both blog and review sentiment classification

无监督的情感分类方法：

(Turney, 2002)performs classification based on some fixed syntactic patterns that are likely to be used to express opinions

(Taboada et al., 2011) the lexicon-based method, which uses a dictionary of sentiment words and phrases with their associated orientations and strength, and incorporates intensification and negation to compute a sentiment score for each document

(Ding, Liu and Yu, 2008; Hu and Liu, 2004; Kim and Hovy, 2004) This method was originally used in sentence and aspect-level sentiment classification

情感得分预测：

与分类不同，此问题归结于一个回归问题

Pang and Lee (2005) experimented with SVM regression, SVM multiclass classification using the one-vs-all (OVA) strategy, and a meta-learning method called metric labeling.
Goldberg and Zhu (2006) improved this approach by modeling rating prediction as a graph-based semi-supervised learning problem
Qu, Ifrim and Weikum (2010) introduced a bag-of-opinions representation of documents to capture the strength of n-grams with opinions, which is different from the traditional bag-of-words representation

跨领域Cross-Domain情感分类：

情感分类和领域强相关，一个领域内的情感分类模型在其他领域的表现会很糟糕。如果想进行领域扩展，如酒店领域扩展到家具领域，就需要Domain Adaptation 或 Transfer Learning。

两种思路：

(Aue and Gamon, 2005) 新领域只需要少量的标注数据

(Blitzer, Dredze and Pereira, 2007; Tan et al., 2007) 新领域不需要标注数据

(Yang, Si and Callan, 2006), a simple strategy based on feature selection was proposed for transfer learning for sentence level classification.
(Blitzer, Dredze and Pereira, 2007), the authors used a method called structural correspondence learning (SCL) for domain adaptation, which was proposed earlier in (Blitzer, McDonald and Pereira, 2006).
(Pan et al., 2010) proposed a method similar to SCL at the high level.
He, Lin and Alani (2011) used joint topic modeling to identify opinion topics (which are similar to clusters in the above work) from both domains to bridge them.
(Gao and Li, 2011), topic modeling was used too to find a common semantic space based on domain term correspondences and term co-occurrences in the two domains

跨语言Cross-Language情感分类：

(Wan, 2008), the author exploited sentiment resources in English to perform classification of Chinese reviews
(Wan, 2009), a co-training method was proposed which made use of an annotated English corpus for classification of Chinese reviews in a supervised manner
Wei and Pal (2010) proposed to use a transfer learning method for crosslanguage sentiment classification
Boyd-Graber and Resnik (2010) extended the topic modeling method supervised latent Dirichlet allocation (SLDA) (Blei and McAuliffe, 2007) to work on reviews from multi-languages for review rating prediction
(Guo et al., 2010), a topic model based method was proposed to group a set of given aspect expressions in different languages into aspect clusters (categories) for aspect-based sentiment comparison of opinions from different countries

posted on 2013-07-04 11:32 LakeLight 阅读(571) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部