Mining Text Data Chapter One: An Introduction to Data Mining

Introduction

Data mining can learn interesting patterns from the data in a dynamic and scalable way. Information retrieval has traditionally focused more on facilitating information access rather than analyzing information to discover patterns, which is the primary goal of text mining.

The most important characteristic of text data is sparse and high dimensional due to string input. Furthermore, in most application, it would be desirable to represent text information semantically, however, the natural language processing are still not robust enough to work. Usually, text data will be treated as a bag-of-words or a string of words.

Recently, there has been rapid growth of text data in the context of different web-based applications such as social media

Algorithm for text mining

1, information extraction from text data

2, text summarization

3, unsupervised learning methods from text data: clustering and topic modeling

4, LSI and dimensionality reduction for text mining

5, supervised learning methods from test data: classification and transfer learning

6, transfer learning with text data: for cross-lingual mining in some web source

7, probabilistic techniques for test mining

8, mining text streams: for Reuters and news

9, cross-lingual mining of text data

10, text mining in multimedia networks

11, text mining in social media

12, opinion mining from text data

13, text mining from biomedical data

Future Direction

1, Scalable and robust methods for natural language understanding: It is important to develop effective and robust information extraction and other natural language processing methods that can scale to multiple domains

2, Domain adaptation and transfer learning

3, Contextual analysis of text data: Text data is generally associated with a lot of context information such as authors, sources, and time, or more complicated information networks associated with text data.

4, Parallel text mining: In particular, how to parallelize all kinds of text mining algorithms, including both unsupervised and supervised learning methods is a major future challenge.

posted @ 2014-05-17 15:34  LeonCrash  阅读(297)  评论(0编辑  收藏  举报