Mining Text Data Chapter Two: Information Extraction from Text (2)

Relation Extraction

A set of major relation types and their subtypes are defined by ACE (short for Annotation Context Extraction). ACE makes a distinction between relation extraction and relation mention extraction. The former refers to identifying the semantic relation between a pair of entities based on all the evidence we can gather from the corpus, whereas the latter refers to identifying individual mentions of entity relations. Because corpus-level relation extraction to a large extent still relies on accurate mention-level relation extraction,

Feature-based classification

A two-stage classification can be performed where at the first stage whether two entities are related is determined and at the second stage the relation type for each related entity pair is determined.

Some of the most commonly used features as follows:

1, entity features

2, lexical contextual features

3, syntactic contextual features: Syntactic relations between the two arguments or between an argument and another word can often be useful.

4, background knowledge: ontology, Wikipedia or DBpedia

A framework to organize the features used for relation extraction is proposed. A relation instance is represented as a labeled, directed graph G = (V, E, A, B), where V is the set of nodes in the graph, E is the set of directed edges in the graph, and A (feature value corresponding to feature set) and B (distinguish the node type) are functions that assign labels to the nodes. It is found that a combination of features at different levels of complexity and from different sentence representations, coupled with task-oriented feature pruning, gave the best performance.

Kernel methods

In machine learning, a kernel or kernel function defines the inner product of two observed instances represented in some underlying vector space. It can also be seen as a similarity measure for the observations. The major advantage of using kernels is that observed instances do not need to be explicitly mapped to the underlying vector space in order for their inner products defined by the kernel to be computed.

1, sequence-based kernel: the shortest path in dependency tree. However, the paths with different length will be ended with zero similarity.

2, tree-based kernel: The main motivation is that if two parse trees share many common subtree structures then the two relation instances are similar to each other.

3, composite kernel: combine 1 and 2

Weakly supervised (semi-supervised) learning method

Weakly supervised learning methods are the work with much less training data.

1, bootstrapping: Given a large corpus, we then look for co-occurrences of these entity pairs within close proximity. The assumption is that if two entities related through the target relation co-occur closely, the context in which they co-occur is likely to be a pattern for the target relation. An important step in bootstrapping methods is to evaluate the quality of extraction patterns (heuristic methods) so as not to include many noisy patterns during the extraction process.

2, distant supervision: usage of knowledge bases

Unsupervised Information Extraction

The key idea is to cluster entities or entity pairs based on their lexical-syntactic contextual features.

Relation discovery and template induction

Work example: They started by collecting a large number of news articles from different news sources on the Web. They then used simple clustering based on lexical similarity to find articles talking about the same event. Next they performed syntactic parsing and extracted named entities from these articles. Each named entity could then be represented by a set of syntactic patterns as its features. Finally, they clustered pairs of entities co-occurring in the same article using their feature representations. The end results were tables in which rows corresponded to different articles and columns corresponded to different roles in a relation.

With relation discovery, the most straightforward solution is to identify candidates of role fillers first and then cluster these candidates into clusters. However, the candidates can belong to many clusters with the same features. Later, prior label assignment with probability is defined.

The aforementioned studies cannot label the discovered slots. A method to this problem is that performs two steps of clustering where the first clustering step groups lexical patterns that are likely to describe the same type of events and the second clustering step groups candidate role fillers into slots for each type of events. A slot can be labeled using the syntactic patterns of the corresponding slot fillers.

Open Information Extraction

In order to alleviate the situation those relation discovery and template inductions usually work on a corpus from a single domain, it need open information extraction methods.

Open information extraction does not assume any specific target relation type. Recent work on open information extraction introduced more heuristics to improve the quality of the extracted relations, such as 1 A multi-word relation phrase must begin with a verb, end with a preposition, and be a contiguous sequence of words in the sentence; 2 A binary relation phrase ought to appear with at least a minimal number of distinct argument pairs in a large corpus.

Evaluation

For named entity recognition, strictly speaking a correctly identified named entity must satisfy two criteria, namely, correct entity boundary and correct entity type.

For relation extraction, as we have mentioned, there are two levels of extraction, corpus-level and mention-level. While evaluation at mention level requires annotated relation mention instances, evaluation at corpus level requires only truly related entity pairs, which may be easier to obtain or annotate than relation mentions

posted @ 2014-05-17 15:37  LeonCrash  阅读(311)  评论(0编辑  收藏  举报