【语言处理与Python】7.4语言结构中的递归

用级联分块器构建嵌套结构

例如如下,是名词短语、介词短语、动词短语和句子的模式。一个四级块语法器。

grammar= r"""
NP:{<DT|JJ|NN.*>+} #Chunksequences of DT,JJ, NN
PP:{<IN><NP>} #Chunkprepositions followed byNP
VP:{<VB.*><NP|PP|CLAUSE>+$} #Chunkverbs and their arguments
CLAUSE:{<NP><VP>} #ChunkNP,VP
"""
cp= nltk.RegexpParser(grammar)
sentence = [("Mary", "NN"), ("saw", "VBD"), ("the", "DT"), ("cat", "NN"),
("sit", "VB"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]
>>>print cp.parse(sentence)
(S
(NP Mary/NN)
saw/VBD
(CLAUSE
(NP the/DT cat/NN)
(VP sit/VB (PP on/IN(NP the/DT mat/NN)))))

但是,这个是有缺陷的,没有认出来saw为首的VP。如果句子嵌套更深,那么更无法正常工作,我们可以设置循环次数,来解决这个问题。

 

>>>cp = nltk.RegexpParser(grammar,loop=2)
>>>print cp.parse(sentence)
(S
    (NP John/NNP)
    thinks/VBZ
    (CLAUSE
    (NP Mary/NN)
    (VP
        saw/VBD
        (CLAUSE
            (NP the/DT cat/NN)
            (VP sit/VB (PP on/IN(NP the/DT mat/NN)))))))

级联这种方法是有局限性的,创建和调试困难,只能产生固定深度的树,完整句法分析是不够的。

树大家应该都很熟悉,不过多介绍树的定义。

在NLTK中,我们也可以创造树。

>>>tree1 = nltk.Tree('NP',['Alice'])
>>>print tree1
(NP Alice)
>>>tree2 = nltk.Tree('NP',['the', 'rabbit'])
>>>print tree2
(NP the rabbit)
#我们也可以合并树
>>>tree3 = nltk.Tree('VP',['chased', tree2])
>>>tree4 = nltk.Tree('S',[tree1, tree3])
>>>print tree4
(S (NP Alice)(VP chased (NP the rabbit)))
#这里是树的一些方法
>>>print tree4[1]
(VP chased(NP the rabbit))
>>>tree4[1].node
'VP'
>>>tree4.leaves()
['Alice', 'chased', 'the', 'rabbit']
>>>tree4[1][1][1]
'rabbit'
#有的树直接看代码会很不直观,我们可以画出来
>>>tree3.draw()

image

树遍历

def traverse(t):
    try:
        t.node
    except AttributeError:
        print t,
    else:
        #Nowweknowthat t.node is defined
        print '(', t.node,
        for child in t:
            traverse(child)
        print ')',
>>>t =nltk.Tree('(S(NP Alice)(VP chased(NP the rabbit)))')
>>>traverse(t)
( S( NPAlice) ( VPchased ( NPthe rabbit ) ) )
posted @ 2013-05-30 22:50  createMoMo  阅读(553)  评论(0编辑  收藏  举报