【语言处理与Python】5.6基于转换的标注
Brill标注,一种归纳标注方法。基于转换的学习:猜每个词的标记,然后返回和修复错误。在这种方式中,Brill标注器将会陆续将一个不良标注的文本转换成一个更好的。需要已经标注的训练数据来评估标注器的猜测是否是一个错误。
(1) ThePresidentsaid he willask Congressto increase grantsto states for vocational rehabilitation.
在这句话中,Brill是这样做的:
下面这个代码,演示了Brill标注器:
>>>nltk.tag.brill.demo() TrainingBrilltagger on80sentences... Finding initial usefulrules... Found6555usefulrules. B | S F r O | Score= Fixed- Broken c i o t | R Fixed= num tags changedincorrect -> correct o x k h | u Broken= num tags changedcorrect -> incorrect r e e e | l Other= num tags changedincorrect -> incorrect e d n r | e ------------------+------------------------------------------------------- 12 13 1 4 | NN-> VBif the tag ofthe precedingwordis 'TO' 8 9 1 23 | NN-> VBDif the tag ofthe following wordis 'DT' 8 8 0 9 | NN-> VBDif the tag ofthe preceding wordis 'NNS' 6 9 3 16 | NN-> NNPif the tag ofwordsi-2...i-1 is '-NONE-' 5 8 3 6 | NN-> NNPif the tag ofthe following wordis 'NNP' 5 6 1 0 | NN-> NNPif the text ofwordsi-2...i-1 is 'like' 5 5 0 3 | NN-> VBNif the text ofthe following wordis '*-1' ... >>>print(open("errors.out").read()) left context | word/test->gold | right context -----------------------------------+--------------------------------+--------------------------| Then/NN->RB | ,/, in/IN the/DT guests/N , in/IN the/DT guests/NNS | '/VBD->POS | honor/NN,/, the/DT speed '/POS honor/NN,/, the/DT | speedway/JJ->NN | hauled/VBD out/RP four/CD NN,/, the/DT speedway/NN| hauled/NN->VBD | out/RPfour/CD drivers/NN DTspeedway/NN hauled/VBD| out/NNP->RP | four/CD drivers/NNS,/, c dway/NNhauled/VBD out/RP| four/NNP->CD | drivers/NNS ,/, crews/NNS hauled/VBD out/RPfour/CD | drivers/NNP->NNS | ,/, crews/NNS and/CC even Pfour/CD drivers/NNS ,/, | crews/NN->NNS | and/CC even/RB the/DT off NNSand/CC even/RB the/DT| official/NNP->JJ | Indianapolis/NNP 500/CDa | After/VBD->IN | the/DT race/NN ,/, Fortun ter/IN the/DT race/NN ,/, | Fortune/IN->NNP | 500/CDexecutives/NNS dro s/NNS drooled/VBD like/IN | schoolboys/NNP->NNS | over/INthe/DT cars/NNS a olboys/NNS over/INthe/DT | cars/NN->NNS | and/CC drivers/NNS ./.
有关Brill的具体知识,请自行查找相关资料。