随笔- 7 文章- 0 评论- 0 阅读- 79929

使用APT (A reference-based metric to evaluate the accuracy of pronoun translation)评估机器翻译中代词翻译(pronoun translation)的质量

Chapter1 跑通example

1.1 运行环境及依赖包

　　运行在python3.68环境。首先将需要的包pip install 比如 ConfigParser

1.2 github下载代码

　　输入命令行git clone https://github.com/LinqingChen/APT.git该代码已由本文作者更新适配Python3

1.3　运行example中的demo

　　进入APT目录： cd [path-to-APT]/APT

　　运行demo：python APT.py ./data_es-en/config_1

Chapter2 运行自己的评测

2.1 评测文件准备与评测参数设置

　　按 [path-to-APT]/APT/data_es-en/config_1 文件中的参数准备评测数据并更新相关设置

　　下文中将必须根据评测要求进行更改的参数标记为红色。

　　其中 source，target对应源语言，目标语言，reference对应golden语句(人工翻译的高质量语料)，三种语料必须已经在句子级别对齐。

　　alignment_source_target/reference是源语言分别与目标语言及reference的词对齐信息，需要特别注意到是，此处使用的是对称化词对齐信息，需要使用Moses或GIZA++，本文作者会在其他文章中分别介绍使用两种工作生成平行语料对称化词对齐信息的方法。

[lang]
source: es
target: en
# Optional character to divide word. eg. "-" for "est-t-il"
source_word_separator:
target_word_separator:

[files]
source: ./data_es-en/china.low.es
reference: ./data_es-en/china.low.en.ref
target: ./data_es-en/china.low.en
alignment_source_reference: ./data_es-en/china.ref.symal
alignment_source_target: ./data_es-en/china.symal
# The list of source pronouns to evaluate is mandatory
list_source_pronouns: data_es-en/list_es
# Content type of the list of source pronouns to evaluate [word|possition]
input_type: word
# The list of target pronouns to evaluate is optional. Pronouns not in the list will be consider as "OTHER".
list_target_pronouns:

[dictionary]
# Optional dictionaries for considering equal or similar pronouns
equal:
similar:
source_pronouns:
target_pronouns:

[cases]
# Cases: 1 Equal, 2 Similar, 3 Different, 4 Empty target, 5 Empty reference, 6 Empty target and reference . The cases not listed will be dropped.
cases_to_use: 1,2,3,4,5,6
# Weiths: In the range [0, 1], (1 Correct, 0 Incorrect)
weigths_per_case: 1,0.5,0,0,0,0
# Only if list_target_pronouns. When reference and cadidate are "OTHER", they will be consider equal or not.
count_OTHER_as_equal: false

[output]
output_file: ./data_es-en/output1
# If counting multiword alignment as separate words in the matrix.
counting_multiword_in_matrix: false
# Lenght of columns of the output confusion matrix
max_length_matrix = 10

posted @ 2020-01-14 10:54 lqchen 阅读(351) 评论(0) 编辑收藏举报

努力加载评论中...

刷新页面返回顶部

公告

昵称： lqchen
园龄： 6年3个月
粉丝： 0
关注： 0

2025年3月

日

一

二

三

四

五

六

MT blog

使用APT (A reference-based metric to evaluate the accuracy of pronoun translation)评估机器翻译中代词翻译(pronoun translation)的质量

Chapter1 跑通example

1.1 运行环境及依赖包

1.2 github下载代码

1.3　运行example中的demo

Chapter2 运行自己的评测

2.1 评测文件准备与评测参数设置

公告

搜索

常用链接

随笔分类

随笔档案

阅读排行榜

推荐排行榜

MT blog

使用APT (A reference-based metric to evaluate the accuracy of pronoun translation)评估机器翻译中代词翻译(pronoun translation)的质量

Chapter1 跑通example

1.1 运行环境及依赖包

1.2 github下载代码

1.3 运行example中的demo

Chapter2 运行自己的评测

2.1 评测文件准备与评测参数设置

公告

搜索

常用链接

随笔分类

随笔档案

阅读排行榜

推荐排行榜

1.3　运行example中的demo