导航

Translation Agent 源码分析

Posted on 2024-06-15 15:13  蝈蝈俊  阅读(328)  评论(0编辑  收藏  举报

吴恩达老师开源了一套 AI Agent 翻译工作流 Translation Agent。
https://github.com/andrewyng/translation-agent/

工作流主要分三个步骤:

  1. 通过指定大语言模型(LLM)进行语言之间的翻译;
  2. 对翻译结果进行反思,并提出改进建议;
  3. 再根据这些建议进行优化翻译。

很多AI工作都可以用这样的三步来增强质量,我们来看它是如何实现的?以便可以借鉴到我们自己的工作中。

源码如下:
https://github.com/andrewyng/translation-agent/blob/main/src/translation_agent/utils.py

整体调用逻辑如下:

# 完整翻译一个块的文本,包括初步翻译、反思和改进三步。
def one_chunk_translate_text(
    source_lang: str, target_lang: str, source_text: str, country: str = ""
) -> str:
    translation_1 = one_chunk_initial_translation(
        source_lang, target_lang, source_text
    )
    reflection = one_chunk_reflect_on_translation(
        source_lang, target_lang, source_text, translation_1, country
    )
    translation_2 = one_chunk_improve_translation(
        source_lang, target_lang, source_text, translation_1, reflection

第一步:初步翻译

one_chunk_initial_translation这个函数中实现将整段文本作为一个块进行初步翻译。

# 这个函数生成初步的翻译结果。
def one_chunk_initial_translation(
    source_lang: str, target_lang: str, source_text: str
) -> str:
    system_message = f"You are an expert linguist, specializing in translation from {source_lang} to {target_lang}."
    translation_prompt = f"""This is an {source_lang} to {target_lang} translation, please provide the {target_lang} translation for this text. \
Do not provide any explanations or text apart from the translation.
{source_lang}: {source_text}
{target_lang}:"""
    prompt = translation_prompt.format(source_text=source_text)
    translation = get_completion(prompt, system_message=system_message)
    return translation

其中prompt部分中文对应如下:

您是一位语言专家,擅长从{source_lang}到{target_lang}的翻译。

这是 {source_lang} 到 {target_lang} 的翻译,请提供此文本的 {target_lang} 翻译。
除翻译外,请勿提供任何解释或文本。
{source_lang}:{source_text}
{target_lang}:

第二步:反思

one_chunk_reflect_on_translation这个函数中初步翻译结果进行反思和改进建议。

# 这个函数获取对初步翻译结果的反馈和改进建议。
def one_chunk_reflect_on_translation(
    source_lang: str,
    target_lang: str,
    source_text: str,
    translation_1: str,
    country: str = "",
) -> str:
    system_message = f"You are an expert linguist specializing in translation from {source_lang} to {target_lang}. \
    You will be provided with a source text and its translation and your goal is to improve the translation."
    if country != "":
        reflection_prompt = f"""Your task is to carefully read a source text and a translation from {source_lang} to {target_lang}, and then give constructive criticism and helpful suggestions to improve the translation. \
The final style and tone of the translation should match the style of {target_lang} colloquially spoken in {country}.
The source text and initial translation, delimited by XML tags <SOURCE_TEXT></SOURCE_TEXT> and <TRANSLATION></TRANSLATION>, are as follows:
<SOURCE_TEXT>
{source_text}
</SOURCE_TEXT>
<TRANSLATION>
{translation_1}
</TRANSLATION>
When writing suggestions, pay attention to whether there are ways to improve the translation's \n\
(i) accuracy (by correcting errors of addition, mistranslation, omission, or untranslated text),\n\
(ii) fluency (by applying {target_lang} grammar, spelling and punctuation rules, and ensuring there are no unnecessary repetitions),\n\
(iii) style (by ensuring the translations reflect the style of the source text and takes into account any cultural context),\n\
(iv) terminology (by ensuring terminology use is consistent and reflects the source text domain; and by only ensuring you use equivalent idioms {target_lang}).\n\
Write a list of specific, helpful and constructive suggestions for improving the translation.
Each suggestion should address one specific part of the translation.
Output only the suggestions and nothing else."""
    else:
        reflection_prompt = f"""Your task is to carefully read a source text and a translation from {source_lang} to {target_lang}, and then give constructive criticism and helpful suggestions to improve the translation. \
The source text and initial translation, delimited by XML tags <SOURCE_TEXT></SOURCE_TEXT> and <TRANSLATION></TRANSLATION>, are as follows:
<SOURCE_TEXT>
{source_text}
</SOURCE_TEXT>
<TRANSLATION>
{translation_1}
</TRANSLATION>
When writing suggestions, pay attention to whether there are ways to improve the translation's \n\
(i) accuracy (by correcting errors of addition, mistranslation, omission, or untranslated text),\n\
(ii) fluency (by applying {target_lang} grammar, spelling and punctuation rules, and ensuring there are no unnecessary repetitions),\n\
(iii) style (by ensuring the translations reflect the style of the source text and takes into account any cultural context),\n\
(iv) terminology (by ensuring terminology use is consistent and reflects the source text domain; and by only ensuring you use equivalent idioms {target_lang}).\n\
Write a list of specific, helpful and constructive suggestions for improving the translation.
Each suggestion should address one specific part of the translation.
Output only the suggestions and nothing else."""
    prompt = reflection_prompt.format(
        source_lang=source_lang,
        target_lang=target_lang,
        source_text=source_text,
        translation_1=translation_1,
    )
    reflection = get_completion(prompt, system_message=system_message)
    return reflection

其中prompt部分中文对应如下:

您是一位专业语言学家,擅长从 {source_lang} 翻译成 {target_lang}。
您将获得源文本及其翻译,您的目标是改进翻译。

你的任务是仔细阅读原文和从{source_lang}到{target_lang}的翻译,然后提出建设性的批评和有益的建议以改进翻译。

翻译的最终风格和语气应与{country}口语中{target_lang}的风格相匹配。

源文本和初始翻译由 XML 标签 <SOURCE_TEXT></SOURCE_TEXT> 和 <TRANSLATION></TRANSLATION> 分隔,如下所示:
<SOURCE_TEXT>
{source_text}
</SOURCE_TEXT>
<TRANSLATION>
{translation_1}
</TRANSLATION>

在撰写建议时,请注意是否有方法可以改进翻译的
(i) 准确性(通过纠正添加、误译、遗漏或未翻译文本的错误);
(ii) 流畅性(通过应用 {target_lang} 语法、拼写和标点规则,并确保没有不必要的重复);
(iii) 风格(通过确保翻译反映源文本的风格并考虑任何文化背景);
(iv) 术语(通过确保术语使用一致并反映源文本领域;并且仅确保使用等效习语{target_lang});
列出具体、有用的建议以及改进翻译的建设性建议。
每条建议应针对翻译的一个特定部分。
仅输出建议,不输出其他内容。

第三步:按照建议优化

one_chunk_improve_translation这个函数中根据反思改进初步翻译结果。

# 这个函数利用反思中的建议来改进初步翻译结果
def one_chunk_improve_translation(
    source_lang: str,
    target_lang: str,
    source_text: str,
    translation_1: str,
    reflection: str,
) -> str:
    system_message = f"You are an expert linguist, specializing in translation editing from {source_lang} to {target_lang}."
    prompt = f"""Your task is to carefully read, then edit, a translation from {source_lang} to {target_lang}, taking into
account a list of expert suggestions and constructive criticisms.
The source text, the initial translation, and the expert linguist suggestions are delimited by XML tags <SOURCE_TEXT></SOURCE_TEXT>, <TRANSLATION></TRANSLATION> and <EXPERT_SUGGESTIONS></EXPERT_SUGGESTIONS> \
as follows:
<SOURCE_TEXT>
{source_text}
</SOURCE_TEXT>
<TRANSLATION>
{translation_1}
</TRANSLATION>
<EXPERT_SUGGESTIONS>
{reflection}
</EXPERT_SUGGESTIONS>
Please take into account the expert suggestions when editing the translation. Edit the translation by ensuring:
(i) accuracy (by correcting errors of addition, mistranslation, omission, or untranslated text),
(ii) fluency (by applying {target_lang} grammar, spelling and punctuation rules and ensuring there are no unnecessary repetitions), \
(iii) style (by ensuring the translations reflect the style of the source text)
(iv) terminology (inappropriate for context, inconsistent use), or
(v) other errors.
Output only the new translation and nothing else."""
    translation_2 = get_completion(prompt, system_message)
    return translation_2

其中prompt部分中文对应如下:

您是一位语言专家,擅长从{source_lang}到{target_lang}的翻译编辑。

您的任务是仔细阅读,然后编辑从 {source_lang} 到 {target_lang} 的翻译,同时考虑专家建议和建设性批评。

源文本、初始翻译和专家语言学家建议由 XML 标签 <SOURCE_TEXT></SOURCE_TEXT>、 和 <EXPERT_SUGGESTIONS></EXPERT_SUGGESTIONS>
分隔,如下所示:
<SOURCE_TEXT>
{source_text}
</SOURCE_TEXT>
<TRANSLATION>
{translation_1}
</TRANSLATION>
<EXPERT_SUGGESTIONS>
{reflection}
</EXPERT_SUGGESTIONS>

编辑翻译时,请考虑专家建议。编辑译文时,请确保:
(i) 准确性(通过更正添加、误译、遗漏或未翻译文本的错误);
(ii) 流畅性(通过应用 {target_lang} 语法、拼写和标点规则并确保没有不必要的重复);
(iii) 风格(通过确保译文反映源文本的风格);
(iv) 术语(不适合上下文、使用不一致)或
(v) 其他错误;
仅输出新译文,不输出其他内容。

通过上面三步,经测试,该工作流翻译质量甚至可以媲美领先的商业翻译工具。

总结

让大语言模型对自己的结果进行检查和改进的方法不仅提升了翻译的准确性、流畅性和风格一致性,还能为其他AI工作提供借鉴。

反思是通用的智能体设计模式,通过让大语言模型对自己的结果检查改进,能极大的提升质量。