Recipes for building an open-domain chatbot

论文地址:https://arxiv.org/pdf/2004.13637.pdf

 

模型

这篇论文提出了3个模型。

1. 检索模型(Retriever)

   就是从候选集中选取最合适的句子作为机器人当前的答复,训练时,候选集只有给定的一句response;

   在做推断时,候选集由训练集中的所有response组成。

   具体打分/排序模型使用 Poly-encoder,结构如下:

     

   Poly-encoder使用两个单独的transfomer encoder block用于上下文语句和候选语句。

   假设context由$m$个历史对话语句组成,每个句子表示为$x_i=(ln_{x_i}1, ln_{x_i}2,...,ln_{x_i}N_{x_i}),\; i=1,2,...,m$。

   对应的回复语句记为$y=(ln_y1, ln_y2,...,ln_yN_y)$。

   模型的计算过程如下:

$$\big(out_{x_i}1, out_{x_i}2,...,out_{x_i}N_{x_i}\big)=Context \; Encoder\big(ln_{x_i}1, ln_{x_i}2,...,ln_{x_i}N_{x_i}\big),\;i=1,2,...,m$$

$$Emb_i=Attention\bigg(query,\; (out_{x_i}1, out_{x_i}2,...,out_{x_i}N_{x_i})\bigg),\;i=1,2,...,m$$

$$\big(out_{y}1, out_{y}2,...,out_{y}N_{y}\big)=Condidate \; Encoder\big(ln_{y}1, ln_{y}2,...,ln_{y}N_{y}\big)$$

$$Cand \; emb=Aggregation\big(out_{y}1, out_{y}2,...,out_{y}N_{y}\big)$$

$$Ctx \; Emb = Attention\bigg(concat(query, Cand \; emb),\; (Emb_1, Emb_2, ...,Emb_m)\bigg)$$

$$score = inner\;product\big(Ctx \; Emb, \; Cand \; emb\big)$$

   作者的实验表明$m$越大效果越好,当然模型打分也越耗时。

   论文训练了两种大小的打分模型,参数量分别为256M和622M。

 

2. 生成模型(Generator)

   生成模型是标准的 seq2seq 结构,只是用了标准的 Transformer 层,以及 encoder 层数少,decoder 层数多的设计。

   下图列出了三个生成模型的尺寸,最大的模型包含了 9.4B 的参数量,快有 Meena 的 4 倍大了。

    

   decoding阶段一般使用beam Search或sampling方法。Meena的结论是sampling效果比beam search方法好,所以用了sampling方法。

   Blender则表示模型够好时beam search用好了不比sampling差,所以Blender使用了受限的beam search,加入了以下限制:

  • 控制生成回复的最小长度。作者尝试了两种方法:
  1. Minimum length:要求回复长度必须大于设定的值。长度不达标时,强制不产生结束 token;
  2. Predictive length:把长度分成四段,例如 <10, <20, <30, 和 >30 tokens,然后利用四分类模型预测当前回复应该落在哪个长度段。
  • 屏蔽重复的子序列(Subsequence Blocking):不允许产生当前句子和前面对话(context)中已经存在的 3-grams。

   当然如果模型本身不够好,这些方法能起到的作用是很小的。

   Seq2seq模型标准的训练方法就是MLE,作者尝试了他们自己提出来的另外一种损失函数Unlikelihood Loss (UL),即在提高正确token概

   率的同时,降低其他token的概率。

   Unlikelihood loss的关键是如何选取这些被打压的负token。作者选的是那些容易组合成常见n-grams的tokens。如果一个token组成的

   n-grams比真实答案中n-grams比例高,就会有更大概率被选取为负token。这么做的目的是期望降低生成无意义回复的比例。

 

3. 检索+生成(Retrieve and Refine)

   Retrieve and Refine (RetNRef)融合了检索和生成两种方法,RetNRef先利用检索模型检索出一个结果,然后把检索出的结果拼接

   到context后面,用一个特殊的分割符和context分隔,然后整体作为generator模型的输入。这样做的目的是期望生成模型能学习到在

   合适的时候从检索结果中copy词或词组。

   那么,检索出的结果具体是怎么得到的?作者建议了两种方法:

  • Dialogue Retrieval:利用1中训练好的poly-encoder检索模型直接从训练集中检索出得分最高的回复,作为结果;
  • Knowledge Retrieval:从外部的大知识库如Wiki中检索,具体做法如下:
    • 分别利用当前对话topic(对话topic会事先告知)和最后两轮对话,各自检索出top-7的文章;
    • 把 3*7=21 篇文章各自分句,然后把各自文章的 title 追加到每个句子最前面,获得很多候选句子;

         

    • 再利用poly-encoder结构的模型对候选句子排序,最终使用 top-1 的句子作为检索结果;
    • 同时还会训练一个单独的分类器来判断是否需要从知识库中检索知识。回复某些对话context可以不需要额外知识,这时候就不用追加检索结果。

   对于Knowledge Retrieval,把检索出的结果直接追加到context后面,然后利用标准的MLE训练即可。但对于 Dialogue Retrieval,作者发现

   直接利用MLE训练会有问题。训练出来的模型很容易直接忽略掉追加的检索部分,因为检索部分可能与实际回复关联性不强。

   作者提出了名为α-blending的训练策略:训练时以α%的概率把检索结果替换为实际回复。这样模型就会被吸引去关注检索部分了。

 

数据集

公认的几个标准是对话要:个性有趣、包含知识、富有同理心。作者发现在具有某些特性的数据上训练出的模型也会拥有这些特性。

这样我们就可以针对性地创建数据集来优化机器人的不同特性了。

数据集可以通过ParIAI进行下载,执行:

parlai display_data -t convai2 -dt train
parlai display_data -t empathetic_dialogues -dt train
parlai display_data -t wizard_of_wikipedia -dt train
parlai display_data -t blended_skill_talk -dt train

或者直接执行

parlai display_data -t blended_skill_talk:all -dt train

数据集具体为:

  •  Reddit:整理自Reddit网站上的讨论;数据量大,可用于训练预训练模型(检索模型训练使用 MLM、生成模型训练使用 LM);

  • ConvAI2:带个性的对话数据,对话目标是了解对方,所以对话个性有趣;

    数据集下载地址:http://parl.ai/downloads/convai2/convai2_fix_723.tgz

  • Empathetic Dialogues(ED):一个人发牢骚另一个人倾听,所以对话富有同理心;

    数据集下载地址:http://parl.ai/downloads/empatheticdialogues/empatheticdialogues.tar.gz

    一共考虑了32种情感标签,并提供对话情境(situation),不超过6轮对话,举个例子:

    

    在数据集中组织为如下形式,每列的含义为conv_id,utterance_idx,context,prompt,speaker_idx,utterance,selfeval,tags。

hit:6401_conv:12803,1,proud,My little dog learned to sit!,445,I finally tough my new little puppy his first trick!,5|5|5_5|5|5,
hit:6401_conv:12803,2,proud,My little dog learned to sit!,4,What trick did you teach him?,5|5|5_5|5|5,
hit:6401_conv:12803,3,proud,My little dog learned to sit!,445,I tought him to sit for a treat_comma_ its so cute.,5|5|5_5|5|5,
hit:6401_conv:12803,4,proud,My little dog learned to sit!,4,That is good_comma_ do you plan to teach him more tricks?,5|5|5_5|5|5,
  • Wizard of Wikipedia(WoW):该对话数据集属于开放域对话系统,一个对话者随机选择一个初始话题,对话双方可以在此基础上进行对话,

    但在对话过程中话题也可以拓展。对话双方的角色是不同的,分为wizardapprentice。

    wizard:wizard的目的是通知apprentice关于对话主题相关的背景知识,在对话开始之前,会给定一些相关的wiki段落,这些对于apprentice

           不可见。同时,wizard不允许直接复制拷贝wiki里的文本句子作为回复,而是需要自己进行组合生成融合知识的回答。

    apprentice:apprentice的目的是深入的询问与对话主题相关的问题,这与普通的闲聊有所区别。

    数据集下载地址:http://parl.ai/downloads/wizard_of_wikipedia/wizard_of_wikipedia.tgz

{
    "chosen_topic": "Science fiction",
    "persona": "i enjoy movies about aliens invading the earth.",
    "wizard_eval": 5,
    "dialog": [
        {
            "speaker": "0_Wizard",
            "text": "I think science fiction is an amazing genre for anything. Future science, technology, time travel, FTL travel, they\'re all such interesting concepts.",
            "checked_sentence": {
                "chosen_Science_fiction_0": "Science fiction (often shortened to SF or sci-fi) is a genre of speculative fiction, typically dealing with imaginative concepts such as futuristic science and technology, space travel, time travel, faster than light travel, parallel universes, and extraterrestrial life."
            },
            "checked_passage": {
                "chosen_topic_0_Science_fiction": "Science fiction"
            },
            "retrieved_passages": [
                {
                    "Hyperspace (science fiction)": [
                        "Hyperspace is a faster-than-light (FTL) method of traveling used in science fiction.",
                        "It is typically described as an alternative \\"sub-region\\" of space co-existing with our own universe which may be entered using an energy field or other device.",
                        "As seen in most fiction hyperspace is most succinctly described as a \\"somewhere else\\" within which the laws of general and special relativity decidedly do \\"not\\" apply \\u2013 especially with respect to the speed of light being the cosmic speed limit.",
                        "Entering and exiting said \\"elsewhere\\" thus directly enables travel near or faster than the speed of light \\u2013 almost universally with the aid of extremely advanced technology."
                    ]
                },
                ...
                {
                    "History of US science fiction and fantasy magazines to 1950": [
                        "Science fiction and fantasy magazines began to be published in the US in the 1920s.",
                        "Stories with science fiction themes had been appearing for decades in pulp magazines such as \\"Argosy\\", but there were no magazines that specialized in a single genre until 1915, when Street & Smith, one of the major pulp publishers, brought out \\"Detective Story Magazine\\".",
                        "The first magazine to focus solely on fantasy and horror was \\"Weird Tales\\", which was launched in 1923, and established itself as the leading weird fiction magazine over the next two decades; writers such as H.P.",
                        "Lovecraft, Clark Ashton Smith and Robert E. Howard became regular contributors."
                    ]
                }
            ],
            "retrieved_topics": [
                "Hyperspace (science fiction)",
                "Science fiction",
                "History of science fiction",
                "Science fiction film",
                "Time travel",
                "List of starships in Stargate",
                "History of US science fiction and fantasy magazines to 1950"
            ]
        },
        {
            "speaker": "1_Apprentice",
            "text": "I\'m a huge fan of science fiction myself! ",
            "retrieved_passages": [
                {
                    "Science fiction": [
                        "Science fiction (often shortened to SF or sci-fi) is a genre of speculative fiction, typically dealing with imaginative concepts such as futuristic science and technology, space travel, time travel, faster than light travel, parallel universes, and extraterrestrial life.",
                        "Science fiction often explores the potential consequences of scientific and other innovations, and has been called a \\"literature of ideas\\".",
                        "It usually avoids the supernatural, unlike the related genre of fantasy.",
                        "Historically, science-fiction stories have had a grounding in actual science, but now this is only expected of hard science fiction."
                    ]
                },
                ...
                {
                    "LGBT themes in speculative fiction": [
                        "LGBT themes in speculative fiction refer to the incorporation of lesbian, gay, bisexual, or transgender (LGBT) themes into science fiction, fantasy, horror fiction and related genres.",
                        "Such elements may include an LGBT character as the protagonist or a major character, or explorations of sexuality or gender that deviate from the hetero-normative.",
                        "Science fiction and fantasy have traditionally been puritanical genres aimed at a male readership, and can be more restricted than non-genre literature by their conventions of characterisation and the effect that these conventions have on depictions of sexuality and gender."
                    ]
                }
            ],
            "retrieved_topics": [
                "Science fiction",
                "History of science fiction",
                "Isaac Asimov",
                "U.S. television science fiction",
                "History of US science fiction and fantasy magazines to 1950",
                "Starstruck (comics)",
                "LGBT themes in speculative fiction"
            ]
        },
		...
}
  • Blended Skill Talk(BST):基于ConvAI2、ED和WoW构建,并融合它们各自的优势,即对话由不同的数据集组合而成

     数据集下载地址:http://parl.ai/downloads/blended_skill_talk/blended_skill_talk.tar.gz

     举个例子:

         

     在数据集中组织为:

{
    "personas": [
      [
        "my son plays on the local football team.",
        "i design video games for a living."
      ],
      [
        "my eyes are green.",
        "i wear glasses that are cateye."
      ]
    ],
    "context_dataset": "wizard_of_wikipedia",
    "free_turker_utterance": "What video games do you like to play?",
    "guided_turker_utterance": "all kinds, action, adventure, shooter, platformer, rpg, etc. but video game design requires both artistic and techncal competence AND writing skills. that is one part many people forget",
    "additional_context": "Video game design",
    "dialog": [
      [
        0,
        "Exactly! I think many people fail to notice how beautiful the art of video games can be.e"
      ],
      [
        1,
        "Indeed, Some games games are purposely designed to be a work of a persons creative expression, many though have been challenged as works of art by some critics."
      ],
      [
        0,
        "Video games are undervalued by many and too easily blamed for problems like obesity or violence in kids."
      ],
      [
        1,
        "Indeed, Just last week my son was playing some Tine 2 and it was keeping him so calm. Games are therapeutic to some. "
      ],
      [
        0,
        "I use games to relax after a stressful day, the small escape is relaxing."
      ],
      [
        1,
        "I enjoy a good gaming session after a hard day at work as well. "
      ],
      [
        0,
        "What other hobbies does your son have?"
      ],
      [
        1,
        "Well he likes to fly kites and collect bugs, typical hobbies for an 8 year old, lol. "
      ],
      [
        0,
        "My 12 year old is into sports. Football mostly. I however don;t enjoy watching him play."
      ],
      [
        1,
        "I wish I could play football, But I wear this cateye glasses and they would break if I tried. "
      ],
      [
        0,
        "Sounds nice. Are they new or vintage?"
      ],
      [
        1,
        "They are new, I got them because of my love for cats lol. I have to show off my beautiful green eyes somehow."
      ]
    ],
    "suggestions": [
      {
        
      },
      {
        "convai2": "yes , definitely very talented . what else do you enjoy ?",
        "empathetic_dialogues": "I agree wholly, quite an impeccable spectrum. I read online on 4chan that da Vinci was likely an alien from the future.",
        "wizard_of_wikipedia": "Indeed, Some games games are purposely designed to be a work of a persons creative expression, many though have been challenged as works of art by some critics."
      },
      {
        
      },
      {
        "convai2": "that's true ! and society did not allow for women to have opinions .",
        "empathetic_dialogues": "I've seen videos on the internet where kids are crying to their parents that they were made fun of because their race and health illnesses. Very messed up.",
        "wizard_of_wikipedia": "It's not just the adults that are overweight. I've noticed in shopping malls (people watching of course) that CHILDREN as well are overweight at tender young ages..."
      },
      {
        
      },
      {
        "convai2": "yes i agree my friend . it has been nice chatting with you , too !",
        "empathetic_dialogues": "I enjoy doing that after a hard day at work as well.  I hope it relaxes you!",
        "wizard_of_wikipedia": "It's good you have an outlet, stress can really cause damage and acute changes in certain parts of the brain that can cause long term damage."
      },
      {
        
      },
      {
        "convai2": "play station , and first person shooter games , how about you ?",
        "empathetic_dialogues": "Video games.",
        "wizard_of_wikipedia": "Personally, I enjoy role-playing games. But some of the newest technology is in the development of controllers. There are even kinetic sensor devices so your body can be a controller."
      },
      {
        
      },
      {
        "convai2": "my son was playing games constantly until i grounded him . now he gets as .",
        "empathetic_dialogues": "Ir is almost all kid that do that. He will get used to it.",
        "wizard_of_wikipedia": "You should introduce him to some of the streaming services offered on xbox, so he might stop playing that fortnite game!"
      },
      {
        
      },
      {
        "convai2": "yes , i have progressive lenses that's why they are still broke .",
        "empathetic_dialogues": "NEW! CAn you belive it?",
        "wizard_of_wikipedia": "They have been worn for hundreds of years. Mine are used for vision correction."
      }
    ],
    "suggestion_orders": [
      "",
      "wizard_of_wikipedia,convai2,empathetic_dialogues",
      "",
      "convai2,empathetic_dialogues,wizard_of_wikipedia",
      "",
      "convai2,empathetic_dialogues,wizard_of_wikipedia",
      "",
      "wizard_of_wikipedia,empathetic_dialogues,convai2",
      "",
      "convai2,wizard_of_wikipedia,empathetic_dialogues",
      "",
      "wizard_of_wikipedia,convai2,empathetic_dialogues"
    ],
    "chosen_suggestions": [
      "",
      "wizard_of_wikipedia",
      "",
      "",
      "",
      "empathetic_dialogues",
      "",
      "",
      "",
      "",
      "",
      ""
    ],
    "workers": [
      "A3DZ46U9XRLVBI",
      "A4T4577P6JL6R"
    ],
    "bad_workers": [
      
    ],
    "hit_ids": [
      "3OEWW2KGQL8KSS5QR00AS1UH1NKDO9",
      "30QQTY5GMMHZOOODW99XRFCAXY87UQ"
    ],
    "assignment_ids": [
      "3U5JL4WY5N6W9PYJPIRFWUNSXWNX4R",
      "34Q075JO10A7K0VPZOJ6PITEB5V10Q"
    ]
}

 

评测方法

自动评估生成模型用的是Perplexity(PPL),定义式如下:

                                           

一般用自然对数,由公式可知,句子概率越大,语言模型越好,迷惑度越小。

生成模型人工评估采用的是 ACUTE-Eval 和 Self-Chat ACUTE-Eval。这两个指标都是评估给定的两个speakers(如不同的聊天机器人)哪个更会聊天。

1. ACUTE-Eval:每次给两个对话session(每个session来自一个speaker与其他人的对话记录),然后让人来评判哪个speaker聊的更好,即更想跟谁继续聊;

   谁更像人。最后 ACUTE-Eval 可以给出两个 speaker 各自的胜率。

   下图是一个示例,左边是speaker1(浅蓝色)与其他人的对话记录,右边是speaker 2(深蓝色)与其他人的对话记录。

   评委需要判断是左边的speaker 1聊得更好,还是右边的speaker 2聊得更好。

    

2. Self-Chat ACUTE-Eval:做法同上类似,只是评估时用的是自己跟自己聊的session,也即speaker1对话的对象是使用speaker1相同模型构建

   的机器人,speaker2对话的对象是使用speaker2相同模型构建的机器人。然后把speaker1与speaker2自聊的很多session拿来两两对比,评估

   各自的胜率。

posted @ 2021-05-14 20:01  _yanghh  阅读(753)  评论(0编辑  收藏  举报