Evaluating Theory of Mind in QA


Reasoning about beliefs

  1. possessing a capacity similar to human reasoning has been argued to be necessary for the success of artificial intelligence systems.

  2. One well-studied domain that requires reasoning is question answering, where simply memorizing and looking up information is often not enough to correctly answer a question.

  3. Facebook bAbi dataset, simple reasoning tasks

  4. People reason not just about their own observations and beliefs, but also about others' mental states (such as beliefs and intentions). The capacity to recognize that others can have mental states different than one's own-theory of mind- marks an important milestone in the development of children and has been extensively studied by psychologists. TOM review.

  5. Sally-Anne task and a bAbi-style dataset, a first step in designing benchmarks to evaluate the mental-state reasoning capacity of question-answering models, but it is still limited in the types of reasoning it probes. Evaluation is only one question. This does not guarantee that a model has an understanding of the state of the world; in fact, even in developmental theory-of-mind experiments, children are asked a few questions to ensure that their correct answer reflects their understanding and is not simply due to chance.

  6. in this paper, we address these shortcomings by designing a new dataset that enables us to evaluate a model's capacity to reason about different types of beliefs as well as whether it maintains a correct understanding of the world.


Theory of Mind Experiments

  1. By the age of five, most children have a unified theory of mind and are able to represent and reason about others' desires, perceptions, and beliefs.

  2. Sally-Anne experiment, false belief. For evaluation, belief question, reality question, memory question. first-order belief

  3. Icecream Van experiment, reasoning about higher-order beliefs


ToM task Dataset

  1. three tasks: true-, false-, second-order false-belief

  2. container-transparency property

  3. four questions: memory, reality, first-order, second-order

posted @ 2018-11-24 15:01  林小奚  阅读(204)  评论(0编辑  收藏  举报