From Natural Language Processing to Neural Databases论文学习

This paper introduces neural databases, a class of systems that use NLP transformers as localized answer derivation engines. The authors ground the vision in NeuralDB, a database system in which updates and queries are given as short natural language sentences. Preliminary experiments show that NeuralDB can answer select-project-join-aggregate queries over thousands of natural language sentences with very high accuracy.(本文介绍了神经数据库,这是一类使用 NLP 转换器作为本地化答案推导引擎的系统。 作者在 NeuralDB 中建立了愿景,这是一个数据库系统,其中更新和查询以简短的自然语言句子的形式给出。初步实验表明,NeuralDB 可以非常准确地回答数千个自然语言句子中的 select-project-join-aggregate 查询。

By nature, neural databases are not meant to provide the same correctness guarantees of a traditional database system. Hence, to be clear about the scope of the vision, neural databases should not be considered as an alternative to traditional databases in applications where such guarantees are required.(从本质上讲,神经数据库并不意味着提供与传统数据库系统相同的正确性保证。 因此,要明确愿景的范围,在需要此类保证的应用程序中,不应将神经数据库视为传统数据库的替代品。

Two technical challenges for the vision: (1) finding suitable sets of facts from the database to feed to each transformer instance, and (2) further processing the answers of each transformer instance to produce the answer to the query.(该愿景面临的两个技术挑战:(1)从数据库中找到合适的事实集以提供给每个转换器实例,以及(2)进一步处理每个转换器实例的答案以生成查询的答案。

In NeuralDB, data and queries are represented as sentences in natural language, providing two of the key benefjts of neural databases. First, the database has no pre-defjned schema - users can mention any relationship of interest. Second, the database is usable by a broader set of users because updates and queries can be specifjed in whatever linguistic form is most convenient to the user.(在 NeuralDB 中,数据和查询以自然语言的句子表示,提供了神经数据库的两个关键优势。 首先,数据库没有预先定义的模式——用户可以提及任何感兴趣的关系。 其次,数据库可供更广泛的用户使用,因为更新和查询可以以对用户最方便的任何语言形式进行指定。

The architecture of NeuralDB is based on the following ideas.(NeuralDB 的架构基于以下思想。

  • Running multiple transformers in parallel: In practice, transformers can only take a relatively small input. Hence, to scale to larger data sets, NeuralDB runs multiple copies of a neural SPJ operator in parallel, each outputting structured results. When queries don’t involve aggregation, the union of the outputs of the neural SPJ operators is the answer to the query. When the query does involve aggregation, these machine-readable outputs are fed into the aggregation operator.(并行运行多个变压器:实际上,变压器只能接受相对较小的输入。 因此,为了扩展到更大的数据集,NeuralDB 并行运行神经 SPJ 算子的多个副本,每个副本输出结构化结果。 当查询不涉及聚合时,神经 SPJ 运算符的输出的并集就是查询的答案。 当查询确实涉及聚合时,这些机器可读的输出被送入聚合运算符。
  • Aggregation with a conventional operator: Since the neural SPJ was designed to output structured results, the architecture can use a separate conventional aggregation operator. The aggregation operator is selected through a classifjer that maps a query to an aggregation function.(使用传统算子进行聚合:由于神经 SPJ 旨在输出结构化结果,因此该架构可以使用单独的传统聚合算子。 聚合运算符是通过将查询映射到聚合函数的分类器来选择的。

The results of the experiment show that for lookup and join queries the model attained near perfect scores (above 99% exact match) on the templategenerated data. However, the model performs poorly for queries that require an aggregation or when the query result is a large set. Importantly, the results indicate that the model can be robust to simple linguistic variations when processing queries.(实验结果表明,对于查找和连接查询,模型在模板生成的数据上获得了接近完美的分数(超过 99% 的精确匹配)。 但是,该模型对于需要聚合的查询或查询结果是一个大集合时表现不佳。 重要的是,结果表明该模型在处理查询时对简单的语言变化具有鲁棒性。

 

posted @ 2021-12-30 21:01  bky-16  阅读(73)  评论(0编辑  收藏  举报