Natural language to SQL: Where are we today?论文学习

To provide a holistic view of Translating natural language to SQL (NL2SQL) technologies and access current advancements, the paper performs extensive experiments under the unified framework using eleven of recent techniques over 10+ benchmarks including a new benchmark (WTQ) and TPC-H. Existing NL2SQL benchmarks used in recent studies are: WikiSQL, ATIS/GeoQuery, MAS and Spider. There is no consistent winner over all benchmarks since all methods focus partially on limited-scope problems.(为了全面了解将自然语言转换为 SQL (NL2SQL) 技术并了解当前的进展,本文在统一框架下使用 11 种最新技术进行了大量实验,超过 10 多个基准测试,包括新基准测试 (WTQ) 和 TPC-H。在最近的研究中使用的NL2SQL基准测试有:WikiSQL、ATIS/GeoQuery、MAS和Spider。由于所有方法都部分关注范围有限的问题,因此没有所有基准测试的一致赢家。

The paper provides a comprehensive survey of recent NL2SQL methods, introducing a taxonomy of them.(该论文对最近的 NL2SQL 方法进行了全面调查,并介绍了它们的分类法。

The paper accurately measured the quality of the NL2SQL methods by considering the semantic equivalence of SQL queries. The paper provides a practical tool for validation by using existing, mature database technologies such as query rewrite and database testing.(论文通过考虑SQL查询的语义等价性来准确衡量NL2SQL方法的质量。本文还通过使用现有的、成熟的数据库技术,如查询重写和数据库测试,提供了一种实用的验证工具。

This paper proposes a multi-level framework for determining the semantic equivalence of two SQL queries. First, compare the execution results of two SQL queries. However, when the size of a given database is small, it is highly likely that two completely different SQL queries return the same empty results. Resolve this problem by comparing execution results on the generated datasets using the database testing technique as well as on the given database. Next, we use an existing prover, that exploits automated constraint solving and interactive theorem proving and returns a counter example or a proof of equivalence for a limited set of queries. For queries that are not supported by the prover, we use the query rewriter in a commer- cial DBMS and compare the parse trees of the two rewritten SQL queries. (本文提出了一个多级框架,用于确定两个 SQL 查询的语义等价性。 首先,比较两个SQL查询的执行结果。 但是,当给定数据库的大小很小时,两个完全不同的 SQL 查询很可能返回相同的空结果。 通过比较使用数据库测试技术以及给定数据库生成的数据集的执行结果来解决此问题。 接下来,我们使用现有的证明器,它利用自动约束求解和交互式定理证明,并返回一个反例或有限查询集的等价证明。 对于证明者不支持的查询,我们在商业 DBMS 中使用查询重写器,并比较两个重写的 SQL 查询的解析树。

 

posted @ 2021-12-30 22:07  bky-16  阅读(141)  评论(0编辑  收藏  举报