陋室铭
永远也不要停下学习的脚步(大道至简至易)

PolyBase 是一种可通过 t-sql 语言访问数据库外部数据的技术。PolyBase is a technology that accesses data outside of the database via the t-sql language. 在 SQL Server 2016 中,可以对 Hadoop 中的外部数据运行查询或将数据导入/导出 Azure Blob 存储。In SQL Server 2016, it allows you to run queries on external data in Hadoop or to import/export data from Azure Blob Storage. 查询会进行优化以将计算推送到 Hadoop。Queries are optimized to push computation to Hadoop. 在 Azure SQL 数据仓库中,可以将数据导入/导出 Azure Blob 存储和 Azure Data Lake Store。In Azure SQL Data Warehouse, you can import/export data from Azure Blob Storage and Azure Data Lake Store.

若要使用 Polybase,请参阅 PolyBase 入门To use PolyBase, see Get started with PolyBase.

PolyBase 逻辑PolyBase logical

为什么要用 PolyBase?Why use PolyBase?

若要作出正确决策,你需要同时分析关系数据和其他未构建到表中的数据 - 尤其是 Hadoop 数据。To make good decisions, you want to analyze both relational data and other data that is not structured into tables —notably Hadoop. 除非有方法能够在不同数据存储类型之间传输数据,否则这将很难执行。This is difficult to do unless you have a way to transfer data among the different types of data stores. PolyBase 通过处理 SQL Server 外部的数据填补了这一差距。PolyBase bridges this gap by operating on data that is external to SQL Server.

为了简单起见,PolyBase 不要求向 Hadoop 环境安装其他软件。To keep it simple, PolyBase does not require you to install additional software to your Hadoop environment. 查询外部数据使用与查询数据库表一样的语法。Querying external data uses the same syntax as querying a database table. 所有的一切均透明发生。This all happens transparently. PolyBase 会在后台处理所有详细信息,并且最终用户不需要 Hadoop 的任何相关知识便可查询外部表。PolyBase handles all the details behind-the-scenes, and no knowledge about Hadoop is required by the end user to query external tables.

PolyBase 能够:PolyBase can:

  • 通过 SQL Server 或 PDW 查询 Hadoop 中存储的数据。Query data stored in Hadoop from SQL Server or PDW. 用户将数据存储在经济高效的分布式、可扩展系统中,例如 Hadoop。Users are storing data in cost-effective distributed and scalable systems, such as Hadoop. PolyBase 使得使用 T-SQL 查询数据更加容易。PolyBase makes it easy to query the data by using T-SQL.

  • 查询存储在 Azure Blob 存储中的数据。Query data stored in Azure Blob Storage. Azure blob 存储是一个方便存储供 Azure 服务使用的数据的位置。Azure blob storage is a convenient place to store data for use by Azure services. PolyBase 使得使用 T-SQL 访问数据变得更加容易。PolyBase makes it easy to access the data by using T-SQL.

  • 从 Hadoop、Azure Blob 存储或 Azure Data Lake Store 导入数据 通过将数据从 Hadoop、Azure Blob 存储或 Azure Data Lake Store 导入到关系表中,利用 Microsoft SQL 的列存储技术和分析功能的速度。Import data from Hadoop, Azure Blob Storage, or Azure Data Lake Store Leverage the speed of Microsoft SQL's columnstore technology and analysis capabilities by importing data from Hadoop, Azure Blob Storage, or Azure Data Lake Store into relational tables. 不需要单独的 ETL 或导入工具。There is no need for a separate ETL or import tool.

  • 将数据导出到 Hadoop、Azure Blob 存储或 Azure Data Lake Store。Export data to Hadoop, Azure Blob Storage, or Azure Data Lake Store. 将数据存档到 Hadoop、Azure Blob 存储或 Azure Data Lake Store,以获得经济高效的存储,并使数据保持联机以便于访问。Archive data to Hadoop, Azure Blob Storage, or Azure Data Lake Store to achieve cost-effective storage and keep it online for easy access.

  • 与 BI 工具集成Integrate with BI tools. 结合使用 PolyBase 和 Microsoft 的商业智能和分析堆栈,或使用任何与 SQL Server 兼容的第三方工具。Use PolyBase with Microsoft’s business intelligence and analysis stack, or use any third party tools that are compatible with SQL Server.

“性能”Performance

  • 将计算推送到 Hadoop。 查询优化器制定了基于开销的决策,以在执行此操作将提升查询性能时将计算推送到 Hadoop。Push computation to Hadoop. The query optimizer makes a cost-based decision to push computation to Hadoop when doing so will improve query performance. 它使用外部表上的统计以制定基于开销的决策。It uses statistics on external tables to make the cost-based decision. 推送计算会创建 MapReduce 作业并利用 Hadoop 的分布计算资源。Pushing computation creates MapReduce jobs and leverages Hadoop's distributed computational resources.

  • 缩放计算资源。Scale compute resources. 若要提高查询性能,可以使用 SQL Server PolyBase 横向扩展组To improve query performance, you can use SQL Server PolyBase scale-out groups. 这使并行数据可以在 SQL Server 实例和 Hadoop 节点之间传输,并为处理外部数据添加计算资源。This enables parallel data transfer between SQL Server instances and Hadoop nodes, and it adds compute resources for operating on the external data.

PolyBase 指南主题PolyBase Guide Topics

本指南包括帮助你高效且有效地使用 PolyBase 的主题。This guide includes topics to help you use PolyBase efficiently and effectively.

  
主题Topic DescriptionDescription
PolyBase 入门Get started with PolyBase 安装和配置 PolyBase 的基本步骤。Basic steps to install and configure PolyBase. 这演示了如何创建指向 Hadoop 或 Azure blob 存储中数据的外部对象,并提供了查询示例。This shows how to create external objects that point to data in Hadoop or Azure blob storage, and gives query examples.
PolyBase 受版本控制的功能摘要PolyBase Versioned Feature Summary 描述 SQL Server、SQL 数据库和 SQL 数据仓库上支持哪些 PolyBase 功能。Describes which PolyBase features are supported on SQL Server, SQL Database, and SQL Data Warehouse.
PolyBase 横向扩展组PolyBase scale-out groups 通过使用 SQL Server 横向扩展组在 SQL Server 和 Hadoop 之间横向扩展并行度。Scale out parallelism between SQL Server and Hadoop by using SQL Server scale-out groups.
PolyBase 安装PolyBase installation 使用安装向导或命令行工具安装 PolyBase 的参考和步骤。Reference and steps for installing PolyBase with the installation wizard or with a command-line tool.
PolyBase 配置PolyBase configuration 为 PolyBase 配置 SQL Server 设置。Configure SQL Server settings for PolyBase. 例如,配置计算下推和 kerberos 安全性。For example, configure computation pushdown and kerberos security.
PolyBase T-SQL 对象PolyBase T-SQL objects 创建 PolyBase 用来定义和访问外部数据的 T-SQL 对象。Create the T-SQL objects that PolyBase uses to define and access external data.
PolyBase QueriesPolyBase Queries 使用 T-SQL 语句来查询、导入或导出外部数据。Use T-SQL statements to query, import, or export external data.
PolyBase 故障排除PolyBase troubleshooting 管理 PolyBase Queries的技术。Techniques to manage PolyBase queries. 使用动态管理视图 (DMV) 来监视 PolyBase Queries,并了解如何读取 PolyBase Queries 计划,以找出性能瓶颈。
posted on 2018-06-04 10:18  宏宇  阅读(3298)  评论(0编辑  收藏  举报