【Parallel Execution Plans】Understanding and Using Parallelism in SQL Server(待续....)
SQL Server is able to make implicit(adj. 含蓄的;暗示的;) use of parallelism to speed SQL queries. Quite how it does it, and how you can be sure that it is doing so, isn't entirely(adv. 完全地,彻底地) obvious to most of us. Paul White begins a series that makes it all seem(vi. 似乎;像是;装作) simple, starting at the gentle(adj. 温和的;文雅的) level of counting Jelly Beans
Many experienced database professionals(n. [管理] 专业人员(professional的复数)) have acquired(v. 取得;捕获) a somewhat(adv. 有点;多少;几分;稍微) jaded(adj. 厌倦的;疲倦不堪的) view of parallel query execution. Sometimes, this is a consequence of bad experiences with older versions of SQL Server. Just as frequently, however, this view is the result of misconceptions, or an otherwise incomplete mastery(n. 掌握;精通;) of the techniques required to effectively design and tune(n. 曲调;和谐;心情) queries for parallel execution.
This is the first in a series of articles that will provide the reader with the deep knowledge necessary to make advanced use of the parallel query processing features available in Microsoft SQL Server. Part one provides a step-by-step guide to the fundamentals(n. 基本面;基本原理) of parallelism in SQL Server, introducing concepts such as parallel scans and seeks, workers, threads, tasks, execution contexts, and the exchange operators that coordinate(adj. 并列的;同等的; vt. 调整;整合; vi. 协调) parallel activity.
Future instalments will provide further insights into the inner workings of the database engine, and show how targeted parallelism can benefit many real-world environments, not just the data warehousing and decision-support systems normally associated with its use. Systems that are often thought of as having a primarily transaction-processing (OLTP) workload often contain queries and procedures that could benefit from the appropriate(adj. 适当的;恰当的;合适的) use of parallelism.
Perhaps inevitably(adv. 不可避免地;必然地), this and subsequent(adj. 后来的,随后的) instalments contain quite deep technical content in places. Making the most effective use of parallelism requires a good understanding of how things like scheduling, query optimization, and the execution engine really work. Nevertheless(adv. 然而,不过;虽然如此), it is hoped that even those who are completely new to the topic will find this series informative and useful.
What is Parallelism?
You have probably heard the phrase(n. 短语, 习语, 措辞, 乐句) “many hands make light work”. The idea is that splitting a task among a number of people results in each person doing less. From the individual’s( n. 个人;个体) perspective(n. 观点;), the job seems much easier, even though a similar amount of work is being done overall(adj. 全部的;全体的;). More importantly, if the extra(adv. 特别地,非常;另外) people can perform their allocation of work at the same time, the total time required for the task is reduced.
Counting Jelly Beans
Imagine you are presented(vt. 提出;介绍;呈现;赠送) with a large glass jar(n. 罐;广口瓶;) full of assorted jelly beans(啫哩豆; 软心豆粒糖), and asked to count how many there are. Assuming you are able to count beans at an average rate of five per second, it would take you a little over ten minutes to determine that this particular jar contains 3,027 jelly beans.
If four of your friends offer to help with the task, you could choose from a number of potential(adj. 潜在的;可能的;) strategies, but let’s consider one that closely mirrors the sort of strategy that SQL Server would adopt(vt. 采取;接受;收养;). You seat your friends around a table with the jar at its centre(vi. 以…为中心), and a single scoop(vt. 掘;舀取;) to remove beans from the jar. You ask them to help themselves to a scoop of beans whenever they need more to count. Each friend is also given a pen and a piece of paper, to keep a running total of the number of beans they have counted so far.
Once a person finishes counting and finds the jar empty, they pass their individual bean count total to you. As you collect each subtotal, you add it to a grand(adj. 宏伟的;) total. When you have received a subtotal from each of your friends, the task is complete. With four people counting beans simultaneously(adv. 同时地) the whole task is completed in around two and a half minutes – a four-fold(adj. 四倍的;四重的;) improvement over counting them all yourself. Of course, four people still worked for a total of ten minutes (plus the few seconds it took you to add the last subtotal to the grand total).
This particular task is well-suited(adj. 便利的;适当的) to parallel working because each person is able to work concurrently and independently. The desired result is obtained much more quickly, without doing much more work overall.
Counting Beans with SQL Server
SQL Server cannot count jelly beans directly, so we ask it to count the number of rows in a table instead. If the table is small, SQL Server will likely use an execution plan like the one shown in Figure(n. 数字;人物;图形;) 1.
Figure 1: Serial(adj. 连续的;连载的;) Counting Plan
This query plan uses a single worker – equivalent(adj. 等价的,相等的;同意义的) to counting all the beans yourself. The plan itself is very simple: the Stream Aggregate operator counts the rows it receives from the Index Scan operator, and returns the result once all rows have been processed. You might have chosen a similar strategy if the jelly bean jar had been almost empty, since you would be unlikely to save much time by splitting such a small number of beans among your friends, and the extra workers might even slow the process down slightly, due to the extra step of adding partial counts together at the end.
On the other hand, if the table is large enough, the SQL Server optimizer may choose to enlist(vt. 使入伍;征募;) additional workers, producing a query plan like the one shown in Figure 2.
The small yellow arrow icons identify operations that involve multiple workers. Each worker is assigned a separate part of the problem, and the partial(adj. 局部的;偏爱的;不公平的) results are then combined to give a final result. As the manual bean-counting example demonstrated(演示), the parallel plan has the potential(n. 潜能;可能性;) to complete much faster than the serial plan, because multiple workers will be actively counting rows, simultaneously(adv. 同时地).
How Parallelism Works
Imagine for a moment that SQL Server has no built-in support for parallelism. You might try to improve the performance of the original row-counting query by manually splitting the query into equally-sized pieces, and running each one concurrently on a separate connection to the server.
Each query in Figure 3 is written to process a separate range of rows from the table, ensuring that every row from the table is processed exactly once overall. With luck, SQL Server would run each query on a separate processing unit, and you could expect to receive the three partial results in roughly(adv. 粗糙地;概略地) a third of the time. Naturally, you would still need to perform the extra step of adding the three values together to get a correct final result.
Parallel Execution as Multiple Serial Plans
The ‘manual parallelism’ example is not that far removed from the way SQL Server actually implements its parallel query facility(n.容易,简易, 灵巧,敏捷;). Recall the parallel query plan from Figure 2, and assume that SQL Server allocates three additional workers to the query at runtime. Conceptually, we can redraw the parallel plan to show SQL Server running three serial(adj. 连续的;连载的;分期偿还的) plans concurrently (this representation is not strictly(adv. 严格地;完全地;确实地) accurate, but we will correct that shortly).
Each additional worker is assigned to one of the three plan branches that feed into the Gather Streamsoperator. Notice that only the Gather Streams operator retains the little yellow parallelism icon; it is now the only operator that interacts(vi. 互动;交互;相互作用) with multiple workers. This general strategy suits SQL Server for two main reasons. Firstly, all the SQL Server code necessary to execute serial plans already exists, and has been optimized over many years and product releases. Secondly, this method scales extremely well: if more workers are available at runtime, SQL Server can easily add extra plan branches to split the work more ways.
The number of extra workers SQL Server assigns to each parallel plan region at runtime is known as the degree of parallelism (often abbreviated to DOP). SQL Server chooses the DOP just before the query starts executing, and it can change between executions without requiring a plan recompilation. The maximum DOP for each parallel region is determined by the number of logical processing units visible to SQL Server.
Parallel Scan and the Parallel Page Supplier
The problem with the conceptual plan shown in Figure 4 is that each Index Scan operator would count every row in the entire input set. Left uncorrected, the plan would produce incorrect results and probably take longer to execute than the serial version did. The manual parallelism example avoided that issue by using an explicit(adj. 明确的;清楚的;直率的;详述的) WHERE clause in each query to split the input rows into three distinct and equally-sized ranges.
SQL Server does not use quite the same approach, because distributing the work evenly makes the implicit(adj. 含蓄的;暗示的;盲从的) assumption that each query will receive an equal share of the available processing resources, and that each data row will require the same amount of effort to process. In a simple example like counting rows in a table (on a server with no other activity) those assumptions may well hold, and the three queries might indeed return their partial results at about the same time.
In general, however, it is easy to think of examples where one or more of those hidden assumptions would not apply in the real world, due to any number of external or internal factors. For example, one of the queries might be scheduled on the same logical processor as a long-running bulk load, while the others run without contention. Alternatively(adv. 非此即彼;二者择一地;), consider a query that includes a join operation, where the amount of effort(n. 努力;成就) required to process a particular row depends heavily on whether it matches the join condition or not. If some queries happen to receive more joining rows than others, the execution times are likely to vary widely, and overall performance will be limited by the speed of the slowest worker.
Instead of allocating a fixed number of rows to each worker, SQL Server uses a Storage Engine feature called the Parallel Page Supplier to distribute rows among the workers, on demand. You will not see the Parallel Page Supplier in a graphical query plan because it is not part of the Query Processor, but we can extend the illustration of Figure 4 to show where it would sit and what its connections would be:
The crucial(adj. 重要的;决定性的;定局的;决断的) point is that this is a demand-based scheme; the Parallel Page Supplier responds to requests from workers, providing a batch of rows to any worker that needs more work to do. Referring back to the bean-counting analogy(n. 类比;类推;类似), the Parallel Page Supplier is represented by the scoop used to remove beans from the jar. The single, shared scoop(n. 勺;铲子;vt. 掘;舀取;) ensures that no two people count the same beans, but there is otherwise nothing to stop the same person collecting more beans, as required. In particular, if one person is slower than the others, that person simply takes fewer scoops from the jar, and the other workers will count more beans to compensate(vi. 补偿,赔偿;抵消).
In SQL Server, a slow worker makes fewer requests to the Parallel Page Supplier, and so processes fewer rows. Other workers are unaffected, and continue to process rows at their individual maximum rates. In this way, the demand-based scheme provides some measure of resilience(n. 恢复力;弹力;顺应力) to variations(n. 变奏曲,变更;) in worker throughput. Instead of being bound by the speed of the slowest worker, the performance of the demand scheme degrades(vt. 贬低;使……丢脸;使……降解) gracefully(adv. 优雅地;温文地) as individual worker throughput(n.生产量,生产能力,吞吐量; 流率;) declines(vi. 下降;衰退;拒绝). Nevertheless, the fact that each worker may process a significantly(adv. 显著地;相当数量地) different number of rows, depending on runtime conditions, can cause other problems (a topic we will return to later in this series).
Note that the use of a Parallel Page Supplier does not prevent SQL Server from using existing optimizations like read-ahead scanning (prefetching data from permanent storage). In fact, it may even be slightly more efficient for the three workers to consume rows from a single, underlying physical scan, rather than from the three separate range scans that we saw in the manual parallelism example.
The Parallel Page Supplier is also not limited to use with index scans; SQL Server uses a Parallel Page Supplier whenever multiple workers cooperatively read a data structure. That data structure may be a heap, clustered table, or an index, and the operation may be either a scan or a seek operation. If the latter point surprises you, consider that an Index Seek operation is just a partial scan i.e. it seeks to find the first qualifying row and then scans to the end of the qualifying range.