Oracle 8i and 9i分析函数(转)

http://www.oracle.com.cn/viewthread.php?tid=38374&highlight=OLAP  

Analytic Functions in Oracle 8i and 9i

Oracle 8i and 9i分析函数


Contents目录

Overview and Introduction概述与简介
How Analytic Functions Work分析函数原理
The Syntax句法
Calculate a running Total累计计算
Top-N Queries前N条查询
    Example 1例1
    Example 2例2
Windows窗口
    Range Windows范围窗口
    Compute average salary for defined range计算定义范围的平均工资
    Row Windows行窗口
    Accessing Rows Around Your Current Row访问当前行前后的行
LAG
LEAD
Determine the First Value / Last Value of a Group确定组的首值和末值
Crosstab or Pivot Queries交叉表或Pivot查询
Conclusion结论
Links and Documents链接和文档

Overview概述:

Analytic Functions, which have been available since Oracle 8.1.6, are designed to address such problems as "Calculate a running total", "Find percentages within a group", "Top-N queries", "Compute a moving average" and many more. Most of these problems can be solved using standard PL/SQL, however the performance is often not what it should be. Analytic Functions add extensions to the SQL language that not only make these operations easier to code; they make them faster than could be achieved with pure SQL or PL/SQL. These extensions are currently under review by the ANSI SQL committee for inclusion in the SQL specification.

分析函数,最早是从ORACLE8.1.6开始出现的,它的设计目的是为了解决诸如“累计计算”,“找出分组内百分比”,“前-N条查询”,“移动平均数计算”"等问题。其实大部分的问题都可以用PL/SQL解决,但是它的性能并不能达到你所期望的效果。分析函数是SQL言语的一种扩充,它并不是仅仅试代码变得更简单而已,它的速度比纯粹的SQL或者PL/SQL更快。现在这些扩展已经被纳入了美国国家标准化组织SQL委员会的SQL规范说明书中。

How Analytic Functions Work ?分析函数的原理

Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions in that they return multiple rows for each group. The group of rows is called a window and is defined by the analytic clause. For each row, a "sliding" window of rows is defined. The window determines the range of rows used to perform the calculations for the "current row". Window sizes can be based on either a physical number of rows or a logical interval such as time. 分析函数是在一个记录行分组的基础上计算它们的总值。与集合函数不同他们返回各分组的多行记录。行的分组被称窗口,并通过分析语句定义。对于记录行,定义了一个“滑动”窗口。该窗口确定“当前行”计算的范围。窗口的大小可由各行的实际编号或由时间等逻辑间隔确定

Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed. Therefore, analytic functions can appear only in the select list or ORDER BY clause. 除了ORDER BY(按…排序)语句外,分析函数查询被执行的操作。所有合并WHEREGROUP BYHAVING语句都是分析函数处理之前完成的。因此,分析函数只出现在选择目录或ORDER BY(按…排序)语句中

The Syntax句法

The Syntax of analytic functions is rather straightforward in appearance分析函数的句法非常简单。

Analytic-Function(<Argument>,<Argument>,...)
OVER (
  <Query-Partition-Clause>
  <Order-By-Clause>
  <Windowing-Clause>
)

Analytic-Function分析函数的种类

Specify the name of an analytic function, Oracle actually provides many analytic functions such as AVG, CORR, COVAR_POP, COVAR_SAMP, COUNT, CUME_DIST, DENSE_RANK, FIRST, FIRST_VALUE, LAG, LAST, LAST_VALUE, LEAD, MAX, MIN, NTILE, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC, RANK, RATIO_TO_REPORT, STDDEV, STDDEV_POP, STDDEV_SAMP, SUM, VAR_POP, VAR_SAMP, VARIANCE.

分析函数的名称,ORACLE通常多个分析函数,包括:AVG, CORR, COVAR_POP, COVAR_SAMP, COUNT, CUME_DIST, DENSE_RANK, FIRST, FIRST_VALUE, LAG, LAST, LAST_VALUE, LEAD, MAX, MIN, NTILE, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC, RANK, RATIO_TO_REPORT, STDDEV, STDDEV_POP, STDDEV_SAMP, SUM, VAR_POP, VAR_SAMP, VARIANCE.

Arguments参数

Analytic functions take 0 to 3 arguments. 分析函数通常有03个参数

Query-Partition-Clause查询划分语句

The PARTITION BY clause logically breaks a single result set into N groups, according to the criteria set by the partition expressions. The words "partition" and "group" are used synonymously here. The analytic functions are applied to each group independently, they are reset for each group. 根据划分表达式设置的规则,PARTITION BY(按…划分)将一个结果逻辑分成N个分组划分表达式此“划分分组”用作同义词。分析函数独立应用于各个分组,并在应用时重置。

Order-By-Clause排序语句

The ORDER BY clause specifies how the data is sorted within each group (partition). This will definitely affect the outcome of any analytic function. ORDER BY(按…排序)语句规定了个分(划分)的数据如何排序。这必然影响分析函数的结果。

Windowing-Clause生成语句

The windowing clause gives us a way to define a sliding or anchored window of data, on which the analytic function will operate, within a group. This clause can be used to have the analytic function compute its value based on any arbitrary sliding or anchored window within a group. More information on windows can be found here. 生成语句用以定义滑动或固定数据窗口,分析函数在分组内进行分析。语句能够对分组中任意定义的滑动或固定窗口进行计算。点击此处了解更多。

Example: Calculate a running Total累计计算

This example shows the cumulative salary within a departement row by row, with each row including a summation of the prior rows salary.本例中对某部门的工资进行逐行计算,每行包括之前所有行中工资的合计。

set autotrace traceonly explain
break on deptno skip 1
column ename format A6
column deptno format 999
column sal format 99999
column seq format 999

SELECT ename "Ename", deptno "Deptno", sal "Sal",
  SUM(sal)
    OVER (ORDER BY deptno, ename) "Running Total",
  SUM(SAL)
    OVER (PARTITION BY deptno
          ORDER BY ename) "Dept Total",
  ROW_NUMBER()
    OVER (PARTITION BY deptno
          ORDER BY ENAME) "Seq"
FROM emp
ORDER BY deptno, ename
/

Ename  Deptno    Sal Running Total Dept Total  Seq
------ ------ ------ ------------- ---------- ----
CLARK      10   2450          2450       2450    1
KING            5000          7450       7450    2
MILLER          1300          8750       8750    3

ADAMS      20   1100          9850       1100    1
FORD            3000         12850       4100    2
JONES           2975         15825       7075    3
SCOTT           3000         18825      10075    4
SMITH            800         19625      10875    5

ALLEN      30   1600         21225       1600    1
BLAKE           2850         24075       4450    2
JAMES            950         25025       5400    3
MARTIN          1250         26275       6650    4
TURNER          1500         27775       8150    5
WARD            1250         29025       9400    6

Execution Plan
---------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   WINDOW (SORT)
   2    1     TABLE ACCESS (FULL) OF 'EMP'
Statistics
---------------------------------------------------
          0 recursive calls
          0 db block gets
          3 consistent gets
          0 physical reads
          0 redo size
       1658 bytes sent via SQL*Net to client
        503 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          1 sorts (memory)
          0 sorts (disk)
         14 rows processed

The example shows how to calculate a "Running Total" for the entire query. This is done using the entire ordered result set, via SUM(sal) OVER (ORDER BY deptno, ename).本例指出了如何计算整条查询的“累计”。即使用排序后的整个结果集合,通过SUM(sal) OVER (ORDER BY deptno, ename)函数得到

Further, we were able to compute a running total within each department, a total that would be reset at the beginning of the next department. The PARTITION BY deptno in that SUM(sal) caused this to happen, a partitioning clause was specified in the query in order to break the data up into groups.可以进一步计算各个部门的累计值,该值在开始下一个部门计算时将被重置。由SUM(sal)中的PARTITION BY deptno实现该条查询中指定划分语句将数据进行分组。

The ROW_NUMBER() function is used to sequentially number the rows returned in each group, according to our ordering criteria (a "Seq" column was added to in order to display this position).根据排序规则(增加了“Seq”列以显示该状态),ROW_NUMBER()函数将每组返回的记录行进行顺序编号,

The execution plan shows, that the whole query is very well performed with only 3 consistent gets, this can never be accomplished with standard SQL or even PL/SQL. 执行计划显示,整条查询仅需3条一致get函数就可以很好的执行。这一点是标准SQL甚至PL/SQL不能都实现的。

Top-N Queries前N条查询

How can we get the Top-N records by some set of fields ?如何通过部分字段得到前N条记录?
Prior to having access to these analytic functions, questions of this nature were extremely difficult to answer.在未使用这些分析函数之前,很难对此类问题做出回答。

There are some problems with Top-N queries however; mostly in the way people phrase them. It is something to be careful about when designing reports. Consider this seemingly sensible request:人们关于前N条查询的说法存在问题。在设计报告时,应留意这一点。

I would like the top three paid sales reps by department我需要知道部门工资为前3名销售代表的谁。

The problem with this question is that it is ambiguous. It is ambiguous because of repeated values, there might be four people who all make the same salary, what should we do then ?这句话的问题在于含混不清。因为存在重复的值,如果有四个人领着同样的工资,该怎么处理?

Let's look at three examples, all use the well known table EMP.以下3个例子均使用EMP表。

Example 1例1

Sort the sales people by salary from greatest to least. Give the first three rows. If there are less then three people in a department, this will return less than three records.从多到少排列销售人员的工资,取前三行。如果该部门少于三人,则返回的记录少于三个。

set autotrace on explain
break on deptno skip 1

SELECT * FROM (
  SELECT deptno, ename, sal, ROW_NUMBER()
  OVER (
    PARTITION BY deptno ORDER BY sal DESC
  ) Top3 FROM emp
)
WHERE Top3 <= 3

/

    DEPTNO ENAME             SAL       TOP3
---------- ---------- ---------- ----------
        10 KING             5000          1
           CLARK            2450          2
           MILLER           1300          3

        20 SCOTT            3000          1
           FORD             3000          2
           JONES            2975          3

        30 BLAKE            2850          1
           ALLEN            1600          2
           TURNER           1500          3

9 rows selected.

Execution Plan
--------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   VIEW
   2    1     WINDOW (SORT)
   3    2       TABLE ACCESS (FULL) OF 'EMP'

This query works by sorting each partition (or group, which is the deptno), in a descending order, based on the salary column and then assigning a sequential row number to each row in the group as it is processed. The use of a WHERE clause after doing this to get just the first three rows in each partition.该查询根据工资列以降序排列各个划分(或分组,属于该deptno),并在处理过程中为每行分配一个顺序号。然后使用WHERE语句得到各划分的前三行。

Example 2例2

Give me the set of sales people who make the top 3 salaries - that is, find the set of distinct salary amounts, sort them, take the largest three, and give me everyone who makes one of those values.我需要工资为前三位的销售人员名字——即查找工资金额、排序、取最高的三项金额、给我领取这些工资的人员的名字。

SELECT * FROM (
  SELECT deptno, ename, sal,
         DENSE_RANK()
  OVER (
    PARTITION BY deptno ORDER BY sal desc
  ) TopN FROM emp
)
WHERE TopN <= 3
ORDER BY deptno, sal DESC
/


    DEPTNO ENAME             SAL       TOPN


---------- ---------- ---------- ----------
        10 KING             5000          1
           CLARK            2450          2
           MILLER           1300          3

        20 SCOTT            3000          1  <--- !
           FORD             3000          1  <--- !

           JONES            2975          2
           ADAMS            1100          3

        30 BLAKE            2850          1
           ALLEN            1600          2
        30 TURNER           1500          3


10 rows selected.

Execution Plan
--------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   VIEW
   2    1     WINDOW (SORT PUSHED RANK)
   3    2       TABLE ACCESS (FULL) OF 'EMP'

Here the DENSE_RANK function was used to get the top three salaries. We assigned the dense rank to the salary column and sorted it in a descending order.其中,使用DENSE_RANK函数得出最高的三个工资金额。然后指定Dense rank至工资列,并将其按降序排列。

The DENSE_RANK function computes the rank of a row in an ordered group of rows. The ranks are consecutive integers beginning with 1. The largest rank value is the number of unique values returned by the query. Rank values are not skipped in the event of ties. Rows with equal values for the ranking criteria receive the same rank. DENSE_RANK函数计算排序后分组中各行的序数。序数为从1开始的连续整数。最大的序数就是查询所所返回唯一值的个数。如果出现并列,序数不跳计。具有相同值的列的序数相同。

The DENSE_RANK function does not skip numbers and will assign the same number to those rows with the same value. Hence, after the resultset is built in the inline view, we can simply select all of the rows with a dense rank of three or less, this gives us everyone who makes the top three salaries by department number. DENSE_RANK函数不跳计序数,并为相同值的列赋予相同的序数。结果集合在当前窗口建立后,通过部门编号选择Dense rank为3 或3之前的行,就可以知道工资在该部门前三位的名字。

Windows窗口

The windowing clause gives us a way to define a sliding or anchored window of data, on which the analytic function will operate, within a group. The default window is an anchored window that simply starts at the first row of a group an continues to the current row.窗口语句用以定义滑动或固定数据窗口。在其上面运行组的分析函数。默认窗口为固定窗口,从第一行开始到当前行。

We can set up windows based on two criteria: RANGES of data values or ROWS offset from the current row. It can be said, that the existance of an ORDER BY in an analytic function will add a default window clause of RANGE UNBOUNDED PRECEDING. That says to get all rows in our partition that came before us as specified by the ORDER BY clause.可根据两种规则对窗口进行设置:数据值的范围当前行指定区距的。分析函数中的ORDER BY会默认添加一条窗口语句:RANGE UNBOUNDED PRECEDING。即按照ORDER BY语句取得划分中的当前之前的所有行。

Let's look at an example with a sliding window within a group and compute the sum of the current row's SAL column plus the previous 2 rows in that group. If we need a report that shows the sum of the current employee's salary with the preceding two salaries within a departement, it would look like this.5。以下例子为一个分组中的滑动窗口,计算该组中当前行与其前两行的SAL列的和。如我们需要计算当前员工的工资和其之前的两人工资的和,如下例所示。

break on deptno skip 1
column ename format A6
column deptno format 999
column sal format 99999

SELECT deptno "Deptno", ename "Ename", sal "Sal",
  SUM(SAL)
    OVER (PARTITION BY deptno
          ORDER BY ename
          ROWS 2 PRECEDING) "Sliding Total"
FROM emp
ORDER BY deptno, ename
/

Deptno Ename     Sal Sliding Total
------ ------ ------ -------------
    10 CLARK    2450          2450
       KING     5000          7450
       MILLER   1300          8750

    20 ADAMS    1100          1100
       FORD     3000          4100
      
JONES    2975          7075  ^
       SCOTT    3000          8975  |
       SMITH     800          6775  "-- Sliding Window


    30 ALLEN    1600          1600
       BLAKE    2850          4450
       JAMES     950          5400
       MARTIN   1250          5050
       TURNER   1500          3700
       WARD     1250          4000

The partition clause makes the SUM (sal) be computed within each department, independent of the other groups. Tthe SUM (sal) is ' reset ' as the department changes. The ORDER BY ENAME clause sorts the data within each department by ENAME; this allows the window clause: ROWS 2 PRECEDING, to access the 2 rows prior to the current row in a group in order to sum the salaries.划分语句使SUM (sal)在各部门内进行,并独立于其他组。当部门改变时,SUM (sal) 也被“重置”。ORDER BY ENAME语句通过ENAME排列各部门的数据。这使得窗口语句:ROWS 2 PRECEDING获取该分组中当前行之前两行的数据以计算合计工资。

For example, if you note the SLIDING TOTAL value for SMITH is 6 7 7 5, which is the sum of 800, 3000, and 2975. That was simply SMITH's row plus the salary from the preceding two rows in the window.例如,SMITHSLIDING TOTAL(滑动合计)6 7 7 58003000以及2975的和。即窗口中SMITH行及其之前两行工资的简单相加。

Range Windows范围窗口

Range windows collect rows together based on a WHERE clause. If I say ' range 5 preceding ' for example, this will generate a sliding window that has the set of all preceding rows in the group such that they are within 5 units of the current row. These units may either be numeric comparisons or date comparisons and it is not valid to use RANGE with datatypes other than numbers and dates.范围窗口根据WHERE语句进行收集。例如“之前5将会生成一个滑动视窗,包括该分组中当前行之前的5个单位所有行。这些单位可以是数值或日期,使用数字或日期以外的其他数据类型表示范围无效。

Example

Count the employees which where hired within the last 100 days preceding the own hiredate. The range window goes back 100 days from the current row's hiredate and then counts the rows within this range. The solution ist to use the following window specification:计算当前雇佣日期之前100天内雇佣的员工的数量。范围窗口返回当前行雇佣日期100天之前并在这个范围内计算行数。计算使以下窗口规格:

COUNT(*) OVER (ORDER BY hiredate ASC RANGE 100 PRECEDING)

column ename heading "Name" format a8
column hiredate heading "Hired" format a10
column hiredate_pre heading "Hired-100" format a10
column cnt heading "Cnt" format 99

SELECT ename, hiredate, hiredate-100 hiredate_pre,
       COUNT(*)
       OVER (
          ORDER BY hiredate ASC
          RANGE 100 PRECEDING

       ) cnt
  FROM emp
 ORDER BY hiredate ASC
/

Name     Hired      Hired-100  Cnt
-------- ---------- ---------- ---
SMITH    17-DEC-80  08-SEP-80    1
ALLEN    20-FEB-81  12-NOV-80    2
WARD     22-FEB-81  14-NOV-80    3
JONES    02-APR-81  23-DEC-80    3
BLAKE    01-MAY-81  21-JAN-81    4
CLARK    09-JUN-81  01-MAR-81    3
TURNER   08-SEP-81  31-MAY-81    2
MARTIN   28-SEP-81  20-JUN-81    2
KING     17-NOV-81  09-AUG-81    3
JAMES    03-DEC-81  25-AUG-81    5
FORD     03-DEC-81  25-AUG-81    5
MILLER   23-JAN-82  15-OCT-81    4
SCOTT    09-DEC-82  31-AUG-82    1
ADAMS    12-JAN-83  04-OCT-82    2

We ordered the single partition by hiredate ASC. If we look for example at the row for CLARK we can see that his hiredate was 09-JUN-81, and 100 days prior to that is the date 01-MAR-81. If we look who was hired between 01-MAR-81 and 09-JUN-81, we find JONES (hired: 02-APR-81) and BLAKE (hired: 01-MAY-81). This are 3 rows including the current row, this is what we see in the column "Cnt" of CLARK's row.根据雇佣日期ASC每个划分进行排序。例CLARK行可看到其雇佣日期为198169100天之前是198131,看看在这期间雇佣的员工,会发现JONES(雇佣日期:198142)、BLAKE(雇佣日期:198151),共3行,包括当前行,在CLARK行“Cnt”列中。

Compute average salary for defined range计算定义范围平均工资

As an example, compute the average salary of people hired within 100 days before for each employee. The query looks like this:例如,计算每个员工雇佣之前100天内雇佣员工的平均工资。查询如下:

column ename heading "Name" format a8
column hiredate heading "Hired" format a10
column hiredate_pre heading "Hired-100" format a10
column avg_sal heading "Avg-100" format 999999

SELECT ename, hiredate, sal,
       AVG(sal)
       OVER (
          ORDER BY hiredate ASC
          RANGE 100 PRECEDING
       ) avg_sal

  FROM emp
ORDER BY hiredate ASC
/

Name     Hired             SAL Avg-100
-------- ---------- ---------- -------
SMITH    17-DEC-80         800     800
ALLEN    20-FEB-81        1600    1200
WARD     22-FEB-81        1250    1217
JONES    02-APR-81        2975    1942
BLAKE    01-MAY-81        2850    2169
CLARK    09-JUN-81        2450    2758

TURNER   08-SEP-81        1500    1975
MARTIN   28-SEP-81        1250    1375
KING     17-NOV-81        5000    2583
JAMES    03-DEC-81         950    2340
FORD     03-DEC-81        3000    2340
MILLER   23-JAN-82        1300    2563
SCOTT    09-DEC-82        3000    3000
ADAMS    12-JAN-83        1100    2050

Look at CLARK again, since we understand his range window within the group. We can see that the average salary of 2758 is equal to (2975+2850+2450)/3. This is the average of the salaries for CLARK and the rows preceding CLARK, those of JONES and BLAKE. The data must be sorted in ascending order.再看看CLARK我们知道他在本组中的范围窗口,可以看到平均工资2758由(2975+2850+2450/3得来,是CLARK和其之前的JONESBLAKE行工资的平均数。数据必须按由小到大顺序排列。

Row Windows行窗口

Row Windows are physical units; physical number of rows, to include in the window. For example you can calculate the average salary of a given record with the (up to 5) employees hired before them or after them as follows:行窗口实际单位包括在窗口中实际行数。例如可以计算一给定记录的平均工资,该记录包括之前或之后雇佣的员工(至多5名),具体如下:

set numformat 9999
SELECT ename, hiredate, sal,
AVG(sal)
  OVER (ORDER BY hiredate ASC ROWS 5 PRECEDING) AvgAsc,
COUNT(*)
  OVER (ORDER BY hiredate ASC ROWS 5 PRECEDING) CntAsc,
AVG(sal)
  OVER (ORDER BY hiredate DESC ROWS 5 PRECEDING) AvgDes,
COUNT(*)
  OVER (ORDER BY hiredate DESC ROWS 5 PRECEDING) CntDes
FROM emp
ORDER BY hiredate
/

ENAME      HIREDATE    SAL AVGASC CNTASC AVGDES CNTDES
---------- --------- ----- ------ ------ ------ ------
SMITH      17-DEC-80   800    800      1   1988      6
ALLEN      20-FEB-81  1600   1200      2  
2104      6
WARD       22-FEB-81  1250   1217      3  
2046      6
JONES      02-APR-81  2975   1656      4  
2671      6
BLAKE      01-MAY-81  2850   1895      5  
2675      6
CLARK      09-JUN-81  2450   1988      6  
2358      6
TURNER     08-SEP-81  1500   2104      6   2167      6
MARTIN     28-SEP-81  1250   2046      6   2417      6
KING       17-NOV-81  5000   2671      6   2392      6
JAMES      03-DEC-81   950   2333      6   1588      4
FORD       03-DEC-81  3000   2358      6   1870      5
MILLER     23-JAN-82  1300   2167      6   1800      3
SCOTT      09-DEC-82  3000   2417      6   2050      2
ADAMS      12-JAN-83  1100   2392      6   1100      1

The window consist of up to 6 rows, the current row and five rows " in front of " this row, where " in front of " is defined by the ORDER BY clause. With ROW partitions, we do not have the limitation of RANGE partition - the data may be of any type and the order by may include many columns. Notice, that we selected out a COUNT(*) as well. This is useful just to demonstrate how many rows went into making up a given average. We can see clearly that for ALLEN's record, the average salary computation for people hired before him used only 2 records whereas the computation for salaries of people hired after him used 6.窗口中包括6行,现有行及此行“之前”的5行,其中“之前”由ORDER BY语句定义。对于ROW(行)的划分,不受RANGE(范围)划分的限制——数据可以是任何类型,order by可包括许多列。注意,也要选择COUNT*,可以说明是多少行的平均值。从ALLEN记录可以清楚看到,之前雇佣员工平均工资计算使用了2个记录,他之后雇佣员工平均工资的计算使用了6记录

Accessing Rows Around Your Current Row访问当前行前后的行

Frequently you want to access data not only from the current row but the current row " in front of " or " behind " them. For example, let's say you need a report that shows, by department all of the employees; their hire date; how many days before was the last hire; how many days after was the next hire.我们常常不仅想访问当前行,还想访问“之前”或“之后”行中的数据。例如,某份报告需要表明部门所有员工、员工雇佣日期、距上雇佣的天数、距下雇佣的天数。

Using straight SQL this query would be difficult to write. Not only that but its performance would once again definitely be questionable. The approach I typically took in the past was either to "select a select" or write a PL/SQL function that would take some data from the current row and "find" the previous and next rows data. This workedbut introduce large overhead into both the development of the query and the run-time execution of the query.直接编写SQL会比较困难,其执行性能必然存在问题。过去我常用的方法是“select a select”或编写PL/SQL函数,从当前行得到数据并“找到”之前以及之后行中的数据这样可以达到目的,查询的开发与运行带来很大开销。

Using analytic functions, this is easy and efficient to do.使用分析函数,简单易行且有效。

set echo on

column deptno format 99 heading Dep
column ename format a6 heading Ename
column hiredate heading Hired
column last_hire heading LastHired
column days_last heading DaysLast
column next_hire heading NextHire
column days_next heading NextDays

break on deptno skip 1

SELECT deptno, ename, hiredate,
LAG(hiredate,1,NULL)
  OVER (PARTITION BY deptno
        ORDER BY hiredate, ename) last_hire,
hiredate - LAG(hiredate,1,NULL)
  OVER (PARTITION BY deptno
        ORDER BY hiredate, ename) days_last,
LEAD(hiredate,1,NULL)
  OVER (PARTITION BY deptno
        ORDER BY hiredate, ename) next_hire,
LEAD(hiredate,1,NULL)
  OVER (PARTITION BY deptno
        ORDER BY hiredate, ename) - hiredate days_next
FROM emp
ORDER BY deptno, hiredate
/

Dep Ename  Hired     LastHired DaysLast NextHire  NextDays
--- ------ --------- --------- -------- --------- --------
 10 CLARK 
09-JUN-81                    17-NOV-81      161
    KING   17-NOV-81
09-JUN-81      161 23-JAN-82       67
    MILLER
23-JAN-82 17-NOV-81       67

 20 SMITH  17-DEC-80                    02-APR-81      106
    JONES  02-APR-81 17-DEC-80      106 03-DEC-81      245
    FORD   03-DEC-81 02-APR-81      245 09-DEC-82      371
    SCOTT  09-DEC-82 03-DEC-81      371 12-JAN-83       34
    ADAMS  12-JAN-83 09-DEC-82       34

 30 ALLEN  20-FEB-81                    22-FEB-81        2
    WARD   22-FEB-81 20-FEB-81        2 01-MAY-81       68
    BLAKE  01-MAY-81 22-FEB-81       68 08-SEP-81      130
    TURNER 08-SEP-81 01-MAY-81      130 28-SEP-81       20
    MARTIN 28-SEP-81 08-SEP-81       20 03-DEC-81       66
    JAMES  03-DEC-81 28-SEP-81       66

The LEAD and LAG routines could be considered a way to "index into your partitioned group ". Using these functions you can access any individual row. Notice for example in the above printout, it shows that the record for KING includes the data (in bold red font) from the prior row (LAST HIRE) and the next row (NEXT-HIRE). We can access the fields in records preceding or following the current record in an ordered partition easily. LEADLAG例程可看作对分进行索引。使用这些函数可以访问任何一行。上面例子表明KING记录包括之前一行(上一雇佣)和下一行(下一)中的数据(加粗红色字体)。可以很容易的访问排序后的分组中当前记录之前或之后记录中的字段。

LAG

LAG ( value_expr [, offset] [, default] )
   OVER ( [query_partition_clause] order_by_clause )

LAG provides access to more than one row of a table at the same time without a self join. Given a series of rows returned from a query and a position of the cursor, LAG provides access to a row at a given physical offset prior to that position. LAG无须自合并就可以获取同一个表格中的多行记录。知道来自查询中的多行以及光标的位置,LEAD就可以进入位置指定区距之前的行。

If you do not specify offset, then its default is 1. The optional default value is returned if the offset goes beyond the scope of the window. If you do not specify default, then its default value is null.如果不指定区距,默认值为1如果区距超出窗口范围,则返回默认值。如不指定默认值,则默认值为null

The following example provides, for each person in the EMP table, the salary of the employee hired just before:下例中,EMP表中的每个人之前雇佣员工的工资:

SELECT ename,hiredate,sal,
LAG(sal, 1, 0) 
  OVER (ORDER BY hiredate) AS PrevSal
FROM emp
WHERE job = 'CLERK';
Ename Hired       SAL PREVSAL
------ --------- ----- -------
SMITH 17-DEC-80   800       0
JAMES 03-DEC-81   950     800
MILLER 23-JAN-82 1300     950
ADAMS 12-JAN-83 1100    1300

LEAD

LEAD ( value_expr [, offset] [, default] )
   OVER ( [query_partition_clause] order_by_clause )

LEAD provides access to more than one row of a table at the same time without a self join. Given a series of rows returned from a query and a position of the cursor, LEAD provides access to a row at a given physical offset beyond that position. LEAD不用自合并就可同时进入一个表格中的多行。知道来自查询中的多行以及光标的位置,LEAD就可以进入位置指定区距之后的行。

If you do not specify offset, then its default is 1. The optional default value is returned if the offset goes beyond the scope of the table. If you do not specify default, then its default value is null.如不指定区距,则默认值为1。如区距超出窗口范围则返回任意默认值。如不指定默认值,默认值为0

The following example provides, for each employee in the EMP table, the hire date of the employee hired just after:下例,EMP表中的每个员工,他们之后雇佣员工的雇佣日期:

SELECT ename, hiredate,
LEAD(hiredate, 1)
  OVER (ORDER BY hiredate) AS NextHired

FROM emp WHERE deptno = 30;

Ename  Hired     NEXTHIRED
------ --------- ---------
ALLEN  20-FEB-81 22-FEB-81
WARD   22-FEB-81 01-MAY-81
BLAKE  01-MAY-81 08-SEP-81
TURNER 08-SEP-81 28-SEP-81
MARTIN 28-SEP-81 03-DEC-81
JAMES  03-DEC-81

Determine the First Value / Last Value of a Group确定组中的第一个值/最后一个值

The FIRST_VALUE and LAST_VALUE functions allow you to select the first and last rows from a group. These rows are especially valuable because they are often used as the baselines in calculations. FIRST_VALUELAST_VALUE函数可以选择一组中的第一行和最后一行。这些行很有用,因为它们经常用计算基线。

Example

The following example selects, for each employee in each department, the name of the employee with the lowest salary. 下例每个部门的每员工最高工资员工的姓名。

break on deptno skip 1

SELECT deptno, ename, sal,
  FIRST_VALUE(ename)
  OVER (PARTITION BY deptno
        ORDER BY sal ASC) AS MIN_SAL_HAS
FROM emp
ORDER BY deptno, ename;

    DEPTNO ENAME             SAL MIN_SAL_HAS
---------- ---------- ---------- -----------
        10 CLARK            2450 MILLER
           KING             5000 MILLER
          
MILLER           1300 MILLER

        20 ADAMS            1100 SMITH
           FORD             3000 SMITH
           JONES            2975 SMITH
           SCOTT            3000 SMITH
          
SMITH             800 SMITH

        30 ALLEN            1600 JAMES
           BLAKE            2850 JAMES
          
JAMES             950 JAMES
           MARTIN           1250 JAMES
           TURNER           1500 JAMES
           WARD             1250 JAMES

The following example selects, for each employee in each department, the name of the employee with the highest salary.下例中为每个部门的每员工最高工资员工的姓名。

SELECT deptno, ename, sal, 
  FIRST_VALUE(ename)
 OVER (PARTITION BY deptno
        ORDER BY sal DESC) AS MAX_SAL_HAS
FROM emp
ORDER BY deptno, ename;

    DEPTNO ENAME             SAL MAX_SAL_HAS
---------- ---------- ---------- -----_-----
        10 CLARK            2450 KING
          
KING             5000 KING
           MILLER           1300 KING

        20 ADAMS            1100 FORD
          
FORD             3000 FORD
           JONES            2975 FORD
           SCOTT            3000 FORD
           SMITH             800 FORD

        30 ALLEN            1600 BLAKE
          
BLAKE            2850 BLAKE
           JAMES             950 BLAKE
           MARTIN           1250 BLAKE
           TURNER           1500 BLAKE
           WARD             1250 BLAKE

The following example selects, for each employee in department 30 the name of the employee with the lowest salary using an inline view下例30个部门中的每员工工资最低员工的姓名,使用内联视图。

SELECT deptno, ename, sal,
  FIRST_VALUE(ename)
  OVER (ORDER BY sal ASC) AS MIN_SAL_HAS
FROM (SELECT * FROM emp WHERE deptno = 30)

    DEPTNO ENAME             SAL MIN_SAL_HAS
---------- ---------- ---------- -----------
        30 JAMES             950 JAMES
           MARTIN           1250 JAMES
           WARD             1250 JAMES
           TURNER           1500 JAMES
           ALLEN            1600 JAMES
           BLAKE            2850 JAMES

Crosstab or Pivot Queries交叉表或Pivot查询

A crosstab query, sometimes known as a pivot query, groups your data in a slightly different way from those we have seen hitherto. A crosstab query can be used to get a result with three rows (one for each project), with each row having three columns (the first listing the projects and then one column for each year) -- like this:交叉表查询,或者pivot查询,用稍微不同的方法将这些数据分组。交叉表查询可根据三行(一行代表一个项目),每行有三列(第一列列出项目,然后一列代表一年)得出结果——如下:

Project        2001        2002
     ID         CHF         CHF
-------------------------------
    100      123.00      234.50
    200      543.00      230.00
    300      238.00      120.50

Example

Let's say you want to show the top 3 salary earners in each department as columns. The query needs to return exactly 1 row per department and the row would have 4 columns. The DEPTNO, the name of the highest paid employee in the department, the name of the next highest paid, and so on. Using analytic functions this almost easy, without analytic functions this was virtually impossible.如果需要以列的形式显示每个部门中工资最多的3个人,查询每个部门返回一行,一行有4列。DEPTNO部门中工资最高的人,工资第二高人的名字,依次类推。使用分析函数容易做到不采用有分析函数这将是不可能的

SELECT deptno,
  MAX(DECODE(seq,1,ename,null)) first,
  MAX(DECODE(seq,2,ename,null)) second,
  MAX(DECODE(seq,3,ename,null)) third
FROM (SELECT deptno, ename,
       row_number()
       OVER (PARTITION BY deptno
             ORDER BY sal desc NULLS LAST) seq
       FROM emp)
WHERE seq <= 3
GROUP BY deptno
/

    DEPTNO FIRST      SECOND     THIRD
---------- ---------- ---------- ----------
        10 KING       CLARK      MILLER
        20 SCOTT      FORD       JONES
        30 BLAKE      ALLEN      TURNER

Note the inner query, that assigned a sequence (RowNr) to each employee by department number in order of salary.注意内查询,它按照工资高低根据部门为每名员工赋予一个序列RowNr)。

SELECT deptno, ename, sal,
 row_number()
 OVER (PARTITION BY deptno
       ORDER BY sal desc NULLS LAST) RowNr
FROM emp;

    DEPTNO ENAME             SAL      ROWNR
---------- ---------- ---------- ----------
        10 KING             5000          1
        10 CLARK            2450          2
        10 MILLER           1300          3
        20 SCOTT            3000          1
        20 FORD             3000          2
        20 JONES            2975          3
        20 ADAMS            1100          4
        20 SMITH             800          5
        30 BLAKE            2850          1
        30 ALLEN            1600          2
        30 TURNER           1500          3
        30 WARD             1250          4
        30 MARTIN           1250          5
        30 JAMES             950          6

The DECODE in the outer query keeps only rows with sequences 1, 2 or 3 and assigns them to the correct "column". The GROUP BY gets rid of the redundant rows and we are left with our collapsed result. It may be easier to understand if you see the resultset without the aggregate function MAX grouped by deptno. 外查询中DECODE保留含有序列123的记录行,并将它们分派到正确“列”。GROUP BY去掉多余行,只剩下压缩结果。如果没有根据deptno分组的合计函数MAX,结果会更容易理解。

SELECT deptno,
  DECODE(seq,1,ename,null) first,
  DECODE(seq,2,ename,null) second,
  DECODE(seq,3,ename,null) third
FROM (SELECT deptno, ename,
       row_number()
       OVER (PARTITION BY deptno
             ORDER BY sal desc NULLS LAST) seq
       FROM emp)
WHERE seq <= 3
/

    DEPTNO FIRST      SECOND     THIRD
---------- ---------- ---------- ----------
        10 KING
        10            CLARK
        10                       MILLER
        20 SCOTT
        20            FORD
        20                       JONES
        30 BLAKE
        30            ALLEN
        30                       TURNER

The MAX aggregate function will be applied by the GROUP BY column DEPTNO. In any given DEPTNO above only one row will have a non-null value for FIRST, the remaining rows in that group will always be NULL. The MAX function will pick out the non-null row and keep that for us. Hence, the group by and MAX will collapse our resultset, removing the NULL values from it and giving us what we want.通过GROUP BYDEPTNO(按DEPTNO分组列)应用MAX合计函数。在上面任意给出的DEPTNO中,只有一行将对FIRST有非null,其余行将永远是NULLMAX函数将选出非NULL并保存。因此,group byMAX将会压缩结果集,去掉NULL值并给出我们想要的值。

Conclusion结论

This new set of functionality holds some exiting possibilities. It opens up a whole new way of looking at the data. It will remove a lot of procedural code and complex or inefficient queries that would have taken a long tome to develop, to achieve the same result.新的函数包括了一些现有的可能性,它开辟处理数据的一种全新方法,减少过程代码以及浪费大量时间开发的复杂或低效查询,实现相同结果。

Links and Documents链接和参考文档

Further articles about Analytic Functions can be found in:有关分析函数更多文章请看:

Oracle9i SQL Reference Release 1 (9.0.1)  Oracle9i SQL参考版本19.0.1)。

Oracle9i Database Concepts Release 1 (9.0.1)  Oracle9i数据库概念版本19.0.1)。

o        Oracle9i Data Warehousing Guide Release 1 (9.0.1)  Oracle9i数据仓储指南版本19.0.1)。

posted @ 2008-06-26 10:12  TG.Yang's IT Space  阅读(756)  评论(0编辑  收藏  举报