spark DSL 的开窗

withColum("新增一列的列名", 某个函数 over Window.partitionBy($"按照该字段分区").orderBy($"按照该字段排序".desc))

========================================================================================================================================

// 某个函数:

- SUM() OVER()
- MIN() OVER()
- MAX() OVER()
- AVG() OVER()
- COUNT() OVER()
- ROW_NUMBER() OVER()
- RANK() OVER()
- DENSE_RANK() OVER()

假设我们有一个名为 sales 的表,包含以下数据:

| sale_id | product_id | amount |
| ------- | ---------- | ------ |
| 1 | 101 | 200 |
| 2 | 102 | 150 |
| 3 | 103 | 300 |
| 4 | 101 | 180 |
| 5 | 103 | 250 |

=======================================================================================================================================
### 1. SUM() OVER()

计算每个产品的销售总额:

sql** **SELECT ** ** sale_id, ** ** product_id, ** ** amount,** ** SUM(amount) OVER (PARTITION BY product_id) AS total_sales** **FROM sales;** **

****执行结果

| sale_id | product_id | amount | total_sales |
| ------- | ---------- | ------ | ----------- |
| 1 | 101 | 200 | 380 |
| 4 | 101 | 180 | 380 |
| 2 | 102 | 150 | 150 |
| 3 | 103 | 300 | 550 |
| 5 | 103 | 250 | 550 |

=======================================================================================================================================
### 2. MIN() OVER()

找出每个产品的最小销售额:

sql** **SELECT ** ** sale_id, ** ** product_id, ** ** amount,** ** MIN(amount) OVER (PARTITION BY product_id) AS min_sales** **FROM sales;** **

****执行结果

| sale_id | product_id | amount | min_sales |
| ------- | ---------- | ------ | --------- |
| 1 | 101 | 200 | 180 |
| 4 | 101 | 180 | 180 |
| 2 | 102 | 150 | 150 |
| 3 | 103 | 300 | 250 |
| 5 | 103 | 250 | 250 |

### 3. MAX() OVER()

找出每个产品的最大销售额:

sql** **SELECT ** ** sale_id, ** ** product_id, ** ** amount,** ** MAX(amount) OVER (PARTITION BY product_id) AS max_sales** **FROM sales;** **

****执行结果

| sale_id | product_id | amount | max_sales |
| ------- | ---------- | ------ | --------- |
| 1 | 101 | 200 | 200 |
| 4 | 101 | 180 | 200 |
| 2 | 102 | 150 | 150 |
| 3 | 103 | 300 | 300 |
| 5 | 103 | 250 | 300 |

### 4. AVG() OVER()

计算每个产品的平均销售额:

sql** **SELECT ** ** sale_id, ** ** product_id, ** ** amount,** ** AVG(amount) OVER (PARTITION BY product_id) AS avg_sales** **FROM sales;** **

****执行结果

| sale_id | product_id | amount | avg_sales |
| ------- | ---------- | ------ | --------- |
| 1 | 101 | 200 | 190 |
| 4 | 101 | 180 | 190 |
| 2 | 102 | 150 | 150 |
| 3 | 103 | 300 | 275 |
| 5 | 103 | 250 | 275 |

### 5. COUNT() OVER()

计算每个产品的销售次数:

sql** **SELECT ** ** sale_id, ** ** product_id, ** ** amount,** ** COUNT(*) OVER (PARTITION BY product_id) AS sales_count** **FROM sales;** **

****执行结果

| sale_id | product_id | amount | sales_count |
| ------- | ---------- | ------ | ----------- |
| 1 | 101 | 200 | 2 |
| 4 | 101 | 180 | 2 |
| 2 | 102 | 150 | 1 |
| 3 | 103 | 300 | 2 |
| 5 | 103 | 250 | 2 |

### 6. ROW_NUMBER() OVER()

为每个产品的销售结果分配行号:

sql** **SELECT ** ** sale_id, ** ** product_id, ** ** amount,** ** ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY sale_id) AS row_num** **FROM sales;** **

****执行结果

| sale_id | product_id | amount | row_num |
| ------- | ---------- | ------ | ------- |
| 1 | 101 | 200 | 1 |
| 4 | 101 | 180 | 2 |
| 2 | 102 | 150 | 1 |
| 3 | 103 | 300 | 1 |
| 5 | 103 | 250 | 2 |

### 7. RANK() OVER()

为每个产品的销售额分配排名(并列排名时有空位):

sql** **SELECT ** ** sale_id, ** ** product_id, ** ** amount,** ** RANK() OVER (PARTITION BY product_id ORDER BY amount DESC) AS sales_rank** **FROM sales;** **

****执行结果

| sale_id | product_id | amount | sales_rank |
| ------- | ---------- | ------ | ---------- |
| 1 | 101 | 200 | 1 |
| 4 | 101 | 180 | 2 |
| 2 | 102 | 150 | 1 |
| 3 | 103 | 300 | 1 |
| 5 | 103 | 250 | 2 |

### 8. DENSE_RANK() OVER()

为每个产品的销售额分配稠密排名(并列排名时没有空位):

sql** **SELECT ** ** sale_id, ** ** product_id, ** ** amount,** ** DENSE_RANK() OVER (PARTITION BY product_id ORDER BY amount DESC) AS dense_sales_rank** **FROM sales;** **

****执行结果

| sale_id | product_id | amount | dense_sales_rank |
| ------- | ---------- | ------ | ---------------- |
| 1 | 101 | 200 | 1 |
| 4 | 101 | 180 | 2 |
| 2 | 102 | 150 | 1 |
| 3 | 103 | 300 | 1 |
| 5 | 103 | 250 | 2 |

posted on 2024-07-18 10:00  By远方  阅读(4)  评论(0编辑  收藏  举报