ZhangZhihui's Blog  

 

rank@ZZHUBT:~$ pip install airflowctl

 

复制代码
rank@ZZHUBT:~$ airflowctl init my_airflow_project --build-start
......
ebserver  | [2025-01-18 20:46:08 +0800] [19140] [INFO] Starting gunicorn 23.0.0
webserver  | [2025-01-18 20:46:08 +0800] [19140] [INFO] Listening at: http://0.0.0.0:8080 (19140)
webserver  | [2025-01-18 20:46:08 +0800] [19140] [INFO] Using worker: sync
webserver  | [2025-01-18 20:46:08 +0800] [19152] [INFO] Booting worker with pid: 19152
webserver  | [2025-01-18 20:46:08 +0800] [19153] [INFO] Booting worker with pid: 19153
standalone | Airflow is ready
standalone | Login with username: admin  password: aszEnNbvzC8dPmV4
......
复制代码

 

 

 

 

 

 

复制代码
import json

from airflow.decorators import (
    dag,
    task,
)
from pendulum import datetime


# When using the DAG decorator, The "dag_id" value defaults to the name of the function
# it is decorating if not explicitly set. In this example, the "dag_id" value would be "example_dag_basic".
@dag(
    # This defines how often your DAG will run, or the schedule by which your DAG runs. In this case, this DAG
    # will run daily
    schedule="@daily",
    # This DAG is set to run for the first time on January 1, 2023. Best practice is to use a static
    # start_date. Subsequent DAG runs are instantiated based on the schedule
    start_date=datetime(2023, 1, 1),
    # When catchup=False, your DAG will only run the latest run that would have been scheduled. In this case, this means
    # that tasks will not be run between January 1, 2023 and 30 mins ago. When turned on, this DAG's first
    # run will be for the next 30 mins, per the its schedule
    catchup=False,
    default_args={
        "retries": 2,  # If a task fails, it will retry 2 times.
    },
    tags=["example"],
)  # If set, this tag is shown in the DAG view of the Airflow UI
def example_dag_basic():
    """
    ### Basic ETL Dag
    This is a simple ETL data pipeline example that demonstrates the use of
    the TaskFlow API using three simple tasks for extract, transform, and load.
    For more information on Airflow's TaskFlow API, reference documentation here:
    https://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html
    """

    @task()
    def extract():
        """
        #### Extract task
        A simple "extract" task to get data ready for the rest of the
        pipeline. In this case, getting data is simulated by reading from a
        hardcoded JSON string.
        """
        data_string = '{"1001": 301.27, "1002": 433.21, "1003": 502.22}'

        order_data_dict = json.loads(data_string)
        return order_data_dict

    @task(multiple_outputs=True)  # multiple_outputs=True unrolls dictionaries into separate XCom values
    def transform(order_data_dict: dict):
        """
        #### Transform task
        A simple "transform" task which takes in the collection of order data and
        computes the total order value.
        """
        total_order_value = 0

        for value in order_data_dict.values():
            total_order_value += value

        return {"total_order_value": total_order_value}

    @task()
    def load(total_order_value: float):
        """
        #### Load task
        A simple "load" task that takes in the result of the "transform" task and prints it out,
        instead of saving it to end user review
        """

        print(f"Total order value is: {total_order_value:.2f}")

    order_data = extract()
    order_summary = transform(order_data)
    load(order_summary["total_order_value"])


example_dag_basic()
复制代码

 

Let’s declare the DAG and identify some of the fields that are required, such as schedule:

 

 

 

 

 

 

 

 

 

 

 

 

 

复制代码
import json

from airflow.decorators import (
    dag,
    task,
    task_group
)
from airflow.operators.empty import EmptyOperator
from pendulum import datetime


@dag(
    schedule="@daily",
    start_date=datetime(2023, 1, 1),
    catchup=False,
    default_args={
        "retries": 2,
    },
    tags=["zzh"],
)
def zzh_dag():
    @task()
    def extract():
        data_string = '{"1001": 301.27, "1002": 433.21, "1003": 502.22}'

        order_data_dict = json.loads(data_string)
        return order_data_dict

    @task(multiple_outputs=True)  # multiple_outputs=True unrolls dictionaries into separate XCom values
    def transform(order_data_dict: dict):
        total_order_value = 0

        for value in order_data_dict.values():
            total_order_value += value

        return {"total_order_value": total_order_value}

    @task()
    def load(total_order_value: float):
        print(f"Total order value is: {total_order_value:.2f}")

    @task_group(group_id='extraction_task_group')
    def tg1():
        t1 = extract()
        t2 = EmptyOperator(task_id='task_2')
        t1 >> t2
        return t1

    order_data = tg1()
    order_summary = transform(order_data)
    order_data >> order_summary >> load(order_summary["total_order_value"])

zzh_dag()
复制代码

 

 

 

posted on   ZhangZhihuiAAA  阅读(2)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· winform 绘制太阳,地球,月球 运作规律
 
点击右上角即可分享
微信分享提示