miketwais

work up

AI 监控平台产品分析--Evidentlyai产品实践指南

大数据,人工智能浪潮席卷IT行业,最近数月Gen AI新产品频频亮相。让人不得不感叹,AI时代已经到来。对企业来说一般会构建AI平台,包括AI开发,训练平台,发布平台等。

但是,一般来说AI模型是面向特定的业务主题,而且对业务影响较大,从开发到上线周期较长,而且,对于AI模型的性能监控也是非常重要的一环,如果没有相应的监控平台,我想大多数AI模型是不敢上线使用的。这也是AI 监控平台的主要作用:

监控模型质量,以便及时对模型进行重新训练和调整。

本文从AI平台整体架构,AI平台能力链条,evidentlyai产品实践等方面展开。

ai平台架构:

 整体上分为三层:

  1.基础能力层,包括CPU,GPU,存储等基础设施能力

  2.技术平台层,包括数据处理,模型开发,模型运行,管理,监控等模块

  3.AI应用层,包括各种类型的AI模型

 

AI平台能力链条:

 从AI平台的能力上来看,主要分为:

  1.业务理解,对应可视化建模平台

  2.数据处理,对应为数据处理平台,数据标签平台

  3.模型开发平台,对应为model开发IDE notebook

  4.模型评估和上线

  5.模型运行平台,对应模型运行平台和监控平台

通过模型的监控,发现模型质量的偏差和问题,进而及时对模型进行调整,这样形成模型开发的闭环,确保模型能够持续改进,适应业务的需求。

 

模型监控及模型监控工具--evidentlyai

关于模型监控,一般从三个层面来进行考虑:

 

  1.运维层面:包括模型的访问次数,访问延迟时间,CPU/MEM/IO等系统状况

  2.模型性能层面:包括用来识别concept drift的RMSE,AUC-ROC,KS统计等指标

  3.模型稳定性矩阵:包括PSI指数和CSI指数

 

我们可以从下面这些方面来进行考虑:

  1. 数据质量和完整性
    --验证输入数据是否符合我们的期望至关重要。检查可能包括范围合规性、数据分布、特征统计、 相关性或我们认为数据集“正常”的任何行为。
    --确认我们正在提供模型可以处理的数据
  2. 数据和目标漂移
    --当模型接收到它在训练中没有看到的数据时,我们可能会遇到数据漂移。
    --当现实世界的模式发生变化,我们可能会遇到概念漂移。(模型不再适用,如: 影响所有客户行为的全球流行病,出现新的影响因素 )
    -- 目标是获得概念或数据发生变化的早期信号,及时更新我们的模型了
  3. 模型性能
    -- 将模型的预测与实际值进行对比,对比KPI如:分类的Precision/Recall、回归的 RMSE

 

 以上面的三条为评判依据,我们搜集了业界市面上的AI监控平台及解决方案,并进行了对比:

 综合发现,Evidently是一款能够满足我们需求的开源产品,所以,先锁定Evidently进行研究。

 

---------------------------------------------------分割线---------------------------------------------------------------------------

进入官网,映入眼帘的是对产品的介绍:

The open-source ML observability platform
Evaluate, test, and monitor ML models from validation to production.
From tabular data to NLP and LLM. Built for data scientists and ML engineers.

 研究过程我就不在此坠述了,直接上我们的解决方案吧,我们目前部署模型,暴露为api方式供调用,Evidently提供了多种接入和使用方式:

  1.以python包的形式引入,可在开发过程中可视化模型性能指标,并生成html报告。

  

  2.针对批量模型,可以结合调度工具airflow,定时批量生成报告,结合dashboard进行展示

    

     dashboard:

    

  3.针对实时场景,可以使用Granfana + Prometheus + Evidently来实现实时监控

  

  以上三种都是在官网提供的参考指南,可以在下面网址找到不同案例的实践指南:

  https://docs.evidentlyai.com/integrations/evidently-integrations

  https://github.com/evidentlyai/evidently/tree/main/examples/integrations

  整体上来讲,第一种是基础,第二种具备可实施的条件,第三种dashboard还不够全面,实施难度较大。选择第二种方案来验证。

------------------------------------------分割线------------------------------------------------------------

我们本地开发了一个二手车估值的regression模型,用这个例子来做验证。

1.下载evidentlyai到机器

  git clone git@github.com:evidentlyai/evidently.git

2.我们主要使用两个项目:

  

   airflow_drift_detection使用airflow创建pipline触发生成模型性能质量报告,streamlit_dashboard用来进行报告展示。

3.airflow_drift_detection 安装

  修改dckerfile/Dockerfile

FROM puckel/docker-airflow:1.10.9
RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
RUN pip install evidently==0.2.0

#RUN useradd -d /home/ubuntu -ms /bin/bash -g root -G sudo -p ubuntu ubuntu
#RUN mkdir /opt/myvolume  && chown ubuntu /opt/myvolume
#WORKDIR /home/ubuntu
#VOLUME /opt/myvolume
View Code

  修改docker-compose.yml 主要修改目录映射关系,将报告直接生成到streamlit_dashboard中对应的目录

version: '3.7'

services:
  postgres:
    image: postgres:9.6
    environment:
      - POSTGRES_USER=airflow
      - POSTGRES_PASSWORD=airflow
      - POSTGRES_DB=airflow
    logging:
      options:
        max-size: 10m
        max-file: "3"

  webserver:
    build: ./dockerfiles
    user: "airflow:airflow"
    restart: always
    depends_on:
      - postgres
    environment:
      - LOAD_EX=n
      - EXECUTOR=Local
    logging:
      options:
        max-size: 10m
        max-file: "3"
    volumes:
      - ./dags:/usr/local/airflow/dags
      - ../streamlit_dashboard/projects:/usr/local/airflow/reports
        #- ./evidently_reports:/usr/local/airflow/reports
    ports:
      - "8080:8080"
    command: webserver
    healthcheck:
      test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
      interval: 30s
      timeout: 30s
      retries: 3
volumes:
  evidently_reports:
View Code

  到airflow_drift_detection 根目录:docker compose up --build -d

  可能遇到无法生成报告问题,江睦路权限修改为777即可:chmod 777 ../streamlit_dashboard/projects

   访问地址:********:8080/admin/

4.streamlit_dashboard安装 

  sudo su
  切换到root

  创建虚拟环境:

  cd /home/uradataplatform/
  python3 -m venv .venv
  source ./venv/bin/activate
  进入虚拟环境
  进入项目目录:
  cd /home/uradataplatform/sc/streamlit_dashboard

  执行命令:
  pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

  启动程序:
  cd /home/uradataplatform/sc/streamlit_dashboard/
  cd streamlit-app
  streamlit run app.py &

  访问地址:********:8051

5.pipline开发:

  目前evidentlyai预置了7种报告:Data Quality,Data Drift,Target Drift,Classification performance,Regression performance,Text Overview,NoTargetPerformance

   这里选择Data Drift,Target Drift,Regression performance三种报告展示:

  1>Data Drift report

  

try:
    import os

    from datetime import datetime
    from datetime import timedelta

    import psycopg2  #python用来操作postgreSQL数据库的第三方库   
    import pandas as pd

    import pandas as pd
    from airflow import DAG
    from airflow.operators.python_operator import PythonOperator
    from sklearn import datasets

    from evidently.metric_preset import DataDriftPreset
    from evidently.pipeline.column_mapping import ColumnMapping
    from evidently.report import Report


except Exception as e:
    print("Error  {} ".format(e))

dir_path = "reports"
file_path = "used_car_valuation_data_drift_report.html"
project_name = "used_car_valuation"
#timstamp_area = "2023-10-10_2023-10-18"
# 获取当前日期和时间
now = datetime.now()
# 格式化日期和时间
format_today = now.strftime("%Y-%m-%d")
#yesterday
yesterday = now - timedelta(days=1)
format_yesterday =yesterday.strftime("%Y-%m-%d")
timstamp_area=format_yesterday+"_"+format_today
full_path = dir_path+'/'+project_name+'/reports/'+timstamp_area

def load_data_execute(**context):

    # 连接到一个给定的数据库
    conn = psycopg2.connect(database="radarSmartcustoms",user="radarSmartcustoms", password='', host="129.184.13.155", port='5433')
    cursor = conn.cursor() # 连接游标
    #获取数据表1中的列名
    sql1_text="""select string_agg(column_name,',') from information_schema.columns 
            where table_schema='public' and table_name='valuation_model_res'  
                    """
    cursor.execute(sql1_text)  #执行SQL语句
    # 获取SELECT返回的元组
    data1 = cursor.fetchall()  # 获取sql1_text中全部数据,此数据为嵌套元组数据(元组列表)
    #获取数据表1中的数据
    #sql2_text = "select * from valuation_model_res"
    now = datetime.now()
    # 格式化日期和时间
    format_today = now.strftime("%Y-%m-%d")
    sql2_text = "select * from public.valuation_model_res order by id desc limit 40"
    #sql2_text = "select vmr.* from public.valuation_model_res vmr,public.sad_item_basic_info sibi where vmr.uuid =sibi.uuid and sibi.inspect_date ='"+format_today+"'"
    cursor.execute(sql2_text) #执行SQL语句
    # 获取SELECT返回的元组
    data2 = cursor.fetchall()  # 获取sql2_text中全部数据
    #将获得的列名元组数据转换为列名列表数据
    columns_name = list(data1[0])[0].split(',')
    df1=pd.DataFrame(list(data2),columns=columns_name)
    columns_name
    del df1['id']
    del df1['uuid']
    del df1['item_no']
    del df1['cost_insurance_freight']
    del df1['free_on_board']
    #featrues data drift 不需要predict,target
    del df1['predict_price']
    del df1['declared_price']
    #df1.rename(columns={"predict_price": "prediction"}, inplace=True)
    #df1.rename(columns={"declared_price": "target"}, inplace=True)
    df1['threshold'] = df1['threshold'].astype(float)
    df1['ratio'] = df1['ratio'].astype(float)
    #df1['diffrence'] = df1['diffrence'].astype(float)
    #df1['prediction'] = df1['prediction'].astype(float)
    #df1['target'] = df1['target'].astype(float)
    #reference data
    sql3_text = "select * from public.valuation_model_reference"
    cursor.execute(sql3_text) #执行SQL语句
    data3 = cursor.fetchall()  # 获取sql3_text中全部数据
    #将获得的列名元组数据转换为列名列表数据
    columns_name = list(data1[0])[0].split(',')
    df2=pd.DataFrame(list(data3),columns=columns_name)
    columns_name
    del df2['id']
    del df2['uuid']
    del df2['item_no']
    del df2['cost_insurance_freight']
    del df2['free_on_board']
    #featrues data drift 不需要predict,target
    del df2['predict_price']
    del df2['declared_price']
    #df2.rename(columns={"predict_price": "prediction"}, inplace=True)
    #df2.rename(columns={"declared_price": "target"}, inplace=True)
    df2['threshold'] = df2['threshold'].astype(float)
    df2['ratio'] = df2['ratio'].astype(float)
    #df2['diffrence'] = df2['diffrence'].astype(float)
    #df2['prediction'] = df2['prediction'].astype(float)
    #df2['target'] = df2['target'].astype(float)

    cursor.close()  # 关闭游标
    conn.close()  # 关闭数据库连接--不需要使用数据库时,及时关闭数据库,可以减少所占内存
    data_columns = ColumnMapping()
    data_columns.numerical_features = [
        "mileage",
        #"target",
        #"prediction",
        "threshold",
        "ratio",
        #"diffrence",
        "flag"
    ]

    data_columns.categorical_features = ["maker", "country","drive","body_type","model","fuel"]

    context["ti"].xcom_push(key="data_frame", value=df1)
    context["ti"].xcom_push(key="data_frame_reference", value=df2)
    context["ti"].xcom_push(key="data_columns", value=data_columns)


def drift_analysis_execute(**context):
    data = context.get("ti").xcom_pull(key="data_frame")
    data_reference = context.get("ti").xcom_pull(key="data_frame_reference")
    data_columns = context.get("ti").xcom_pull(key="data_columns")

    data_drift_report = Report(metrics=[DataDriftPreset()])
    data_drift_report.run(reference_data=data_reference[:40], current_data=data[:40], column_mapping=data_columns)

    try:
        if os.path.exists(full_path):
            print('Current folder exists')
        else:
            print('Current folder not exists')
        #create file folder
        #os.mkdir(dir_path)
            os.makedirs(full_path)
            print("Creation of the directory {} succeed".format(full_path))
    except OSError:
        print("Creation of the directory {} failed".format(full_path))

    data_drift_report.save_html(os.path.join(full_path, file_path))
    


with DAG(
    dag_id="used_car_valuation_data_drift_report",
    schedule_interval="@daily",
    default_args={
        "owner": "airflow",
        "retries": 1,
        "retry_delay": timedelta(minutes=5),
        "start_date": datetime(2023, 10, 19),
    },
    catchup=False,
) as f:

    load_data_execute = PythonOperator(
        task_id="load_data_execute",
        python_callable=load_data_execute,
        provide_context=True,
        op_kwargs={"parameter_variable": "parameter_value"},  # not used now, may be used to specify data
    )

    drift_analysis_execute = PythonOperator(
        task_id="drift_analysis_execute",
        python_callable=drift_analysis_execute,
        provide_context=True,
    )

load_data_execute >> drift_analysis_execute
View Code

  报告样例:

   2>.Target Drift

  

try:
    import os

    from datetime import datetime
    from datetime import timedelta

    import psycopg2  #python用来操作postgreSQL数据库的第三方库   
    import pandas as pd

    import pandas as pd
    from airflow import DAG
    from airflow.operators.python_operator import PythonOperator
    from sklearn import datasets

    from evidently.metric_preset import TargetDriftPreset
    from evidently.pipeline.column_mapping import ColumnMapping
    from evidently.report import Report


except Exception as e:
    print("Error  {} ".format(e))

dir_path = "reports"
file_path = "used_car_valuation_target_drift_report.html"
project_name = "used_car_valuation"
#timstamp_area = "2023-10-10_2023-10-18"
# 获取当前日期和时间
now = datetime.now()
# 格式化日期和时间
format_today = now.strftime("%Y-%m-%d")
#yesterday
yesterday = now - timedelta(days=1)
format_yesterday =yesterday.strftime("%Y-%m-%d")
timstamp_area=format_yesterday+"_"+format_today
full_path = dir_path+'/'+project_name+'/reports/'+timstamp_area

def load_data_execute(**context):

    # 连接到一个给定的数据库
    conn = psycopg2.connect(database="radarSmartcustoms",user="radarSmartcustoms", password='', host="129.184.13.155", port='5433')
    cursor = conn.cursor() # 连接游标
    #获取数据表1中的列名
    sql1_text="""select string_agg(column_name,',') from information_schema.columns 
            where table_schema='public' and table_name='valuation_model_res'  
                    """
    cursor.execute(sql1_text)  #执行SQL语句
    # 获取SELECT返回的元组
    data1 = cursor.fetchall()  # 获取sql1_text中全部数据,此数据为嵌套元组数据(元组列表)
    #获取数据表1中的数据
    #sql2_text = "select * from valuation_model_res"
    now = datetime.now()
    # 格式化日期和时间
    format_today = now.strftime("%Y-%m-%d")
    sql2_text = "select * from public.valuation_model_res order by id desc limit 40"
    #sql2_text = "select vmr.* from public.valuation_model_res vmr,public.sad_item_basic_info sibi where vmr.uuid =sibi.uuid and sibi.inspect_date ='"+format_today+"'"
    cursor.execute(sql2_text) #执行SQL语句
    # 获取SELECT返回的元组
    data2 = cursor.fetchall()  # 获取sql2_text中全部数据
    #将获得的列名元组数据转换为列名列表数据
    columns_name = list(data1[0])[0].split(',')
    df1=pd.DataFrame(list(data2),columns=columns_name)
    columns_name
    del df1['id']
    del df1['uuid']
    del df1['item_no']
    del df1['cost_insurance_freight']
    del df1['free_on_board']
    df1.rename(columns={"predict_price": "prediction"}, inplace=True)
    df1.rename(columns={"declared_price": "target"}, inplace=True)
    df1['threshold'] = df1['threshold'].astype(float)
    df1['ratio'] = df1['ratio'].astype(float)
    df1['diffrence'] = df1['diffrence'].astype(float)
    df1['prediction'] = df1['prediction'].astype(float)
    df1['target'] = df1['target'].astype(float)
    #reference data
    sql3_text = "select * from public.valuation_model_reference"
    cursor.execute(sql3_text) #执行SQL语句
    data3 = cursor.fetchall()  # 获取sql3_text中全部数据
    #将获得的列名元组数据转换为列名列表数据
    columns_name = list(data1[0])[0].split(',')
    df2=pd.DataFrame(list(data3),columns=columns_name)
    columns_name
    del df2['id']
    del df2['uuid']
    del df2['item_no']
    del df2['cost_insurance_freight']
    del df2['free_on_board']
    df2.rename(columns={"predict_price": "prediction"}, inplace=True)
    df2.rename(columns={"declared_price": "target"}, inplace=True)
    df2['threshold'] = df2['threshold'].astype(float)
    df2['ratio'] = df2['ratio'].astype(float)
    df2['diffrence'] = df2['diffrence'].astype(float)
    df2['prediction'] = df2['prediction'].astype(float)
    df2['target'] = df2['target'].astype(float)
    cursor.close()  # 关闭游标
    conn.close()  # 关闭数据库连接--不需要使用数据库时,及时关闭数据库,可以减少所占内存
    data_columns = ColumnMapping()
    data_columns.numerical_features = [
        "mileage",
        "target",
        "prediction",
        "threshold",
        "ratio",
        "diffrence",
        "flag"
    ]

    data_columns.categorical_features = ["maker", "country","drive","body_type","model","fuel"]

    context["ti"].xcom_push(key="data_frame", value=df1)
    context["ti"].xcom_push(key="data_frame_reference", value=df2)
    context["ti"].xcom_push(key="data_columns", value=data_columns)


def drift_analysis_execute(**context):
    data = context.get("ti").xcom_pull(key="data_frame")
    data_reference = context.get("ti").xcom_pull(key="data_frame_reference")
    data_columns = context.get("ti").xcom_pull(key="data_columns")

    targer_drift_report = Report(metrics=[TargetDriftPreset()])
    targer_drift_report.run(reference_data=data_reference[:40], current_data=data[:40], column_mapping=data_columns)

    try:
        if os.path.exists(full_path):
            print('Current folder exists')
        else:
            print('Current folder not exists')
        #create file folder
        #os.mkdir(dir_path)
            os.makedirs(full_path)
            print("Creation of the directory {} succeed".format(full_path))
    except OSError:
        print("Creation of the directory {} failed".format(full_path))

    targer_drift_report.save_html(os.path.join(full_path, file_path))
    


with DAG(
    dag_id="used_car_valuation_target_drift_report",
    schedule_interval="@daily",
    default_args={
        "owner": "airflow",
        "retries": 1,
        "retry_delay": timedelta(minutes=5),
        "start_date": datetime(2023, 10, 19),
    },
    catchup=False,
) as f:

    load_data_execute = PythonOperator(
        task_id="load_data_execute",
        python_callable=load_data_execute,
        provide_context=True,
        op_kwargs={"parameter_variable": "parameter_value"},  # not used now, may be used to specify data
    )

    drift_analysis_execute = PythonOperator(
        task_id="drift_analysis_execute",
        python_callable=drift_analysis_execute,
        provide_context=True,
    )

load_data_execute >> drift_analysis_execute
View Code

  报告样例:

  

   3>.Regression performance

  

try:
    import os

    from datetime import datetime
    from datetime import timedelta

    import psycopg2  #python用来操作postgreSQL数据库的第三方库   
    import pandas as pd

    import pandas as pd
    from airflow import DAG
    from airflow.operators.python_operator import PythonOperator
    from sklearn import datasets

    from evidently.metric_preset import RegressionPreset
    from evidently.pipeline.column_mapping import ColumnMapping
    from evidently.report import Report


except Exception as e:
    print("Error  {} ".format(e))

dir_path = "reports"
file_path = "used_car_valuation_performance_report.html"
project_name = "used_car_valuation"
#timstamp_area = "2023-10-10_2023-10-18"
# 获取当前日期和时间
now = datetime.now()
# 格式化日期和时间
format_today = now.strftime("%Y-%m-%d")
#yesterday
yesterday = now - timedelta(days=1)
format_yesterday =yesterday.strftime("%Y-%m-%d")
timstamp_area=format_yesterday+"_"+format_today
full_path = dir_path+'/'+project_name+'/reports/'+timstamp_area

def load_data_execute(**context):

    # 连接到一个给定的数据库
    conn = psycopg2.connect(database="radarSmartcustoms",user="radarSmartcustoms", password='', host="129.184.13.155", port='5433')
    cursor = conn.cursor() # 连接游标
    #获取数据表1中的列名
    sql1_text="""select string_agg(column_name,',') from information_schema.columns 
            where table_schema='public' and table_name='valuation_model_res'  
                    """
    cursor.execute(sql1_text)  #执行SQL语句
    # 获取SELECT返回的元组
    data1 = cursor.fetchall()  # 获取sql1_text中全部数据,此数据为嵌套元组数据(元组列表)
    #获取数据表1中的数据
    #sql2_text = "select * from valuation_model_res"
    now = datetime.now()
    # 格式化日期和时间
    format_today = now.strftime("%Y-%m-%d")
    sql2_text = "select * from public.valuation_model_res order by id desc limit 40"
    #sql2_text = "select vmr.* from public.valuation_model_res vmr,public.sad_item_basic_info sibi where vmr.uuid =sibi.uuid and sibi.inspect_date ='"+format_today+"'"
    cursor.execute(sql2_text) #执行SQL语句
    # 获取SELECT返回的元组
    data2 = cursor.fetchall()  # 获取sql2_text中全部数据
    #将获得的列名元组数据转换为列名列表数据
    columns_name = list(data1[0])[0].split(',')
    df1=pd.DataFrame(list(data2),columns=columns_name)
    columns_name
    del df1['id']
    del df1['uuid']
    del df1['item_no']
    del df1['cost_insurance_freight']
    del df1['free_on_board']
    df1.rename(columns={"predict_price": "prediction"}, inplace=True)
    df1.rename(columns={"declared_price": "target"}, inplace=True)
    df1['threshold'] = df1['threshold'].astype(float)
    df1['ratio'] = df1['ratio'].astype(float)
    df1['diffrence'] = df1['diffrence'].astype(float)
    df1['prediction'] = df1['prediction'].astype(float)
    df1['target'] = df1['target'].astype(float)
    #reference data
    sql3_text = "select * from public.valuation_model_reference"
    cursor.execute(sql3_text) #执行SQL语句
    data3 = cursor.fetchall()  # 获取sql3_text中全部数据
    #将获得的列名元组数据转换为列名列表数据
    columns_name = list(data1[0])[0].split(',')
    df2=pd.DataFrame(list(data3),columns=columns_name)
    columns_name
    del df2['id']
    del df2['uuid']
    del df2['item_no']
    del df2['cost_insurance_freight']
    del df2['free_on_board']
    df2.rename(columns={"predict_price": "prediction"}, inplace=True)
    df2.rename(columns={"declared_price": "target"}, inplace=True)
    df2['threshold'] = df2['threshold'].astype(float)
    df2['ratio'] = df2['ratio'].astype(float)
    df2['diffrence'] = df2['diffrence'].astype(float)
    df2['prediction'] = df2['prediction'].astype(float)
    df2['target'] = df2['target'].astype(float)
    cursor.close()  # 关闭游标
    conn.close()  # 关闭数据库连接--不需要使用数据库时,及时关闭数据库,可以减少所占内存
    data_columns = ColumnMapping()
    data_columns.numerical_features = [
        "mileage",
        "target",
        "prediction",
        "threshold",
        "ratio",
        "diffrence",
        "flag"
    ]

    data_columns.categorical_features = ["maker", "country","drive","body_type","model","fuel"]

    context["ti"].xcom_push(key="data_frame", value=df1)
    context["ti"].xcom_push(key="data_frame_reference", value=df2)
    context["ti"].xcom_push(key="data_columns", value=data_columns)


def drift_analysis_execute(**context):
    data = context.get("ti").xcom_pull(key="data_frame")
    data_reference = context.get("ti").xcom_pull(key="data_frame_reference")
    data_columns = context.get("ti").xcom_pull(key="data_columns")

    performance_report = Report(metrics=[RegressionPreset()])
    performance_report.run(reference_data=data_reference[:40], current_data=data[:40], column_mapping=data_columns)

    try:
        if os.path.exists(full_path):
            print('Current folder exists')
        else:
            print('Current folder not exists')
        #create file folder
        #os.mkdir(dir_path)
            os.makedirs(full_path)
            print("Creation of the directory {} succeed".format(full_path))
    except OSError:
        print("Creation of the directory {} failed".format(full_path))

    performance_report.save_html(os.path.join(full_path, file_path))
    


with DAG(
    dag_id="used_car_valuation_performance_report",
    schedule_interval="@daily",
    default_args={
        "owner": "airflow",
        "retries": 1,
        "retry_delay": timedelta(minutes=5),
        "start_date": datetime(2023, 10, 19),
    },
    catchup=False,
) as f:

    load_data_execute = PythonOperator(
        task_id="load_data_execute",
        python_callable=load_data_execute,
        provide_context=True,
        op_kwargs={"parameter_variable": "parameter_value"},  # not used now, may be used to specify data
    )

    drift_analysis_execute = PythonOperator(
        task_id="drift_analysis_execute",
        python_callable=drift_analysis_execute,
        provide_context=True,
    )

load_data_execute >> drift_analysis_execute
View Code

  报告样例:

 

  关于三个报告的指标的详细解读,请参见官方文档:  https://docs.evidentlyai.com/presets/data-drift

 

写在最后,关于AI监控平台的研究,以及evidentyai的实践都是基于互联网资料的学习所得,分享给各位同学,供大家简单参考。

posted @ 2023-11-01 15:18  MasonZhang  阅读(120)  评论(0编辑  收藏  举报