kedro 简单试用

主要是一个简单学习试用

环境准备

安装kedro

python -m venv venv

source venv/bin/activate

pip install kedro

minio s3 存储

为了方便测试使用了s3 进行数据存储,注意需要同时安装

version: "3"

services:

  minio: 

    image: minio/minio

    ports:

       - "9000:9000"

       - "9001:9001"

    command: server /data --console-address ":9001"

    environment:

    - MINIO_ACCESS_KEY=minio

    - MINIO_SECRET_KEY=minio123

初始化项目

可以通过new 以及starter 模式

快速模式

kedro new --name=spaceflights --tools=viz --example=y

项目结构

关于项目结构以及代码的说明后续介绍

./spaceflights

├── README.md

├── conf

│   ├── README.md

│   ├── base

│   │   ├── catalog.yml

│   │   ├── parameters.yml

│   │   ├── parameters_data_processing.yml

│   │   ├── parameters_data_science.yml

│   │   └── parameters_reporting.yml

│   └── local

│       └── credentials.yml

├── data

│   ├── 01_raw

│   │   ├── companies.csv

│   │   ├── reviews.csv

│   │   └── shuttles.xlsx

│   ├── 02_intermediate

│   ├── 03_primary

│   ├── 04_feature

│   ├── 05_model_input

│   ├── 06_models

│   ├── 07_model_output

│   └── 08_reporting

├── notebooks

├── pyproject.toml

├── requirements.txt

└── src

    └── spaceflights

        ├── __init__.py

        ├── __main__.py

        ├── pipeline_registry.py

        ├── pipelines

        └── settings.py

安装依赖

cd spaceflights

pip install -r requirements.txt

同时上传测试数据到s3 中如下（就是模版项目中的spaceflights/data/01_raw 里边的数据，注意还需要安装下s3fs pip install s3fs

修改data catalog 使用s3 格式 conf/base/catalog.yml

companies:

  type: pandas.CSVDataset

  filepath: s3://kedro/01_raw/companies.csv

  credentials: dev_s3

reviews:

  type: pandas.CSVDataset

  filepath: s3://kedro/01_raw/reviews.csv

  credentials: dev_s3

shuttles:

  type: pandas.ExcelDataset

  filepath: s3://kedro/01_raw/shuttles.xlsx

  load_args:

    engine: openpyxl

  credentials: dev_s3

preprocessed_companies:

  type: pandas.ParquetDataset

  filepath: s3://kedro/02_intermediate/preprocessed_companies.pq

  credentials: dev_s3

preprocessed_shuttles:

  type: pandas.ParquetDataset

  filepath: s3://kedro/02_intermediate/preprocessed_shuttles.pq

  credentials: dev_s3

model_input_table:

  type: pandas.ParquetDataset

  filepath: s3://kedro/03_primary/model_input_table.pq

  credentials: dev_s3

regressor:

  type: pickle.PickleDataset

  filepath: s3://kedro/06_models/regressor.pickle

  versioned: true

  credentials: dev_s3

metrics:

  type: tracking.MetricsDataset

  filepath: s3://kedro/09_tracking/metrics.json

  credentials: dev_s3

companies_columns:

  type: tracking.JSONDataset

  filepath: s3://kedro/09_tracking/companies_columns.json

  credentials: dev_s3

shuttle_passenger_capacity_plot_exp:

  type: plotly.PlotlyDataset

  filepath: s3://kedro/08_reporting/shuttle_passenger_capacity_plot_exp.json

  versioned: true

  credentials: dev_s3

  plotly_args:

    type: bar

    fig:

      x: shuttle_type

      y: passenger_capacity

      orientation: h

    layout:

      xaxis_title: Shuttles

      yaxis_title: Average passenger capacity

      title: Shuttle Passenger capacity
 
shuttle_passenger_capacity_plot_go:

  type: plotly.JSONDataset

  filepath: s3://kedro/08_reporting/shuttle_passenger_capacity_plot_go.json

  credentials: dev_s3

  versioned: true
 
dummy_confusion_matrix:

  type: matplotlib.MatplotlibWriter

  filepath: s3://kedro/08_reporting/dummy_confusion_matrix.png

  credentials: dev_s3

  versioned: true

配置访问认证信息 conf/local/credentials.yml

dev_s3:

    key: minio

    secret: minio123

    client_kwargs:

      endpoint_url : http://localhost:9000

运行

kedro run  -p data_processing  

效果

minio 数据

打包

kedro 会将我们的项目打包为标准的python whl 格式的包，包含了代码（pipeline ）以及配置部分，后续的使用只需要conf 以及data 目录，具体使用后续会介绍

kedro package

效果

说明

从实际体验kedro使用上还是比较方便的，同时包含了data process 以及 data_science 项目工程化也很不错，对于数据处理项目还是值得尝试下的

参考资料

https://docs.kedro.org/en/stable/get_started/install.html#installation-prerequisites
https://docs.kedro.org/en/stable/get_started/kedro_concepts.html
https://docs.kedro.org/en/stable/tutorial/package_a_project.html
https://github.com/kedro-org/kedro
https://github.com/kedro-org/kedro-plugins

posted on 2024-09-20 06:02 荣锋亮阅读(35) 评论(0) 编辑收藏举报

刷新页面返回顶部

rongfengliang-荣锋亮

kedro 简单试用

环境准备

初始化项目

说明

参考资料

导航

公告