[Kubeflow] 03 - Create One Pipeline
主要是对 Pipelines SDK 阅读笔记。
作为复习时, review: How to build a Kubeflow Pipeline
之前的笔记主要增加一些必要的背景知识,当前page乃更为重要的 Best Practice。
A pipeline is a description of an ML workflow, including all of the components that make up the steps in the workflow and how the components interact with each other.
安装 SDK
pip3 install kfp --upgrade
After successful installation, the command dsl-compile
should be available.
kfp.compiler
includes classes and methods for compiling pipeline Python DSL into a workflow yaml spec Methods in this package include, but are not limited to, the following:
-
kfp.compiler.Compiler.compile
compiles your Python DSL code into a single static configuration (in YAML format) that the Kubeflow Pipelines service can process. The Kubeflow Pipelines service converts the static configuration into a set of Kubernetes resources for execution.
接口服务
The Kubeflow Pipelines REST API is available at the same endpoint as the Kubeflow Pipelines user interface (UI).
The SDK client can send requests to this endpoint to upload pipelines, create pipeline runs, schedule recurring runs, and more.
-
Export Port/API
You can use kubectl port-forward to port forward the Kubernetes service locally to your laptop outside of the cluster:
# Change the namespace if you deployed Kubeflow Pipelines in a different namespace. $ kubectl port-forward svc/ml-pipeline-ui 3000:80 --namespace kubeflow
-
Access Kubeflow Pipelines from Jupyter notebook
An additional per namespace (profile) manifest is required,其实也是启动了一个Notebook 服务。
Deploying Kubeflow Pipelines
Ref: https://www.kubeflow.org/docs/components/pipelines/v1/installation/localcluster-deployment/
根据官网安装即可。
Build a Pipeline
This document provides an overview of pipeline concepts and best practices, and instructions describing how to build an ML pipeline.
Ref: 使用Kubeflow pipelines
相似代码:https://github.com/liuweibin6566396837/kubeflow-examples/tree/master/mnist_stage
mnist_stage$ tree . . ├── client.py # ----> 如何去调用cluster的api的呢? ├── load_data │ ├── build_image.sh │ ├── Dockerfile │ └── load_data.py # 从本地读取源数据 -> 特征变换 -> 分割训练集 -> 存储数据集到本地 ├── predict │ ├── build_image.sh │ ├── Dockerfile │ └── predict.py └── train ├── build_image.sh ├── Dockerfile └── train.py