[ERROR: kfp.Client().upload_pipeline(...) stuck, not return]
kfp.Client().upload_pipeline("/home/maye/pipeline.yaml", "wafer_test", "wafer test pipeline")
Traceback (most recent call last):
File "
File "/home/maye/anaconda3/lib/python3.9/site-packages/kfp/_client.py", line 1232, in upload_pipeline
response = self._upload_api.upload_pipeline(
File "/home/maye/anaconda3/lib/python3.9/site-packages/kfp_server_api/api/pipeline_upload_service_api.py", line 69, in upload_pipeline
return self.upload_pipeline_with_http_info(uploadfile, **kwargs) # noqa: E501
File "/home/maye/anaconda3/lib/python3.9/site-packages/kfp_server_api/api/pipeline_upload_service_api.py", line 163, in upload_pipeline_with_http_info
return self.api_client.call_api(
File "/home/maye/anaconda3/lib/python3.9/site-packages/kfp_server_api/api_client.py", line 364, in call_api
return self.__call_api(resource_path, method,
File "/home/maye/anaconda3/lib/python3.9/site-packages/kfp_server_api/api_client.py", line 181, in __call_api
response_data = self.request(
File "/home/maye/anaconda3/lib/python3.9/site-packages/kfp_server_api/api_client.py", line 407, in request
return self.rest_client.POST(url,
File "/home/maye/anaconda3/lib/python3.9/site-packages/kfp_server_api/rest.py", line 265, in POST
return self.request("POST", url,
File "/home/maye/anaconda3/lib/python3.9/site-packages/kfp_server_api/rest.py", line 182, in request
r = self.pool_manager.request(
File "/home/maye/anaconda3/lib/python3.9/site-packages/urllib3/request.py", line 78, in request
return self.request_encode_body(
File "/home/maye/anaconda3/lib/python3.9/site-packages/urllib3/request.py", line 170, in request_encode_body
return self.urlopen(method, url, **extra_kw)
File "/home/maye/anaconda3/lib/python3.9/site-packages/urllib3/poolmanager.py", line 376, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/home/maye/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/home/maye/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "
File "/home/maye/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/home/maye/anaconda3/lib/python3.9/http/client.py", line 1377, in getresponse
response.begin()
File "/home/maye/anaconda3/lib/python3.9/http/client.py", line 320, in begin
version, status, reason = self._read_status()
File "/home/maye/anaconda3/lib/python3.9/http/client.py", line 281, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/home/maye/anaconda3/lib/python3.9/socket.py", line 704, in readinto
return self._sock.recv_into(b)
File "/home/maye/anaconda3/lib/python3.9/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/home/maye/anaconda3/lib/python3.9/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
KeyboardInterrupt
On ml-pipeline-ui upload pipeline webpage, after clicking button "Create", icon keeps circling. this is the same thing with the upper one, just using different user interfaces.
And in database mlpipeline, no record of the uploaded pipeline.
$ top
-->
%cpu of api-server (namely container api-server in pod ml-pipeline) is very high, in my case, > 200% (my laptop is 4 core cpu) .
[ANALYSIS]
1. ml-pipeline do things of creating a pipeline, delete a pipeline, create a run, so check its log, to see anything wrong.
kubectl logs ml-pipeline-pod-name -n kubeflow
-->
$ kubectl logs ml-pipeline-66ffb6cbfc-vrwvd -n kubeflow
I0117 13:56:36.924957 7 client_manager.go:170] Initializing client manager
I0117 13:56:36.925074 7 config.go:57] Config DBConfig.MySQLConfig.ExtraParams not specified, skipping
I0117 13:56:49.981373 7 client_manager.go:503] We already own mlpipeline
I0117 13:56:49.981735 7 swf.go:64] (Expected when in cluster) Failed to create scheduled workflow client by out of cluster kubeconfig. Error: stat /root/.kube/config: no such file or directory
I0117 13:56:49.981748 7 swf.go:66] Starting to create scheduled workflow client by in cluster config.
I0117 13:56:49.982220 7 client_manager.go:214] Client manager initialized successfully
I0117 13:56:49.982579 7 main.go:221] Samples already loaded in the past. Skip loading.
I0117 13:56:49.982833 7 resource_manager.go:1388] Default experiment already exists! ID: c56b3c01-45c9-46b4-93d8-ab1e93487711
I0117 13:56:49.982852 7 main.go:143] Starting Http Proxy
I0117 13:56:49.982973 7 main.go:96] Starting RPC server
I0117 14:08:01.122437 7 interceptor.go:29] /kubeflow.pipelines.backend.api.v2beta1.PipelineService/ListPipelines handler starting
I0117 14:08:01.132516 7 interceptor.go:37] /kubeflow.pipelines.backend.api.v2beta1.PipelineService/ListPipelines handler finished
I0117 14:12:45.017460 7 pipeline_upload_server.go:93] Upload pipeline called
(base) maye@maye-Inspiron-5547:~/Documents/kubernetes_install/kubeflow_pipeline_deployment/pipelines/manifests/kustomize$
Log of ml-pipeline stopped at "I0117 14:12:45.017460 7 pipeline_upload_server.go:93] Upload pipeline called" without any operation after kfp.Client().upload_pipeline(), no more logs.
2. function call chain of uploadPipeline()
"""
standalone kubeflow pipeline github repository pipelines/backend/src/server/pipeline_upload_server.go
"""
func (s *PipelineUploadServer) uploadPipeline(api_version string, w http.ResponseWriter, r *http.Request) {
if s.options.CollectMetrics {
uploadPipelineRequests.Inc()
uploadPipelineVersionRequests.Inc()
}
glog.Infof("Upload pipeline called")
file, header, err := r.FormFile(FormFileKey)
if err != nil {
s.writeErrorToResponse(w, http.StatusBadRequest, util.Wrap(err, "Failed to read pipeline from file"))
return
}
defer file.Close()
pipelineFile, err := ReadPipelineFile(header.Filename, file, common.MaxFileLength)
if err != nil {
s.writeErrorToResponse(w, http.StatusBadRequest, util.Wrap(err, "Failed to read a pipeline spec file"))
return
}
pipelineNamespace := r.URL.Query().Get(NamespaceStringQuery)
pipelineNamespace = s.resourceManager.ReplaceNamespace(pipelineNamespace)
resourceAttributes := &authorizationv1.ResourceAttributes{
Namespace: pipelineNamespace,
Verb: common.RbacResourceVerbCreate,
}
err = s.canUploadVersionedPipeline(r, "", resourceAttributes)
if err != nil {
s.writeErrorToResponse(w, http.StatusBadRequest, util.Wrap(err, "Failed to create a pipeline due to authorization error"))
return
}
fileNameQueryString := r.URL.Query().Get(NameQueryStringKey)
pipelineName := buildPipelineName(fileNameQueryString, header.Filename)
pipeline := &model.Pipeline{
Name: pipelineName,
Description: r.URL.Query().Get(DescriptionQueryStringKey),
Namespace: pipelineNamespace,
}
pipelineVersion := &model.PipelineVersion{
Name: pipeline.Name,
Description: pipeline.Description,
PipelineSpec: string(pipelineFile),
}
newPipeline, newPipelineVersion, err := s.resourceManager.CreatePipelineAndPipelineVersion(pipeline, pipelineVersion) ## (1)
if err != nil {
s.writeErrorToResponse(w, http.StatusInternalServerError, util.Wrap(err, "Failed to create a pipeline and a pipeline version"))
return
}
if s.options.CollectMetrics {
pipelineVersionCount.Inc()
}
w.Header().Set("Content-Type", "application/json")
marshaler := &jsonpb.Marshaler{EnumsAsInts: false, OrigName: true}
if api_version == "v1beta1" {
err = marshaler.Marshal(w, toApiPipelineV1(newPipeline, newPipelineVersion))
} else if api_version == "v2beta1" {
err = marshaler.Marshal(w, toApiPipeline(newPipeline))
} else {
s.writeErrorToResponse(w, http.StatusInternalServerError, util.Wrap(err, "Failed to create a pipeline. Invalid API version"))
return
}
if err != nil {
s.writeErrorToResponse(w, http.StatusInternalServerError, util.Wrap(err, "Failed to create a pipeline due to error marshalling the pipeline"))
return
}
}
"""
standalone kubeflow pipeline github repository pipelines/backend/src/apiserver/resource/resource_manager.go
"""
func (r *ResourceManager) CreatePipelineAndPipelineVersion(p *model.Pipeline, pv *model.PipelineVersion) (*model.Pipeline, *model.PipelineVersion, error) {
// Fetch pipeline spec, verify it, and parse parameters
pipelineSpecBytes, pipelineSpecURI, err := r.fetchTemplateFromPipelineVersion(pv)
if err != nil {
return nil, nil, util.Wrap(err, "Failed to create a pipeline and a pipeline version as template is broken")
}
pv.PipelineSpec = string(pipelineSpecBytes)
if pipelineSpecURI != "" {
pv.PipelineSpecURI = pipelineSpecURI
}
tmpl, err := template.New(pipelineSpecBytes)
if err != nil {
return nil, nil, util.Wrap(err, "Failed to create a pipeline and a pipeline version due to template creation error")
}
// Validate pipeline's name in:
// 1. pipeline spec for v2 pipelines and v2-compatible pipeline must comply with MLMD requirements
// 2. display name must be non-empty
pipelineSpecName := ""
if tmpl.IsV2() {
pipelineSpecName = tmpl.V2PipelineName()
if err := common.ValidatePipelineName(pipelineSpecName); err != nil {
return nil, nil, err
}
}
if pv.Name == "" && p.Name == "" {
if pipelineSpecName == "" {
return nil, nil, util.NewInvalidInputError("pipeline's name cannot be empty")
}
pv.Name = pipelineSpecName
p.Name = pipelineSpecName
} else if pv.Name == "" {
pv.Name = p.Name
} else if p.Name == "" {
p.Name = pv.Name
}
// Parse parameters
paramsJSON, err := tmpl.ParametersJSON()
if err != nil {
return nil, nil, util.Wrap(err, "Failed to create a pipeline and a pipeline version due to error converting parameters to json")
}
pv.Parameters = paramsJSON
pv.PipelineSpec = string(tmpl.Bytes())
// Create records in KFP DB (both pipelines and pipeline_versions tables)
newPipeline, newVersion, err := r.pipelineStore.CreatePipelineAndPipelineVersion(p, pv) ## (2)
if err != nil {
return nil, nil, util.Wrap(err, "Failed to create a pipeline and a pipeline version")
}
// TODO(gkcalat): consider removing this after v2beta1 GA if we adopt storing PipelineSpec in DB.
// Store the pipeline file
err = r.objectStore.AddFile(tmpl.Bytes(), r.objectStore.GetPipelineKey(newVersion.UUID))
if err != nil {
return nil, nil, util.Wrap(err, "Failed to create a pipeline and a pipeline version due to error saving PipelineSpec to ObjectStore")
}
newPipeline.Status = model.PipelineReady
err = r.pipelineStore.UpdatePipelineStatus(
newPipeline.UUID,
newPipeline.Status,
)
if err != nil {
return nil, nil, util.Wrap(err, "Failed to update status of a new pipeline after creation")
}
newVersion.Status = model.PipelineVersionReady
err = r.pipelineStore.UpdatePipelineVersionStatus(
newVersion.UUID,
newVersion.Status,
)
if err != nil {
return nil, nil, util.Wrap(err, "Failed to update status of a new pipeline version after creation")
}
return newPipeline, newVersion, nil
}
"""
github.com/kubeflow/pipelines/backend/src/apiserver/resource/resource_manager.go
"""
func (r *ResourceManager) fetchTemplateFromPipelineVersion(pipelineVersion *model.PipelineVersion) ([]byte, string, error) {
if len(pipelineVersion.PipelineSpec) != 0 {
// Check pipeline spec string first
bytes := []byte(pipelineVersion.PipelineSpec)
return bytes, pipelineVersion.PipelineSpecURI, nil
} else {..}
"""
github.com/kubeflow/pipelines/backend/src/apiserver/template/template.go
"""
func New(bytes []byte) (Template, error) {
format := inferTemplateFormat(bytes)
switch format {
case V1:
return NewArgoTemplate(bytes)
case V2:
return NewV2SpecTemplate(bytes)
default:
return nil, util.NewInvalidInputErrorWithDetails(ErrorInvalidPipelineSpec, "unknown template format")
}
}
"""
github.com/kubeflow/pipelines/backend/src/apiserver/template/argo_template.go
"""
func NewArgoTemplate(bytes []byte) (*Argo, error) {
wf, err := ValidateWorkflow(bytes)
if err != nil {
return nil, err
}
return &Argo{wf}, nil
}
"""
github.com/kubeflow/pipelines/backend/src/apiserver/template/argo_template.go
"""
func ValidateWorkflow(template []byte) (*util.Workflow, error) {
var wf workflowapi.Workflow
err := yaml.Unmarshal(template, &wf)
if err != nil {
return nil, util.NewInvalidInputErrorWithDetails(err, "Failed to parse the workflow template")
}
if wf.APIVersion != argoVersion {
return nil, util.NewInvalidInputError("Unsupported argo version. Expected: %v. Received: %v", argoVersion, wf.APIVersion)
}
if wf.Kind != argoK8sResource {
return nil, util.NewInvalidInputError("Unexpected resource type. Expected: %v. Received: %v", argoK8sResource, wf.Kind)
}
_, err = validate.ValidateWorkflow(nil, nil, &wf, validate.ValidateOpts{
Lint: true,
IgnoreEntrypoint: true,
WorkflowTemplateValidation: false, // not used by kubeflow
})
if err != nil {
return nil, err
}
return util.NewWorkflow(&wf), nil
}
Since no record of the uploaded pipeline in database mlpipeline, the ERROR should occurs before:
// Create records in KFP DB (both pipelines and pipeline_versions tables)
newPipeline, newVersion, err := r.pipelineStore.CreatePipelineAndPipelineVersion(p, pv) ## (2)
3. fallback to pipeline.yaml which was ok, to see if it still ok.
Yes, still ok
4. add the modification in pipeline.yaml, to see if the ERROR occurs.
Yes, the ERROR occurs.
In my case, the modification is adding filed " volumes: ..." inside task.template:
# pipeline.yaml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: detect-anomolies-on-wafer-tfdv-schema-
annotations: {pipelines.kubeflow.org/kfp_sdk_version: 1.8.0, pipelines.kubeflow.org/pipeline_compilation_time: '2024-01-07T22:16:36.438482',
pipelines.kubeflow.org/pipeline_spec: '{"description": "Constructs a Kubeflow
pipeline.", "inputs": [{"default": "pipelines/detect_anomolies_on_wafer_tfdv_schema",
"name": "pipeline-root"}], "name": "detect_anomolies_on_wafer_tfdv_schema"}'}
labels: {pipelines.kubeflow.org/kfp_sdk_version: 1.8.0}
spec:
entrypoint: detect-anomolies-on-wafer-tfdv-schema
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- maye-inspiron-5547
templates:
- name: detect-anomolies-on-wafer-tfdv-schema
inputs:
parameters:
- {name: pipeline-root}
dag:
tasks:
- name: importexamplegen
template: importexamplegen
arguments:
parameters:
- {name: pipeline-root, value: '{{inputs.parameters.pipeline-root}}'}
- name: importexamplegen
container:
...
volumeMounts:
- mountPath: /maye/trainEvalData
name: wafer-data
volumes: ## grammar error, field volumes should be defined in workflow spec and
- name: wafer-data ## referenced in task, not defined in task.template.
hostPath:
path: /home/maye/trainEvalData
type: Directory
[SOLUTION]
First, delete the busy ml-pipeline pod, and then the ml-pipeline service will create a new ml-pipeline pod automatically.
kubectl delete pod ml-pipeline-pod-name -n kubeflow
This error is due to that there is grammar error in kubeflow pipeline definition file (e.g. pipeline.yaml),
For example:
- wrong field name: volumns, the right field name: volumes
- put resource definition at wrong place,
e.g. put resource volumes definition inside task.template,
the right one is to reference resource volumes inside task, and define resource volumes inside workflow spec, then define volumeMounts in task.template.container.
# pipeline.yaml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: detect-anomolies-on-wafer-tfdv-schema-
annotations: {pipelines.kubeflow.org/kfp_sdk_version: 1.8.0, pipelines.kubeflow.org/pipeline_compilation_time: '2024-01-07T22:16:36.438482',
pipelines.kubeflow.org/pipeline_spec: '{"description": "Constructs a Kubeflow
pipeline.", "inputs": [{"default": "pipelines/detect_anomolies_on_wafer_tfdv_schema",
"name": "pipeline-root"}], "name": "detect_anomolies_on_wafer_tfdv_schema"}'}
labels: {pipelines.kubeflow.org/kfp_sdk_version: 1.8.0}
spec:
entrypoint: detect-anomolies-on-wafer-tfdv-schema
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- maye-inspiron-5547
volumes:
- name: wafer-data
hostPath:
path: /home/maye/trainEvalData
type: Directory
templates:
- name: detect-anomolies-on-wafer-tfdv-schema
inputs:
parameters:
- {name: pipeline-root}
dag:
tasks:
- name: importexamplegen
template: importexamplegen
arguments:
parameters:
- {name: pipeline-root, value: '{{inputs.parameters.pipeline-root}}'}
volumes:
- name: wafer-data
- name: importexamplegen
container:
...
volumeMounts:
- mountPath: /maye/trainEvalData
name: wafer-data
- objects at the same level not align left.
# pipeline.yaml
...
- name: trainer
template: trainer
dependencies: [importexamplegen, transform]
arguments:
parameters:
- {name: pipeline-root, value: '{{inputs.parameters.pipeline-root}}'}
volumes: #### this object is same level with object arguments,
- name: tfx-pv #### they have not aligned left, wrong grammar.
...