02 2024 档案

kubernetes loadbalancer: metallb
摘要:https://zhuanlan.zhihu.com/p/617807098 阅读全文
posted @ 2024-02-19 11:47 zhenxia-jiuyou 阅读(4) 评论(0) 推荐(0) 编辑
run kubeflow/serving on a kubernetes cluster
摘要:1. build kubeflow/serving container image which contains serving_model [1] # run container tensorflow/serving, the image of this container # is the ba 阅读全文
posted @ 2024-02-18 22:14 zhenxia-jiuyou 阅读(7) 评论(0) 推荐(0) 编辑
tensorflow/serving: REST request
摘要:1. save the trained model # in module file of tfx component trainer def _apply_preprocessing(raw_features, tft_layer): transformed_features = tft_laye 阅读全文
posted @ 2024-02-18 14:45 zhenxia-jiuyou 阅读(10) 评论(0) 推荐(0) 编辑
tensorflow distributed training in tfx pipeline run by kubeflow
摘要:1. deploy worker, parameter server on kubernetes cluster 1.1 build container image of worker, parameter server $ git clone https://github.com/tensorfl 阅读全文
posted @ 2024-02-15 16:37 zhenxia-jiuyou 阅读(38) 评论(0) 推荐(0) 编辑
Debug: tf distribute strategy parameter server: tfx component trainer: model.save(): failed to connect to all addresses
摘要:[ERROR: tf distribute strategy parameter server: tfx component trainer: model.save(): failed to connect to all addresses] log of pod tfx-component-tra 阅读全文
posted @ 2024-02-15 00:01 zhenxia-jiuyou 阅读(70) 评论(0) 推荐(0) 编辑
Debug: tf distribute strategy parameter server: tfx component trainer: OutOfRangeError(), End of sequence
摘要:[ERROR: tf distribute strategy parameter server: tfx component trainer: OutOfRangeError(), Node: 'cond/IteratorGetNext' End of sequence] log of pod tf 阅读全文
posted @ 2024-02-14 23:39 zhenxia-jiuyou 阅读(24) 评论(0) 推荐(0) 编辑
Debug: tf distribute strategy parameter server: NOT_FOUND: No such file or directory
摘要:[ERROR: NOT_FOUND: /tfx/tfx_pv/pipelines/detect_anomolies_on_wafer_tfdv_schema/ImportExampleGen/examples/67/Split-train/data_tfrecord-00000-of-00001.g 阅读全文
posted @ 2024-02-14 17:58 zhenxia-jiuyou 阅读(19) 评论(0) 推荐(0) 编辑
Debug : kfp.Client().upload_pipeline(): Failed to start a transaction to create a new pipeline and a new pipeline version: dial tcp: lookup mysql on 10.96.0.10:53: no such host","
摘要:[ERROR: Failed to start a transaction to create a new pipeline and a new pipeline version: dial tcp: lookup mysql on 10.96.0.10:53: no such host","] > 阅读全文
posted @ 2024-02-14 17:29 zhenxia-jiuyou 阅读(29) 评论(0) 推荐(0) 编辑
Debug: tf distribute strategy parameter server: stuck at "INFO:tensorflow:ParameterServerStrategyV2 is now connecting to cluster
摘要:[ERROR: stuck at "INFO:tensorflow:ParameterServerStrategyV2 is now connecting to cluster with cluster_spec: ClusterSpec({'ps': ['dist-strat-example-ps 阅读全文
posted @ 2024-02-14 16:17 zhenxia-jiuyou 阅读(36) 评论(0) 推荐(0) 编辑
Debug: tf_ditribute_strategy_worker.yaml: unknown field "spec.template.spec.nodeAffinity"
摘要:[ERROR: unknown field "spec.template.spec.nodeAffinity"] (base) maye@maye-Inspiron-5547:~/github_repository/tensorflow_ecosystem/distribution_strategy 阅读全文
posted @ 2024-02-14 14:44 zhenxia-jiuyou 阅读(53) 评论(0) 推荐(0) 编辑
Debug: tf_ditribute_strategy_worker.yaml: resource mapping not found for name: "dist-strat-example-worker-0" namespace: "" from "maye_template.yaml": no matches for kind "Deployment" in version "v1"
摘要:[ERROR: resource mapping not found for name: "dist-strat-example-worker-0" namespace: "" from "maye_template.yaml": no matches for kind "Deployment" i 阅读全文
posted @ 2024-02-14 14:39 zhenxia-jiuyou 阅读(13) 评论(0) 推荐(0) 编辑
Debug: tf_distribute_strategy_worker.yaml: Exit Code: 132, and log of pod is empty.
摘要:[ERROR: Exit Code: 132, and log of pod is empty.] (base) maye@maye-Inspiron-5547:~/github_repository/tensorflow_ecosystem/distribution_strategy$ kubec 阅读全文
posted @ 2024-02-14 14:30 zhenxia-jiuyou 阅读(11) 评论(0) 推荐(0) 编辑
Debug: tf-distribute-strategy-worker: json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 182
摘要:[ERROR: json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 182] # in file pipeline.yaml - name: TF_CONFIG 阅读全文
posted @ 2024-02-14 11:53 zhenxia-jiuyou 阅读(17) 评论(0) 推荐(0) 编辑
Start a kubernetes cluster
摘要:1. check if service of container runtime -- containerd -- is running on all computers who want to join the kubernetes cluster. $ systemctl status cont 阅读全文
posted @ 2024-02-09 18:13 zhenxia-jiuyou 阅读(28) 评论(0) 推荐(0) 编辑
Mechanism of Machine Learning
摘要:what is Machine Learning? Terms: L : loss, expectation of differences between yinfer and ytrue. LD: loss of distribution, $L_D \over 阅读全文
posted @ 2024-02-06 18:23 zhenxia-jiuyou 阅读(5) 评论(0) 推荐(0) 编辑
Debug: javascript: while loop never exit due to state variable
摘要:1. Illustration In a javascript function, if a variable is not declared before assigning value, it is considered as a state variable of the function, 阅读全文
posted @ 2024-02-03 19:21 zhenxia-jiuyou 阅读(3) 评论(0) 推荐(0) 编辑
Run a tfx pipeline using kubeflow pipeline
摘要:1. what is kubeflow pipeline for tfx pipeline ? kubeflow pipeline is an ochetrator of tfx pipeline, which runs on a kubernetes cluster. LocalDagRuner 阅读全文
posted @ 2024-02-02 15:18 zhenxia-jiuyou 阅读(325) 评论(0) 推荐(0) 编辑
Deploy standalone kubeflow pipeline on a kubernetes cluster
摘要:1. Computer environment OS: ubuntu 20.04 kubenetes: 1.26.12 Attention: kubeflow pipeline 2.0 is compatible up to kubernetes v1.26. 2. Prepare file kub 阅读全文
posted @ 2024-02-02 13:42 zhenxia-jiuyou 阅读(137) 评论(0) 推荐(0) 编辑
ml-pipeline-ui of kubeflow pipeline
摘要:1. Creating a pipeline on ml-pipeline-ui webpage is saving the pipeline to database mlpipeline, delete a pipeline on ml-ppeline-ui webpage is deleting 阅读全文
posted @ 2024-02-02 13:39 zhenxia-jiuyou 阅读(14) 评论(0) 推荐(0) 编辑
nerdctl build -- command to build container image from docker file
摘要:1. Prerequisite of using nerdctl build buildctl needs to be installed and buildkitd needs to be running. 2. check if buildctl installed $ nerdctl vers 阅读全文
posted @ 2024-02-01 18:32 zhenxia-jiuyou 阅读(111) 评论(0) 推荐(0) 编辑
Solution of downloading from github.com too slow
摘要:githubusercontent和github加速镜像 加速地址一览 fastgit.org:https://doc.fastgit.org/ http://cnpmjs.org:https://github.com.cnpmjs.org/ http://gitclone.com:https:// 阅读全文
posted @ 2024-02-01 17:04 zhenxia-jiuyou 阅读(24) 评论(0) 推荐(0) 编辑
Install nerdctl -- cli of containerd, compatible with docker
摘要:1. Download binary package of nerdctl-full Attention: Downloading from https://github.com/containerd/nerdctl/releases may be very slow in china, and e 阅读全文
posted @ 2024-02-01 14:39 zhenxia-jiuyou 阅读(61) 评论(0) 推荐(0) 编辑

 
点击右上角即可分享
微信分享提示