K8s && K3s根据pod名称修改应用启动配置文件

K8s && K3s根据pod名称修改应用启动配置文件

TrusNas修改安装的应用K3s配置文件

环境:

linux Linux version 6.1.74-production+truenas (root@tnsbuilds01.tn.ixsystems.net) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #2 SMP PREEMPT_DYNAMIC Wed Feb 21 20:30:38 UTC 2024 TrueNas应用: TrueNAS-SCALE-23.10.2 K3s版本: Client Version: v1.26.6+k3s-e18037a7-dirty Kustomize Version: v4.5.7 Server Version: v1.26.6+k3s-e18037a7-dirty 时间: 2024629

问题概述:

​ 在使用TrueNas系统时发现通过应用市场安装的Immich照片管理系统的搜索功能和图像识别无法正常使用,经排查发现原因为immich的机器学习服务异常,通过分析日志得知,机器学习服务中使用的模型因网络问题无法正常自动下载,导致无法加载模型;故需要手动下载模型文件导入至机器学习服务中。

​ 经过踩坑发现,原本通过k3s kubectl cp将模型文件传入到pod容器中的方式会出现一旦Truenas重启或者K3s重启,那么因为POD重启,导致POD内的复制的模型就没了。故想通过POD内部挂载NAS中的NFS卷的方式,让POD使用NAS中持久化存储的模型文件,这样就不会因POD重启而导致模型文件消失;

实施方案:

方案概述:

​ 通过POD名称获取到应用的控制器配置文件,然后将控制器配置文件导出到本地进行修改,可以自行修改内容包括但不限于数据卷,本文教程为修改数据卷,其他修改内容请自行查询修改参数;修改成功后应用配置文件;

实施过程:

注:这里的实施过程以问题描述中的案例进行实施,也就是修改immich机器学习POD的K3s启动配置参数
  1. 获取需要修改的POD的控制器名称

    1. 切换到root用户

      root@truenas[~]# sudo -i
    2. 查询控制器名称

    root@truenas[~]# k3s kubectl get deployments -n ix-immich NAME READY UP-TO-DATE AVAILABLE AGE immich-postgres 1/1 1 1 3d23h immich-redis 1/1 1 1 3d23h immich 1/1 1 1 3d23h immich-machinelearning 1/1 1 1 3d23h #此项为当前案例要找的控制器名称
  2. 导出控制器配置文件到本地

root@truenas[~]# k3s kubectl get deployment immich-machinelearning -n ix-immich -o yaml > deployment.yaml root@truenas[~]# ls deployment.yaml my-deployment.yaml samba tdb
  1. 修改配置文件中的volumes
apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "3" meta.helm.sh/release-name: immich meta.helm.sh/release-namespace: ix-immich creationTimestamp: "2024-06-25T12:27:28Z" generation: 4 labels: app: immich-4.0.3 app.kubernetes.io/instance: immich app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: immich app.kubernetes.io/version: 1.106.4 helm-revision: "3" helm.sh/chart: immich-4.0.3 release: immich name: immich-machinelearning namespace: ix-immich resourceVersion: "1968579" uid: 78bb9425-18a8-4be9-b033-f0252ad46737 spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 3 selector: matchLabels: app.kubernetes.io/instance: immich app.kubernetes.io/name: immich pod.name: machinelearning strategy: type: Recreate template: metadata: annotations: rollme: eKH7i creationTimestamp: null labels: app: immich-4.0.3 app.kubernetes.io/instance: immich app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: immich app.kubernetes.io/version: 1.106.4 helm-revision: "3" helm.sh/chart: immich-4.0.3 pod.name: machinelearning release: immich spec: automountServiceAccountToken: false containers: - env: - name: TZ value: Asia/Shanghai - name: UMASK value: "002" - name: UMASK_SET value: "002" - name: NVIDIA_DRIVER_CAPABILITIES value: all - name: PUID value: "568" - name: USER_ID value: "568" - name: UID value: "568" - name: PGID value: "568" - name: GROUP_ID value: "568" - name: GID value: "568" envFrom: - configMapRef: name: immich-ml-config image: altran1502/immich-machine-learning:v1.106.4 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 5 httpGet: path: /ping port: 32002 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 name: immich ports: - containerPort: 32002 name: machinelearning protocol: TCP readinessProbe: failureThreshold: 5 httpGet: path: /ping port: 32002 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 2 timeoutSeconds: 5 resources: limits: cpu: "2" memory: 4Gi nvidia.com/gpu: "1" requests: cpu: 10m memory: 50Mi securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: false runAsGroup: 0 runAsNonRoot: false runAsUser: 0 seccompProfile: type: RuntimeDefault startupProbe: failureThreshold: 60 httpGet: path: /ping port: 32002 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 5 successThreshold: 1 timeoutSeconds: 2 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /mlcache name: mlcache dnsConfig: options: - name: ndots value: "2" dnsPolicy: ClusterFirst enableServiceLinks: false initContainers: - command: - /bin/ash - -c - |- echo "Pinging [http://immich:30041/api/server-info/ping] until it is ready..." until wget --spider --quiet --timeout=3 --tries=1 "http://immich:30041/api/server-info/ping"; do echo "Waiting for [http://immich:30041/api/server-info/ping] to be ready..." sleep 2 done echo "URL [http://immich:30041/api/server-info/ping] is ready!" env: - name: TZ value: Asia/Shanghai - name: UMASK value: "002" - name: UMASK_SET value: "002" - name: NVIDIA_VISIBLE_DEVICES value: void - name: S6_READ_ONLY_ROOT value: "1" image: bash:4.4.23 imagePullPolicy: IfNotPresent name: immich-init-wait-url resources: limits: cpu: "2" memory: 4Gi requests: cpu: 10m memory: 50Mi securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true runAsGroup: 568 runAsNonRoot: true runAsUser: 568 seccompProfile: type: RuntimeDefault terminationMessagePath: /dev/termination-log terminationMessagePolicy: File restartPolicy: Always runtimeClassName: nvidia schedulerName: default-scheduler securityContext: fsGroup: 568 fsGroupChangePolicy: OnRootMismatch supplementalGroups: - 44 - 107 serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 30 volumes: # - emptyDir: {} # name: mlcache #将上面的这个mlcache卷配置修改为下面的配置 - hostPath: path: /mnt/ssd/data/model-cache #NFS模型文件路径 type: "" name: mlcache # 这里的name要和上面的volumeMounts配置里的name对应上 status: availableReplicas: 1 conditions: - lastTransitionTime: "2024-06-25T12:27:28Z" lastUpdateTime: "2024-06-29T09:11:02Z" message: ReplicaSet "immich-machinelearning-754d78cf48" has successfully progressed. reason: NewReplicaSetAvailable status: "True" type: Progressing - lastTransitionTime: "2024-06-29T09:49:17Z" lastUpdateTime: "2024-06-29T09:49:17Z" message: Deployment has minimum availability. reason: MinimumReplicasAvailable status: "True" type: Available observedGeneration: 4 readyReplicas: 1 replicas: 1 updatedReplicas: 1
  1. 应用配置文件

    root@truenas[~]# k3s kubectl apply -f deployment.yaml
  2. 测试是否成功

    root@immich-machinelearning-86d4f98657-dtbr6:/mlcache# ls clip facial-recognition image-classification models--M-CLIP--XLM-Roberta-Large-Vit-B-16Plus models--google--vit-base-patch16-224 models--immich-app--XLM-Roberta-Large-Vit-B-16Plus version.txt

​ 可以看到,NAS中的模型已经挂载到容器内部

命令合集:

# 切换为root用户 sudo -i # 获取控制器名称 -n 后面的是命名空间,这里我的是ix-immich,其他的根据实际情况调整 k3s kubectl get deployments -n ix-immich # 应用修改后的配置文件 k3s kubectl apply -f deployment.yaml

鸣谢

​ 感谢您花时间浏览我的文章,如果对您有帮助,随手赞一下~~


__EOF__

本文作者CodeHai
本文链接https://www.cnblogs.com/codeHai/p/18275672.html
关于博主:评论和私信会在第一时间回复。或者直接私信我。
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
声援博主:如果您觉得文章对您有帮助,可以点击文章右下角推荐一下。您的鼓励是博主的最大动力!
posted @   codeHi  阅读(135)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· 阿里巴巴 QwQ-32B真的超越了 DeepSeek R-1吗?
· 【译】Visual Studio 中新的强大生产力特性
· 【设计模式】告别冗长if-else语句:使用策略模式优化代码结构
· 10年+ .NET Coder 心语 ── 封装的思维:从隐藏、稳定开始理解其本质意义
历史上的今天:
2021-06-29 Docker环境下跑Ubuntu环境的构建
点击右上角即可分享
微信分享提示