K8s && K3s根据pod名称修改应用启动配置文件
K8s && K3s根据pod名称修改应用启动配置文件
TrusNas修改安装的应用K3s配置文件
环境:
linux:
Linux version 6.1.74-production+truenas (root@tnsbuilds01.tn.ixsystems.net) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #2 SMP PREEMPT_DYNAMIC Wed Feb 21 20:30:38 UTC 2024
TrueNas应用:
TrueNAS-SCALE-23.10.2
K3s版本:
Client Version: v1.26.6+k3s-e18037a7-dirty
Kustomize Version: v4.5.7
Server Version: v1.26.6+k3s-e18037a7-dirty
时间:
2024年6月29日
问题概述:
在使用TrueNas系统时发现通过应用市场安装的Immich照片管理系统的搜索功能和图像识别无法正常使用,经排查发现原因为immich的机器学习服务异常,通过分析日志得知,机器学习服务中使用的模型因网络问题无法正常自动下载,导致无法加载模型;故需要手动下载模型文件导入至机器学习服务中。
经过踩坑发现,原本通过k3s kubectl cp将模型文件传入到pod容器中的方式会出现一旦Truenas重启或者K3s重启,那么因为POD重启,导致POD内的复制的模型就没了。故想通过POD内部挂载NAS中的NFS卷的方式,让POD使用NAS中持久化存储的模型文件,这样就不会因POD重启而导致模型文件消失;
实施方案:
方案概述:
通过POD名称获取到应用的控制器配置文件,然后将控制器配置文件导出到本地进行修改,可以自行修改内容包括但不限于数据卷,本文教程为修改数据卷,其他修改内容请自行查询修改参数;修改成功后应用配置文件;
实施过程:
注:这里的实施过程以问题描述中的案例进行实施,也就是修改immich机器学习POD的K3s启动配置参数
-
获取需要修改的POD的控制器名称
-
切换到root用户
root@truenas[~]# sudo -i -
查询控制器名称
root@truenas[~]# k3s kubectl get deployments -n ix-immich NAME READY UP-TO-DATE AVAILABLE AGE immich-postgres 1/1 1 1 3d23h immich-redis 1/1 1 1 3d23h immich 1/1 1 1 3d23h immich-machinelearning 1/1 1 1 3d23h #此项为当前案例要找的控制器名称 -
-
导出控制器配置文件到本地
root@truenas[~]# k3s kubectl get deployment immich-machinelearning -n ix-immich -o yaml > deployment.yaml
root@truenas[~]# ls
deployment.yaml my-deployment.yaml samba tdb
- 修改配置文件中的volumes
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "3"
meta.helm.sh/release-name: immich
meta.helm.sh/release-namespace: ix-immich
creationTimestamp: "2024-06-25T12:27:28Z"
generation: 4
labels:
app: immich-4.0.3
app.kubernetes.io/instance: immich
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: immich
app.kubernetes.io/version: 1.106.4
helm-revision: "3"
helm.sh/chart: immich-4.0.3
release: immich
name: immich-machinelearning
namespace: ix-immich
resourceVersion: "1968579"
uid: 78bb9425-18a8-4be9-b033-f0252ad46737
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 3
selector:
matchLabels:
app.kubernetes.io/instance: immich
app.kubernetes.io/name: immich
pod.name: machinelearning
strategy:
type: Recreate
template:
metadata:
annotations:
rollme: eKH7i
creationTimestamp: null
labels:
app: immich-4.0.3
app.kubernetes.io/instance: immich
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: immich
app.kubernetes.io/version: 1.106.4
helm-revision: "3"
helm.sh/chart: immich-4.0.3
pod.name: machinelearning
release: immich
spec:
automountServiceAccountToken: false
containers:
- env:
- name: TZ
value: Asia/Shanghai
- name: UMASK
value: "002"
- name: UMASK_SET
value: "002"
- name: NVIDIA_DRIVER_CAPABILITIES
value: all
- name: PUID
value: "568"
- name: USER_ID
value: "568"
- name: UID
value: "568"
- name: PGID
value: "568"
- name: GROUP_ID
value: "568"
- name: GID
value: "568"
envFrom:
- configMapRef:
name: immich-ml-config
image: altran1502/immich-machine-learning:v1.106.4
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 5
httpGet:
path: /ping
port: 32002
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: immich
ports:
- containerPort: 32002
name: machinelearning
protocol: TCP
readinessProbe:
failureThreshold: 5
httpGet:
path: /ping
port: 32002
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 2
timeoutSeconds: 5
resources:
limits:
cpu: "2"
memory: 4Gi
nvidia.com/gpu: "1"
requests:
cpu: 10m
memory: 50Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
readOnlyRootFilesystem: false
runAsGroup: 0
runAsNonRoot: false
runAsUser: 0
seccompProfile:
type: RuntimeDefault
startupProbe:
failureThreshold: 60
httpGet:
path: /ping
port: 32002
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 2
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /mlcache
name: mlcache
dnsConfig:
options:
- name: ndots
value: "2"
dnsPolicy: ClusterFirst
enableServiceLinks: false
initContainers:
- command:
- /bin/ash
- -c
- |-
echo "Pinging [http://immich:30041/api/server-info/ping] until it is ready..."
until wget --spider --quiet --timeout=3 --tries=1 "http://immich:30041/api/server-info/ping"; do
echo "Waiting for [http://immich:30041/api/server-info/ping] to be ready..."
sleep 2
done
echo "URL [http://immich:30041/api/server-info/ping] is ready!"
env:
- name: TZ
value: Asia/Shanghai
- name: UMASK
value: "002"
- name: UMASK_SET
value: "002"
- name: NVIDIA_VISIBLE_DEVICES
value: void
- name: S6_READ_ONLY_ROOT
value: "1"
image: bash:4.4.23
imagePullPolicy: IfNotPresent
name: immich-init-wait-url
resources:
limits:
cpu: "2"
memory: 4Gi
requests:
cpu: 10m
memory: 50Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
readOnlyRootFilesystem: true
runAsGroup: 568
runAsNonRoot: true
runAsUser: 568
seccompProfile:
type: RuntimeDefault
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
restartPolicy: Always
runtimeClassName: nvidia
schedulerName: default-scheduler
securityContext:
fsGroup: 568
fsGroupChangePolicy: OnRootMismatch
supplementalGroups:
- 44
- 107
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
volumes:
# - emptyDir: {}
# name: mlcache
#将上面的这个mlcache卷配置修改为下面的配置
- hostPath:
path: /mnt/ssd/data/model-cache #NFS模型文件路径
type: ""
name: mlcache # 这里的name要和上面的volumeMounts配置里的name对应上
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2024-06-25T12:27:28Z"
lastUpdateTime: "2024-06-29T09:11:02Z"
message: ReplicaSet "immich-machinelearning-754d78cf48" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2024-06-29T09:49:17Z"
lastUpdateTime: "2024-06-29T09:49:17Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 4
readyReplicas: 1
replicas: 1
updatedReplicas: 1
-
应用配置文件
root@truenas[~]# k3s kubectl apply -f deployment.yaml -
测试是否成功
root@immich-machinelearning-86d4f98657-dtbr6:/mlcache# ls clip facial-recognition image-classification models--M-CLIP--XLM-Roberta-Large-Vit-B-16Plus models--google--vit-base-patch16-224 models--immich-app--XLM-Roberta-Large-Vit-B-16Plus version.txt
可以看到,NAS中的模型已经挂载到容器内部
命令合集:
# 切换为root用户
sudo -i
# 获取控制器名称 -n 后面的是命名空间,这里我的是ix-immich,其他的根据实际情况调整
k3s kubectl get deployments -n ix-immich
# 应用修改后的配置文件
k3s kubectl apply -f deployment.yaml
鸣谢
感谢您花时间浏览我的文章,如果对您有帮助,随手赞一下~~
__EOF__

本文作者:CodeHai
本文链接:https://www.cnblogs.com/codeHai/p/18275672.html
关于博主:评论和私信会在第一时间回复。或者直接私信我。
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
声援博主:如果您觉得文章对您有帮助,可以点击文章右下角【推荐】一下。您的鼓励是博主的最大动力!
本文链接:https://www.cnblogs.com/codeHai/p/18275672.html
关于博主:评论和私信会在第一时间回复。或者直接私信我。
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
声援博主:如果您觉得文章对您有帮助,可以点击文章右下角【推荐】一下。您的鼓励是博主的最大动力!
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· 阿里巴巴 QwQ-32B真的超越了 DeepSeek R-1吗?
· 【译】Visual Studio 中新的强大生产力特性
· 【设计模式】告别冗长if-else语句:使用策略模式优化代码结构
· 10年+ .NET Coder 心语 ── 封装的思维:从隐藏、稳定开始理解其本质意义
2021-06-29 Docker环境下跑Ubuntu环境的构建