Slurm 测试环境配置
Slurm 测试环境配置
1.机器规划
Host:
HPC_Slurm_Main:192.168.141.135
Clients:
HPC_Slurm_Client01:192.168.141.136
HPC_Slurm_Client02:192.168.141.137
HPC_Slurm_Client03:192.168.141.138
2.修改主机名 /etc/hosts, /etc/hostname
192.168.141.136 node1-nfs
192.168.141.137 node2-nfs
192.168.141.138 node3-nfs
192.168.141.135 control1-nfs
192.168.141.136 node1
192.168.141.137 node2
192.168.141.138 node3
192.168.141.135 control1
3.NFS部署
3.1 服务器端:sudo apt-get install nfs-kernel-server
cat /etc/exports /home/xxx/software *(insecure,rw,sync,no_root_squash)
/etc/init.d/nfs-kernel-server restart && systemctl enable nfs-kernel-server
验证:sudo exportfs -rv
3.2 客户端:sudo apt-get install nfs-common
a.客户端开机启动并挂载nfs: 编辑/etc/fstab文件添加如下内容:永久挂载software
control1-nfs:/software /software nfs defaults 0 0
(临时测试方案(不推荐):sudo mount -t nfs control1-nfs:/home/jose/software /home/jose//software)
b.客户端取消挂载:取消挂载:sudo umount /software
sudo reboot
4、Munge部署
1、useradd -m munge
2、apt install munge
Host:
dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key #在Master Node创建全局使用的密钥
chown munge: /etc/munge/munge.key
chmod 400 /etc/munge/munge.key
chown -R munge: /var/lib/munge
chown -R munge: /var/run/munge
chown -R munge: /var/log/munge
scp /etc/munge/munge.key jose@node1:/etc/munge/
scp /etc/munge/munge.key jose@node2:/etc/munge/
scp /etc/munge/munge.key jose@node3:/etc/munge/
systemctl start munge
systemctl enable munge
权限设置,很重要
sudo chmod 1775 /etc/munge
sudo chmod 0600 /etc/munge/munge.key
#如果munge.key的所有者不对,需要执行以下命令
sudo chown munge: /etc/munge/munge.key
Client:
sudo apt install rng-tools5
sudo rngd -r /dev/urandom
sudo chmod 700 /etc/munge
sudo chown -R munge: /etc/munge
sudo chown -R munge: /var/lib/munge
sudo chown -R munge: /var/run/munge
sudo chown -R munge: /var/log/munge
sudo systemctl start rngd
sudo systemctl start munge
sudo systemctl enable rngd
sudo systemctl enable munge
5、Slurm部署
Host:
sudo apt install slurm-wlm -y
sudo apt install slurmctld -y
sudo chmod +r /usr/share/doc/slurmctld/slurm-wlm-configurator.html
Client:
sudo apt install slurmd -y
sudo slurm -c
sudo slurm -D -s
Host:
python3 -m http.server
打开:http://192.168.141.135:8000/slurm-wlm-configurator.easy.html
将内容填入配置文件:/etc/slurm/slurm.conf
sudo mkdir /var/spool/slurmd
sudo mkdir /var/spool/slurmctld
# 启动 slurmd, 日志文件路径为 `/var/log/slurmd.log`
sudo systemctl start slurmd
# 启动 slurmctld, 日志文件路径为 `/var/log/slurmctld.log`
sudo systemctl start slurmctld
# 查看 slurmd 的状态
sudo systemctl status slurmd
# 查看 slurmctld 的状态
sudo systemctl status slurmctld
#ProctrackType=proctrack/cgroup 需要修改成 ProctrackType=proctrack/pgid
6、Slurm Mysql
sudo apt-get install mysql-server libmysqlclient-dev -y
在mysql中创建相应的用户
$ mysql -u root -p
create user 'slurm'@'localhost' identified by '2023@Slurm';
grant all on slurm_acct_db.* to 'slurm'@'localhost';
# scontrol update NodeName=<node> State=RESUME
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek “源神”启动!「GitHub 热点速览」
· 我与微信审核的“相爱相杀”看个人小程序副业
· 上周热点回顾(2.17-2.23)
· 如何使用 Uni-app 实现视频聊天(源码,支持安卓、iOS)
· C# 集成 DeepSeek 模型实现 AI 私有化(本地部署与 API 调用教程)