环境配置笔记-作业调度系统的安装(CentOS7系统)- 配置队列
上一章稀里糊涂就安装完了作业系统,这章准备配置队列,完整无坑版的后续会出现在公众号中,欢迎关注:
===========================================================================
以上一章的报错作为开始:
$ qsub -W queue=normal zsleep.sh
qsub: submit error (Queue is not enabled MSG=queue is disabled: user fuyuan@ , queue normal)
============================================================================
qmgr -c "c q abyss"
qmgr -c "s q abyss queue_type=Execution"
qmgr -c "s q abyss enabled=true"
qmgr -c "s s default_queue=abyss"
--------------------
qmgr -c 'set queue abyss acl_user_enable = true'
qmgr -c 'set queue abyss acl_users = ???'
qmgr -c 'set queue abyss acl_users += ???'
-------------------------------
qmgr -c 'set queue abyss acl_host_enable = true'
qmgr -c 'set queue abyss acl_hosts = note8'
qmgr -c 'set queue abyss acl_hosts += note9'
---------------------
好了,提交任务ok,但是任务一直是Q状态,继续排错:
修改时间:
ln -s ../usr/share/zoneinfo/Asia/Shanghai /etc/localtime
service pbs restart
ps -e |grep pbs_
kill
将所有测试队列删除,重新设置
qmgr -c "d q normal"
修改/var/spool/torque/server_priv/nodes
vim /etc/hosts
vi /var/spool/torque/server_priv/nodes
vim /etc/profile
qmgr -c "s q abyss started=true"
qmgr -c 'print server'
qmgr -c "s q abyss resources_default.nodes = 2"
qmgr -c "s q abyss resources_default.walltime = 1000:00:00"
不行啊?
=====================================================
好了,大概回溯一下
1. 设置公钥,使节点间可以无密登录:
ssh-keygen -t rsa # 回车三次即可
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys # 每个节点都如此操作
将所有节点的公钥合并,实现无密登录
=====================================================
2. 修改配置文件/etc/profile
加入变量如下:
TORQUE=/usr/local
MAUI=/usr/local
if [ `id -u` -eq 0 ]; then
PATH=$TORQUE/bin:$TORQUE/sbin:$TORQUE/bin:$MAUI/sbin:$MAUI/bin:$PATH
else
PATH=$TORQUE/bin:$MAUI/bin:$PATH
fi
export PATH=/usr/local/bin:$PATH
export PATH=/usr/local/sbin:$PATH
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/bin:$PATH
export PATH=/usr/local/sbin:$PATH
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
### 上下两份是冗余的
========================================================
3. 添加一行
cat /etc/ld.so.conf
include ld.so.conf.d/*.conf
/usr/local/lib
并配置
/sbin/ldconfig /etc/ld.so.conf
========================================================
4. 新增账号
./torque.setup name
若已有pbs运行,ps -e | grep pbs | kill 后重试
========================================================
5. 开启pbs
for i in pbs_server pbs_sched pbs_mom trqauthd; do sudo service $i start;done
========================================================
6. nodes数据丢失,重新添加
vi /var/spool/torque/server_priv/nodes
$ cat /var/spool/torque/mom_priv/config
$pbsserver note9
$logevent 255
=========================================================
7. 重新开启pbs
for i in pbs_server pbs_sched pbs_mom trqauthd; do sudo service $i restart;done
=========================================================
8. 投任务
$ cat zsleep.sh
while [ 1 ];do
echo `date` >> zzz.test
sleep 5s
done
$ qsub zsleep.sh
输出到了home
~]$ cat zzz.test
Thu Feb 25 21:41:26 CST 2021
Thu Feb 25 21:41:31 CST 2021
Thu Feb 25 21:41:36 CST 2021
Thu Feb 25 21:41:41 CST 2021
Thu Feb 25 21:41:46 CST 2021
Thu Feb 25 21:41:51 CST 2021
Thu Feb 25 21:41:56 CST 2021
Thu Feb 25 21:42:01 CST 2021
Thu Feb 25 21:42:06 CST 2021
Thu Feb 25 21:42:11 CST 2021
Thu Feb 25 21:42:16 CST 2021
Thu Feb 25 21:42:21 CST 2021
写一个脚本处理一下就好了。
==========================================
之后就是重新自己加一个队列了,不加也行,有默认的嘛。
774 qmgr -c 'print server'
775 qmgr -c "c q abyss"
776 qmgr -c "s q abyss queue_type=Execution"
777 qmgr -c "s q abyss enabled=true"
778 qmgr -c "s s default_queue=abyss"
779 qmgr -c "s q abyss started=true"
780 qmgr -c 'print server'
781 qmgr -c "s q abyss resources_default.nodes = 1"
782 qmgr -c 'print server'
783 qmgr -c 'd q batch'
784 qmgr -c 'print server'
785 qmgr -c "s q abyss resources_default.walltime = 1000:00:00"
786 qmgr -c 'print server'
完事,整了两天。
posted on 2021-02-25 19:13 Yuan-SW-F(abysw) 阅读(224) 评论(0) 编辑 收藏 举报