环境配置笔记-作业调度系统的安装(CentOS7系统)- 配置队列

上一章稀里糊涂就安装完了作业系统,这章准备配置队列,完整无坑版的后续会出现在公众号中,欢迎关注:

 

 ===========================================================================

以上一章的报错作为开始:

$ qsub -W queue=normal zsleep.sh
qsub: submit error (Queue is not enabled MSG=queue is disabled: user fuyuan@ , queue normal)

============================================================================

qmgr -c "c q abyss"

qmgr -c "s q abyss queue_type=Execution"

qmgr -c "s q abyss enabled=true"

qmgr -c "s s default_queue=abyss"

--------------------

qmgr -c 'set queue abyss acl_user_enable = true'

qmgr -c 'set queue abyss acl_users = ???'

qmgr -c 'set queue abyss acl_users += ???'

-------------------------------

qmgr -c 'set queue abyss acl_host_enable = true'

qmgr -c 'set queue abyss acl_hosts = note8'

qmgr -c 'set queue abyss acl_hosts += note9'

---------------------

好了,提交任务ok,但是任务一直是Q状态,继续排错:

 修改时间:

ln -s ../usr/share/zoneinfo/Asia/Shanghai /etc/localtime

 

service pbs restart

ps -e |grep pbs_

kill

将所有测试队列删除,重新设置

qmgr -c "d q normal"

修改/var/spool/torque/server_priv/nodes

vim /etc/hosts

vi /var/spool/torque/server_priv/nodes

vim /etc/profile

 

qmgr -c "s q abyss started=true"

qmgr -c 'print server'

qmgr -c "s q abyss resources_default.nodes = 2"

qmgr -c "s q abyss resources_default.walltime = 1000:00:00"

不行啊?

=====================================================

 好了,大概回溯一下

1. 设置公钥,使节点间可以无密登录:

ssh-keygen -t rsa # 回车三次即可

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys # 每个节点都如此操作

将所有节点的公钥合并,实现无密登录

=====================================================

2. 修改配置文件/etc/profile

加入变量如下:

TORQUE=/usr/local
MAUI=/usr/local
if [ `id -u` -eq 0 ]; then
PATH=$TORQUE/bin:$TORQUE/sbin:$TORQUE/bin:$MAUI/sbin:$MAUI/bin:$PATH
else
PATH=$TORQUE/bin:$MAUI/bin:$PATH
fi
export PATH=/usr/local/bin:$PATH
export PATH=/usr/local/sbin:$PATH
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/bin:$PATH
export PATH=/usr/local/sbin:$PATH
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

### 上下两份是冗余的

========================================================

3. 添加一行

cat /etc/ld.so.conf
include ld.so.conf.d/*.conf
/usr/local/lib

并配置

/sbin/ldconfig /etc/ld.so.conf

========================================================

4. 新增账号

./torque.setup name

若已有pbs运行,ps -e | grep pbs | kill 后重试

========================================================

5. 开启pbs

for i in pbs_server pbs_sched pbs_mom trqauthd; do sudo service $i start;done

========================================================

6. nodes数据丢失,重新添加

vi /var/spool/torque/server_priv/nodes

$ cat /var/spool/torque/mom_priv/config
$pbsserver note9
$logevent 255

=========================================================

7. 重新开启pbs

for i in pbs_server pbs_sched pbs_mom trqauthd; do sudo service $i restart;done

=========================================================

8. 投任务

$ cat zsleep.sh
while [ 1 ];do
echo `date` >> zzz.test
sleep 5s
done

$ qsub zsleep.sh

 

 输出到了home

 

~]$ cat zzz.test
Thu Feb 25 21:41:26 CST 2021
Thu Feb 25 21:41:31 CST 2021
Thu Feb 25 21:41:36 CST 2021
Thu Feb 25 21:41:41 CST 2021
Thu Feb 25 21:41:46 CST 2021
Thu Feb 25 21:41:51 CST 2021
Thu Feb 25 21:41:56 CST 2021
Thu Feb 25 21:42:01 CST 2021
Thu Feb 25 21:42:06 CST 2021
Thu Feb 25 21:42:11 CST 2021
Thu Feb 25 21:42:16 CST 2021
Thu Feb 25 21:42:21 CST 2021

写一个脚本处理一下就好了。

==========================================

之后就是重新自己加一个队列了,不加也行,有默认的嘛。

774 qmgr -c 'print server'
775 qmgr -c "c q abyss"
776 qmgr -c "s q abyss queue_type=Execution"
777 qmgr -c "s q abyss enabled=true"
778 qmgr -c "s s default_queue=abyss"
779 qmgr -c "s q abyss started=true"
780 qmgr -c 'print server'
781 qmgr -c "s q abyss resources_default.nodes = 1"
782 qmgr -c 'print server'
783 qmgr -c 'd q batch'
784 qmgr -c 'print server'
785 qmgr -c "s q abyss resources_default.walltime = 1000:00:00"
786 qmgr -c 'print server'

完事,整了两天。

 

posted on 2021-02-25 19:13  Yuan-SW-F(abysw)  阅读(224)  评论(0编辑  收藏  举报

导航