openlava配置计算节点
下为开源版本openlava的安装和部署,LSF(LSF Community Edition)有社区版可用
安装配置(安装目录:/opt/openlava-4.0/):
yum install tcl-devel # dependency
yum install ncurses-devel #dependency
# 解压openlava-4.0.tar.gz
tar -xzvf openlava-4.0.tar.gz
# 进入安装包
cd openlava-4.0
# 编译安装,默认安装位置/opt/openlava-4.0/
./configure
make
make install
# 创建openlava账户
useradd -r openlava
# 拷贝config到安装目录
cp -rf config/* /opt/openlava-4.0/etc/
# 配置环境变量,更改文件权限或属主(若有多个节点,每个节点上都需要配置!)
chown -R openlava:openlava /opt/openlava-4.0
cp -rf /opt/openlava-4.0/etc/openlava /etc/init.d/
cp -rf /opt/openlava-4.0/etc/openlava.* /etc/profile.d/
chmod 755 /etc/init.d/openlava
chmod 755 /etc/profile.d/openlava.*
chown -R openlava:openlava /etc/init.d/openlava
chown -R openlava:openlava /etc/profile.d/openlava.*
# 执行
chkconfig openlava on
以下是为集群添加负载节点配置实例
添加节点:
#每一个节点均要设置,并开启openlava service
# 创建openlava账户
useradd -r openlava
# 配置环境变量,更改文件权限和所属
cp -rf /public/openlava/openlava-4.0-releaseetc/openlava /etc/init.d/
cp -rf /public/openlava/openlava-4.0-releaseetc/openlava.* /etc/profile.d/
chown -R openlava:openlava /public/openlava/openlava-4.0-release
chmod 755 /etc/init.d/openlava
chmod 755 /etc/profile.d/openlava.*
chown -R openlava:openlava /etc/init.d/openlava
chown -R openlava:openlava /etc/profile.d/openlava.*
# 执行
chkconfig openlava on
#开启该节点openlava:
service openlava start
#导入环境变量
source /etc/profile.d/openlava.sh
#测试openlava服务
lsid, lshosts, bhosts查看状态是否ok
参考:https://www.geek-share.com/detail/2791760708.html
问题:
- Failed in an LSF library call: Failed in sending/receiving a message: Connection reset by peer
#Run: lsadmin reconfig badmin reconfig badmin mbdrestart
- 查看与切换节点状态(节点状态改变后,bhosts查看稍等下才会显示状态变化)
/etc/init.d/openlava status #查看节点状态 /etc/init.d/openlava stop #停止 /etc/init.d/openlava restart #重启
- 配置文件在openlava安装路径下的etc目录
- lsb.hosts:配置最大Jobs数
- MXJ可大于核数,0状态为closed
- lsf.cluster.openlava:主配置文件
- lsf.users用户配置文件(提交任务数限制等)
- 修改配置文件后需要运行:
#Run: badmin reconfig
- lsb.hosts:配置最大Jobs数
- batch system daemon not responding ... still trying batch system daemon not responding ... still trying
- 参考:https://www.ibm.com/support/pages/lsf-cluster-not-responding-because-too-many-interactive-or-block-mode-jobs
To enable and define the parameter, please follow these steps: 1) In lsf.conf, find LSB_NUM_NIOS_CALLBACK_THREADS=<n> 2) As "n" is the number of threads in the thread pool, you can start with value "4". 3) Restart mbatchd by "badmin mbdrestart"
- 参考:https://www.ibm.com/support/pages/lsf-cluster-not-responding-because-too-many-interactive-or-block-mode-jobs
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 分享一个免费、快速、无限量使用的满血 DeepSeek R1 模型,支持深度思考和联网搜索!
· 基于 Docker 搭建 FRP 内网穿透开源项目(很简单哒)
· 25岁的心里话
· ollama系列01:轻松3步本地部署deepseek,普通电脑可用
· 按钮权限的设计及实现