How to use IBEX

sbatch

#!/bin/bash 
#SBATCH –-job-name=myjobname
#SBATCH --nodes=1 
#SBATCH --time=00:10:00
#SBATCH --mail-user=useremail@kaust.edu.sa 
#SBATCH --mail-type=ALL
#SBATCH --error=JobName.%J.err 
#SBATCH --output=JobName.%J.out

#SBATCH --partition=batch
#SBATCH --nodes=1
#SBATCH --gpus-per-node=v100:1
#SBATCH --cpus-per-gpu=12
#SBATCH --mem=64G

#Go to your working directory 
cd /my_working_dir/

#Module load the desired application if necessary 
module load module_name

#Edit below with the launching command: 
your_commands_goes_here

To submit a job

sbatch myjobscript.sh

To cancel a job

scancel jobid

To check the status of your jobs

squeue –u username

srun

srun allow you to use cluster just like in terminal on your local machine. This is very useful when you want to debug your code. srun is convenient to use, however it will stop run when you lose connection with ibex. You need tmux to protect the node. When you lose connection, you can use tmux to login back into the node.

You can also srun into your allocated node using: srun --jobid=yourjobid --pty bash. To do that, you have to use Sbatch at first to query for resources and start your training there. The srun is just used as a tube. After you srun into the node, you can check your mem usage using nvidia-smi, etc.

posted @ 2021-12-11 14:56  梦想家肾小球  阅读(37)  评论(0编辑  收藏  举报