避免重复折腾
ubuntu 22 gcc-8
sudo apt update
wget http://mirrors.kernel.org/ubuntu/pool/universe/g/gcc-8/gcc-8_8.4.0-3ubuntu2_amd64.deb
wget http://mirrors.edge.kernel.org/ubuntu/pool/universe/g/gcc-8/gcc-8-base_8.4.0-3ubuntu2_amd64.deb
wget http://mirrors.kernel.org/ubuntu/pool/universe/g/gcc-8/libgcc-8-dev_8.4.0-3ubuntu2_amd64.deb
wget http://mirrors.kernel.org/ubuntu/pool/universe/g/gcc-8/cpp-8_8.4.0-3ubuntu2_amd64.deb
wget http://mirrors.kernel.org/ubuntu/pool/universe/g/gcc-8/libmpx2_8.4.0-3ubuntu2_amd64.deb
wget http://mirrors.kernel.org/ubuntu/pool/main/i/isl/libisl22_0.22.1-1_amd64.deb
sudo apt install ./libisl22_0.22.1-1_amd64.deb ./libmpx2_8.4.0-3ubuntu2_amd64.deb ./cpp-8_8.4.0-3ubuntu2_amd64.deb ./libgcc-8-dev_8.4.0-3ubuntu2_amd64.deb ./gcc-8-base_8.4.0-3ubuntu2_amd64.deb ./gcc-8_8.4.0-3ubuntu2_amd64.deb
wget http://mirrors.kernel.org/ubuntu/pool/universe/g/gcc-8/libstdc++-8-dev_8.4.0-3ubuntu2_amd64.deb
wget http://mirrors.kernel.org/ubuntu/pool/universe/g/gcc-8/g++-8_8.4.0-3ubuntu2_amd64.deb
sudo apt install ./libstdc++-8-dev_8.4.0-3ubuntu2_amd64.deb ./g++-8_8.4.0-3ubuntu2_amd64.deb
大量小文件传输
sftp替代品
rsync -avz --progress -e ssh 用户名@主机:远程路径 本地路径
PIL加速
pip uninstall pillow
sudo apt-get install libjpeg-dev
sudo apt-get install zlib1g-dev
sudo apt-get install libpng-dev
pip install pillow-simd
HF镜像
export HF_ENDPOINT=https://hf-mirror.com
conda里升级g++
- 最新
conda install -c conda-forge gcc gxx
这里改=11.2也可以
- 5.4.0
conda install https://anaconda.org/brown-data-science/gcc/5.4.0/download/linux-64/gcc-5.4.0-0.tar.bz2
一种bias_act 能用的版本:
cuda=12.0 gcc=8.5.0 gxx=8.5.0
ZSH三板斧
sudo apt install zsh
wget https://gitee.com/mirrors/oh-my-zsh/raw/master/tools/install.sh
chmod +x install.sh
./install.sh
git clone https://github.com/zsh-users/zsh-syntax-highlighting.git ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/zsh-syntax-highlighting
git clone https://github.com/zsh-users/zsh-autosuggestions ${ZSH_CUSTOM:-~/.oh-my-zsh/custom}/plugins/zsh-autosuggestions
git clone https://github.com/romkatv/powerlevel10k.git $ZSH_CUSTOM/themes/powerlevel10k
修改~/.zshrc
里的两行
plugins=(z tmux git zsh-syntax-highlighting zsh-autosuggestions)
ZSH_THEME="powerlevel10k/powerlevel10k"
slurm查看节点drain原因(长)
sinfo -o "%200E %9u %19H %N"
Can't optimize non-leaf tensor
定义的时候cuda要在requires_grad前面
正确写法xxx = torch.zeros_like(ccc).cuda().requires_grad_(True)
显存不均
每个gpu都在gpu0上占了一块额外1789MB的显存
应该是torch.load
的问题,要加一个map_location='cpu'
这个问题主要出现在deca上,deca.py
的第89行要加
另一个情况是NCCL本身也要用显存,卡越多占用越多
离谱问题,torch.jit.load
卡死
追了半天,最后发现_jit_compile
里的baton = FileBaton(os.path.join(build_directory, 'lock'))
卡死了
https://www.jianshu.com/p/a0d769971b2a 给了解决方案,清空~/.cache/torch_extensions
即可,似乎是并发问题,拿不到锁
好看的bash
~/.bashrc里加
PS1="\[\033[m\]|\[\033[1;35m\]\t\[\033[m\]|\[\e[1;31m\]\u\[\e[1;36m\]\[\033[m\]@\[\e[1;36m\]\h\[\033[m\]:\[\e[0m\]\[\e[1;32m\][\W]> \[\e[0m\]"
alias ls='ls --color'
vscode server XHR fail 快速解决
写成sh脚本:
read commit_id
# 预先创建文件夹,对应的${commit_id}需要替换成那串数字(给萌新解释)
mkdir -p ~/.vscode-server/bin/${commit_id}
# 进入到文件夹并下载依赖
cd ~/.vscode-server/bin/${commit_id}
#这个国内镜像下载很快,注意Remote-SSH的版本,这里是stable
wget https://vscode.cdn.azure.cn/stable/${commit_id}/vscode-server-linux-x64.tar.gz --no-check-certificate
# tar解压文件到当前的文件夹,因为之前已经cd,所以正好是vscode代码需要检索的地方
# 检测到有相应的东西,vscode就会跳过下载直接启动远程的终端及相应线程
tar zxvf vscode-server-linux-x64.tar.gz --strip 1
#这个命令尤其重要否则会不成功
touch ~/.vscode-server/bin/${commit_id}/0
conda环境内不能clear
export 或者改shell配置文件
export TERMINFO=/usr/share/terminfo
export TERM=vt100
无权限情况下安装g++
conda install https://anaconda.org/brown-data-science/gcc/5.4.0/download/linux-64/gcc-5.4.0-0.tar.bz2
upd: 还可以这样
conda install -c conda-forge gcc
conda install -c conda-forge gxx
本文来自博客园,作者:GhostCai,转载请注明原文链接:https://www.cnblogs.com/ghostcai/p/16806392.html