RNA velocity | RNA速率 | velocyto教程
2023年10月18日
补全最后一个WT1的velocity数据: http://172.24.136.119:8888/notebooks/scPipeline/velocity/VelocityBasics-ApcKO.ipynb
# for scMulti-omics data # do before # ln -s gex_possorted_bam.bam possorted_genome_bam.bam # ln -s gex_possorted_bam.bam.bai possorted_genome_bam.bam.bai # conda create -n velocyto numpy scipy cython numba matplotlib scikit-learn h5py click # conda activate velocyto # pip install pysam # pip install velocyto # error # error: command '/usr/bin/gcc' failed with exit code 1 # conda install gcc rmsk=/home/zz950/reference/mm10_rmsk.gtf gtf=/home/zz950/reference/refdata-cellranger-arc-mm10-2020-A-2.0.0/genes.gtf WT1report=/home/zz950/tmpData/ApcKO_cellranger/cellranger_report/WT1_report_full # for this special case # tar zxvf filtered_feature_bc_matrix.tar.gz # mkdir filtered_feature_bc_matrix # mv matrix.mtx.gz features.tsv.gz barcodes.tsv.gz filtered_feature_bc_matrix # fail to run # /home/lizhixin/softwares/anaconda3/bin/velocyto run10x -@ 10 -m $rmsk $WT1report $gtf velocyto run10x -@ 10 -m $rmsk $WT1report $gtf --bcfile filtered_feature_bc_matrix/barcodes.tsv.gz # solution # conda install samtools velocyto run \ --bcfile filtered_feature_bc_matrix/barcodes.tsv.gz \ --outputfolder results \ -@ 20 \ -m $rmsk \ possorted_genome_bam.bam \ $gtf
2022年09月17日
试一下cellrank,必须单独建一个env因为依赖环境过于复杂,容易紊乱。
conda create --name cellrank cellrank -c conda-forge -c bioconda source activate cellrank conda install ipykernel ipython kernel install --name "cellrank" --user
2022年09月16日
RNA velocity分析需要两个工具
- 第一个是velocyto,命令行版,用于从bam file里提取出splicing matrix;CLI Usage Guide
- 第二个是scVelo,Python版,用于后续处理分析,作图;scVelo
参考:
下载mm10_rmsk.gtf:https://genome.ucsc.edu/cgi-bin/hgTables?hgsid=611454127_NtvlaW6xBSIRYJEBI0iRDEWisITa&clade=mammal&org=Mouse&db=0&hgta_group=allTracks&hgta_track=rmsk&hgta_table=rmsk&hgta_regionType=genome&position=&hgta_outputType=gff&hgta_outFileName=mm10_rmsk.gtf
下载hg38:https://genome.ucsc.edu/cgi-bin/hgTables?hgsid=611454127_NtvlaW6xBSIRYJEBI0iRDEWisITa&clade=mammal&org=Human&db=0&hgta_group=allTracks&hgta_track=rmsk&hgta_table=rmsk&hgta_regionType=genome&position=&hgta_outputType=gff&hgta_outFileName=mm10_rmsk.gtf
一个比原版更好用的RNA velocity分析工具:velocyto 过期
GitHub:https://github.com/velocyto-team/velocyto.R 过期
readthedoc:http://velocyto.org/velocyto.py/index.html
历史分析参考:
project/scRNA-seq/velocyto - for smart-seq data
project/scRNA-seq/rawData/smart-seq/analysis/velocyto - for smart-seq data
project/scRNA-seq/rawData/10x/Jun_2020/analysis/velocyto - for 10x Genomics data
project/scPipeline/human/Human-Organoid/PHOX2B-Organoid-BO1/VelocityBasics.ipynb - NB
需要1一个bam文件和2个注释文件
-m, --mask FILE .gtf file containing intervals to mask GTFFILE
注意注释文件有人和鼠的区别,均需要与cellranger处理时用的注释文件版本一致。
10x的数据处理非常简单,一行命令即可搞定。
velocyto run10x -m ensembl/release91/GRCh38_rmsk.gtf project/scRNA-seq/rawData/10x/Jun_2020/analysis/7Ala-D60-BO_report cellranger_ref/2019_Aug/refdata-cellranger-GRCh38-3.0.0/genes/genes.gtf
smart-seq的数据处理现在也变简单了,一行命令搞定。
velocyto run-smartseq2 -o OUTPUT -m databases/hg19/hg19_rmsk.gtf -e HSCR bam.link.2650/*.bam databases/hg19/gencode.v27.annotation.gtf
分析经验:
首先要检验bam文件的完整性,不然肯定会报错;
其次就是要大致知道运行时间,2500左右的细胞大概要运行80个小时;
Job information and usage summary of your CGS-HPCF job 689116 : +--------------+----------+-------+----------+------+---------------------+-----------+------+------+ | jobid | username | queue | jobname | E S | End Time | walltime% | mem% | cpu% | +--------------+----------+-------+----------+------+---------------------+-----------+------+------+ | 689116.omics | lizhixin | large | velocyto | 0 | 2021-01-16 07:13:21 | 94.98 | 7.95 | 8.32 | +--------------+----------+-------+----------+------+---------------------+-----------+------+------+ +---------------------+---------------------+----------+----------+------+-------------+-------+----------+--------+ | Submit Time | Start Time | wtime@ | wtime# | mem@ | mem# | vmem@ | CPUTime@ | nproc# | +---------------------+---------------------+----------+----------+------+-------------+-------+----------+--------+ | 2021-01-12 23:06:51 | 2021-01-12 23:26:26 | 79:46:55 | 84:00:00 | 7.95 | 104857600kb | 10.38 | 79:39:54 | 12 | +---------------------+---------------------+----------+----------+------+-------------+-------+----------+--------+ +-------------+ | hostlist | +-------------+ | hpch06/1*12 | +-------------+ E S = Exit Status ; % = usage percentage; # = requested ; @ = used ; mem@/vmem@ in GB ; nproc = number of processors
结果图: