RNA velocity | RNA速率 | velocyto教程

 

2023年10月18日

补全最后一个WT1的velocity数据: http://172.24.136.119:8888/notebooks/scPipeline/velocity/VelocityBasics-ApcKO.ipynb 

# for scMulti-omics data

# do before
# ln -s gex_possorted_bam.bam possorted_genome_bam.bam
# ln -s gex_possorted_bam.bam.bai possorted_genome_bam.bam.bai

# conda create -n velocyto numpy scipy cython numba matplotlib scikit-learn h5py click
# conda activate velocyto
# pip install pysam
# pip install velocyto # error
# error: command '/usr/bin/gcc' failed with exit code 1
# conda install gcc

rmsk=/home/zz950/reference/mm10_rmsk.gtf
gtf=/home/zz950/reference/refdata-cellranger-arc-mm10-2020-A-2.0.0/genes.gtf
WT1report=/home/zz950/tmpData/ApcKO_cellranger/cellranger_report/WT1_report_full

# for this special case
# tar zxvf filtered_feature_bc_matrix.tar.gz
# mkdir filtered_feature_bc_matrix
# mv matrix.mtx.gz features.tsv.gz barcodes.tsv.gz filtered_feature_bc_matrix

# fail to run
# /home/lizhixin/softwares/anaconda3/bin/velocyto run10x -@ 10 -m $rmsk $WT1report $gtf

velocyto run10x -@ 10 -m $rmsk $WT1report $gtf --bcfile filtered_feature_bc_matrix/barcodes.tsv.gz

# solution
# conda install samtools

velocyto run \
    --bcfile filtered_feature_bc_matrix/barcodes.tsv.gz \
    --outputfolder results \
    -@ 20 \
    -m $rmsk \
    possorted_genome_bam.bam \
    $gtf

  

 

 

2022年09月17日 

试一下cellrank,必须单独建一个env因为依赖环境过于复杂,容易紊乱。

conda create --name cellrank cellrank -c conda-forge -c bioconda
source activate cellrank
conda install ipykernel
ipython kernel install --name "cellrank" --user

  

 


 

2022年09月16日

RNA velocity分析需要两个工具

  1. 第一个是velocyto,命令行版,用于从bam file里提取出splicing matrix;CLI Usage Guide
  2. 第二个是scVelo,Python版,用于后续处理分析,作图;scVelo

参考:

下载mm10_rmsk.gtf:https://genome.ucsc.edu/cgi-bin/hgTables?hgsid=611454127_NtvlaW6xBSIRYJEBI0iRDEWisITa&clade=mammal&org=Mouse&db=0&hgta_group=allTracks&hgta_track=rmsk&hgta_table=rmsk&hgta_regionType=genome&position=&hgta_outputType=gff&hgta_outFileName=mm10_rmsk.gtf

下载hg38:https://genome.ucsc.edu/cgi-bin/hgTables?hgsid=611454127_NtvlaW6xBSIRYJEBI0iRDEWisITa&clade=mammal&org=Human&db=0&hgta_group=allTracks&hgta_track=rmsk&hgta_table=rmsk&hgta_regionType=genome&position=&hgta_outputType=gff&hgta_outFileName=mm10_rmsk.gtf

 


 

一个比原版更好用的RNA velocity分析工具:velocyto  过期

GitHub:https://github.com/velocyto-team/velocyto.R  过期

 

readthedoc:http://velocyto.org/velocyto.py/index.html

 

历史分析参考:

project/scRNA-seq/velocyto - for smart-seq data

project/scRNA-seq/rawData/smart-seq/analysis/velocyto - for smart-seq data

project/scRNA-seq/rawData/10x/Jun_2020/analysis/velocyto - for 10x Genomics data

project/scPipeline/human/Human-Organoid/PHOX2B-Organoid-BO1/VelocityBasics.ipynb - NB

 

需要1一个bam文件和2个注释文件

-m, --mask FILE                 .gtf file containing intervals to mask
GTFFILE

 

注意注释文件有人和鼠的区别,均需要与cellranger处理时用的注释文件版本一致。

 

10x的数据处理非常简单,一行命令即可搞定。

velocyto run10x -m ensembl/release91/GRCh38_rmsk.gtf project/scRNA-seq/rawData/10x/Jun_2020/analysis/7Ala-D60-BO_report cellranger_ref/2019_Aug/refdata-cellranger-GRCh38-3.0.0/genes/genes.gtf

 

smart-seq的数据处理现在也变简单了,一行命令搞定。

velocyto run-smartseq2 -o OUTPUT -m databases/hg19/hg19_rmsk.gtf -e HSCR bam.link.2650/*.bam databases/hg19/gencode.v27.annotation.gtf

  

分析经验:

首先要检验bam文件的完整性,不然肯定会报错;

其次就是要大致知道运行时间,2500左右的细胞大概要运行80个小时;

 

Job information and usage summary of your CGS-HPCF job 689116 :
+--------------+----------+-------+----------+------+---------------------+-----------+------+------+
| jobid        | username | queue | jobname  | E S  | End Time            | walltime% | mem% | cpu% |
+--------------+----------+-------+----------+------+---------------------+-----------+------+------+
| 689116.omics | lizhixin | large | velocyto |    0 | 2021-01-16 07:13:21 |     94.98 | 7.95 | 8.32 |
+--------------+----------+-------+----------+------+---------------------+-----------+------+------+
+---------------------+---------------------+----------+----------+------+-------------+-------+----------+--------+
| Submit Time         | Start Time          | wtime@   | wtime#   | mem@ | mem#        | vmem@ | CPUTime@ | nproc# |
+---------------------+---------------------+----------+----------+------+-------------+-------+----------+--------+
| 2021-01-12 23:06:51 | 2021-01-12 23:26:26 | 79:46:55 | 84:00:00 | 7.95 | 104857600kb | 10.38 | 79:39:54 |     12 |
+---------------------+---------------------+----------+----------+------+-------------+-------+----------+--------+
+-------------+
| hostlist    |
+-------------+
| hpch06/1*12 |
+-------------+
E S = Exit Status ; % = usage percentage; # = requested ; @ = used ; mem@/vmem@ in GB ; nproc = number of processors

  

结果图:

 

 

posted @ 2021-01-12 16:57  Life·Intelligence  阅读(4524)  评论(0编辑  收藏  举报
TOP