全基因组测序流程 | WGS pipeline

创建conda环境，安装必要软件

conda create -n wgs

conda activate wgs

conda install bioconda::bwa

下载最佳ref fasta

gcloud storage cp gs://BUCKET_NAME/OBJECT_NAME SAVE_TO_LOCATION

gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta

Resource bundle

https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/

构建索引【可以直接下载，broad有标准流程】

../refdata-gex-GRCh38-2020-A/fasta/genome.fa

bwa index -p GRCh38 hg38.fasta

/home/zz950/reference/bwa_wgs/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta.64

bwa标准比对

然后合并两个lane

部分下载的数据以CRAM格式保存

How to convert CRAM back to BAM using samtools

It's the same command, just use the -b option instead of -C.

需要最新版samtools读取cram文件

conda install bioconda::samtools

下载GATk： https://github.com/broadinstitute/gatk/releases

下载Jar：https://gatk.broadinstitute.org/hc/en-us/community/posts/360073027611-Where-and-how-I-can-get-GenomeAnalysisTK-jar-script

VariantRecalibrator java.lang.IllegalArgumentException: No data found.

No data found when using VariantRecalibrator

Using CRAM within Samtools

还有配对的RNA-seq

但是使用的是hg19的参考基因组，用的STAR比对。

https://data.broadinstitute.org/snowman/hg19/annotation/

这个bam可以用htseq-count来计数

conda install -c bioconda htseq

conda install bioconda::htseq

进入R里面合并文件

参考：http://localhost:17435/notebooks/projects/TAP_WGS/analysis/RNA-seq.ipynb

从gtf文件里提取信息

grep -v '#' Homo_sapiens_assembly19.gtf | cut -f9 | cut -d' ' -f2,10 | uniq > id.symbol

grep -v '#' Homo_sapiens_assembly19.gtf | cut -f1-5 > id.length
grep -v '#' Homo_sapiens_assembly19.gtf | cut -f9 | cut -d' ' -f2,10 > full.id.symbol

转化count为TPM

# counts:转录组的count矩阵，行为基因，列为样本
# effLen：一个数值型向量，值是基因长度，顺序应该与count的列一致对应。
Counts2TPM <- function(counts, effLen){
  rate <- log(counts) - log(effLen)
  denom <- log(sum(exp(rate)))
  exp(rate - denom + log(1e6))
}

trans_tpm <- apply(count, 2, Counts2TPM, effLen = effLen)
head(trans_tpm)[,1:2]

参考：RNAseq数据分析中count、FPKM和TPM之间的转换

参考流程：

posted @ 2024-02-10 15:57 Life·Intelligence 阅读(220) 评论(0) 编辑收藏举报

刷新页面返回顶部

Digital-LI

全基因组测序流程 | WGS pipeline