ASE分析

1、Prepare necessary input files（可以参考上次的博客http://www.cnblogs.com/renping/p/7391028.html）

1）对fq1和fq2合并

cat fq1 fq2

2）对bam 文件转换成psl格式

/share/nas2/genome/biosoft/Python/2.7.8/bin/python /share/nas1/wenyh/develop/tools/Au-public-master/iron/utilities/sam_to_psl.py -r transcript.fa T16.bam >T16.psl

3）gtf format convert to gpd format

/share/nas1/wenyh/develop/tools/gtfToGenePred transcript.gtf -genePredExt transcript.gpd.tmp

awk '{print 0"\t"$0}'transcript.gpd.tmp >transcript.gpd.tmp2

/share/nas1/wenyh/develop/pacbio/IDP-ASE/julia/bin/julia /home/wenyh/.julia/v0.4/IDPASE/scripts/convert_gpd.jl transcript.gpd.tmp2 >transcript.gpd.tmp3

4）vcf注释和选杂合的vcf文件

注释vcf文件。（参考博客：http://www.cnblogs.com/renping/p/7467348.html）

awk '$10!~/1\/1/;$10!~/\.\/\./{print}'|le >final.snp.anno.vcf1 ##筛选杂合

le final.snp.anno.vcf1|grep -v '#'|cut -f 1 |sort |uniq -c | awk '{print $2,$1}'|less -S|sort -k 2nr|le >Snp.distribution

2、Prepare Gene level data

1) mkdir temp/; mkdir gene_files; mkdir isoform_files; mkdir gene_out; mkdir isoform_out;

2) for i in `le snp.distribution |awk '$1<10 {print $2}'|le`; do echo "/share/nas1/yangch/tools/julia/bin/julia -p 4 /home/yangch/.julia/v0.4/IDPASE/src/prep_runs.jl \

　　-a /share/nas1/yangch/RENPP/out/T19.psl \

　　-g /share/nas1/yangch/RENPP/out/transcript.gpd.tmp3 \

　　-v /share/nas1/yangch/RENPP/out/final.snp.anno.vcf1 \

　　-q /share/nas1/yangch/RENPP/out/T19.fq \

　　-d /share/nas1/yangch/RENPP/out/temp \

　　-c ${i} \

　　 -f 1 \

-o /share/nas1/yangch/RENPP/out/gene_files/ \

-p T19 "; done >A1.sh #####Prepare Gene level data

3) for i in `ls /share/nas1/yangch/RENPP/out/gene_files/|perl -lne '{next if /^\s+$/;/T19_(reads|true)_(.*)\.txt/;print $2}'|sort|uniq|less`;\

do echo "/share/nas1/yangch/tools/julia/bin/julia -p 4 /home/yangch/.julia/v0.4/IDPASE/scripts//phase_by_loci_sub.jl \

-t /share/nas1/yangch/RENPP/out/gene_files/T19_true_${i}.txt \

-a /share/nas1/yangch/RENPP/out/gene_files/T19_reads_${i}.txt \

-o /share/nas1/yangch/RENPP/out/gene_out/ \

-l 1 \

-r ${i} \

-i 10000 \

-b 1000 \

-c 4 \

-d /home/yangch/.julia/v0.4/IDPASE/scripts/ \

-n SGS \

-m 1 0 \

-s 1.0"; done >to_run_curr.sh #### Get commands to run each gene individually

4) Concatenate all gene level results

find gene_out/ -name "REAL*" | xargs cat > gene_out/gene_results.txt

posted @ 2017-09-07 09:46 风中之铃阅读(858) 评论(0) 收藏举报

刷新页面返回顶部

风中之铃

ASE分析

公告