可变剪切调控因子motif基因富集分析 | motif enrichment | FIMO | MEME

类似篇:转录因子motif TSS区域富集分析 | motif enrichment | HOMER | FIMO | MEME

 

一个新的领域,现在我关注的是可变剪切调控因子,如PTBP1,它们有特定的RNA结合motif,类似TF。

相同点:

  • 都是蛋白质的序列结合区域
  • 有特定的序列motif

不同点:

  • TF的motif主要结合在promoter和enhancer,负责基因转录
  • ASF的motif主要结合在gene的intro区域,负责可变剪切

 

这里以PTBP1为例。

 

灵感来源文章:2018 - cancer cell - PTBP1-Mediated Alternative Splicing Regulates the Inflammatory Secretome and the Pro-tumorigenic Effects of Senescent Cells

RNA-Binding Motif Analysis
FIMO (Grant et al., 2011) was used to scan the human gene sequences for the PTBP1 RNA-binding motifs inferred by (Ray et al., 2013). The thereby predicted occurrences were mapped to the analyzed splicing events. To generate the RNA-maps (Figures 7B and S7D), for each comparison alternative exons were divided into those with PSIs significantly increasing upon PTBP1 knockdown (putatively repressed), those with PSIs significantly decreasing upon PTBP1 knockdown (putatively enhanced), and those with PSIs not altered upon PTBP1 knockdown (putatively not regulated). Statistical significance for local motif enrichment is associated with Fisher’s exact tests for differences in motif occurrences between groups of exons within 31 bp moving windows.

 

找RNA motif

查Ray et al., 2013,A compendium of RNA-binding motifs for decoding gene regulation

顺藤摸瓜,找到一个数据库:CISBP-RNA Database: Catalog of Inferred Sequence Binding Preferences of RNA binding proteins

 

操作,导出hg38的gene序列(包含exon和intro)

http://www.genome.ucsc.edu/cgi-bin/hgTables

 

用FIMO预测:https://meme-suite.org/meme/tools/fimo

 

得到短序列的motif的meme格式,网页版会给出来,下载即可。

MEME version 4

ALPHABET= ACGT

strands: + -

Background letter frequencies (from unknown source):
A 0.250 C 0.250 G 0.250 T 0.250

MOTIF 1 HYTTTYT

letter-probability matrix: alength= 4 w= 7 nsites= 1 E= 0e+0
0.333333 0.333333 0.000000 0.333333
0.000000 0.500000 0.000000 0.500000
0.000000 0.000000 0.000000 1.000000
0.000000 0.000000 0.000000 1.000000
0.000000 0.000000 0.000000 1.000000
0.000000 0.500000 0.000000 0.500000
0.000000 0.000000 0.000000 1.000000

  

fimo --alpha 1 --max-strand -oc target PTBP1.motif.meme hg38_gene.fasta

  

一个小的DNA、RNA、protein转换工具:http://biomodel.uah.es/en/lab/cybertory/analysis/trans.htm

 

注意:

motif与序列要匹配,DNA就是T,RNA就是U,不然无法匹配。

如果是RNA motif,则需要做一个反向互补的DNA motif

MEME version 4

ALPHABET= ACGT

strands: + -

Background letter frequencies (from unknown source):
A 0.250 C 0.250 G 0.250 T 0.250

MOTIF 1 ARAAARD

letter-probability matrix: alength= 4 w= 7 nsites= 1 E= 0e+0
1.000000 0.000000 0.000000 0.000000
0.500000 0.000000 0.500000 0.000000
1.000000 0.000000 0.000000 0.000000
1.000000 0.000000 0.000000 0.000000
1.000000 0.000000 0.000000 0.000000
0.500000 0.000000 0.500000 0.000000
0.333333 0.000000 0.333333 0.333333

 

fimo --alpha 1 --max-strand -oc target PTBP1.DNA.motif.meme hg38_gene.fasta --max-stored-scores 1000000 --thresh 1e-4

  

下次要用小数据测试,不然一晚上白跑了。 

 

--max-strand

If matches on both strands at a given position satisfy the output threshold, only report the match for the strand with the higher score. If the scores are tied, the matching strand is chosen at random.

 

资源消耗统计

--max-stored-scores 1000000用到了1.48G内存,1个CPU

--max-stored-scores 10000000用到了内存,个CPU

 

最新命令:

fimo --max-stored-scores 10000000 --thresh 1e-4 --alpha 1 -oc target2 --text --max-strand PTBP1.DNA.motif.meme hg38_gene.fasta > output.tsv
fimo --max-stored-scores 10000000 --thresh 1e-4 --alpha 1 -oc target2 --skip-matched-sequence --max-strand PTBP1.DNA.motif.meme hg38_gene.fasta > output2.tsv

  

--skip-matched-sequence【超速输出,一个半小时缩短为10分钟】

Like the --text option, this limits output to tab-separated values (TSV) sent to standard out, but in addition, turns off output of the sequence of motif matches. This speeds up processing considerably.

  

--text【结果到标准输出】

Limits output to TSV (tab-separated values) formatted results sent to standard output. The results are unsorted and no q-values are output, allowing very large files to be searched.

 

参考:

~/project/scPipeline/motifEnrichment/ASF_motif/

 

posted @ 2021-08-08 23:12  Life·Intelligence  阅读(2504)  评论(0编辑  收藏  举报
TOP