可变剪切调控因子motif基因富集分析 | motif enrichment | FIMO | MEME
类似篇:转录因子motif TSS区域富集分析 | motif enrichment | HOMER | FIMO | MEME
一个新的领域,现在我关注的是可变剪切调控因子,如PTBP1,它们有特定的RNA结合motif,类似TF。
相同点:
- 都是蛋白质的序列结合区域
- 有特定的序列motif
不同点:
- TF的motif主要结合在promoter和enhancer,负责基因转录
- ASF的motif主要结合在gene的intro区域,负责可变剪切
这里以PTBP1为例。
灵感来源文章:2018 - cancer cell - PTBP1-Mediated Alternative Splicing Regulates the Inflammatory Secretome and the Pro-tumorigenic Effects of Senescent Cells
RNA-Binding Motif Analysis
FIMO (Grant et al., 2011) was used to scan the human gene sequences for the PTBP1 RNA-binding motifs inferred by (Ray et al., 2013). The thereby predicted occurrences were mapped to the analyzed splicing events. To generate the RNA-maps (Figures 7B and S7D), for each comparison alternative exons were divided into those with PSIs significantly increasing upon PTBP1 knockdown (putatively repressed), those with PSIs significantly decreasing upon PTBP1 knockdown (putatively enhanced), and those with PSIs not altered upon PTBP1 knockdown (putatively not regulated). Statistical significance for local motif enrichment is associated with Fisher’s exact tests for differences in motif occurrences between groups of exons within 31 bp moving windows.
找RNA motif
查Ray et al., 2013,A compendium of RNA-binding motifs for decoding gene regulation
顺藤摸瓜,找到一个数据库:CISBP-RNA Database: Catalog of Inferred Sequence Binding Preferences of RNA binding proteins
操作,导出hg38的gene序列(包含exon和intro)
http://www.genome.ucsc.edu/cgi-bin/hgTables
用FIMO预测:https://meme-suite.org/meme/tools/fimo
得到短序列的motif的meme格式,网页版会给出来,下载即可。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | MEME version 4 ALPHABET= ACGT strands: + - Background letter frequencies ( from unknown source): A 0.250 C 0.250 G 0.250 T 0.250 MOTIF 1 HYTTTYT letter-probability matrix: alength= 4 w= 7 nsites= 1 E= 0e+0 0.333333 0.333333 0.000000 0.333333 0.000000 0.500000 0.000000 0.500000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.500000 0.000000 0.500000 0.000000 0.000000 0.000000 1.000000 |
1 | fimo --alpha 1 --max-strand -oc target PTBP1.motif.meme hg38_gene.fasta |
一个小的DNA、RNA、protein转换工具:http://biomodel.uah.es/en/lab/cybertory/analysis/trans.htm
注意:
motif与序列要匹配,DNA就是T,RNA就是U,不然无法匹配。
如果是RNA motif,则需要做一个反向互补的DNA motif
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | MEME version 4 ALPHABET= ACGT strands: + - Background letter frequencies (from unknown source ): A 0.250 C 0.250 G 0.250 T 0.250 MOTIF 1 ARAAARD letter-probability matrix: alength= 4 w= 7 nsites= 1 E= 0e+0 1.000000 0.000000 0.000000 0.000000 0.500000 0.000000 0.500000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.500000 0.000000 0.500000 0.000000 0.333333 0.000000 0.333333 0.333333 |
1 | fimo --alpha 1 --max-strand -oc target PTBP1.DNA.motif.meme hg38_gene.fasta --max-stored-scores 1000000 --thresh 1e-4 |
下次要用小数据测试,不然一晚上白跑了。
--max-strand
If matches on both strands at a given position satisfy the output threshold, only report the match for the strand with the higher score. If the scores are tied, the matching strand is chosen at random.
资源消耗统计
--max-stored-scores 1000000用到了1.48G内存,1个CPU
--max-stored-scores 10000000用到了内存,个CPU
最新命令:
1 | fimo --max-stored-scores 10000000 --thresh 1e-4 --alpha 1 -oc target2 --text --max-strand PTBP1.DNA.motif.meme hg38_gene.fasta > output.tsv |
1 | fimo --max-stored-scores 10000000 --thresh 1e-4 --alpha 1 -oc target2 --skip-matched-sequence --max-strand PTBP1.DNA.motif.meme hg38_gene.fasta > output2.tsv |
--skip-matched-sequence【超速输出,一个半小时缩短为10分钟】
Like the --text option, this limits output to tab-separated values (TSV) sent to standard out, but in addition, turns off output of the sequence of motif matches. This speeds up processing considerably.
--text【结果到标准输出】
Limits output to TSV (tab-separated values) formatted results sent to standard output. The results are unsorted and no q-values are output, allowing very large files to be searched.
参考:
~/project/scPipeline/motifEnrichment/ASF_motif/
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· winform 绘制太阳,地球,月球 运作规律
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)
2019-08-08 学习与研究
2017-08-08 Shell脚本中的并发(转)