Nr,GenBank, RefSeq, UniProt 数据库的异同
Nr,GenBank, RefSeq, UniProt 数据库的异同
有的文章在做DEG分析时,会把reads比对到RefSeq的转录组上。我也没搞清楚这和直接比对到常规转录组上有什么区别。
文章:Single-Cell Transcriptome Analysis Reveals Dynamic Changes in lncRNA Expression during Reprogramming
方法:For differential expression analysis, we aligned reads against the refSeq mouse transcriptome using Bowtie version 0.12.7 (Langmead et al., 2009). Expression levels were then stimated using eXpress (Roberts and Pachter, 2013) (version 1.3.0), with gene-level effective counts and RPKM values derived from the sum of the corresponding values for all isoforms of a gene.
refseq 数据库长啥样?
ftp://ftp.ncbi.nlm.nih.gov/refseq/
进到小鼠里:
mRNA_Prot
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | mRNA_Prot directory Contents: organisms-specific RefSeq transcript and protein data {org-name}.files.installed: reports the md5checksum and files included in the directory For example: /refseq/H_sapiens/mRNA_Prot/human .files.installed File Name Conventions: File name formats are as follows: common_name. #.molecule_type.format_type Multiple files may be provided for any given molecule and format type and file names include a numerical increment. Files with the same numerical increment are related by content. For example, the files provided for human are named as: human. #.rna.fna.gz --fasta report for transcript records human. #.protein.faa.gz --fasta report for protein records human. #.rna.gbff.gz --flatfile report for transcript records human. #.protein.gpff.gz --flatfile report for protein records |
下载一个rna.fna文件,里面是这样的:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | >NM_001013372.2 Mus musculus neural regeneration protein (Nrp), mRNA CGGTCCAAGGAATTTTTCTGACAAACGCAATAGGCCGACCAGTACTGGAACGCAGTGCGCTTAGCCCCTTTATGGCGGAG GCTGCCATGTTAAAACGGAATGAATCGAAACCCTGGAGTCGTGACCCCGGAAGAACCTGCCAGAGCCGGAATTTCGAGTT CTGCTTCCGGGCCAAACTGTTGGCAGCCTCGAGATGGGGAAGATGGCGGCTGCTGTGGCTTCATTAGCCACGCTGGCTGC AGAGCCCAGAGAGGATGCTTTCCGGAAGCTTTTCCGCTTCTACCGGCAGAGCCGGCCGGGGACAGCGGACCTGGGAGCCG TCATCGACTTCTCAGAGGCGCACTTGGCTCGGAGCCCGAAGCCCGGCGTGCCCCAGGTAGGAAAGGAGGAGTAGTGTGTG CCAGCCTAGCGGCCGACTGGGCCACCCGAGACTGGGCCGCCTCCGGGCCGGCTTTGGAGGGAAGCCCCTGCTGGGCCTGT CCAGTGAGCTGTAATGTCGAGCGATGAGCGACCAGCTGCCTCGCTGTCCCAACGCTCTGGCCACGGCTTGTGCCTTGCCG CCATTTCCCCCAACCCACGCGGGCCACGGCTTGTGCCCTGCCGCCATTTCCCCCAACCCACGCGACCTTGCTAAAAAAAA AAAAAGAAAGAAAAGAAAAGAAAGAAAGAAAGAAAAAAATCTGGAAATTGCTTGTACCTCCTTAACTATCTGTTTAATAC TAATACGATATTTTGTGTAAAGCTCAGAAGAACATCTTCGTGGACGTTAGGGTGGCCTCATAACTTCAGATAAAAGCAGC CATTTAATAAGTCTCAAACCGTTAATCCGTTGGGCCTGAGACTCGATCGACCCTGTCTTCTCTGAGGCTTTGAAAGTAAA GGTAAAATTAGCAGGTTTTTTTCCTGAGAATCTAGGAGCCTGGAGAGATAGCTCAGTAATTAAGAGCATTTACCTACTGG TGTTCCCAAGAACACCAAGTAGATTTGGTTCCTTGCAGCCACGTGGCAGCTCACAGCCTTCTTGTAACTCTTCCGGAGGA TCAGACACCCTCTCTTGAGCTCCACAGGAGAGCACTCGTAGACATGTAAATAAACTTCTAAGCTAAATCTAAACAATTTA TGTACCCTCCCTATTTCTTCGTGATGAGAAGAAAGGGGCCAGAGGGTATG >NR_046233.2 Mus musculus 45S pre-ribosomal RNA (Rn45s), ribosomal RNA ACTGACACGCTGTCCTTTCCCTATTAACACTAAAGGACACTATAAAGAGACCCTTTCGATTTAAGGCTGTTTTGCTTGTC |
还是没发现有什么区别!!!
1 | RefSeq转录本是 从gtf得到的转录本的一个子集 |
后面会再详细展开~
标签:
数据库
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· .NET10 - 预览版1新功能体验(一)