生信工具推荐之(2) datasets
datasets NCBI出品跨平台轻松批量从数据库中下载数据的命令行工具
指南:
工具处于快速更新迭代阶段,正逐步添加新功能,,参考网址:https://www.ncbi.nlm.nih.gov/datasets/docs/v1/how-tos/
安装:
curl -o datasets 'https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/v1/linux-amd64/datasets'
curl -o dataformat 'https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/v1/linux-amd64/dataformat'
chmod +x dataformat datasets
用法:
比如下载基因,基因组序列数的据帮助信息如下
./datasets download -h
Download genome, gene and coronavirus data packages, including sequence, annotation, and metadata, as a zip file.
Refer to NCBI's [download and install](https://www.ncbi.nlm.nih.gov/datasets/docs/download-and-install/) documentation for information about getting started with the command-line tools.
Usage
datasets download [flags]
datasets download [command]
Examples
datasets download genome accession GCF_000001405.40 --chromosomes X,Y --exclude-gff3 --exclude-rna
datasets download genome taxon "bos taurus"
datasets download gene gene-id 672
datasets download gene symbol brca1 --taxon mouse
datasets download gene accession NP_000483.3
datasets download virus genome taxon sars-cov-2 --host dog
datasets download virus protein S --host dog --filename SARS2-spike-dog.zip
datasets download --input-json request_file.json --filename output.zip
Available Commands
gene download a gene dataset
genome download a genome dataset
virus download a coronavirus dataset
ortholog download an ortholog dataset
Flags
--filename string specify a custom file name for the downloaded dataset (default "ncbi_dataset.zip")
-h, --help help for download
--input-json string a file that contains a valid json request object for genome or gene queries
Global Flags
--api-key string NCBI Datasets API Key
--no-progressbar hide progress bar
Use datasets download help <command> for detailed help about a command.
测试:
下载人类基因id为672的序列,其他更多用法参考:https://www.ncbi.nlm.nih.gov/datasets/docs/v2/how-tos/genomes/download-genome/
datasets download gene gene-id 672
Downloading: ncbi_dataset.zip 818kB done
# 解压结果
unzip ncbi_dataset.zip
Archive: ncbi_dataset.zip
inflating: README.md
inflating: ncbi_dataset/data/gene.fna
inflating: ncbi_dataset/data/rna.fna
inflating: ncbi_dataset/data/protein.faa
inflating: ncbi_dataset/data/data_report.jsonl
inflating: ncbi_dataset/data/data_table.tsv
inflating: ncbi_dataset/data/dataset_catalog.json
作者:天使不设防
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利.