GenomicConsensus (quiver, arrow)使用方法 | 序列 consensus
https://github.com/PacificBiosciences/GenomicConsensus
GenomicConsensus 是pacbio开发的,我个人非常不喜欢pacbio开发的工具,很难用。
安装这个GenomicConsensus也是废了我快半条老命。
这个工具的目的:Compute genomic consensus and call variants relative to the reference.
就是用一些reads来对最终的ref来进行纠错,这个模型适用性比较大,可以用在各个场合,尤其是我们在开发一些工具时,可以直接将这个嵌入到我们的工具中,减少开发量。
| . /bin/arrow -h usage: variantCaller [-h] [--version] [--emit-tool-contract] [--resolved-tool-contract RESOLVED_TOOL_CONTRACT] [--log- file LOG_FILE] [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} | --debug | --quiet | - v ] --referenceFilename REFERENCEFILENAME -o OUTPUTFILENAMES [-j NUMWORKERS] [--minConfidence MINCONFIDENCE] [--minCoverage MINCOVERAGE] [--noEvidenceConsensusCall {nocall,reference,lowercasereference}] [--coverage COVERAGE] [--minMapQV MINMAPQV] [--referenceWindow REFERENCEWINDOWSASSTRING] [--alignmentSetRefWindows] [--referenceWindowsFile REFERENCEWINDOWSASSTRING] [--barcode _BARCODE] [--readStratum READSTRATUM] [--minReadScore MINREADSCORE] [--minSnr MINHQREGIONSNR] [--minZScore MINZSCORE] [--minAccuracy MINACCURACY] [--algorithm {quiver,arrow,plurality,poa,best}] [--parametersFile PARAMETERSFILE] [--parametersSpec PARAMETERSSPEC] [--maskRadius MASKRADIUS] [--maskErrorRate MASKERRORRATE] [--pdb] [--notrace] [--pdbAtStartup] [--profile] [--dumpEvidence [{variants,all,outliers}]] [--evidenceDirectory EVIDENCEDIRECTORY] [--annotateGFF] [--reportEffectiveCoverage] [--diploid] [--queueSize QUEUESIZE] [--threaded] [--referenceChunkSize REFERENCECHUNKSIZE] [--fancyChunking] [--simpleChunking] [--referenceChunkOverlap REFERENCECHUNKOVERLAP] [--autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE] [--aligner {affine,simple}] [--refineDinucleotideRepeats] [--noRefineDinucleotideRepeats] [--fast] [--skipUnrecognizedContigs] inputFilename Compute genomic consensus and call variants relative to the reference. optional arguments: -h, --help show this help message and exit --version show program's version number and exit --emit-tool-contract Emit Tool Contract to stdout (default: False) --resolved-tool-contract RESOLVED_TOOL_CONTRACT Run Tool directly from a PacBio Resolved tool contract (default: None) --log- file LOG_FILE Write the log to file . Default(None) will write to stdout. (default: None) --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} Set log level (default: WARN) --debug Alias for setting log level to DEBUG (default: False) --quiet Alias for setting log level to CRITICAL to suppress output. (default: False) - v , --verbose Set the verbosity level. (default: None) Basic required options: inputFilename The input cmp .h5 or BAM alignment file --referenceFilename REFERENCEFILENAME, --reference REFERENCEFILENAME, -r REFERENCEFILENAME The filename of the reference FASTA file (default: None) -o OUTPUTFILENAMES, --outputFilename OUTPUTFILENAMES The output filename(s), as a comma-separated list.Valid output formats are .fa/.fasta, .fq/.fastq, .gff, .vcf (default: []) Parallelism: -j NUMWORKERS, --numWorkers NUMWORKERS The number of worker processes to be used (default: 1) Output filtering: --minConfidence MINCONFIDENCE, -q MINCONFIDENCE The minimum confidence for a variant call to be output to variants.{gff,vcf} (default: 40) --minCoverage MINCOVERAGE, -x MINCOVERAGE The minimum site coverage that must be achieved for variant calls and consensus to be calculated for a site. (default: 5) --noEvidenceConsensusCall {nocall,reference,lowercasereference} The consensus base that will be output for sites with no effective coverage. (default: lowercasereference) Read selection /filtering : --coverage COVERAGE, -X COVERAGE A designation of the maximum coverage level to be used for analysis. Exact interpretation is algorithm- specific. (default: 100) --minMapQV MINMAPQV, -m MINMAPQV The minimum MapQV for reads that will be used for analysis. (default: 10) --referenceWindow REFERENCEWINDOWSASSTRING, --referenceWindows REFERENCEWINDOWSASSTRING, -w REFERENCEWINDOWSASSTRING The window (or multiple comma-delimited windows) of the reference to be processed, in the format refGroup :refStart-refEnd (default: entire reference). (default: None) --alignmentSetRefWindows The window (or multiple comma-delimited windows) of the reference to be processed, in the format refGroup :refStart-refEnd will be pulled from the alignment file . (default: False) --referenceWindowsFile REFERENCEWINDOWSASSTRING, -W REFERENCEWINDOWSASSTRING A file containing reference window designations, one per line (default: None) --barcode _BARCODE Only process reads with the given barcode name. (default: None) --readStratum READSTRATUM A string of the form 'n/N' , where n, and N are integers, 0 <= n < N, designating that the reads are to be deterministically split into N strata of roughly even size, and stratum n is to be used for variant and consensus calling. This is mostly useful for Quiver development. (default: None) --minReadScore MINREADSCORE The minimum ReadScore for reads that will be used for analysis (arrow-only). (default: 0.65) --minSnr MINHQREGIONSNR The minimum acceptable signal-to-noise over all channels for reads that will be used for analysis (arrow-only). (default: 3.75) --minZScore MINZSCORE The minimum acceptable z-score for reads that will be used for analysis (arrow-only). (default: -3.5) --minAccuracy MINACCURACY The minimum acceptable window-global alignment accuracy for reads that will be used for the analysis (arrow-only). (default: 0.82) Algorithm and parameter settings: --algorithm {quiver,arrow,plurality,poa,best} --parametersFile PARAMETERSFILE, -P PARAMETERSFILE Parameter set filename (such as ArrowParameters.json or QuiverParameters.ini), or directory D such that either D/* /GenomicConsensus/QuiverParameters .ini, or D /GenomicConsensus/QuiverParameters .ini, is found. In the former case , the lexically largest path is chosen. (default: None) --parametersSpec PARAMETERSSPEC, -p PARAMETERSSPEC Name of parameter set (chemistry.model) to select from the parameters file , or just the name of the chemistry, in which case the best available model is chosen. Default is 'auto' , which selects the best parameter set from the alignment data (default: auto) --maskRadius MASKRADIUS Radius of window to use when excluding local regions for exceeding maskMinErrorRate, where 0 disables any filtering (arrow-only). (default: 3) --maskErrorRate MASKERRORRATE Maximum local error rate before the local region defined by maskRadius is excluded from polishing (arrow-only). (default: 0.7) Verbosity and debugging /profiling : --pdb Enable Python debugger (default: False) --notrace Suppress stacktrace for exceptions (to simplify testing) (default: False) --pdbAtStartup Drop into Python debugger at startup (requires ipdb) (default: False) --profile Enable Python-level profiling (using cProfile). (default: False) --dumpEvidence [{variants,all,outliers}], -d [{variants,all,outliers}] --evidenceDirectory EVIDENCEDIRECTORY --annotateGFF Augment GFF variant records with additional information (default: False) --reportEffectiveCoverage Additionally record the *post-filtering* coverage at variant sites (default: False) Advanced configuration options: --diploid Enable detection of heterozygous variants (experimental) (default: False) --queueSize QUEUESIZE, -Q QUEUESIZE --threaded, -T Run threads instead of processes ( for debugging purposes only) (default: False) --referenceChunkSize REFERENCECHUNKSIZE, -C REFERENCECHUNKSIZE --fancyChunking Adaptive reference chunking designed to handle coverage cutouts better (default: True) --simpleChunking Disable adaptive reference chunking (default: True) --referenceChunkOverlap REFERENCECHUNKOVERLAP --autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE Disable the HDF5 chunk cache when the number of datasets in the cmp .h5 exceeds the given threshold (default: 500) --aligner {affine,simple}, -a {affine,simple} The pairwise alignment algorithm that will be used to produce variant calls from the consensus (Quiver only). (default: affine) --refineDinucleotideRepeats Require quiver maximum likelihood search to try one less /more repeat copy in dinucleotide repeats, which seem to be the most frequent cause of suboptimal convergence (getting trapped in local optimum) (Quiver only) (default: True) --noRefineDinucleotideRepeats Disable dinucleotide refinement (default: True) --fast Cut some corners to run faster. Unsupported! (default: False) --skipUnrecognizedContigs Do not abort when told to process a reference window (via -w /--referenceWindow [s]) that has no aligned coverage. Outputs emptyish files if there are no remaining non-degenerate windows. Only intended for use by smrtpipe scatter /gather . (default: False) |
待续~~
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· winform 绘制太阳,地球,月球 运作规律
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)