GenomicConsensus (quiver, arrow)使用方法 | 序列 consensus
https://github.com/PacificBiosciences/GenomicConsensus
GenomicConsensus 是pacbio开发的,我个人非常不喜欢pacbio开发的工具,很难用。
安装这个GenomicConsensus也是废了我快半条老命。
这个工具的目的:Compute genomic consensus and call variants relative to the reference.
就是用一些reads来对最终的ref来进行纠错,这个模型适用性比较大,可以用在各个场合,尤其是我们在开发一些工具时,可以直接将这个嵌入到我们的工具中,减少开发量。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 | . /bin/arrow -h usage: variantCaller [-h] [--version] [--emit-tool-contract] [--resolved-tool-contract RESOLVED_TOOL_CONTRACT] [--log- file LOG_FILE] [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} | --debug | --quiet | - v ] --referenceFilename REFERENCEFILENAME -o OUTPUTFILENAMES [-j NUMWORKERS] [--minConfidence MINCONFIDENCE] [--minCoverage MINCOVERAGE] [--noEvidenceConsensusCall {nocall,reference,lowercasereference}] [--coverage COVERAGE] [--minMapQV MINMAPQV] [--referenceWindow REFERENCEWINDOWSASSTRING] [--alignmentSetRefWindows] [--referenceWindowsFile REFERENCEWINDOWSASSTRING] [--barcode _BARCODE] [--readStratum READSTRATUM] [--minReadScore MINREADSCORE] [--minSnr MINHQREGIONSNR] [--minZScore MINZSCORE] [--minAccuracy MINACCURACY] [--algorithm {quiver,arrow,plurality,poa,best}] [--parametersFile PARAMETERSFILE] [--parametersSpec PARAMETERSSPEC] [--maskRadius MASKRADIUS] [--maskErrorRate MASKERRORRATE] [--pdb] [--notrace] [--pdbAtStartup] [--profile] [--dumpEvidence [{variants,all,outliers}]] [--evidenceDirectory EVIDENCEDIRECTORY] [--annotateGFF] [--reportEffectiveCoverage] [--diploid] [--queueSize QUEUESIZE] [--threaded] [--referenceChunkSize REFERENCECHUNKSIZE] [--fancyChunking] [--simpleChunking] [--referenceChunkOverlap REFERENCECHUNKOVERLAP] [--autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE] [--aligner {affine,simple}] [--refineDinucleotideRepeats] [--noRefineDinucleotideRepeats] [--fast] [--skipUnrecognizedContigs] inputFilename Compute genomic consensus and call variants relative to the reference. optional arguments: -h, --help show this help message and exit --version show program's version number and exit --emit-tool-contract Emit Tool Contract to stdout (default: False) --resolved-tool-contract RESOLVED_TOOL_CONTRACT Run Tool directly from a PacBio Resolved tool contract (default: None) --log- file LOG_FILE Write the log to file . Default(None) will write to stdout. (default: None) --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} Set log level (default: WARN) --debug Alias for setting log level to DEBUG (default: False) --quiet Alias for setting log level to CRITICAL to suppress output. (default: False) - v , --verbose Set the verbosity level. (default: None) Basic required options: inputFilename The input cmp .h5 or BAM alignment file --referenceFilename REFERENCEFILENAME, --reference REFERENCEFILENAME, -r REFERENCEFILENAME The filename of the reference FASTA file (default: None) -o OUTPUTFILENAMES, --outputFilename OUTPUTFILENAMES The output filename(s), as a comma-separated list.Valid output formats are .fa/.fasta, .fq/.fastq, .gff, .vcf (default: []) Parallelism: -j NUMWORKERS, --numWorkers NUMWORKERS The number of worker processes to be used (default: 1) Output filtering: --minConfidence MINCONFIDENCE, -q MINCONFIDENCE The minimum confidence for a variant call to be output to variants.{gff,vcf} (default: 40) --minCoverage MINCOVERAGE, -x MINCOVERAGE The minimum site coverage that must be achieved for variant calls and consensus to be calculated for a site. (default: 5) --noEvidenceConsensusCall {nocall,reference,lowercasereference} The consensus base that will be output for sites with no effective coverage. (default: lowercasereference) Read selection /filtering : --coverage COVERAGE, -X COVERAGE A designation of the maximum coverage level to be used for analysis. Exact interpretation is algorithm- specific. (default: 100) --minMapQV MINMAPQV, -m MINMAPQV The minimum MapQV for reads that will be used for analysis. (default: 10) --referenceWindow REFERENCEWINDOWSASSTRING, --referenceWindows REFERENCEWINDOWSASSTRING, -w REFERENCEWINDOWSASSTRING The window (or multiple comma-delimited windows) of the reference to be processed, in the format refGroup :refStart-refEnd (default: entire reference). (default: None) --alignmentSetRefWindows The window (or multiple comma-delimited windows) of the reference to be processed, in the format refGroup :refStart-refEnd will be pulled from the alignment file . (default: False) --referenceWindowsFile REFERENCEWINDOWSASSTRING, -W REFERENCEWINDOWSASSTRING A file containing reference window designations, one per line (default: None) --barcode _BARCODE Only process reads with the given barcode name. (default: None) --readStratum READSTRATUM A string of the form 'n/N' , where n, and N are integers, 0 <= n < N, designating that the reads are to be deterministically split into N strata of roughly even size, and stratum n is to be used for variant and consensus calling. This is mostly useful for Quiver development. (default: None) --minReadScore MINREADSCORE The minimum ReadScore for reads that will be used for analysis (arrow-only). (default: 0.65) --minSnr MINHQREGIONSNR The minimum acceptable signal-to-noise over all channels for reads that will be used for analysis (arrow-only). (default: 3.75) --minZScore MINZSCORE The minimum acceptable z-score for reads that will be used for analysis (arrow-only). (default: -3.5) --minAccuracy MINACCURACY The minimum acceptable window-global alignment accuracy for reads that will be used for the analysis (arrow-only). (default: 0.82) Algorithm and parameter settings: --algorithm {quiver,arrow,plurality,poa,best} --parametersFile PARAMETERSFILE, -P PARAMETERSFILE Parameter set filename (such as ArrowParameters.json or QuiverParameters.ini), or directory D such that either D/* /GenomicConsensus/QuiverParameters .ini, or D /GenomicConsensus/QuiverParameters .ini, is found. In the former case , the lexically largest path is chosen. (default: None) --parametersSpec PARAMETERSSPEC, -p PARAMETERSSPEC Name of parameter set (chemistry.model) to select from the parameters file , or just the name of the chemistry, in which case the best available model is chosen. Default is 'auto' , which selects the best parameter set from the alignment data (default: auto) --maskRadius MASKRADIUS Radius of window to use when excluding local regions for exceeding maskMinErrorRate, where 0 disables any filtering (arrow-only). (default: 3) --maskErrorRate MASKERRORRATE Maximum local error rate before the local region defined by maskRadius is excluded from polishing (arrow-only). (default: 0.7) Verbosity and debugging /profiling : --pdb Enable Python debugger (default: False) --notrace Suppress stacktrace for exceptions (to simplify testing) (default: False) --pdbAtStartup Drop into Python debugger at startup (requires ipdb) (default: False) --profile Enable Python-level profiling (using cProfile). (default: False) --dumpEvidence [{variants,all,outliers}], -d [{variants,all,outliers}] --evidenceDirectory EVIDENCEDIRECTORY --annotateGFF Augment GFF variant records with additional information (default: False) --reportEffectiveCoverage Additionally record the *post-filtering* coverage at variant sites (default: False) Advanced configuration options: --diploid Enable detection of heterozygous variants (experimental) (default: False) --queueSize QUEUESIZE, -Q QUEUESIZE --threaded, -T Run threads instead of processes ( for debugging purposes only) (default: False) --referenceChunkSize REFERENCECHUNKSIZE, -C REFERENCECHUNKSIZE --fancyChunking Adaptive reference chunking designed to handle coverage cutouts better (default: True) --simpleChunking Disable adaptive reference chunking (default: True) --referenceChunkOverlap REFERENCECHUNKOVERLAP --autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE Disable the HDF5 chunk cache when the number of datasets in the cmp .h5 exceeds the given threshold (default: 500) --aligner {affine,simple}, -a {affine,simple} The pairwise alignment algorithm that will be used to produce variant calls from the consensus (Quiver only). (default: affine) --refineDinucleotideRepeats Require quiver maximum likelihood search to try one less /more repeat copy in dinucleotide repeats, which seem to be the most frequent cause of suboptimal convergence (getting trapped in local optimum) (Quiver only) (default: True) --noRefineDinucleotideRepeats Disable dinucleotide refinement (default: True) --fast Cut some corners to run faster. Unsupported! (default: False) --skipUnrecognizedContigs Do not abort when told to process a reference window (via -w /--referenceWindow [s]) that has no aligned coverage. Outputs emptyish files if there are no remaining non-degenerate windows. Only intended for use by smrtpipe scatter /gather . (default: False) |
待续~~
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· winform 绘制太阳,地球,月球 运作规律
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)