GenomicConsensus (quiver, arrow)使用方法 | 序列 consensus

 https://github.com/PacificBiosciences/GenomicConsensus

 

GenomicConsensus 是pacbio开发的,我个人非常不喜欢pacbio开发的工具,很难用。

安装这个GenomicConsensus也是废了我快半条老命。

这个工具的目的:Compute genomic consensus and call variants relative to the reference.

就是用一些reads来对最终的ref来进行纠错,这个模型适用性比较大,可以用在各个场合,尤其是我们在开发一些工具时,可以直接将这个嵌入到我们的工具中,减少开发量。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
./bin/arrow -h
usage: variantCaller [-h] [--version] [--emit-tool-contract]
                     [--resolved-tool-contract RESOLVED_TOOL_CONTRACT]
                     [--log-file LOG_FILE]
                     [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} | --debug | --quiet | -v]
                     --referenceFilename REFERENCEFILENAME -o OUTPUTFILENAMES
                     [-j NUMWORKERS] [--minConfidence MINCONFIDENCE]
                     [--minCoverage MINCOVERAGE]
                     [--noEvidenceConsensusCall {nocall,reference,lowercasereference}]
                     [--coverage COVERAGE] [--minMapQV MINMAPQV]
                     [--referenceWindow REFERENCEWINDOWSASSTRING]
                     [--alignmentSetRefWindows]
                     [--referenceWindowsFile REFERENCEWINDOWSASSTRING]
                     [--barcode _BARCODE] [--readStratum READSTRATUM]
                     [--minReadScore MINREADSCORE] [--minSnr MINHQREGIONSNR]
                     [--minZScore MINZSCORE] [--minAccuracy MINACCURACY]
                     [--algorithm {quiver,arrow,plurality,poa,best}]
                     [--parametersFile PARAMETERSFILE]
                     [--parametersSpec PARAMETERSSPEC]
                     [--maskRadius MASKRADIUS] [--maskErrorRate MASKERRORRATE]
                     [--pdb] [--notrace] [--pdbAtStartup] [--profile]
                     [--dumpEvidence [{variants,all,outliers}]]
                     [--evidenceDirectory EVIDENCEDIRECTORY] [--annotateGFF]
                     [--reportEffectiveCoverage] [--diploid]
                     [--queueSize QUEUESIZE] [--threaded]
                     [--referenceChunkSize REFERENCECHUNKSIZE]
                     [--fancyChunking] [--simpleChunking]
                     [--referenceChunkOverlap REFERENCECHUNKOVERLAP]
                     [--autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE]
                     [--aligner {affine,simple}] [--refineDinucleotideRepeats]
                     [--noRefineDinucleotideRepeats] [--fast]
                     [--skipUnrecognizedContigs]
                     inputFilename
 
Compute genomic consensus and call variants relative to the reference.
 
optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --emit-tool-contract  Emit Tool Contract to stdout (default: False)
  --resolved-tool-contract RESOLVED_TOOL_CONTRACT
                        Run Tool directly from a PacBio Resolved tool contract
                        (default: None)
  --log-file LOG_FILE   Write the log to file. Default(None) will write to
                        stdout. (default: None)
  --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        Set log level (default: WARN)
  --debug               Alias for setting log level to DEBUG (default: False)
  --quiet               Alias for setting log level to CRITICAL to suppress
                        output. (default: False)
  -v, --verbose         Set the verbosity level. (default: None)
 
Basic required options:
  inputFilename         The input cmp.h5 or BAM alignment file
  --referenceFilename REFERENCEFILENAME, --reference REFERENCEFILENAME, -r REFERENCEFILENAME
                        The filename of the reference FASTA file (default:
                        None)
  -o OUTPUTFILENAMES, --outputFilename OUTPUTFILENAMES
                        The output filename(s), as a comma-separated
                        list.Valid output formats are .fa/.fasta, .fq/.fastq,
                        .gff, .vcf (default: [])
 
Parallelism:
  -j NUMWORKERS, --numWorkers NUMWORKERS
                        The number of worker processes to be used (default: 1)
 
Output filtering:
  --minConfidence MINCONFIDENCE, -q MINCONFIDENCE
                        The minimum confidence for a variant call to be output
                        to variants.{gff,vcf} (default: 40)
  --minCoverage MINCOVERAGE, -x MINCOVERAGE
                        The minimum site coverage that must be achieved for
                        variant calls and consensus to be calculated for a
                        site. (default: 5)
  --noEvidenceConsensusCall {nocall,reference,lowercasereference}
                        The consensus base that will be output for sites with
                        no effective coverage. (default: lowercasereference)
 
Read selection/filtering:
  --coverage COVERAGE, -X COVERAGE
                        A designation of the maximum coverage level to be used
                        for analysis. Exact interpretation is algorithm-
                        specific. (default: 100)
  --minMapQV MINMAPQV, -m MINMAPQV
                        The minimum MapQV for reads that will be used for
                        analysis. (default: 10)
  --referenceWindow REFERENCEWINDOWSASSTRING, --referenceWindows REFERENCEWINDOWSASSTRING, -w REFERENCEWINDOWSASSTRING
                        The window (or multiple comma-delimited windows) of
                        the reference to be processed, in the format refGroup
                        :refStart-refEnd (default: entire reference).
                        (default: None)
  --alignmentSetRefWindows
                        The window (or multiple comma-delimited windows) of
                        the reference to be processed, in the format refGroup
                        :refStart-refEnd will be pulled from the alignment
                        file. (default: False)
  --referenceWindowsFile REFERENCEWINDOWSASSTRING, -W REFERENCEWINDOWSASSTRING
                        A file containing reference window designations, one
                        per line (default: None)
  --barcode _BARCODE    Only process reads with the given barcode name.
                        (default: None)
  --readStratum READSTRATUM
                        A string of the form 'n/N', where n, and N are
                        integers, 0 <= n < N, designating that the reads are
                        to be deterministically split into N strata of roughly
                        even size, and stratum n is to be used for variant and
                        consensus calling. This is mostly useful for Quiver
                        development. (default: None)
  --minReadScore MINREADSCORE
                        The minimum ReadScore for reads that will be used for
                        analysis (arrow-only). (default: 0.65)
  --minSnr MINHQREGIONSNR
                        The minimum acceptable signal-to-noise over all
                        channels for reads that will be used for analysis
                        (arrow-only). (default: 3.75)
  --minZScore MINZSCORE
                        The minimum acceptable z-score for reads that will be
                        used for analysis (arrow-only). (default: -3.5)
  --minAccuracy MINACCURACY
                        The minimum acceptable window-global alignment
                        accuracy for reads that will be used for the analysis
                        (arrow-only). (default: 0.82)
 
Algorithm and parameter settings:
  --algorithm {quiver,arrow,plurality,poa,best}
  --parametersFile PARAMETERSFILE, -P PARAMETERSFILE
                        Parameter set filename (such as ArrowParameters.json
                        or QuiverParameters.ini), or directory D such that
                        either D/*/GenomicConsensus/QuiverParameters.ini, or
                        D/GenomicConsensus/QuiverParameters.ini, is found. In
                        the former case, the lexically largest path is chosen.
                        (default: None)
  --parametersSpec PARAMETERSSPEC, -p PARAMETERSSPEC
                        Name of parameter set (chemistry.model) to select from
                        the parameters file, or just the name of the
                        chemistry, in which case the best available model is
                        chosen. Default is 'auto', which selects the best
                        parameter set from the alignment data (default: auto)
  --maskRadius MASKRADIUS
                        Radius of window to use when excluding local regions
                        for exceeding maskMinErrorRate, where 0 disables any
                        filtering (arrow-only). (default: 3)
  --maskErrorRate MASKERRORRATE
                        Maximum local error rate before the local region
                        defined by maskRadius is excluded from polishing
                        (arrow-only). (default: 0.7)
 
Verbosity and debugging/profiling:
  --pdb                 Enable Python debugger (default: False)
  --notrace             Suppress stacktrace for exceptions (to simplify
                        testing) (default: False)
  --pdbAtStartup        Drop into Python debugger at startup (requires ipdb)
                        (default: False)
  --profile             Enable Python-level profiling (using cProfile).
                        (default: False)
  --dumpEvidence [{variants,all,outliers}], -d [{variants,all,outliers}]
  --evidenceDirectory EVIDENCEDIRECTORY
  --annotateGFF         Augment GFF variant records with additional
                        information (default: False)
  --reportEffectiveCoverage
                        Additionally record the *post-filtering* coverage at
                        variant sites (default: False)
 
Advanced configuration options:
  --diploid             Enable detection of heterozygous variants
                        (experimental) (default: False)
  --queueSize QUEUESIZE, -Q QUEUESIZE
  --threaded, -T        Run threads instead of processes (for debugging
                        purposes only) (default: False)
  --referenceChunkSize REFERENCECHUNKSIZE, -C REFERENCECHUNKSIZE
  --fancyChunking       Adaptive reference chunking designed to handle
                        coverage cutouts better (default: True)
  --simpleChunking      Disable adaptive reference chunking (default: True)
  --referenceChunkOverlap REFERENCECHUNKOVERLAP
  --autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE
                        Disable the HDF5 chunk cache when the number of
                        datasets in the cmp.h5 exceeds the given threshold
                        (default: 500)
  --aligner {affine,simple}, -a {affine,simple}
                        The pairwise alignment algorithm that will be used to
                        produce variant calls from the consensus (Quiver
                        only). (default: affine)
  --refineDinucleotideRepeats
                        Require quiver maximum likelihood search to try one
                        less/more repeat copy in dinucleotide repeats, which
                        seem to be the most frequent cause of suboptimal
                        convergence (getting trapped in local optimum) (Quiver
                        only) (default: True)
  --noRefineDinucleotideRepeats
                        Disable dinucleotide refinement (default: True)
  --fast                Cut some corners to run faster. Unsupported! (default:
                        False)
  --skipUnrecognizedContigs
                        Do not abort when told to process a reference window
                        (via -w/--referenceWindow[s]) that has no aligned
                        coverage. Outputs emptyish files if there are no
                        remaining non-degenerate windows. Only intended for
                        use by smrtpipe scatter/gather. (default: False)

  

待续~~

 

posted @   Life·Intelligence  阅读(1374)  评论(0编辑  收藏  举报
(评论功能已被禁用)
编辑推荐:
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
阅读排行:
· winform 绘制太阳,地球,月球 运作规律
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)
TOP
点击右上角即可分享
微信分享提示