OpenKiwi学习笔记
Python 打印调用栈:
1 2 | import traceback traceback.print_stack() |
For CLI usage, the general command is:
1 | kiwi (train|pretrain|predict|evaluate|search) CONFIG_FILE |
导入bert模型:
1 2 3 4 5 6 7 | from transformers import ( BERT_PRETRAINED_MODEL_ARCHIVE_LIST, DISTILBERT_PRETRAINED_MODEL_ARCHIVE_LIST, AutoTokenizer, BertConfig, BertModel, ) |
qe system 的组成:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | Furthermore, all of our QE systems share a similar architecture. They are composed of - Encoder - Decoder - Output - (Optionally) TLM Output Encoder: Embedding and creating features to be used for downstream tasks. i.e. Predictor, BERT, etc Decoder: Responsible for learning feature transformations better suited for the downstream task. i.e. MLP, LSTM, etc Output: Simple feedforwards that take decoder features and transform them into the prediction required by the downstream task. Something in the same line as the common "classification heads" being used with transformers. TLM Output: A simple output layer that trains for the specific TLM objective. It can be useful to continue finetuning the predictor during training of the complete QE system. |
qe system 继承关系:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | All QE systems inherit from : class :`kiwi.systems.qe_system.QESystem`. Use ``kiwi train`` to train these systems. Currently available are: + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | : class :`kiwi.systems.nuqe.NuQE` | + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | : class :`kiwi.systems.predictor_estimator.PredictorEstimator` | + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | : class :`kiwi.systems.bert.Bert` | + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | : class :`kiwi.systems.xlm.XLM` | + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | : class :`kiwi.systems.xlmroberta.XLMRoberta` | + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + |
TLM 继承关系:
1 2 3 4 5 6 7 8 9 10 11 12 | TLM - - - :mod:`kiwi.systems.tlm_system` - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - All TLM systems inherit from : class :`kiwi.systems.tlm_system.TLMSystem`. Use ``kiwi pretrain`` to train these systems. These systems can then be used as the encoder part in QE systems by using the `load_encoder` flag. Currently available are: + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | : class :`kiwi.systems.predictor.Predictor` | |
Configuration class
1 2 3 4 5 | kiwi.lib.train.Configuration kiwi.lib.train.RunConfig kiwi.lib.train.TrainerConfig kiwi.data.datasets.wmt_qe_dataset.WMTQEDataset.Config kiwi.systems.qe_system.QESystem.Config |
encoder 采用 bert 模型,模型整体结构为:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | Bert( (encoder): BertEncoder( (bert): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding( 119547 , 768 , padding_idx = 0 ) (position_embeddings): Embedding( 512 , 768 ) (token_type_embeddings): Embedding( 2 , 768 ) (LayerNorm): LayerNorm(( 768 ,), eps = 1e - 12 , elementwise_affine = True ) (dropout): Dropout(p = 0.1 , inplace = False ) ) (encoder): BertEncoder( (layer): ModuleList( ( 0 ): BertLayer( (attention): BertAttention( ( self ): BertSelfAttention( (query): Linear(in_features = 768 , out_features = 768 , bias = True ) (key): Linear(in_features = 768 , out_features = 768 , bias = True ) (value): Linear(in_features = 768 , out_features = 768 , bias = True ) (dropout): Dropout(p = 0.1 , inplace = False ) ) (output): BertSelfOutput( (dense): Linear(in_features = 768 , out_features = 768 , bias = True ) (LayerNorm): LayerNorm(( 768 ,), eps = 1e - 12 , elementwise_affine = True ) (dropout): Dropout(p = 0.1 , inplace = False ) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features = 768 , out_features = 3072 , bias = True ) ) (output): BertOutput( (dense): Linear(in_features = 3072 , out_features = 768 , bias = True ) (LayerNorm): LayerNorm(( 768 ,), eps = 1e - 12 , elementwise_affine = True ) (dropout): Dropout(p = 0.1 , inplace = False ) ) ) ................ ( 11 ): BertLayer( (attention): BertAttention( ( self ): BertSelfAttention( (query): Linear(in_features = 768 , out_features = 768 , bias = True ) (key): Linear(in_features = 768 , out_features = 768 , bias = True ) (value): Linear(in_features = 768 , out_features = 768 , bias = True ) (dropout): Dropout(p = 0.1 , inplace = False ) ) (output): BertSelfOutput( (dense): Linear(in_features = 768 , out_features = 768 , bias = True ) (LayerNorm): LayerNorm(( 768 ,), eps = 1e - 12 , elementwise_affine = True ) (dropout): Dropout(p = 0.1 , inplace = False ) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features = 768 , out_features = 3072 , bias = True ) ) (output): BertOutput( (dense): Linear(in_features = 3072 , out_features = 768 , bias = True ) (LayerNorm): LayerNorm(( 768 ,), eps = 1e - 12 , elementwise_affine = True ) (dropout): Dropout(p = 0.1 , inplace = False ) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features = 768 , out_features = 768 , bias = True ) (activation): Tanh() ) ) (scalar_mix): ScalarMixWithDropout( (scalar_parameters): ParameterList( ( 0 ): Parameter containing: [torch.FloatTensor of size 1 ] ( 1 ): Parameter containing: [torch.FloatTensor of size 1 ] ( 2 ): Parameter containing: [torch.FloatTensor of size 1 ] ( 3 ): Parameter containing: [torch.FloatTensor of size 1 ] ( 4 ): Parameter containing: [torch.FloatTensor of size 1 ] ( 5 ): Parameter containing: [torch.FloatTensor of size 1 ] ( 6 ): Parameter containing: [torch.FloatTensor of size 1 ] ( 7 ): Parameter containing: [torch.FloatTensor of size 1 ] ( 8 ): Parameter containing: [torch.FloatTensor of size 1 ] ( 9 ): Parameter containing: [torch.FloatTensor of size 1 ] ( 10 ): Parameter containing: [torch.FloatTensor of size 1 ] ( 11 ): Parameter containing: [torch.FloatTensor of size 1 ] ( 12 ): Parameter containing: [torch.FloatTensor of size 1 ] ) ) (output_embeddings): Embedding( 119547 , 768 , padding_idx = 0 ) ) (decoder): LinearDecoder( (linear_outs): ModuleDict( (target): Sequential( ( 0 ): Linear(in_features = 768 , out_features = 768 , bias = True ) ( 1 ): Tanh() ) (source): Sequential( ( 0 ): Linear(in_features = 768 , out_features = 768 , bias = True ) ( 1 ): Tanh() ) (target_sentence): Sequential( ( 0 ): Linear(in_features = 768 , out_features = 768 , bias = True ) ( 1 ): Tanh() ( 2 ): Dropout(p = 0.1 , inplace = False ) ( 3 ): Linear(in_features = 768 , out_features = 768 , bias = True ) ( 4 ): Tanh() ( 5 ): Dropout(p = 0.1 , inplace = False ) ) ) (dropout): Dropout(p = 0.1 , inplace = False ) ) (outputs): QEOutputs( (word_outputs): ModuleDict( (target_tags): WordLevelOutput( (linear): Linear(in_features = 768 , out_features = 2 , bias = True ) (loss_fn): CrossEntropyLoss() ) (gap_tags): GapTagsOutput( (linear): Linear(in_features = 1536 , out_features = 2 , bias = True ) (loss_fn): CrossEntropyLoss() ) (source_tags): WordLevelOutput( (linear): Linear(in_features = 768 , out_features = 2 , bias = True ) (loss_fn): CrossEntropyLoss() ) ) (sentence_outputs): ModuleDict( (sentence_scores): SentenceScoreRegression( (sentence_pred): Sequential( (linear_0): Linear(in_features = 768 , out_features = 384 , bias = True ) (activation_0): Tanh() (dropout_0): Dropout(p = 0.0 , inplace = False ) (linear_1): Linear(in_features = 384 , out_features = 1 , bias = True ) ) (loss_fn): MSELoss() ) ) ) (tlm_outputs): TLMOutputs( (masked_word_outputs): ModuleDict() ) ) |
cli.py:print(arguments):
1 2 3 4 5 6 7 8 9 10 11 12 | { '--example' : False , '--help' : False , '--quiet' : False , '--verbose' : False , '--version' : False , 'CONFIG_FILE' : 'config/bert.yaml' , 'OVERWRITES' : [], 'evaluate' : False , 'predict' : False , 'pretrain' : False , 'search' : False , 'train' : True } |
cli.py:config_dict:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 | { 'run' : { 'experiment_name' : 'BERT WMT20 EN-ZH' , 'seed' : 42 , 'use_mlflow' : False }, 'trainer' : { 'deterministic' : True , 'gpus' : 1 , 'epochs' : 10 , 'main_metric' : [ 'WMT19_MCC' , 'PEARSON' ], 'gradient_max_norm' : 1.0 , 'gradient_accumulation_steps' : 1 , 'amp_level' : 'O2' , 'precision' : 16 , 'log_interval' : 100 , 'checkpoint' : { 'validation_steps' : 0.2 , 'early_stop_patience' : 10 } }, 'system' : { 'class_name' : 'Bert' , 'batch_size' : 2 , 'num_data_workers' : 1 , 'model' : { 'encoder' : { 'model_name' : 'bert-base-multilingual-cased' , 'use_mlp' : False , 'freeze' : False }, 'decoder' : { 'hidden_size' : 768 , 'bottleneck_size' : 768 , 'dropout' : 0.1 }, 'outputs' : { 'word_level' : { 'target' : True , 'gaps' : True , 'source' : True , 'class_weights' : { 'target_tags' : { 'BAD' : 3.0 }, 'gap_tags' : { 'BAD' : 5.0 }, 'source_tags' : { 'BAD' : 3.0 } } }, 'sentence_level' : { 'hter' : True , 'use_distribution' : False , 'binary' : False }, 'n_layers_output' : 2 , 'sentence_loss_weight' : 1 }, 'tlm_outputs' : { 'fine_tune' : False } }, 'optimizer' : { 'class_name' : 'adamw' , 'learning_rate' : 1e - 05 , 'warmup_steps' : 0.1 , 'training_steps' : 12000 }, 'data_processing' : { 'share_input_fields_encoders' : True } }, 'data' : { 'train' : { 'input' : { 'source' : 'data/WMT20/en-zh/train/train.src' , 'target' : 'data/WMT20/en-zh/train/train.mt' , 'alignments' : 'data/WMT20/en-zh/train/train.src-mt.alignments' , 'post_edit' : 'data/WMT20/en-zh/train/train.pe' }, 'output' : { 'source_tags' : 'data/WMT20/en-zh/train/train.source_tags' , 'target_tags' : 'data/WMT20/en-zh/train/train.tags' , 'sentence_scores' : 'data/WMT20/en-zh/train/train.hter' } }, 'valid' : { 'input' : { 'source' : 'data/WMT20/en-zh/dev/dev.src' , 'target' : 'data/WMT20/en-zh/dev/dev.mt' , 'alignments' : 'data/WMT20/en-zh/dev/dev.src-mt.alignments' , 'post_edit' : 'data/WMT20/en-zh/dev/dev.pe' }, 'output' : { 'source_tags' : 'data/WMT20/en-zh/dev/dev.source_tags' , 'target_tags' : 'data/WMT20/en-zh/dev/dev.tags' , 'sentence_scores' : 'data/WMT20/en-zh/dev/dev.hter' } }, 'test' : { 'input' : { 'source' : 'data/WMT20/en-zh/test-blind/test.src' , 'target' : 'data/WMT20/en-zh/test-blind/test.mt' , 'alignments' : 'data/WMT20/en-zh/test-blind/test.src-mt.alignments' } } }, 'verbose' : False , 'quiet' : False } |
QESystem:
1 2 3 4 5 6 7 8 9 10 | def forward( self , batch_inputs): encoder_features = self .encoder(batch_inputs) features = self .decoder(encoder_features, batch_inputs) outputs = self .outputs(features, batch_inputs) # For fine-tuning the encoder if self .tlm_outputs: outputs.update( self .tlm_outputs(encoder_features, batch_inputs)) return outputs |
- pytorch_lightning 执行流程:
PL的流程很简单,生产流水线,有一个固定的顺序。
这部分代码只执行一次。
1 2 3 | 1. `__init__()`(初始化 LightningModule ) 2. `prepare_data()` (准备数据,包括下载数据、预处理等等) 3. `configure_optimizers()` (配置优化器) |
测试 “验证代码”,提前来做的意义在于:不需要等待漫长的训练过程才发现验证代码有错。
1 2 3 | 1. `val_dataloader()` 2. `validation_step()` 3. `validation_epoch_end()` |
batch 数据:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 | { 'source' : BatchedSentence( tensor = tensor([ [ 15846 , 10491 , 82978 , 10226 , 75312 , 10571 , 10105 , 106095 , 11942 , 38587 , 10108 , 10955 , 11586 , 119 , 102 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ], [ 10135 , 10386 , 11288 , 10207 , 117 , 17668 , 11945 , 39091 , 10226 , 10751 , 14310 , 107 , 17316 , 10230 , 10108 , 11589 , 107 , 117 , 17846 , 15736 , 10188 , 13209 , 14951 , 13240 , 117 , 11406 , 14320 , 10270 , 10108 , 10226 , 84977 , 11305 , 68999 , 10731 , 11202 , 31419 , 119 , 102 ] ], device = 'cuda:0' ), lengths = tensor([ 15 , 38 ], device = 'cuda:0' ), bounds = tensor([ [ 0 , 1 , 3 , 4 , 5 , 6 , 7 , 9 , 10 , 11 , 12 , 13 , 14 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 ], [ 0 , 1 , 2 , 3 , 4 , 5 , 7 , 8 , 9 , 10 , 11 , 12 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 34 , 36 , 37 ] ], device = 'cuda:0' ), bounds_lengths = tensor([ 13 , 31 ], device = 'cuda:0' ), strict_masks = tensor([ [ True , True , True , True , True , True , True , True , True , True , True , True , True , True , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False ], [ True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , False ] ], device = 'cuda:0' ), number_of_tokens = tensor([ 12 , 30 ], device = 'cuda:0' , dtype = torch.int32) ), 'target' : BatchedSentence( tensor = tensor([ [ 101 , 4877 , 113183 , 113227 , 118188 , 3031 , 2128 , 114696 , 217 , 2674 , 115512 , 118188 , 5718 , 7724 , 111978 , 6348 , 114696 , 2079 , 5740 , 115551 , 2196 , 5718 , 2773 , 111915 , 2206 , 119 , 102 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ], [ 101 , 10207 , 3642 , 10186 , 4460 , 10386 , 4348 , 10064 , 4536 , 4580 , 114286 , 7735 , 117459 , 2196 , 5718 , 84977 , 11305 , 68999 , 10731 , 4237 , 113162 , 6063 , 10270 , 8272 , 10064 , 4163 , 112293 , 2146 , 2196 , 5718 , 4333 , 5718 , 107 , 3976 , 5718 , 4614 , 113664 , 107 , 10064 , 2460 , 111870 , 4461 , 3228 , 118573 , 113826 , 217 , 4608 , 5718 , 4784 , 112939 , 1882 , 102 ] ], device = 'cuda:0' ), lengths = tensor([ 27 , 52 ], device = 'cuda:0' ), bounds = tensor([ [ 0 , 1 , 5 , 6 , 8 , 9 , 12 , 13 , 15 , 17 , 18 , 20 , 21 , 22 , 24 , 25 , 26 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 11 , 13 , 14 , 15 , 19 , 21 , 22 , 23 , 24 , 25 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 37 , 38 , 39 , 41 , 42 , 45 , 46 , 47 , 48 , 50 , 51 ] ], device = 'cuda:0' ), bounds_lengths = tensor([ 17 , 40 ], device = 'cuda:0' ), strict_masks = tensor([ [ False , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False ], [ False , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , False ] ], device = 'cuda:0' ), number_of_tokens = tensor([ 15 , 38 ], device = 'cuda:0' , dtype = torch.int32) ), 'alignments' : tensor([ [ [ 1 , 0 , 0 , ..., 0 , 0 , 0 ], [ 0 , 1 , 0 , ..., 0 , 0 , 0 ], [ 0 , 0 , 0 , ..., 0 , 0 , 0 ], ..., [ 0 , 0 , 0 , ..., 0 , 0 , 0 ], [ 0 , 0 , 0 , ..., 0 , 0 , 0 ], [ 0 , 0 , 0 , ..., 0 , 0 , 0 ] ], [ [ 1 , 0 , 0 , ..., 0 , 0 , 0 ], [ 0 , 0 , 0 , ..., 0 , 0 , 0 ], [ 0 , 0 , 0 , ..., 0 , 0 , 0 ], ..., [ 0 , 0 , 0 , ..., 0 , 0 , 0 ], [ 0 , 0 , 0 , ..., 0 , 1 , 0 ], [ 0 , 0 , 0 , ..., 0 , 0 , 0 ] ] ], device = 'cuda:0' , dtype = torch.int32), 'pe' : BatchedSentence( tensor = tensor([ [ 101 , 4877 , 113183 , 113227 , 118188 , 10060 , 15846 , 10061 , 3031 , 2128 , 114696 , 217 , 2674 , 115512 , 118188 , 10060 , 10955 , 11586 , 10061 , 3740 , 117490 , 2775 , 5718 , 2212 , 114236 , 2468 , 7475 , 117244 , 5740 , 115551 , 2196 , 5718 , 2773 , 112165 , 1882 , 102 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ], [ 101 , 10207 , 3642 , 10186 , 4460 , 10386 , 4348 , 10064 , 2234 , 114346 , 4520 , 7735 , 117459 , 2196 , 5718 , 84977 , 11305 , 68999 , 10731 , 4237 , 113162 , 6063 , 10270 , 8272 , 8505 , 114003 , 2196 , 5718 , 4333 , 4447 , 115521 , 100 , 17316 , 10230 , 10108 , 11589 , 100 , 10064 , 4792 , 114213 , 5611 , 13209 , 14951 , 13240 , 2726 , 111847 , 5162 , 112652 , 1882 , 102 ] ], device = 'cuda:0' ), lengths = tensor([ 36 , 50 ], device = 'cuda:0' ), bounds = tensor([ [ 0 , 1 , 5 , 6 , 7 , 8 , 9 , 11 , 12 , 15 , 16 , 17 , 18 , 19 , 21 , 22 , 23 , 25 , 26 , 28 , 30 , 31 , 32 , 34 , 35 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 10 , 11 , 13 , 14 , 15 , 19 , 21 , 22 , 23 , 24 , 26 , 27 , 28 , 29 , 31 , 32 , 34 , 35 , 36 , 37 , 38 , 40 , 41 , 43 , 44 , 46 , 48 , 49 ] ], device = 'cuda:0' ), bounds_lengths = tensor([ 25 , 37 ], device = 'cuda:0' ), strict_masks = tensor([ [ False , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False ], [ False , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , False ] ], device = 'cuda:0' ), number_of_tokens = tensor([ 23 , 35 ], device = 'cuda:0' , dtype = torch.int32) ), 'source_tags' : BatchedSentence( tensor = tensor([ [ 0 , 0 , 1 , 1 , 0 , 0 , 0 , 0 , 1 , 0 , 0 , 0 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 ], [ 1 , 1 , 1 , 1 , 1 , 1 , 0 , 1 , 1 , 0 , 1 , 0 , 0 , 0 , 0 , 1 , 1 , 0 , 0 , 0 , 0 , 1 , 1 , 1 , 1 , 1 , 1 , 0 , 1 , 1 ] ], device = 'cuda:0' ), lengths = tensor([ 12 , 30 ], device = 'cuda:0' ), bounds = tensor([ [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 ] ], device = 'cuda:0' ), bounds_lengths = tensor([ 12 , 30 ], device = 'cuda:0' ), strict_masks = tensor([ [ True , True , True , True , True , True , True , True , True , True , True , True , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False ], [ True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True ] ], device = 'cuda:0' ), number_of_tokens = tensor([ 12 , 30 ], device = 'cuda:0' , dtype = torch.int32) ), 'target_tags' : BatchedSentence( tensor = tensor([ [ 1 , 1 , 1 , 1 , 1 , 1 , 0 , 0 , 0 , 1 , 1 , 1 , 0 , 0 , 0 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 ], [ 1 , 1 , 1 , 1 , 1 , 1 , 1 , 0 , 0 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 0 , 0 , 0 , 1 , 1 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 ] ], device = 'cuda:0' ), lengths = tensor([ 15 , 38 ], device = 'cuda:0' ), bounds = tensor([ [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 ] ], device = 'cuda:0' ), bounds_lengths = tensor([ 15 , 38 ], device = 'cuda:0' ), strict_masks = tensor([ [ True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False ], [ True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True ] ], device = 'cuda:0' ), number_of_tokens = tensor([ 15 , 38 ], device = 'cuda:0' , dtype = torch.int32) ), 'sentence_scores' : tensor([ 0.6522 , 0.5143 ], device = 'cuda:0' ), 'gap_tags' : BatchedSentence( tensor = tensor([ [ 1 , 0 , 1 , 1 , 1 , 0 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 ], [ 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 ] ], device = 'cuda:0' ), lengths = tensor([ 16 , 39 ], device = 'cuda:0' ), bounds = tensor([ [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 , - 1 ], [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 ] ], device = 'cuda:0' ), bounds_lengths = tensor([ 16 , 39 ], device = 'cuda:0' ), strict_masks = tensor([ [ True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False , False ], [ True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True , True ] ], device = 'cuda:0' ), number_of_tokens = tensor([ 16 , 39 ], device = 'cuda:0' , dtype = torch.int32)), 'binary' : tensor([ 1 , 1 ], device = 'cuda:0' ) } |
result example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | 2021 - 08 - 27T08 : 15 : 17Z INFO kiwi.training.callbacks: 117 : Best validation so far was in epoch 0 : val_loss: 85.9468 , val_WMT19_MCC: 0.5628 , val_loss_target_tags: 30.9671 , val_loss_gap_tags: 22.8837 , val_loss_source_tags: 32.0960 , val_WMT19_F1_MULT: 0.5807 , val_F1_BAD: 0.7120 , val_F1_OK: 0.3879 , val_WMT19_CORRECT: 0.7842 , val_target_tags_F1_MULT: 0.3510 , val_target_tags_MCC: 0.3523 , val_target_tags_CORRECT: 0.6544 , val_gap_tags_F1_MULT: 0.2005 , val_gap_tags_MCC: 0.1625 , val_gap_tags_CORRECT: 0.9068 , val_source_tags_F1_MULT: 0.2762 , val_source_tags_MCC: 0.2858 , val_source_tags_CORRECT: 0.6083 |
未完待续。。。。。。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· 10年+ .NET Coder 心语 ── 封装的思维:从隐藏、稳定开始理解其本质意义
· 地球OL攻略 —— 某应届生求职总结
· 提示词工程——AI应用必不可少的技术
· Open-Sora 2.0 重磅开源!
· 周边上新:园子的第一款马克杯温暖上架
2020-08-24 圆的周长和面积推导