D:\app\work_app\anaconda3\local\envs\ProphetNet-master\python.exe C:\Users\PC\Desktop\GENIE\Genie_Finetune.py --checkpoint_path=output --model_channels 128 --in_channel 128 --out_channel 128 --vocab_size 30522 --config_name=bert-base-uncased --token_emb_type=random --model_arch=s2s_CAT --diffusion_steps 2000 --predict_xstart --noise_schedule=sqrt --training_mode=s2s --schedule_sampler=loss-second-moment --tgt_max_len 64 --src_max_len 512 --data_name=cnndm_data --data_path=data --lr_anneal_steps 120000 --batch_size 64 --lr 5e-05 --warmup_steps 7200 --train_type=S2S_Diffusion --eval_interval 200 --log_interval 200 --save_interval 20000 --pretrain_model_path=pretrain_ckpt/GENIE_ckpt-cnndm.ckpt
Logging to C:\Users\PC\AppData\Local\Temp\openai-2023-07-06-16-30-45-715447
saving the hyperparameters to output/training_args.json
Logging to output\log.txt
creating model, based on s2s_CAT
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.9.attention.self.key.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'cls.predictions.transform.dense.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.7.output.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.9.attention.self.query.weight', 'cls.predictions.bias', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.key.bias', 'cls.seq_relationship.weight', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.7.attention.output.dense.bias', 'cls.seq_relationship.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.10.output.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.30.2",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}
noise_schedule: sqrt
Diffusion Steps: 2000
betas: [0.01464131 0.00888909 0.00706818 ... 0.35722328 0.55561113 0.999 ]
Diffusion Loss Type: LossType.E2E_MSE , Whether to learn sigma: False
Diffusion predict xstart: True
training mode is s2s
load model ckpt at : pretrain_ckpt/GENIE_ckpt-cnndm.ckpt
Reading saved model from %s pretrain_ckpt/GENIE_ckpt-cnndm.ckpt
model_state_dict keys %s odict_keys(['model_dict', 'optimizer_dict', 'scheduler_dict', 'offset'])
the parameter count is 143983034
using loss-second-moment time schedule sampler
loading tokenizer...
***** load cnndm_data train src dataset*****
287113it [00:01, 214791.14it/s]
***** load cnndm_data train tgt dataset*****
287113it [00:00, 1882824.06it/s]
example of src text: ( CNN ) - - China has suspended exports of the A ##qua Dot ##s toys contaminated with a chemical that can convert to a powerful " date rape " drug , the state - run Xi ##nh ##ua news agency reported Saturday . The toys have caused some children who swallowed the craft toys to vomit and lose consciousness . [SEP_0] China suspended exports of the A ##qua Dot ##s toys that contain a chemical that converts into a " date rape " drug . [SEP_1] The agency said that the General Administration of Quality Super ##vision , In ##spect ##ion , and Q ##ua ##rant ##ine ( A ##Q ##SI ##Q ) has ordered an investigation by quality control agencies and will release results as soon as they are available . [SEP_2] The A ##Q ##SI ##Q did not reveal the name of the toys ' producer , Xi ##nh ##ua said . [SEP_3] U . S . safety officials voluntarily recalled about 4 . 2 million of the Chinese - made toys Wednesday . [SEP_4] Scientists have found the highly popular holiday toy contains a chemical that , once meta ##bol ##ized , converts into the toxic " date rape " drug G ##H ##B ( gamma - h ##ydro ##xy but ##yra ##te ) , Scott Wolf ##son , a spokesman with the U . S . Consumer Product Safety Commission ( CP ##SC ) , told CNN . [SEP_5] " Children who swallow the beads can become coma ##tos ##e , develop respiratory depression or have seizure ##s , " a CP ##SC statement warned . [SEP_6] The arts - and - craft beads , which have been selling since April at major U . S . retail stores under the name " A ##qua Dot ##s , " have also been distributed in Australia under the name " Bin ##de ##ez Be ##ads . " [SEP_7] The Bin ##de ##ez toys were recalled Tuesday by Melbourne - based Moose Enterprise P ##ty . Ltd . after three children in Australia swallowed large quantities of the beads and were hospital ##ized . [SEP_8] " I was so frightened because I thought she wasn ' t going to make it , " Heather Le ##hane told CNN affiliate Network 7 of her 10 - year - old daughter , Charlotte , who was hospital ##ized in Australia after ing ##est ##ing some of the beads . [SEP_9] In the United States , the Washington - based safety commission said it has in recent days received two reports detailing the severe effects of the dig ##ested beads , which are part of a craft kit aimed at kids 4 years and older . [SEP_9] The CP ##SC said a boy nearly 2 years old " swallowed several dozen beads . He became dizzy and vomit ##ed several times before slipping into a coma ##tos ##e state for a period of time . " [SEP_9] The commission said the to ##ddler was hospital ##ized and has since " fully recovered . " [SEP_9] The second incident involved a child who vomit ##ed , fell into a coma and was hospital ##ized for five days . It was not immediately clear whether the child had made a full recovery . [SEP_9] Toronto - based toy distributor Spin Master Ltd . stopped shipping the A ##qua Dot ##s toys and asked retailers to pull them off their shelves , where they were previously sold for $ 17 to $ 30 . [SEP_9] Anyone with A ##qua Dot ##s at home should return the product to the company , CP ##SC spoke ##s ##woman Julie Valle ##se said . [SEP_9] The toy had been named toy of the year in Australia and recently crest ##ed W ##al - Mart ' s list of top 12 Christmas toys . [SEP_9] W ##al - Mart on Thursday listed the toys on its Web site as " out of stock online " and had removed them from their top toy list as well . [SEP_9] This latest recall is part of a larger batch of recalls of Chinese - made toys that have swept across the country . [SEP_9] Last month alone , U . S . government safety officials and retailers voluntarily recalled at least 69 , 000 Chinese - made toys over concerns of excessive amounts of lead paint , which can cause hazardous lead poisoning . E - mail to a friend . [SEP_9] CNN ' s Jan ##ine Brady , Jason Carroll , Laura Do ##lan , Julie O ' Neill and Leslie W ##ig ##gins contributed to this report .
example of tgt text: State - run news agency : China orders an investigation by quality control agencies . [X_SEP] Children who swallow the beads can become coma ##tos ##e or have seizure ##s . [X_SEP] Toy ##s are sold as A ##qua Dot ##s in the U . S . , as Bin ##de ##ez Be ##ads in Australia . [X_SEP] Three children were hospital ##ized in Australia after swallowing large quantities .
example of src id lists: tensor([[ 101, 1006, 13229, 1007, 1011, 1011, 2859, 2038, 6731, 14338,
1997, 1996, 1037, 1001, 1001, 24209, 2050, 11089, 1001, 1001,
1055, 10899, 19450, 2007, 1037, 5072, 2008, 2064, 10463, 2000,
1037, 3928, 1000, 3058, 9040, 1000, 4319, 1010, 1996, 2110,
1011, 2448, 8418, 1001, 1001, 18699, 1001, 1001, 25423, 2739,
4034, 2988, 5095, 1012, 1996, 10899, 2031, 3303, 2070, 2336,
2040, 7351, 1996, 7477, 10899, 2000, 23251, 1998, 4558, 8298,
1012, 1031, 19802, 1035, 1014, 1033, 2859, 6731, 14338, 1997,
1996, 1037, 1001, 1001, 24209, 2050, 11089, 1001, 1001, 1055,
10899, 2008, 5383, 1037, 5072, 2008, 19884, 2046, 1037, 1000,
3058, 9040, 1000, 4319, 1012, 1031, 19802, 1035, 1015, 1033,
1996, 4034, 2056, 2008, 1996, 2236, 3447, 1997, 3737, 3565,
1001, 1001, 4432, 1010, 1999, 1001, 1001, 28699, 2102, 1001,
1001, 10163, 1010, 1998, 1053, 1001, 1001, 25423, 1001, 1001,
2743, 2102, 1001, 1001, 1999, 2063, 1006, 1037, 1001, 1001,
1053, 1001, 1001, 9033, 1001, 1001, 1053, 1007, 2038, 3641,
2019, 4812, 2011, 3737, 2491, 6736, 1998, 2097, 2713, 3463,
2004, 2574, 2004, 2027, 2024, 2800, 1012, 1031, 19802, 1035,
1016, 1033, 1996, 1037, 1001, 1001, 1053, 1001, 1001, 9033,
1001, 1001, 1053, 2106, 2025, 7487, 1996, 2171, 1997, 1996,
10899, 1005, 3135, 1010, 8418, 1001, 1001, 18699, 1001, 1001,
25423, 2056, 1012, 1031, 19802, 1035, 1017, 1033, 1057, 1012,
1055, 1012, 3808, 4584, 17912, 7383, 2055, 1018, 1012, 1016,
2454, 1997, 1996, 2822, 1011, 2081, 10899, 9317, 1012, 1031,
19802, 1035, 1018, 1033, 6529, 2031, 2179, 1996, 3811, 2759,
6209, 9121, 3397, 1037, 5072, 2008, 1010, 2320, 18804, 1001,
1001, 8945, 2140, 1001, 1001, 1045, 5422, 1010, 19884, 2046,
1996, 11704, 1000, 3058, 9040, 1000, 4319, 1043, 1001, 1001,
1044, 1001, 1001, 1038, 1006, 13091, 1011, 1044, 1001, 1001,
21076, 3217, 1001, 1001, 1060, 2100, 2021, 1001, 1001, 1061,
2527, 1001, 1001, 8915, 1007, 1010, 3660, 4702, 1001, 1001,
2365, 1010, 1037, 14056, 2007, 1996, 1057, 1012, 1055, 1012,
7325, 4031, 3808, 3222, 1006, 18133, 1001, 1001, 8040, 1007,
1010, 2409, 13229, 1012, 1031, 19802, 1035, 1019, 1033, 1000,
2336, 2040, 10577, 1996, 17530, 2064, 2468, 16571, 1001, 1001,
2000, 2015, 1001, 1001, 1041, 1010, 4503, 16464, 6245, 2030,
2031, 18634, 1001, 1001, 1055, 1010, 1000, 1037, 18133, 1001,
1001, 8040, 4861, 7420, 1012, 1031, 19802, 1035, 1020, 1033,
1996, 2840, 1011, 1998, 1011, 7477, 17530, 1010, 2029, 2031,
2042, 4855, 2144, 2258, 2012, 2350, 1057, 1012, 1055, 1012,
7027, 5324, 2104, 1996, 2171, 1000, 1037, 1001, 1001, 24209,
2050, 11089, 1001, 1001, 1055, 1010, 1000, 2031, 2036, 2042,
5500, 1999, 2660, 2104, 1996, 2171, 1000, 8026, 1001, 1001,
2139, 1001, 1001, 1041, 2480, 2022, 1001, 1001, 14997, 1012,
1000, 1031, 19802, 1035, 1021, 1033, 1996, 8026, 1001, 1001,
2139, 1001, 1001, 1041, 2480, 10899, 2020, 7383, 9857, 2011,
4940, 1011, 2241, 17716, 6960, 1052, 1001, 1001, 5939, 1012,
5183, 1012, 2044, 2093, 2336, 1999, 2660, 7351, 2312, 12450,
1997, 1996, 17530, 1998, 2020, 2902, 1001, 1001, 1045, 5422,
1012, 1031, 19802, 1035, 1022, 1033, 1000, 1045, 2001, 2061,
10363, 2138, 1045, 2245, 2016, 2347, 1005, 1056, 2183, 2000,
2191, 102]])
example of tgt id lists: tensor([[ 101, 2110, 1011, 2448, 2739, 4034, 1024, 2859, 4449, 2019,
4812, 2011, 3737, 2491, 6736, 1012, 1031, 1060, 1035, 19802,
1033, 2336, 2040, 10577, 1996, 17530, 2064, 2468, 16571, 1001,
1001, 2000, 2015, 1001, 1001, 1041, 2030, 2031, 18634, 1001,
1001, 1055, 1012, 1031, 1060, 1035, 19802, 1033, 9121, 1001,
1001, 1055, 2024, 2853, 2004, 1037, 1001, 1001, 24209, 2050,
11089, 1001, 1001, 102]])
total query dataset len : 287113
***** load cnndm_data dev src dataset*****
13368it [00:00, 227334.10it/s]
13368it [00:00, 1676567.77it/s]
***** load cnndm_data dev tgt dataset*****
example of src text: ( CNN ) Outside of Israeli politics , Isaac Her ##zo ##g is not a well - known name . That may change on March 17 , when Israeli ##s head to the polls for election day . In the final round of polling before the elections , Her ##zo ##g ' s Zionist Union party is in the lead , holding a four - seat edge over Prime Minister Benjamin Net ##any ##ahu ' s Li ##ku ##d party . [SEP_0] " I believe in a certain type of leadership that is not always customary in this region . I ' m not a general . I don ' t give orders . I know how to work together , " he says . [SEP_1] Throughout the campaign , Her ##zo ##g has been seen as an under ##dog , lacking the ch ##aris ##ma and the English flu ##ency of Net ##any ##ahu . Her ##zo ##g says that doesn ' t bother him at all . [SEP_2] " I have always suffered from a certain under ##est ##imation , " Her ##zo ##g said , " and I have always surprised . " He promised , " I will surprise again , and I will show my leadership and s ##tam ##ina . " [SEP_3] Her ##zo ##g began his political career in 2003 , when he first won a seat in the K ##ness ##et with the Labor Party . He held a variety of ministerial positions , including minister of housing and construction , minister of tourism , and minister of welfare and social services , before becoming leader of the Labor Party in 2013 . In those elections , he also became the leader of the opposition , as Benjamin Net ##any ##ahu won another term as prime minister . [SEP_4] But when Net ##any ##ahu called for early elections in 2014 , Her ##zo ##g p ##eg ##ged his bid for the premiership on social reform . [SEP_5] " What I run for is social justice . I will change the nature of the division of wealth in a fair and more balanced way , close inequality and give a sense of purpose to the people here in the workplace , in the housing , and in the cost of living , " promised Her ##zo ##g . [SEP_6] Before the election , the issue of a nuclear Iran garnered international headlines as it further a ##gg ##ra ##vated tense relations between the White House and Net ##any ##ahu . Her ##zo ##g , in a speech almost immediately after Net ##any ##ahu ' s address to Congress , promised to work with the United States and European powers , not against , to ensure the safety of Israel . He echoed that sentiment in an interview with CNN ' s Elise Lab ##ott . [SEP_7] " A nuclear - armed Iran is dangerous to world peace , is dangerous to our region , is dangerous to Israel . As leader of Israel , I will never accept a nuclear - armed Iran . Never . And all options are on the table . " [SEP_8] In these elections , negotiations with the Palestinians haven ' t been one of the major issues , but Her ##zo ##g promised to restart the stalled peace talks with the Palestinian Authority . [SEP_9] " I will do my best to i ##gni ##te a political process with our Palestinian neighbors . . . . Although I cannot promise 100 % results , I promise 100 % effort . " [SEP_9] Her ##zo ##g comes from Israeli political royalty . His grandfather , Rabbi Yi ##tz ##hak Ha ##L ##ev ##i Her ##zo ##g , was the first chief rabbi of the state of Israel . His father , Cha ##im Her ##zo ##g , was an Army general , an ambassador to the United Nations and the president of Israel . Her ##zo ##g believes it is his destiny to be the next prime minister of Israel . [SEP_9] " What I carry with me is a unique legacy , a family legacy , but most important , an experience that brings me to be able to lead our nation . "
example of tgt text: Poll ##s show Isaac Her ##zo ##g ' s Zionist Union party four seats ahead of Benjamin Net ##any ##ahu ' s party . [X_SEP] Israeli parliamentary elections will be on March 17 .
example of src id lists: tensor([[ 101, 1006, 13229, 1007, 2648, 1997, 5611, 4331, 1010, 7527,
2014, 1001, 1001, 1062, 2080, 1001, 1001, 1043, 2003, 2025,
1037, 2092, 1011, 2124, 2171, 1012, 2008, 2089, 2689, 2006,
2233, 2459, 1010, 2043, 5611, 1001, 1001, 1055, 2132, 2000,
1996, 14592, 2005, 2602, 2154, 1012, 1999, 1996, 2345, 2461,
1997, 17888, 2077, 1996, 3864, 1010, 2014, 1001, 1001, 1062,
2080, 1001, 1001, 1043, 1005, 1055, 21379, 2586, 2283, 2003,
1999, 1996, 2599, 1010, 3173, 1037, 2176, 1011, 2835, 3341,
2058, 3539, 2704, 6425, 5658, 1001, 1001, 2151, 1001, 1001,
6289, 2226, 1005, 1055, 5622, 1001, 1001, 13970, 1001, 1001,
1040, 2283, 1012, 1031, 19802, 1035, 1014, 1033, 1000, 1045,
2903, 1999, 1037, 3056, 2828, 1997, 4105, 2008, 2003, 2025,
2467, 16120, 1999, 2023, 2555, 1012, 1045, 1005, 1049, 2025,
1037, 2236, 1012, 1045, 2123, 1005, 1056, 2507, 4449, 1012,
1045, 2113, 2129, 2000, 2147, 2362, 1010, 1000, 2002, 2758,
1012, 1031, 19802, 1035, 1015, 1033, 2802, 1996, 3049, 1010,
2014, 1001, 1001, 1062, 2080, 1001, 1001, 1043, 2038, 2042,
2464, 2004, 2019, 2104, 1001, 1001, 3899, 1010, 11158, 1996,
10381, 1001, 1001, 10488, 2015, 1001, 1001, 5003, 1998, 1996,
2394, 19857, 1001, 1001, 4372, 5666, 1997, 5658, 1001, 1001,
2151, 1001, 1001, 6289, 2226, 1012, 2014, 1001, 1001, 1062,
2080, 1001, 1001, 1043, 2758, 2008, 2987, 1005, 1056, 8572,
2032, 2012, 2035, 1012, 1031, 19802, 1035, 1016, 1033, 1000,
1045, 2031, 2467, 4265, 2013, 1037, 3056, 2104, 1001, 1001,
9765, 1001, 1001, 10047, 3370, 1010, 1000, 2014, 1001, 1001,
1062, 2080, 1001, 1001, 1043, 2056, 1010, 1000, 1998, 1045,
2031, 2467, 4527, 1012, 1000, 2002, 5763, 1010, 1000, 1045,
2097, 4474, 2153, 1010, 1998, 1045, 2097, 2265, 2026, 4105,
1998, 1055, 1001, 1001, 17214, 1001, 1001, 27118, 1012, 1000,
1031, 19802, 1035, 1017, 1033, 2014, 1001, 1001, 1062, 2080,
1001, 1001, 1043, 2211, 2010, 2576, 2476, 1999, 2494, 1010,
2043, 2002, 2034, 2180, 1037, 2835, 1999, 1996, 1047, 1001,
1001, 23384, 1001, 1001, 3802, 2007, 1996, 4450, 2283, 1012,
2002, 2218, 1037, 3528, 1997, 18645, 4460, 1010, 2164, 2704,
1997, 3847, 1998, 2810, 1010, 2704, 1997, 6813, 1010, 1998,
2704, 1997, 7574, 1998, 2591, 2578, 1010, 2077, 3352, 3003,
1997, 1996, 4450, 2283, 1999, 2286, 1012, 1999, 2216, 3864,
1010, 2002, 2036, 2150, 1996, 3003, 1997, 1996, 4559, 1010,
2004, 6425, 5658, 1001, 1001, 2151, 1001, 1001, 6289, 2226,
2180, 2178, 2744, 2004, 3539, 2704, 1012, 1031, 19802, 1035,
1018, 1033, 2021, 2043, 5658, 1001, 1001, 2151, 1001, 1001,
6289, 2226, 2170, 2005, 2220, 3864, 1999, 2297, 1010, 2014,
1001, 1001, 1062, 2080, 1001, 1001, 1043, 1052, 1001, 1001,
1041, 2290, 1001, 1001, 16216, 2094, 2010, 7226, 2005, 1996,
11264, 2006, 2591, 5290, 1012, 1031, 19802, 1035, 1019, 1033,
1000, 2054, 1045, 2448, 2005, 2003, 2591, 3425, 1012, 1045,
2097, 2689, 1996, 3267, 1997, 1996, 2407, 1997, 7177, 1999,
1037, 4189, 1998, 2062, 12042, 2126, 1010, 2485, 16440, 1998,
2507, 1037, 3168, 1997, 3800, 2000, 1996, 2111, 2182, 1999,
1996, 16165, 1010, 1999, 1996, 3847, 1010, 1998, 1999, 1996,
3465, 1997, 2542, 1010, 1000, 5763, 2014, 1001, 1001, 1062,
2080, 102]])
example of tgt id lists: tensor([[ 101, 8554, 1001, 1001, 1055, 2265, 7527, 2014, 1001, 1001,
1062, 2080, 1001, 1001, 1043, 1005, 1055, 21379, 2586, 2283,
2176, 4272, 3805, 1997, 6425, 5658, 1001, 1001, 2151, 1001,
1001, 6289, 2226, 1005, 1055, 2283, 1012, 1031, 1060, 1035,
19802, 1033, 5611, 5768, 3864, 2097, 2022, 2006, 2233, 2459,
1012, 102, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0]])
total query dataset len : 13368
training Diffusion LM model...
D:\app\work_app\anaconda3\local\envs\ProphetNet-master\lib\site-packages\transformers\optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
Iteration: 0%| | 0/4487 [00:00<?, ?it/s]***** there are no checkpoint inoutput *****
***** Running training *****
Max steps = %d 120000
Gradient Accumulation steps = %d 1
Iteration: 1%| | 51/4487 [1:38:37<148:11:35, 120.27s/it]