『源码阅读』Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning

https://github.com/cshizhe/hgr_v2t

输入解压后的数据集路径，程序会在数据集文件夹中的result文件夹中建立RET.released/mtmatch/根据配置生成的文件夹名，在其中存储训练配置以及中间输出：

python configs/prepare_mlmatch_configs.py $datadir

主要是下面五个文件：

log model model.json path.json pred

其中path.json存储了相关路径，特别是预先提取的特征路径：

{'output_dir': '~/workspace_zm/HGR_T2V/MSRVTT/results/RET.released/mlmatch/vis.resnet152.pth.txt.bigru.16role.gcn.1L.attn.1024.loss.bi.af.embed.4.glove.init',
'attn_ft_files': {'trn': ['~/workspace_zm/HGR_T2V/MSRVTT/ordered_feature/SA/resnet152.pth/trn_ft.hdf5'],
'val': ['~/workspace_zm/HGR_T2V/MSRVTT/ordered_feature/SA/resnet152.pth/val_ft.hdf5'],
'tst': ['~/workspace_zm/HGR_T2V/MSRVTT/ordered_feature/SA/resnet152.pth/tst_ft.hdf5']},
'name_file': {'trn': '~/workspace_zm/HGR_T2V/MSRVTT/public_split/trn_names.npy',
'val': '~/workspace_zm/HGR_T2V/MSRVTT/public_split/val_names.npy',
'tst': '~/workspace_zm/HGR_T2V/MSRVTT/public_split/tst_names.npy'},
'word2int_file': '~/workspace_zm/HGR_T2V/MSRVTT/annotation/RET/word2int.json',
'int2word_file': '~/workspace_zm/HGR_T2V/MSRVTT/annotation/RET/int2word.npy',
'ref_caption_file': {'trn': '~/workspace_zm/HGR_T2V/MSRVTT/annotation/RET/ref_captions.json',
'val': '~/workspace_zm/HGR_T2V/MSRVTT/annotation/RET/ref_captions.json',
'tst': '~/workspace_zm/HGR_T2V/MSRVTT/annotation/RET/ref_captions.json'},
'ref_graph_file': {'trn': '~/workspace_zm/HGR_T2V/MSRVTT/annotation/RET/sent2rolegraph.augment.json',
'val': '~/workspace_zm/HGR_T2V/MSRVTT/annotation/RET/sent2rolegraph.augment.json',
'tst': '~/workspace_zm/HGR_T2V/MSRVTT/annotation/RET/sent2rolegraph.augment.json'}}

而model.json则存储了模型参数：

{'subcfgs': {'video_encoder': {'freeze': False,
'lr_mult': 1.0,
'opt_alg': 'Adam',
'weight_decay': 0,
'dim_fts': [2048],
'dim_embed': 1024,
'dropout': 0.2,
'num_levels': 3,
'share_enc': False},
'text_encoder': {'freeze': False,
'lr_mult': 1.0,
'opt_alg': 'Adam',
'weight_decay': 0,
'num_words': 10510,
'dim_word': 300,
'fix_word_embed': False,
'rnn_type': 'gru',
'bidirectional': True,
'rnn_hidden_size': 1024,
'num_layers': 1,
'dropout': 0.2,
'num_roles': 16,
'gcn_num_layers': 1,
'gcn_attention': True,
'gcn_dropout': 0.5}},
'trn_batch_size': 64,
'tst_batch_size': 300,
'num_epoch': 50,
'val_per_epoch': True,
'save_per_epoch': True,
'val_iter': -1,
'save_iter': -1,
'monitor_iter': 1000,
'summary_iter': 1000,
'base_lr': 0.0001,
'decay_schema': None,
'decay_boundarys': [],
'decay_rate': 1,
'max_frames_in_video': 20,
'max_words_in_sent': 30,
'margin': 0.2,
'max_violation': True,
'hard_topk': 1,
'loss_direction': 'bi',
'num_verbs': 4,
'num_nouns': 6,
'attn_fusion': 'embed',
'simattn_sigma': 4,
'loss_weights': None}

然后运行训练脚本：

python multilevel_match.py

$resdir/model.json

$resdir/path.json

--load_video_first

--is_train

--resume_file ~/workspace_zm/HGR_T2V/MSRVTT/results/RET.released/mlmatch/vis.resnet152.pth.txt.bigru.16role.gcn.1L.attn.1024.loss.bi.af.embed.4.glove.init/../../../RET/word_embeds.glove32b.th

posted @ 2020-09-30 21:39 叠加态的猫阅读(340) 评论(0) 编辑收藏举报

叠加态的猫

『源码阅读』Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning

公告