Human-like Controllable Image Captioning with Verb-specific Semantic Roles(具有动词语义角色的类人可控图像字幕生成)
前人的缺陷:
CIC works mainly focus on (1)subjective control signals,(2)objective control signals or (1) Content-controlled (2) Structure controlled。
almost all existing objective control signals have overlooked two indispensable characteristics of an ideal control signal:
1) Event-compatible:all visual contents referred to in a single sentence should be compatible with the describe activity.
2) Sample-suitable: the control signals should be suitable for a specific image sample.
论文的创新点:
propose a new event-oriented objective control signal, Verb-specific Semantic Roles (VSR), to meet both event-compatible and sample-suitable requirements simultaneously。
VSR consists of a verb and some user-interested semantic roles。
Grounded Semantic Role Labeling: visual features of all grounded proposal sets。
Semantic Structure Planner: hierarchical semantic structure learning model, which aims to learn a reasonable sequence of sub-roles S。
Verb-specific Semantic Roles = Grounded Semantic Role Labeling υ Semantic Structure Planner
step:we first use GSRL and SSP to obtain semantic structures and grounded regions features: (Sa; Ra) and (Sb; Rb).
Then,as shown in Figure above, we merge them by two steps。
(a) find the sub-roles in both Sa and Sb which refer to the same visual regions
(b) insert all other sub-roles between the nearest two selected sub-roles
模型架构:
Faster R-CNN(ResNet-101) + Controllable LSTM + Controllable UpDn + SCT
原文: https://arxiv.org/abs/2103.12204
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· 【自荐】一款简洁、开源的在线白板工具 Drawnix