视频精彩片段提取 - 调研

思路1:从字幕或音轨中找到对话较多的部分

- 抽取音轨

ffmpeg -i a.mp4 -map 0:a:0 a.mp3

- 逐帧抽取RMS功率:

ffmpeg -i in.mp3 -af astats=metadata=1:reset=1,ametadata=print:key=lavfi.astats.Overall.RMS_level:file=log.txt -f null -

Determining audio level peaks with ffmpeg

https://superuser.com/questions/1183663/determining-audio-level-peaks-with-ffmpeg

- 对整体进行音量分析:

ffmpeg -i input.wav -filter:a volumedetect -f null /dev/null

https://trac.ffmpeg.org/wiki/AudioVolume

https://ffmpeg.org/ffmpeg-filters.html#volumedetect 

- 截取片段:

ffmpeg -ss $ss -t 00:05:00 -i $vfile.mp4 -vcodec copy -acodec copy -y $vfile.${ss//:/_}.mp4

https://stackoverflow.com/questions/21420296/how-to-extract-time-accurate-video-segments-with-ffmpeg

 

提取精彩片段时间区间:

import sys, os

def getv(rms):
    return max(0, 100-abs(rms))

def extract(diff):
    pos=0
    pos3 = 0
    for n, v in enumerate(diff):
        if v > 0:
            pos += 1
        if n < 3 and v >= 3:
            pos3 += 1
    if pos >= 3 and pos3 >= 2:
        return 1
    return 0

timebin = 0
s = []
v = []
diff = (0,)*5
for nline, line in enumerate(sys.stdin):
    if 'pts_time' in line:
        ts = float(line.split('pts_time:')[1])
        if ts > timebin + 60:
            if s:
                avgrms = int(sum(s)/len(s))
            #    print '%.2d %.2d' % (timebin/60, timebin%60), avgrms, 100-abs(avgrms), '-' * (100-abs(avgrms))
            if v:
                d = max(0, getv(avgrms)-v[-1])
                diff = diff[1:] + (d,)
                ext = extract(diff)
                print >>sys.stderr, '%3d %2d %s %3d' % (timebin/60, timebin%60, avgrms, getv(avgrms)-v[-1]), '-' * d, '*' * ext
                if ext:
                    h = timebin/3600
                    print '%.2d:%.2d:00' % (h, (timebin-3600*h)/60)
                if ext:
                    diff = (0,)*5
            v.append(getv(avgrms))
            timebin += 60
            s=[]
    if 'RMS' in line:
        rms = float(line.split('lavfi.astats.Overall.RMS_level=')[1])
        if rms > -1000:
            s.append(rms) 

 

调试:

ffmpeg volumedetect returns unstable result

https://stackoverflow.com/questions/48673923/ffmpeg-volumedetect-returns-unstable-result

 

思路2:思路1+镜头边缘检测

安装opencv:https://www.cnblogs.com/yaoyaohust/p/10228888.html

镜头边界检测:https://www.cnblogs.com/lynsyklate/p/7840881.html

Yahoo的开源工具Hecate:https://github.com/yahoo/hecate

 

思路3:耗时更长、技术难度更高的做法

百度BROAD-Video Highlights视频精彩片段数据集简要介绍与分析

https://zhuanlan.zhihu.com/p/31770408

 

Temporal Action Detection (时序动作检测)方向2017年会议论文整理

https://zhuanlan.zhihu.com/p/31501316

 

Video Analysis 相关领域解读之Temporal Action Detection(时序行为检测)

https://zhuanlan.zhihu.com/p/26603387

 

Video Analysis相关领域解读之Action Recognition(行为识别)

https://zhuanlan.zhihu.com/p/26460437

 

Temporal Action Detection with Structured Segment Networks

林达华(香港中文)的团队

https://github.com/yjxiong/action-detection

基于PyTorch和DenseFlow

 

UntrimmedNets for Weakly Supervised Action Recognition and Detection

林达华(香港中文)的团队

https://github.com/wanglimin/UntrimmedNet

https://github.com/yjxiong/caffe/tree/untrimmednet

基于Caffe

 

posted on 2019-06-06 21:18  冰山上的博客  阅读(1311)  评论(0编辑  收藏  举报