cvpr 2016 论文学习 Video object segmentation

 

Abstract— Video object segmentation, a binary labelling
problem is vital in various applications including object tracking,
action recognition, video summarization, video editing, object
based encoding and video retrieval(检索). This paper presents an
overview of recent strategies in video object segmentation(分类),
focusing on the techniques for solving challenges like complex
and moving background, illumination(光线/照明) changes, occlusions(遮挡),
motion blur(运动模糊), shadow effect and view point variation. Significant
works evolved in this research field over recent years are
categorized based on the challenges solved by the researchers. A
list of challenging datasets and evaluation metrics(指标) available for
video object segmentation is presented. Finally, research gaps in
this domain(领域) are discussed.
摘要 : 视频分割算法,这是一种在各种领域中,比如对象跟踪,动作识别,视频分类,视频剪辑,基于编码和视频检索的重要方法。
这篇论文主要呈现的是最近视频分割算法中的策略,它主要解决了复杂的移动背景,照明变化,遮挡物品,运动模糊,阴影效果,和视角的变化的问题。
本论文中涉及到的重要工作和挑战都被分好类了,同时,他也提供了要学习的数据集和评价指标,最后,他也讨论了这块领域的分歧点。
Recent internet world is engaged with massive amount of
video data thanks to the development in storage devices and
handy imaging systems. Huge terabyte(兆字节) of video are regularly

generated for various useful applications like surveillance(监控),

news broadcasting, telemedicine, etc. Based on the
information provided by CISCO on ‘Visual Networking Index
(VNI)’, the growth of internet video traffic will be three fold
from 2015 to 2020. Manually extracting semantic(语义) information
from this enormous amount of internet video is highly
unfeasible, seeking the need for automated methods to
annotate(注释)/derive(导出) useful information from the video data for
video management and retrieval [1]. Hence, one of the
essential steps for video processing and retrieval is video
object segmentation, a binary labelling problem for
differentiating the foreground objects accurately from the
background. Video object segmentation aims at partitioning(分离)
every frame(帧) in a video into meaningful objects by grouping
the pixels along spatio-temporal(时空的) direction that exhibit
coherency in appearance and motion [2]. Video object
segmentation task is highly challenging due to the following
reasons: (i) unknown number of objects in a video (ii) varying
background in a video and (iii) occurrence of multiple objects
in a video [3]. Existing approaches in video segmentation can
be broadly classified into two categories viz, interactive(交互式)
method and unsupervised(无监督的) method. Interaction objects
segmentation method human intervention in initialization
process while unsupervised approaches can perform object
segmentation automatically. In Semi supervised approaches,
user intervention is required for annotating initial frames and
these annotations are transferred to the entire frames in the
video. Automated object segmentation approaches [7][8][9]
can segment any video data into meaningful objects without
user interaction based on object proposals and motion cues
from the video. The common assumption followed by most of
the automated methods is that only single object is moving
throughout the video and use only the motion information for
segmenting the object from the background. This assumption
will lead to poor segmentation under discontinuous motion of
object [11]. Referring the literature [12] [13] [14] [15] for
survey on video object segmentation which describes the
techniques available for image segmentation, not to video
data. In [15] authors classified the approaches in video
segmentation as inference and feature modes. The
segmentation techniques propose so far to improve the
segmentation results are grouped as inference modes and
methods that depend on features like depth, motion and
histogram are termed as feature modes. From this observation,
it is evident that none of the researchers have discussed the
segmentation approaches from the perspective of the
challenges solved by the algorithm. Hence this paper
categorizes the significant work contributed by researchers in
video object segmentation based on the issues resolved by the
respective authors. Several issues degrading the segmentation
performance are moving back ground, moving camera,
illumination variation, occlusion, shadow effect, viewpoint
variation, etc. Moreover the proposed algorithm should 
provide tradeoff between segmentation accuracy and
complexity. As depicted in fig. 1, this paper classifies the
video object segmentation task as:
1. Issue tackling mode
2. Complexity reduction mode and
3. Inference mode
The main contributions of this paper are:
x Summarizing the recent activities in video object
segmentation domain.
x Categorizing the significant works in this research
field meaningfully and
x Presenting a list of database and evaluation metrics
needed for developing an efficient video object
segmentation framework.
Organization of this paper: Section II describes the algorithms
contributed significantly in tackling the issues (discussed
earlier) involved in video object segmentation. Section III
presents an overview on segmentation approaches with
reduced complexity available in literature. Section IV provides
a gist on object segmentation techniques that fall under
inference mode. Section V lists the dataset and the evaluation
metrics used in these segmentation approaches and discusses
about research gaps in video object segmentation field.
Section IV concludes this study.
 
最近,我们的网络世界充斥着各种各样的视频信息。。。。。总之用了一大段话告诉你这很重要啦,然后就是说,我们的目的(视频分割算法)是分离视频中的每一帧,从而展示出视频中的物体的动静和谐(估计是一种一致性),可是,视频分割有以下难点:
1.不知道视频中有多少目标对象
2.多变的背景
3.多重目标对象
现在主要是两种方法:交互式,无监督方法。当然我们这篇文章肯定是无监督方法,自动分割视频对象。
而在半监督学习方法中,初始化一开始的帧,和我们要分割的对象肯定是必要的,但是无监督方法不需要这一点,现在存在的很多视频分割算法都是假设只有单独的物体对象在移动,但是面对不连续移动的对象时候,会导致不良的效果。而在引文15中的作者认为视频分割的算法应该是基于“特征提取和推断”,索引12-15是关于图像分割的方法总结。基于目前的观察,显然很少有人从算法角度总结了视频分割的方法,因此这篇文章总结了在一些可能会降低视频分割准确度的一些问题,比如说移动的背景,移动的照相机,光线变化,遮挡物,阴影效果,视角变化。
更进一步,这篇文章提出的算法将会权衡算法的复杂度和准确度之间的考量。
所以本篇文章的架构是;
1.解决处理方式
2.复杂度降低
3.干扰的分析
本篇文章主要的三个贡献:1.总结了最近这个领域中的工作 2.将目前所用到的方法分类3.提供了一些数据集和数据标准供阅读者练习。
II. ISSUE TACKLING MODE
This section details about ‘issue tackling mode’, first
category of the video object segmentation approach. Though
several issues (as discussed earlier) affect the performance of
the segmentation approaches, commonly occurring problems
are moving background, occlusion, shadow, rain , moving
camera, illumination and view point variation.
A. Surveillance video systems
The traffic surveillance systems include detection and
recognition of moving vehicles (objects) from traffic video
sequence. For any traffic surveillance system, vehicle
segmentation is the fundamental step and base for tracking the
vehicle movements. But, Vehicle segmentation in traffic
video is still challenging due to the moving objects and
illumination variations. To solve this issue, an unsupervised
neural network based background modelling has been
proposed for real time objects segmentation. In this work,
neural network serves as both adaptive model of the
background in a video sequence and a classifier of pixels as
background/foreground. The segmentation time taken by the
neural network is improved by implementing it in FPGA kit.
Though this neural network based background subtraction
method achieves good segmentation accuracy, it works well
only under slightly varying illumination and moving
background. A high cost is involved in reducing time
complexity [16]. Followed by this, [17]Appiah et. Al proposed
an integrated hardware implementation of moving object
segmentation in real time video stream under varying lighting
conditions. Two algorithms for multimodal background
modelling and connected component analysis is implemented
on a single chip FPGA. This method segments objects under
varying illumination condition at high processing speed. The
two algorithms described so far do not take raining issue into
account. Under raining situation, shadows and colour
reflections are the major problems to be tackled. A
conventional video object segmentation algorithm that
combines the background construction-based video object
segmentation and the foreground extraction-based video
object segmentation has been proposed. The foreground is
separated from the background using histogram-based change
detection technique and object regions are segmented
accurately by detecting the initial moving object masks based
on a frame difference mask. Shadow and colour reflection
regions are removed by diamond window mask and colour
analysis of moving object respectively. Segmentation of
moving objects are refined by morphological operations. The
segmentation results of moving objects under rainy situations.In the future, we will adaptively
obtain the threshold and adjustthe content of the video
automatically. Later, Chien et al [19] proposed a video object
segmentation and tracking technique for smart cameras in
visual surveillance networks. A multi-background model
based on threshold decision algorithm for video object
segmentation under drastic changes in illumination and
background clutter has been developed. In this method, the
threshold is selected robustly without user requirement and it
is different from per pixel background model which avoids
possible error propagations. Another algorithm for extracting
objects from videos captured by static camera has been
proposed to solve issues like waving tree, camouflage region
and sleeping is also proposed [20]. In this method, reference
background is obtained by averaging of some initial frames.
Temporal processing for object extraction do not consider
spatial correlation amongst the moving objects across frames.
Hence, an approximate motion field is derived using the
background subtraction and temporal difference mechanism.
The background model adapts temporal changes (swaying
trees, rippling water, etc) which extract the complementary
object in the scene.
using [18] is shown in fig.2.对于交通检测来说,最重要的就是分类各种各样的交通工具,但是,因为物体总是在运动的原因,所以还是很难识别。所以为了解决这个问题,一个无监督的神经网络被我们用来作为视频中前景色和背景色的适应性模型和像素的分类器。这个神经网络的运算时间可以把他装在fgpa上来减少,虽然这个神经网络作为“筛除背景”的方法取得了很高的分类效果,但是只能在光影变化不大和背景几乎不动的情况下使用,同时,减少时间复杂度的成本很高的,所以,Appiah et. Al提出了一种可在集成硬件上实行的算法,这两种算法在单核fgpa上就能实现,而且它很好地解决了光线问题。但是他没有解决雨天的问题,在雨天,阴影和光线的反射是最主要的问题。传统的算法将基于架构的背景分类和分离出的前景物体混合在一起。而前景应该利用基于“直方图”改变的侦测技术,目标区域也应该被分割出来,方法是侦测最初的移动物体基于移动物体的掩模和帧差异的掩码上(这是什么意思,目前没搞明白)。反正他说阴影和颜色反射的部分会被一个diamond window mask和颜色分析移动的目标算法分别来处理。这是一种分形几何的算法,移动中需要分割的物体被这种算法给限制了,在fig2中结果被呈现了出来。在未来,我们让算法自动适应性地调整“阈值”和“调整的内容”。之后,chien提出了一种对小型照相机的视觉神经网络的算法,在这种方法中,“阈值”不需要使用者的帮助就能鲁棒地给出,而且它与逐像素的算法还不同,避免了可能的错误宣传??(啥意思,不懂)。还有一种算法是使用静态相机的,专门用来捕捉正在摇晃的树,还有一些伪装的东西。初始化时利用一些初始帧的平均值(什么意思??),但是这种算法没有考虑空间相关性,尤其是那些逐帧移动的物体??。
 
个人结合这种算法的那张效果图,觉得就是可以过滤掉光影效果,仅留存真正的目标。
 
最后一段话真的看不懂了,所以就直接谷歌翻译了????
 
因此,使用该推导出近似运动场
背景减法和时间差分机制。
背景模型适应时间变化(摇摆
树木,涟漪水等)提取互补
场景中的物体。
????
 
B. Generic video sequences
Moving foreground object extraction from a given generic
video shot is one of the vital tasks for content representation
and retrieval in many computer vision applications. An
iterative method based on energy minimization has been
proposed for segmenting the primary moving object efficiently
from moving camera video sequences. Initial object
segmentation obtained using graph-cut is improved repeatedly
by the features extracted over a set of neighbouring frames
[21]. Thus, this iterative method can efficiently segment the
objects in video shots captured on a moving camera. A
conditional random field model based video object
segmentation system, capable of segmenting multiple moving
objects from complex background has been proposed [22]. In
this work, a complementary property of point and region
trajectories is utilized effectively by transferring the labels of
sparse point trajectories to region trajectories. Region
trajectories based on shape consistency provides robust design
to segment spatially overlapping region trajectories. As region
trajectories are extracted from hierarchical image over
segmentation, it segments meaningful regions over time.
time and computational complexity. Unsupervised
segmentation of moving camera video sequence using inter
frame change detection has been proposed [23].
通用视频序列
这里提到了一种“迭代算法”,他的初始化就是通过一开始的几帧的图片分割,从一些相邻帧中提取的元素,所以这种算法可以从“移动的相机”中提取信息???论文22中提起了
基于条件随机场模型的视频对象
分割系统,能够分割多个移动
已经提出了来自复杂背景的物体(???谷歌翻译结果)
论文22主要提及了从稀疏轨迹到稠密轨迹的算法?
论文23提及的是无监督学习方法?
This section describes about the earlier works proposed in
video object segmentation field to reduce time and
computational complexity. Unsupervised segmentation of
moving camera video sequence using inter-frame change
detection has been proposed [24]. In this technique a better
trade-off between segmentation accuracy and complexity.
The object segmentation is performed by thresholding two
error frames generated by motion compensation for every
frame in the video. The thresholding technique used here is
weighted mean thresholding algorithm, where the weights are
set optimally. The unsupervised segmentation of the video
object is obtained by a further post processing step for
enhancing the temporal consistency. Moreover, lack of an
explicit model of how an object looks or moves also affects
the segmentation accuracy. Approaches addressing explicit
model should also comply with minimal complexity. To
address this issue, a moving object segmentation algorithm
[11] which uses improved point trajectories has been
proposed. This algorithm includes three steps:
1. Point trajectories are obtained from densely sampled
points at different scales from video and tracked
through optical flow.
 
 
 
 
 
 
 
posted @ 2019-09-05 22:10  coolwx  阅读(648)  评论(0编辑  收藏  举报