ICCV 2015, Day 1、2、3、4
ICCV 2015, Day 1
ICCV 2015, the International Conference on Computer Vision, is one of the premier venues for computer vision research, together with the CVPR conference. This ICCV is happening in Santiago, Chile, a beautiful city with amazing food.
The computer vision community is growing, and this ICCV is the largest so far (1460 attendees, 525 papers). Since a few years computer vision is broadly relevant for the industry and there are no less than 22 companies sponsoring the conference. The acceptance rate this year was 30.92%, with the acceptance for oral presentations at 3.30%. All papers of the conference areavailable as open-access PDF here.
There was a lot of interesting work presented on the first day, but here is my subjective selection of interesting work.
Aligning Books and Movies
By Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler.
Movies and the books they are based on form a rich paired data source. In this work the authors propose a recurrent neural network model to align these two sources semantically. The challenge is that movies and books are often substantially different, but apparently modern recurrent neural networks have enough semantic discrimination ability to enable such alignment.
Convolutional Color Constancy
By Jonathan Barron.
Color constancy deals with the correction of colors in digital images. While there have been a large number of works in this area, the issue remains challenging and important.
In this work the author convincingly demonstrates that common changes in colors correspond to simple translation of a color histogram in a transformed 2D histogram space. Then, the problem of correcting for these translations can be posed as simply recognizing the true center position of the observed color histogram and undoing the translation.
Self-Calibration of Optical Lenses
By Michael Hirsch and Bernhard Schoelkopf.
Both cheap and expensive camera lenses suffer from many optical effects, leading to deterioration in image quality. This work proposes an automatic way to obtain non-parametric kernel estimates of the point spread functions characterising a lens. The resulting model can then be used to deblur images. In effect, this allows better image quality even when using cheap lenses.
ICCV 2015, Day 2
This article summarizes the second day of the ICCV 2015 conference, the International Conference on Computer Vision. A summary of the first day is also available.
Awards
The following awards were given at ICCV 2015.
Achievement awards
- PAMI Distinguished Researcher Award (1): Yann LeCun
- PAMI Distinguished Researcher Award (2): David Lowe
- PAMI Everingham Prize Winner (1): Andrea Vedaldi for VLFeat
- PAMI Everingham Prize Winner (2): Daniel Scharstein and Rick Szeliskifor the Middlebury Datasets
Paper awards
- PAMI Helmholtz Prize (1): David Martin, Charles Fowlkes, Doron Tal, and Jitendra Malik for their ICCV 2001 paper "A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics".
- PAMI Helmholtz Prize (2): Serge Belongie, Jitendra Malik, and Jan Puzicha, for their ICCV 2001 paper "Matching Shapes".
- Marr Prize: Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, and Samual Rota Bulo, for "Deep Neural Decision Forests".
- Marr Prize honorable mention: Saining Xie and Zhuowen Tu for"Holistically-Nested Edge Detection".
Interesting Papers
The above Marr prize winning papers are very nice, but here I also highlight a two other papers I found interesting today.
Fast R-CNN
By Ross Girshick.
Since 2014 the standard object detection pipeline for natural images is the R-CNN system which first extracts a set of object proposals then scores them using a convolutional neural network. The two key weaknesses of the approach are: first, the separation between proposal generation and scoring, preventing joint training of model parameters; and second the separate scoring of each hypothesis which leads to significant runtime overhead. This work and the follow-up work ("Faster R-CNNs" at NIPS this year) addresses both issues by proposing a joint model that is trained end-to-end, including proposal generation, leading to a new state of the art in object detection.
Unsupervised Visual Representation Learning by Context Prediction
By Carl Doersch, Abhinav Gupta, and Alexei A. Efros.
Supervised deep learning needs lots of labeled training data to achieve good performance. This paper investigates whether we can create and train deep neural networks on artificial tasks for which we can create large amounts of training data. In particular, the paper proposes to predict where a certain patch appears within the image. For this task, an almost infinite amount of training data is easily created. Perhaps surprisingly the resulting network, despite being trained on this artificial task, has learned useful representations for real vision tasks such as image classification.
Deep Fried Convnets
By Zichao Yang, Marcin Moczulski, Misha Denil, Nando de Freitas, Alex Smola, Le Song, and Ziyu Wang.
In deep convolutional networks the last few densely connected layers have the most parameters and thus most of the required memory during test time and training. This work proposes to leverage the fastfood kernel approximation to replace densely connected layers with specific efficient and low parameter operations.
The empirical results are impressive and the fastfood justification is plausible, but I wonder if this work may event provide a hint at a more general approach to construct efficient neural network architectures by using arbitrary dense but efficient matrix operations (FFT, DCT, Walsh-Hadamard, etcetera).
ICCV 2015, Day 3
This article summarizes the third day of the ICCV 2015 conference, the International Conference on Computer Vision. A summary of the first day andsecond day is also available.
Interesting Papers
Registering Images to Untextured Geometry Using Average Shading Gradients
By Tobias Ploetz and Stefan Roth.
This work considers the difficult problem of aligning an untextured 3D surface to a real image of the same object, a challenging problem because of the absence and presence of edges depending on texture and light.
The authors propose an alignment procedure that uses efficiently computableaverage shading gradient images that capture expected visible edges due to shadows despite unknown light direction.
Robust Nonrigid Registration by Convex Optimization
By Qifeng Chen and Vladlen Koltun.
The authors consider the problem of aligning two 3D shapes to each other, where each shape may be corrupted by missing surfaces (non water-tight surfaces) and undergo severe nonrigid deformations. Previous work has proposed to minimize a specific geodesic distortion measure over suitable classes of continuous transformations, however, this yields difficult non-convex optimization problems.
Because the distortion measure makes sense this work proposes a way to approximate while simultaneously convexifying the problem. This is achieved by representing the transformation nonparametrically through correspondences on randomly sampled points. While the original problem was continuous and non-convex, now it is a discrete energy minimization problem that can be approximately solved using a standard LP-based relaxation approach, where the authors use TRW-S.
What is surprising is how much the results improve on benchmark data sets; the error is reduced by a factor of three compared to strong baseline methods.
ICCV 2015, Day 4
This article summarizes the fourth day of the ICCV 2015 conference, the International Conference on Computer Vision. A summary of the first day,second day, and third day is also available.
ICCV 2017 and 2019
ICCV 2017 will be in Venice, Italy.
For ICCV 2019 there was an open voting between Seoul (Korea) and Shanghai (China), with Seoul winning the election. Both proposals were strong and because I have lived in Shanghai for two years I favored that proposal, but I am confident that ICCV 2019 in Seoul will be wonderful as well.
Parties
Computer vision is now fully recognized as having an impact in the industry. All large tech companies invested heavily in the last three years or so, and one of the visible results is the increased number of conference sponsors and the conference parties.
Conferences such as NIPS, CVPR, and ICCV now host invite-only open bar parties with several hundred attendees; this year at ICCV there were parties by Microsoft, Intel, Google, and Facebook.
Interestingly they do not come across as recruiting events: there is a minimal announcement perhaps, but otherwise people just chat with food and drinks. It is more a show of strength and goodwill towards the community that computer vision is taken seriously and the parties do demonstrate that the companies are in good shape, much like banks invest in a marble floor and shiny glass facades to gain the trust of their customers.
Interesting Papers
Polarized 3D: High-Quality Depth Sensing with Polarization Cues
By Achuta Kadambi, Vage Taamazyan, Boxin Shi, and Ramesh Raskar.
Polarization of light is a rarely exploited cue for 3D reconstruction. This work revisits shape-from-polarization and shows fine detail 3D reconstruction from polarization information (with non-trivial post-processing).