[Object Tracking] Identify and Track Specific Object

Abstract—Augmented Reality (AR) has become increasingly popular in recent years and it has a widespread application prospect. Especially in 2016, Pokémon Go, a location-based augmented reality game, has brought a dramatic impact on the global market of mobile games which has become the milestone of AR technology. AR is a kind of comprehensive application and it allows users to experience digital game play in a real world environment. Although a mature AR product includes various technologies, such as location technique, 3D reconstruction, its core technology is object identification and tracking. In this report, I implement the basic functions of object identification and tracking, then rendering a 2D model overlaying the object in the preview, which achieves the basic effect of Pokémon Go.

Bag of visual words.

In this project, there are three key techniques to implement functions. Firstly, we transform images to SURF descriptors and train these descriptors to get bag of visual words model. Next, we use this model to get the vector representation of object image and justify the validity of homography matrix to identify the object. Finally, after identifying the object, we use optical flow method to track it and overlay 2D model on the frame buffer according to the position of object.

The bag-of-visual-words (BoW) model is a simple assumption used in natural language processing and information retrieval, and has been widely applied in the computer vision field. The procedure includes two parts: learning and recognition. As shown in Figure 1, in the learning procedure, (i) Select a large set of images as the training data. (ii) Extract the SURF descriptors points of all the images in the training set and obtain the SURF descriptor for each interest point extracted. (iii) Quantize these descriptors by K-means clustering algorithm to form a codebook. (iv) Obtain the result of clustering as the visual vocabulary. (v) Extract the interest points of the object image. (vi) Obtain SURF descriptor for each interest point. (vii) Match the feature descriptors with the vocabulary by KNN algorithm. (viii) Build the histogram of object image.

Figure 1

Tracking

When tracking object by multi-scale Lucas-Kanade algorithm, we can get the homography matrix from the previous frame to current frame. Thus if we know the initial position of four corners, which actually have been given when the object is identified, the position of four corners can always be inferred. This is shown in Figure 2.

Figure 2

Implement steps:

cv::FeatureDetector

cv::DescriptorExtractor

cv::DescriptorMatcher

(1) Create FeatureDetector and DescriptorExtractor.

(2) Calculate SURF descriptors for each image.

(3) Clusters the descriptors using k-Means algorithm.

//! clusters the input data using k-Means algorithm
CV_EXPORTS_W double kmeans( InputArray data,
int K,
CV_OUTInputOutputArray bestLabels,
TermCriteria criteria,
int attempts,
int flags,
OutputArray centers=noArray()
);

(4) Create flannBasedMatcher which is based on KD-Tree.

(5) Add centroids to flannBasedMatcher (train collection), and these centroids will be the indexes of histogram.

descriptor_matcher->add(centroid);

(6) Update and balance the KD-Tree.

descriptor_matcher->train();

(7) Use BoW model to get the vector representation of object image by KNN algorithm.

(8) Tracking.