Motion-Based Multiple Object Tracking

kalman filter tracking...

%% Motion-Based Multiple Object Tracking  
% This example shows how to perform automatic detection and motion-based  
% tracking of moving objects in a video from a stationary camera.  
%  
%   Copyright 2014 The MathWorks, Inc.  
  
%%  
% Detection of moving objects and motion-based tracking are important   
% components of many computer vision applications, including activity  
% recognition, traffic monitoring, and automotive safety.  The problem of  
% motion-based object tracking can be divided into two parts:  
%  
% # detecting moving objects in each frame   
% # associating the detections corresponding to the same object over time  
%  
% The detection of moving objects uses a background subtraction algorithm  
% based on Gaussian mixture models. Morphological operations are applied to  
% the resulting foreground mask to eliminate noise. Finally, blob analysis  
% detects groups of connected pixels, which are likely to correspond to  
% moving objects.   
%  
% The association of detections to the same object is based solely on  
% motion. The motion of each track is estimated by a Kalman filter. The  
% filter is used to predict the track's location in each frame, and  
% determine the likelihood of each detection being assigned to each   
% track.  
%  
% Track maintenance becomes an important aspect of this example. In any  
% given frame, some detections may be assigned to tracks, while other  
% detections and tracks may remain unassigned.The assigned tracks are  
% updated using the corresponding detections. The unassigned tracks are   
% marked invisible. An unassigned detection begins a new track.   
%  
% Each track keeps count of the number of consecutive frames, where it  
% remained unassigned. If the count exceeds a specified threshold, the  
% example assumes that the object left the field of view and it deletes the  
% track.    
%  
% For more information please see  
% <matlab:helpview(fullfile(docroot,'toolbox','vision','vision.map'),'multipleObjectTracking') Multiple Object Tracking>.  
%  
% This example is a function with the main body at the top and helper   
% routines in the form of   
% <matlab:helpview(fullfile(docroot,'toolbox','matlab','matlab_prog','matlab_prog.map'),'nested_functions') nested functions>   
% below.  
  
function multiObjectTracking()  
  
% Create System objects used for reading video, detecting moving objects,  
% and displaying the results.  
obj = setupSystemObjects();  
  
tracks = initializeTracks(); % Create an empty array of tracks.  
  
nextId = 1; % ID of the next track  
  
% Detect moving objects, and track them across video frames.  
while ~isDone(obj.reader)  
    frame = readFrame();  
    [centroids, bboxes, mask] = detectObjects(frame);  
    predictNewLocationsOfTracks();  
    [assignments, unassignedTracks, unassignedDetections] = ...  
        detectionToTrackAssignment();  
      
    updateAssignedTracks();  
    updateUnassignedTracks();  
    deleteLostTracks();  
    createNewTracks();  
      
    displayTrackingResults();  
end  
  
  
%% Create System Objects  
% Create System objects used for reading the video frames, detecting  
% foreground objects, and displaying results.  
  
    function obj = setupSystemObjects()  
        % Initialize Video I/O  
        % Create objects for reading a video from a file, drawing the tracked  
        % objects in each frame, and playing the video.  
          
        % Create a video file reader.  
        obj.reader = vision.VideoFileReader('atrium.avi');  
          
        % Create two video players, one to display the video,  
        % and one to display the foreground mask.  
        obj.videoPlayer = vision.VideoPlayer('Position', [20, 400, 700, 400]);  
        obj.maskPlayer = vision.VideoPlayer('Position', [740, 400, 700, 400]);  
          
        % Create System objects for foreground detection and blob analysis  
          
        % The foreground detector is used to segment moving objects from  
        % the background. It outputs a binary mask, where the pixel value  
        % of 1 corresponds to the foreground and the value of 0 corresponds  
        % to the background.   
          
        obj.detector = vision.ForegroundDetector('NumGaussians', 3, ...  
            'NumTrainingFrames', 40, 'MinimumBackgroundRatio', 0.7);  
          
        % Connected groups of foreground pixels are likely to correspond to moving  
        % objects.  The blob analysis System object is used to find such groups  
        % (called 'blobs' or 'connected components'), and compute their  
        % characteristics, such as area, centroid, and the bounding box.  
          
        obj.blobAnalyser = vision.BlobAnalysis('BoundingBoxOutputPort', true, ...  
            'AreaOutputPort', true, 'CentroidOutputPort', true, ...  
            'MinimumBlobArea', 400);  
    end  
  
%% Initialize Tracks  
% The |initializeTracks| function creates an array of tracks, where each  
% track is a structure representing a moving object in the video. The  
% purpose of the structure is to maintain the state of a tracked object.  
% The state consists of information used for detection to track assignment,  
% track termination, and display.   
%  
% The structure contains the following fields:  
%  
% * |id| :                  the integer ID of the track  
% * |bbox| :                the current bounding box of the object; used  
%                           for display  
% * |kalmanFilter| :        a Kalman filter object used for motion-based  
%                           tracking  
% * |age| :                 the number of frames since the track was first  
%                           detected  
% * |totalVisibleCount| :   the total number of frames in which the track  
%                           was detected (visible)  
% * |consecutiveInvisibleCount| : the number of consecutive frames for   
%                                  which the track was not detected (invisible).  
%  
% Noisy detections tend to result in short-lived tracks. For this reason,  
% the example only displays an object after it was tracked for some number  
% of frames. This happens when |totalVisibleCount| exceeds a specified   
% threshold.      
%  
% When no detections are associated with a track for several consecutive  
% frames, the example assumes that the object has left the field of view   
% and deletes the track. This happens when |consecutiveInvisibleCount|  
% exceeds a specified threshold. A track may also get deleted as noise if   
% it was tracked for a short time, and marked invisible for most of the of   
% the frames.          
  
    function tracks = initializeTracks()  
        % create an empty array of tracks  
        tracks = struct(...  
            'id', {}, ...  
            'bbox', {}, ...  
            'kalmanFilter', {}, ...  
            'age', {}, ...  
            'totalVisibleCount', {}, ...  
            'consecutiveInvisibleCount', {});  
    end  
  
%% Read a Video Frame  
% Read the next video frame from the video file.  
    function frame = readFrame()  
        frame = obj.reader.step();  
    end  
  
%% Detect Objects  
% The |detectObjects| function returns the centroids and the bounding boxes  
% of the detected objects. It also returns the binary mask, which has the   
% same size as the input frame. Pixels with a value of 1 correspond to the  
% foreground, and pixels with a value of 0 correspond to the background.     
%  
% The function performs motion segmentation using the foreground detector.   
% It then performs morphological operations on the resulting binary mask to  
% remove noisy pixels and to fill the holes in the remaining blobs.    
  
    function [centroids, bboxes, mask] = detectObjects(frame)  
          
        % Detect foreground.  
        mask = obj.detector.step(frame);  
          
        % Apply morphological operations to remove noise and fill in holes.  
        mask = imopen(mask, strel('rectangle', [3,3]));  
        mask = imclose(mask, strel('rectangle', [15, 15]));   
        mask = imfill(mask, 'holes');  
          
        % Perform blob analysis to find connected components.  
        [~, centroids, bboxes] = obj.blobAnalyser.step(mask);  
    end  
  
%% Predict New Locations of Existing Tracks  
% Use the Kalman filter to predict the centroid of each track in the  
% current frame, and update its bounding box accordingly.  
  
    function predictNewLocationsOfTracks()  
        for i = 1:length(tracks)  
            bbox = tracks(i).bbox;  
              
            % Predict the current location of the track.  
            predictedCentroid = predict(tracks(i).kalmanFilter);  
              
            % Shift the bounding box so that its center is at   
            % the predicted location.  
            predictedCentroid = int32(predictedCentroid) - bbox(3:4) / 2;  
            tracks(i).bbox = [predictedCentroid, bbox(3:4)];  
        end  
    end  
  
%% Assign Detections to Tracks  
% Assigning object detections in the current frame to existing tracks is  
% done by minimizing cost. The cost is defined as the negative  
% log-likelihood of a detection corresponding to a track.    
%  
% The algorithm involves two steps:   
%  
% Step 1: Compute the cost of assigning every detection to each track using  
% the |distance| method of the |vision.KalmanFilter| System object(TM). The   
% cost takes into account the Euclidean distance between the predicted  
% centroid of the track and the centroid of the detection. It also includes  
% the confidence of the prediction, which is maintained by the Kalman  
% filter. The results are stored in an MxN matrix, where M is the number of  
% tracks, and N is the number of detections.     
%  
% Step 2: Solve the assignment problem represented by the cost matrix using  
% the |assignDetectionsToTracks| function. The function takes the cost   
% matrix and the cost of not assigning any detections to a track.    
%  
% The value for the cost of not assigning a detection to a track depends on  
% the range of values returned by the |distance| method of the   
% |vision.KalmanFilter|. This value must be tuned experimentally. Setting   
% it too low increases the likelihood of creating a new track, and may  
% result in track fragmentation. Setting it too high may result in a single   
% track corresponding to a series of separate moving objects.     
%  
% The |assignDetectionsToTracks| function uses the Munkres' version of the  
% Hungarian algorithm to compute an assignment which minimizes the total  
% cost. It returns an M x 2 matrix containing the corresponding indices of  
% assigned tracks and detections in its two columns. It also returns the  
% indices of tracks and detections that remained unassigned.   
  
    function [assignments, unassignedTracks, unassignedDetections] = ...  
            detectionToTrackAssignment()  
          
        nTracks = length(tracks);  
        nDetections = size(centroids, 1);  
          
        % Compute the cost of assigning each detection to each track.  
        cost = zeros(nTracks, nDetections);  
        for i = 1:nTracks  
            cost(i, :) = distance(tracks(i).kalmanFilter, centroids);  
        end  
          
        % Solve the assignment problem.  
        costOfNonAssignment = 20;  
        [assignments, unassignedTracks, unassignedDetections] = ...  
            assignDetectionsToTracks(cost, costOfNonAssignment);  
    end  
  
%% Update Assigned Tracks  
% The |updateAssignedTracks| function updates each assigned track with the  
% corresponding detection. It calls the |correct| method of  
% |vision.KalmanFilter| to correct the location estimate. Next, it stores  
% the new bounding box, and increases the age of the track and the total  
% visible count by 1. Finally, the function sets the invisible count to 0.   
  
    function updateAssignedTracks()  
        numAssignedTracks = size(assignments, 1);  
        for i = 1:numAssignedTracks  
            trackIdx = assignments(i, 1);  
            detectionIdx = assignments(i, 2);  
            centroid = centroids(detectionIdx, :);  
            bbox = bboxes(detectionIdx, :);  
              
            % Correct the estimate of the object's location  
            % using the new detection.  
            correct(tracks(trackIdx).kalmanFilter, centroid);  
              
            % Replace predicted bounding box with detected  
            % bounding box.  
            tracks(trackIdx).bbox = bbox;  
              
            % Update track's age.  
            tracks(trackIdx).age = tracks(trackIdx).age + 1;  
              
            % Update visibility.  
            tracks(trackIdx).totalVisibleCount = ...  
                tracks(trackIdx).totalVisibleCount + 1;  
            tracks(trackIdx).consecutiveInvisibleCount = 0;  
        end  
    end  
  
%% Update Unassigned Tracks  
% Mark each unassigned track as invisible, and increase its age by 1.  
  
    function updateUnassignedTracks()  
        for i = 1:length(unassignedTracks)  
            ind = unassignedTracks(i);  
            tracks(ind).age = tracks(ind).age + 1;  
            tracks(ind).consecutiveInvisibleCount = ...  
                tracks(ind).consecutiveInvisibleCount + 1;  
        end  
    end  
  
%% Delete Lost Tracks  
% The |deleteLostTracks| function deletes tracks that have been invisible  
% for too many consecutive frames. It also deletes recently created tracks  
% that have been invisible for too many frames overall.   
  
    function deleteLostTracks()  
        if isempty(tracks)  
            return;  
        end  
          
        invisibleForTooLong = 20;  
        ageThreshold = 8;  
          
        % Compute the fraction of the track's age for which it was visible.  
        ages = [tracks(:).age];  
        totalVisibleCounts = [tracks(:).totalVisibleCount];  
        visibility = totalVisibleCounts ./ ages;  
          
        % Find the indices of 'lost' tracks.  
        lostInds = (ages < ageThreshold & visibility < 0.6) | ...  
            [tracks(:).consecutiveInvisibleCount] >= invisibleForTooLong;  
          
        % Delete lost tracks.  
        tracks = tracks(~lostInds);  
    end  
  
%% Create New Tracks  
% Create new tracks from unassigned detections. Assume that any unassigned  
% detection is a start of a new track. In practice, you can use other cues  
% to eliminate noisy detections, such as size, location, or appearance.  
  
    function createNewTracks()  
        centroids = centroids(unassignedDetections, :);  
        bboxes = bboxes(unassignedDetections, :);  
          
        for i = 1:size(centroids, 1)  
              
            centroid = centroids(i,:);  
            bbox = bboxes(i, :);  
              
            % Create a Kalman filter object.  
            kalmanFilter = configureKalmanFilter('ConstantVelocity', ...  
                centroid, [200, 50], [100, 25], 100);  
              
            % Create a new track.  
            newTrack = struct(...  
                'id', nextId, ...  
                'bbox', bbox, ...  
                'kalmanFilter', kalmanFilter, ...  
                'age', 1, ...  
                'totalVisibleCount', 1, ...  
                'consecutiveInvisibleCount', 0);  
              
            % Add it to the array of tracks.  
            tracks(end + 1) = newTrack;  
              
            % Increment the next id.  
            nextId = nextId + 1;  
        end  
    end  
  
%% Display Tracking Results  
% The |displayTrackingResults| function draws a bounding box and label ID   
% for each track on the video frame and the foreground mask. It then   
% displays the frame and the mask in their respective video players.   
  
    function displayTrackingResults()  
        % Convert the frame and the mask to uint8 RGB.  
        frame = im2uint8(frame);  
        mask = uint8(repmat(mask, [1, 1, 3])) .* 255;  
          
        minVisibleCount = 8;  
        if ~isempty(tracks)  
                
            % Noisy detections tend to result in short-lived tracks.  
            % Only display tracks that have been visible for more than   
            % a minimum number of frames.  
            reliableTrackInds = ...  
                [tracks(:).totalVisibleCount] > minVisibleCount;  
            reliableTracks = tracks(reliableTrackInds);  
              
            % Display the objects. If an object has not been detected  
            % in this frame, display its predicted bounding box.  
            if ~isempty(reliableTracks)  
                % Get bounding boxes.  
                bboxes = cat(1, reliableTracks.bbox);  
                  
                % Get ids.  
                ids = int32([reliableTracks(:).id]);  
                  
                % Create labels for objects indicating the ones for   
                % which we display the predicted rather than the actual   
                % location.  
                labels = cellstr(int2str(ids'));  
                predictedTrackInds = ...  
                    [reliableTracks(:).consecutiveInvisibleCount] > 0;  
                isPredicted = cell(size(labels));  
                isPredicted(predictedTrackInds) = {' predicted'};  
                labels = strcat(labels, isPredicted);  
                  
                % Draw the objects on the frame.  
                frame = insertObjectAnnotation(frame, 'rectangle', ...  
                    bboxes, labels);  
                  
                % Draw the objects on the mask.  
                mask = insertObjectAnnotation(mask, 'rectangle', ...  
                    bboxes, labels);  
            end  
        end  
          
        % Display the mask and the frame.  
        obj.maskPlayer.step(mask);          
        obj.videoPlayer.step(frame);  
    end  
  
%% Summary  
% This example created a motion-based system for detecting and  
% tracking multiple moving objects. Try using a different video to see if  
% you are able to detect and track objects. Try modifying the parameters  
% for the detection, assignment, and deletion steps.    
%  
% The tracking in this example was solely based on motion with the  
% assumption that all objects move in a straight line with constant speed.  
% When the motion of an object significantly deviates from this model, the  
% example may produce tracking errors. Notice the mistake in tracking the  
% person labeled #12, when he is occluded by the tree.   
%  
% The likelihood of tracking errors can be reduced by using a more complex  
% motion model, such as constant acceleration, or by using multiple Kalman  
% filters for every object. Also, you can incorporate other cues for  
% associating detections over time, such as size, shape, and color.   
  
displayEndOfDemoMessage(mfilename)  
end  

 

posted @ 2017-01-15 22:06  航飞冲天  阅读(609)  评论(0编辑  收藏  举报