Geometrical and contextual scene analysis for object detection and tracking in intelligent vehicles

Bihao Wang

Oct. 2013 - Jul. 2015.
China Scholarship Council Grant
Reviewers : Samia Boucheffa and Michel Devy
Examiners: Pascal Vasseur, Dominique Gruyer and Véronique Cherfaoui

Download thesis manuscript

Abstract

For autonomous or semi-autonomous intelligent vehicles, perception constitutes the first fundamental task to be performed before decision and action. Through the analysis of video, lidar and radar data, it provides a specific representation of the environment and of its state, by extracting key properties from sensor data with time integration of sensor information. Compared to other perception modalities such as GPS, inertial or range sensors (radar, Lidar, ultrasonic), the cameras offer the greatest amount of information. Thanks to their versatility, cameras allow intelligent systems to achieve both high-level contextual and low-level geometrical information about the observed scene, and this is at high speed and low cost. Furthermore, the passive sensing technology of cameras enables low energy consumption and facilitates small-size system integration. The use of cameras is however, not trivial and poses a number of theoretical issues related to how this sensor perceives its environment.

In this thesis, we propose a vision-only system for moving object detection. Indeed, within natural and constrained environments observed by an intelligent vehicle, moving objects represent high risk collision obstacles, and have to be handled robustly. We approach the problem of detecting moving objects by first extracting the local context using a color-based road segmentation. After transforming the color image into illuminant invariant image, shadows as well as their negative influence on the detection process can be removed. Hence, according to the feature automatically selected on the road, a region of interest (ROI), where the moving objects can appear with a high collision risk, is extracted. Within this area, the moving pixels are then identified using a plane+parallax approach. To this end, the potential moving and parallax pixels are detected using a background subtraction method; then three different geometrical constraints: the epipolar constraint, the structural consistency constraint and the trifocal tensor are applied to such potential pixels to filter out parallax ones. Likelihood equations are also introduced to combine the constraints in a complementary and effective way. When stereovision is available, the road segmentation and on-road obstacles detection can be refined by means of the disparity map with geometrical cues. Moreover, in this case, a robust tracking algorithm combining image and depth information has been proposed. If one of the two cameras fails, the system can therefore come back to a monocular operation mode, which is an important feature for perception system reliability and integrity.

The different proposed algorithms have been tested on public images dataset with an evaluation against state-of-the-art approaches and groundtruth data. The obtained results are promising and show that the proposed methods are effective and robust on the different traffic scenarios and can achieve reliable detections in ambiguous situations.