Complex dynamic scene analysis through multi-body motion segmentation

Hernan Gonzalez

Sept. 2016 - Dec. 2019
Colciencias Grant
Reviewers : Vincent Frémont and Anne Verroust-Blondet
Examiners: Roland Chapuis, Abdelhafid and Roger Reynaud

Download thesis manuscript

Abstract

In the context of Advanced Driver Assistance Systems (ADAS) and Autonomous Vehicles, scene understanding is a fundamental inference process in which several servoing and decision making functions depends on. Such a process is intended to retrieve reliable information about the vehicle's surroundings including static and dynamic objects (e.g. obstacles, pedestrians, vehicles), the scene structure (e.g. road, navigable space, lane markings) and ego-localization (e.g. odometry). All this information is essential to make crucial decisions in autonomous navigation and assisting maneuvers. To this end, perception systems are designed to provide redundant and reliable observations of the scene. This thesis is devoted and focused on image-based multi-body motion segmentation of dynamic scenes using monocular vision systems only.

The conducted research starts by surveying methods of the state-of-the-art and contrasting their advantages and drawbacks in terms of performance indicators and computation time. After identifying a Vision-only based methodology, sparse optical flow methods are studied. As a concept-proof, an algorithm implementation shows, in practice, limits of the addressed approach leading to envision and consolidate our contributions.

Detecting and tracking objects in a classic processing chain may lead to a low-performance and time-consuming solution. Instead of segmenting moving objects and tracking them independently, a Track-before-Detect framework for a multi-body motion segmentation (namely TbD-SfM) was proposed. This method relies detection and tracking on a tightly coupled strategy intended to reduce the complexity of an existing Multi-body Structure from Motion approach. Efforts were also devoted for reducing the computational cost without introducing any kinematic model constraints and for preserving features density on observed motions. Further, an accelerated implementation variant of TbD (namely ETbD-SfM) was also proposed in order to limit the complexity with respect to the number of observed motions.

The proposed methods were extensively tested with different publicly available datasets such as Hopkins155 and KITTI. Hopkins dataset allows a comparison under feature-tracking ideal conditions since the dataset includes referenced optical flow. KITTI provides image sequences under real road scenarios in order to evaluate robustness of the method. Results on scenarios including the presence of multiple and simultaneous moving objects observed from a moving camera are analyzed and discussed.

In conclusion, the obtained results show that TbD-SfM and ETbD-SfM methods can segment dynamic objects using a 6DoF motion model, achieving a low image segmentation error without increasing of computational cost and preserving the density of the feature points. Additionally, the 3D scene geometry and trajectories are provided by estimating scale on the monocular system and comparing these results to referenced object trajectories.