Velocity and Shape from Tightly-Coupled LiDAR and Camera

M. Hossein Daraei, Anh Vu, and Roberto Manduchi


In this paper, we propose a multi-object tracking and reconstruction approach through measurement-level fusion of LiDAR and camera. The proposed method, regardless of object class, estimates 3D motion and structure for all rigid obstacles. Using an intermediate surface representation, measurements from both sensors are processed within a joint framework. We combine optical flow, surface reconstruction, and point-to-surface terms in a tightly-coupled non-linear energy function, which is minimized using Iterative Reweighted Least Squares (IRLS). We demonstrate the performance of our model on different datasets (KITTI with Velodyne HDL-64E and our collected data with 4-layer ScaLa Ibeo), and show an improvement in velocity error and crispness over state-of-theart trackers.


The proposed method feeds velocity measurements from raw LiDAR and Camera data to a central Kalman-based tracker. LiDAR measurements contribute a geometric term, while camera images contribute a photometric term to the overal velocity estimation. We define an intermediate surface model which links the two modalities: we employ it to match new LiDAR points and to project object onto image domain. This enables us to link image-plane optical flow displacement vectors to three-dimensional velocity vectors. When a new LiDAR point cloud is available, the algorithm minimizes the geometric distance of points to the reconstructed surface. When a new image is available, its photometric distance is minimized with respect to the previous image when moved by estimated velocity. The videos below represent minimization of geometric and photometric errors,

Reconstructed surface is computed based on accumulated points for an object. In this work we assume data associations and object tracks are provided by either ground-truth annotations or a segmentation algorithm. Surfaces are estimated by minimizing the distance of points to them, and they are regularized and encouraged to be piece-wise planar. This is achieved by penalizing second-order derivatives of the surface. In order to reduce the computational burdon of dense reconstruction, we represent surfaces as linear combination of two-dimensional basis functions. In addition, we reduce surface resolution for closer objects -- assuming they are piecewise planar. For more details about surface reconstruction refer to the paper.

Experimental Results

Here a typical example is studied involving a common driving scene. You can hover your mouse on gray boxes to travel through time. Note that LiDAR measurements (top-right) are range measurements on azimuth-elevation grid and encoded with color. Filled areas in the diagram represent three-sigma uncertainty rang associated with each velocity component. For each time instant, velocity error magniture (in meters per second) and crispness score (defined in the paper) are computed.

In the example above, you can see the uncertainty associated with velocity estimates decrease over time (shown as filled areas and encoded as arrow color in top-left image) and the estimates in general get more accurate with more measurements. Object accumulated point cloud also gets denser with more points arriving, and the improvement in crispness score indicates the consistency of aligned (consecutive) point clouds. The appearance model is simply extracted by projecting the object mask (represented as red region in azimuth-elevation map) onto image plane using calibration parameters. It is then employed for computing the optical flow term and minimizing the photometric error. The reconstructed surface (bottom-left) is fit to the accumulated points, but in an uncertainty-aware fashion: fresh points (marked as green) have less uncertianty and are given larger weight in surface reconstruction, while old points (marked as red) are more ambiguous due to velocity error accumulation and are down weighted in the reconstruction. In addition, robust estimation weights (Huber penalizers) are added to further down weight outlier points. In order to achieve more robustness and better performance, we consider dynamic grid for objects,

This results in less number of surface cells to process and even more robustness against outliers. The same coarse-to-fine approach is employed in modeling and minimizing the photometric term: for farther objects (where velocity error results in small image-plane displacement) we use finer image pyramid levels, while for closer objects (where image-plane motion is more sensitive to slight errors in 3D velocity) we resort to coarser pyramid levels.

In order to deal with changes in appearance, we add robust weights to the photometric term that are derived based on a t-distribution assumption over optical flow (brightness constancy) error. They add robustness to motion estimation in cases where we have sudden appearance change -- as in the case of the example below where at some point there is a shadow on cyclist head. You can see how this shaddow results in a large local photometric error -- but the algorithm is robust against it, identifies that region as "outlier", and does not minimize its photometric error. You can hover your mouse on gray squares to explore in different frames. In the middle, the covariance matrix (in red) and velocity error (black line) are shown for that particular frame. As you see both error and its uncertainty decrease as more samples are processed.


  download paper

  title={Velocity and Shape from Tightly-Coupled LiDAR and Camera},
  author={Daraei, M. Hossein and Vu, Ahn and Manduchi, Roberto},
  booktitle={2017 IEEE Intelligent Vehicles Symposium},


slides python code poster
(slides and demo code will be added soon)