3D reconstruction, a popular topic in computer vision, has been researched extensively for more than three decades. Several image-based structure-from-motion algorithms have been proposed. However, 3-D reconstruction of indoor, human-made structures presents specific challenges due to distinctive properties such as lack of textures and dramatic changes in viewpoint. In this work, we will propose a novel approach for 3D reconstruction of indoor scenes under the “Manhattan world” constraint, which assumes that visible planes intersect at orthogonal angles. To recover such planes, our algorithm clusters chains of co-planar feature points and junctions matched over consecutive image pairs followed by planar-constrained parameter optimization. Image patches are then retrieved, allowing for the 3D rendering of the resulting structure. Different from traditional Structure-from-Motion algorithms, our algorithm also estimates 3D planes instead of only individual 3D points and renders the input images using the estimated 3D planes in planar-patch-wise fashion. This enables generation of dense reconstructions with fewer images and facilitates real-world mobile applications.
The first step of this work is finding chains of coplanar feature points using Manhattan world-constrained point pair clustering algorithm. Let’s say we take a pair of images taken from an indoor environment while walking through a corridor, as below.
Next, we find feature points for each input image and match them for each consecutive pair of images using the original Scale Invariant Feature Transform (SIFT) algorithm. Then, we apply our clustering algorithm followed by camera parameter optimization step, to get the following result.
In the above figure, each colored point represents the detected and matched feature point, points of the same color are in the same cluster, and yellow crosses show the outlier points. Please refer our paper “Multi-Planar Fitting in an Indoor Manhattan World,” for more detail. At this point, we will already be able to reconstruct the 3-D scene, because we already know parameters of existing planar structures and camera parameters between two input images. The following figure shows an example of 3-D reconstruction with our novel planar patch retrieval algorithm, only from two images.
Different from point cloud-based reconstruction algorithms, we produce a dense reconstruction only with two images containing less than 250 matched feature points. If we allow a user interaction, e.g., let a user draws quadrilaterals and matches them with one of the found planar structures, we can acquire even much denser reconstruction, as below.
Currently, we are working on new plane-constrained bundle adjustment algorithm, which enables the 3-D reconstruction from more than two images, and produces robust 3-D scene.