3D object detection is an essential component of scene perception and motion prediction in autonomous driving. Previous methods represent objects as the truncated signed distance fields (3D bounding box), which can only provide the geometric constraints of point-to-line. In this work, we define the object as a more compact representation, quadric (ellipsoid) in a 3D scene and a conic (ellipse) in an image, which can provide stronger geometric constraints of surface-to-curve. Specifically, we estimate a ellipsoid from a conic fitted by a 2D bounding box to obtain 3D object localization and occupancy. We further to formulate this constraint relation as a nonlinear optimization problem in dual space, which enables us to easy recover stable and accurate 3D object parameters by adding only three additional direction-aware branches to the existing 2D detection networks. In addition, we decouple the dimensions of object and update the length and orientation of objects in our iterative algorithm when the estimations from the 2D detection networks have different deviations. The final detection results can be obtained after passing through our geometry-related refinement network. We evaluate our method on the KITTI object detection benchmark and achieve the best performance among published monocular competitors.
This work is published on Neurocomputing 441(2021):151-160.