Three-dimensional (3-D) object recognition task focuses on detecting the objects of a scene and estimating their 6-DOF pose via effective feature extraction methods. Most recent feature extraction methods are based on the deep neural networks and show good performances. However, these methods require rendering engine to assist in generating a large amount of training data, which need much time to converge and further lead to the block in a rapid industrial production line. Besides, for the common hand-crafted features, the lack of discriminant feature-points amongst various texture-less and surface-smooth objects can cause ambiguity in the process of feature-points matching. To address these challenges above, a hand-crafted 3-D feature descriptor with center offset and pose annotations is proposed in this article, which is called view-specific local projection statistics (VSLPSs). By relying on these annotations as seeds, a voting strategy is then used to transform the feature-points matching problem into the problem of voting an optimal model-view in the 6-DOF space. In this way, the ambiguity of feature-points matching caused by poor feature discrimination is eliminated. To the end, various experiments on three public datasets and our built 3-D bin-picking dataset demonstrate that our proposed VSLPS method performs well in comparison with the state-of-the-art.
This work is published on IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 51.11(2021):7109-7119.