Lifelong machine learning can learn a sequence of consecutive robotic perception tasks via transferring previous experiences. However, 1) most existing lifelong learning based perception methods only take advantage of visual information for robotic tasks, while neglecting another important tactile sensing modality to capture discriminative material properties; 2) Meanwhile, they cannot explore the intrinsic relationships across different modalities and the common characterization among different tasks of each modality, due to the distinct divergence between heterogeneous feature distributions. To address above challenges, we propose a new Lifelong Visual-Tactile Learning (LVTL) model for continuous robotic visual-tactile perception tasks, which fully explores the latent correlations in both intra-modality and cross-modality aspects. Specifically, a modality-specific knowledge library is developed for each modality to explore common intra-modality representations across different tasks, while narrowing intra-modality mapping divergence between semantic and feature spaces via an auto-encoder mechanism. Moreover, a sparse constraint based modality-invariant space is constructed to capture underlying cross-modality correlations and identify the contributions of each modality for new coming visual-tactile tasks. We further propose a modality consistency regularizer to efficiently align the heterogeneous visual and tactile samples, which ensures the semantic consistency between different modality-specific knowledge libraries. After deriving an efficient model optimization strategy, we conduct extensive experiments on several representative datasets to demonstrate the superiority of our
LVTL model. Evaluation experiments show that our proposed model significantly outperforms existing state-of-the-art methods with about 1.16%~15.36% improvement under different lifelong visual-tactile perception scenarios.
This work is published on Pattern Recognition 121(2022):1-12.