Semantic segmentation has a wide array of applications such as scene understanding, autonomous driving, and robot manipulation tasks. While existing segmentation models have achieved good performance using bottom-up deep neural processing, this paper describes a novel deep learning architecture that integrates top-down and bottom-up processing. The resulting model achieves higher accuracy at a relatively low computational cost. In the proposed model, higher-level top-down information is transmitted to the lower layers through recurrent connections in an encoder and a decoder, and the recurrent connection weights are trained using backpropagation. Experiments on several benchmark datasets demonstrate that this use of top-down information improves the mean intersection over union by more than 3% compared with a state-of-the-art bottom-up only network using the CamVid, SUN-RGBD and PASCAL VOC 2012 benchmark datasets. Additionally, the proposed model is successfully applied to a dataset designed for robotic grasping tasks.
- Deep recurrent neural network
- Semantic segmentation
- Top-down and bottom-up