Skip to end of metadata
Go to start of metadata

Text recognition approaches for indoor robotics : a comparison

Lam, Obadiah, Dayoub, Feras, Schulz, Ruth, & Corke, Peter (2014) Text recognition approaches for indoor robotics : a comparison. In 2014 Australasian Conference on Robotics and Automation, 2-4 December 2014, University of Melbourne, Melbourne, VIC.

This paper evaluates the performance of different text recognition techniques for a mobile robot in an indoor (university campus) environment. We compared four different methods: our own approach using existing text detection methods (Minimally Stable Extremal Regions detector and Stroke Width Transform) combined with a convolutional neural network, two modes of the open source program Tesseract, and the experimental mobile app Google Goggles. The results show that a convolutional neural network combined with the Stroke Width Transform gives the best performance in correctly matched text on images with single characters whereas Google Goggles gives the best performance on images with multiple words. The dataset used for this work is released as well.

Overall Process: the input image is cropped and fed into one of four text recognition algorithms. The results are shown in the bottom row of boxes.

Example images where only our method has succeeded:

During the development of the vision system for the project, we have created a dataset of images of text around QUT Gardens Point Campus, with a focus on room and building labels that are useful for navigation.

Room Label Dataset (192MB)

Room Label Dataset (cropped) (86.6MB)


  • No labels