Real-time scene text identification using mobile devices

Liu, Xinmin

Full metadata record

DC Field	Value	Language
dc.contributor	Faculty of Engineering	en_US
dc.contributor.advisor	Lun, Pak Kong Daniel (EIE)	-
dc.creator	Liu, Xinmin	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/8005	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	-
dc.rights	All rights reserved	en_US
dc.title	Real-time scene text identification using mobile devices	en_US
dcterms.abstract	Mobile devices have become ubiquitous in our daily lives. As a part of an "Internet of Things" project, this study targets at the detection and recognition of text in static images and live video streams, deployed on mobile devices It is applied to a shopping mall environment for the real-time identification of the trademark of the shops from the images or videos captured using a mobile phone. To achieve this, we integrated state-of-the-art algorithms into our system and add novel features. For instance, we adopted the linear time Maximally Stable Extremal Region (MSER) estimation algorithm for extracting the text candidates and designed a grouping classifier to build hypothesis about text lines. To recognize text contents, we used Google's open source OCR engine Tesseract and designed text similarity measurements for pattern matching in our database. As a feedback, the OCR engine can help the algorithm to further eliminate the non-text candidates. Since the shop trademark can be a graphical logo, we extend our study to the identification of shop logo. We trained a boosting classifier for each logo template in the database using the HOG feature descriptor. The candidates are firstly verified by the difference of color histogram. In addition, we designed client-server architecture. The client uses the fast HOG classifier to extract candidates and the server uses the SIFT to verify with high accuracy. Based on the motion model of the user, we adopt the frame skipping strategy to satisfy the real-time requirements. The contribution of this thesis can be attributed to three main aspects. First, we implemented full functionality of MSER extractor and MSER pruning methods running in linear time. Experiments show that our implementation runs faster and the accuracy is competitive. Second, our system is much more time-efficient and user-friendly. Traditional approaches such as the Google Goggles capture images and then upload to the server. Its success largely relies on the power of the server clusters and big data. And users have to upload photos and wait for results. The data traffic can be a severe factor of performance. While for our system, all the localization and recognition can be done natively and automatically due to the prior knowledge of shopping malls and motion model of the camera. Thirdly, we introduced a text similarity measurement to backward improve the text localization and recognition results. State-of-the-art methods just extract text regions in an aggressive way, but our system makes use of the "meaningfulness" of the text to further filter out non-text candidates. Therefore, this application merges the advantages of machine learning and computer vision to make benefits for human users.	en_US
dcterms.extent	ix, 67 leaves : color illustrations ; 30 cm	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2015	en_US
dcterms.educationalLevel	All Master	en_US
dcterms.educationalLevel	M.Sc.	en_US
dcterms.LCSH	Signal processing -- Digital techniques.	en_US
dcterms.LCSH	Cell phone systems.	en_US
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	en_US
dcterms.accessRights	restricted access	en_US

Files in This Item:

File	Description	Size	Format
b28110584.pdf	For All Users (off-campus access for PolyU Staff & Students only)	3.58 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/8005