Detecting text in clutter scene

Cui, X

Detecting text in clutter scene

Cui, X

Permalink

Publication Type:: Thesis
Issue Date:: 2014

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (189.47 kB)

Adobe PDF

Download thesisAdobe PDF (9.87 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Cui, X
dc.date.accessioned	2015-09-01T23:08:26Z
dc.date.available	2015-09-01T23:08:26Z
dc.date.issued	2014
dc.identifier.uri	http://hdl.handle.net/10453/36998
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US
dc.description.abstract	We often encounter cluttered visual scenes and need to identify objects correctly to navigate and interact with the world. As text takes the typical form of a human-designed informative visual object, retrieving texts in both indoor and outdoor environments is an important step towards providing contextual clues for a wide variety of vision tasks. Furthermore, it plays an invaluable role for multimedia retrieval and location based services. Text detection from clutter background is nevertheless a challenging task because the text, being figures in image, can be presented in various ways with lots of room for uncertainty such as size, scale, font type, font texture and colour, unpredicted decorative elements put on the text, etc. The situation will be even more complicated if the text is presented in a clutter background where non-text objects possess similar low-level features to text. Further, all these objects are composed of distinct geometric shapes and they are similar with the essential composition elements of text objects. Pursuing a robust text feature descriptor is therefore always difficult because special feature descriptor is only a fragment of text existence. It needs the completely understanding of text. Regarding the design, understanding, representation and calculating of text as one unitary process of text perceiving, we deal with the completely understanding and representation of text in image with many kinds of aspects in different levels. Without following the legend feature based solution, this research is motivated by perceptual image processing and the observation of painting masters. It will explore a brand new solution by investigating the spatial structure of text and the compositional complexity of the visual object (i.e. text) in image. The research will present the composition granularity indicator and expose novel discriminable attributes embedded inside text objects, which can successfully differentiate text regions and non-text regions on clutter backgrounds. As figures in image with the clutter scene, it is merely the physical appearance of text which provides the perceptual content and plays a central role for text detection, i.e. location and coarse identification. During the view-construction of text, properties of individual character and textual organization of characters build up the physical appearance. When observers see text appearance in clutter scene, they describe their feelings in terms of crowding effect and clutter. However, the appearance of text still has enough saliency to reveal an informative message. Accordingly, text not only has the characteristics of crowding effect and clutter but also follows the principles of saliency. Significantly, the crowding effect of text is derived from the space regularity of inbuilt neighbouring letters which have commonalities beside their distinctiveness. In addition, low-level features of individual letters contribute to the commonalities and distinctiveness from the moment that the font is designed. Therefore, the computational model of text appearance is built up to integrate the three-level properties, including features of individual characters (low-level features), properties for spatial regularity (i.e. neighbourhood, appearance similarity), and the crowding statistics property of space averaged over pooling regions. In terms of image processing, if we consider the view construction of text, the features of individual characters in image processing are obtained on the basis of the properties of construction, including mean intensity, local RMS contrast, shape, pixel density, edge density, stroke width, straight line ratio, height to width ratio, stroke width to height ratio, etc. For the purpose of calculating the properties of space regularity and the crowding space averaging property, the spatial elements and relations are quantified and these involve space granularity and composition rules. If we examine the works of painters, especially impressionists, they use directional brushstroke or colour patches as space granularity to represent “formless” visual objects in space regularity instead of clear contour shape sketches. The space regularity of patches, i.e. repetitive patterns, can offer a compositional format to express an artist’s feelings about an object rather than to simply describe it. Secondly, it is the harmonious proportions among component parts that bridle component space patches into objects. If we consider the painter’s harmonious proportions, the component parts of an object can be said to react simultaneously so that they can be seen at one and the same time both together and separately. Similarly, image is described by a set of grey space patches in multi-grey levels. In addition, each space patch groups pixels in position proximity and similarity, in just the same way as the colour patch is used by impressionists. The space organisation of them is also quantified as the measurement of space relations, especially in terms of the neighbourhood and proportions among component parts. Moreover, the harmonious proportions among space patches are captured by the mathematical tool of geometric mean. Geometric mean (i.e., GM) is calculated over those space patches which possess the same grey level, and considered as the space granularity to form objects. Grey patches with the same GM are composed of GM regions, which are enlarged, extended kinds of pooling regions. Regions given by clusters which have resulted from similarity and neighbourhood are direct, compact pooling regions. Therefore, the statistical properties of space averaging are calculated over GM regions and image is represented as a set of GM regions over which text and other visual objects are analysed by GM indication. Finally, the representation of an image and the three-level computational text model are put into practice to develop a new-brand algorithm on the public benchmark dataset and to design and implement an automatic processing system on the real big data of the bank cheque. The resulting performance of these tools/processes shows that they are highly competitive and effective.	en_US
dc.format	Thesis (PhD)	en_US
dc.language.iso	en	en_US
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/36998/2/02whole.pdf
dc.rights	au.edu.uts.lib/ppc
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	info:eu-repo/semantics/openAccess
dc.title	Detecting text in clutter scene	en_US
dc.type	Thesis
utslib.copyright.status	open_access

Abstract:

We often encounter cluttered visual scenes and need to identify objects correctly to navigate and interact with the world. As text takes the typical form of a human-designed informative visual object, retrieving texts in both indoor and outdoor environments is an important step towards providing contextual clues for a wide variety of vision tasks. Furthermore, it plays an invaluable role for multimedia retrieval and location based services. Text detection from clutter background is nevertheless a challenging task because the text, being figures in image, can be presented in various ways with lots of room for uncertainty such as size, scale, font type, font texture and colour, unpredicted decorative elements put on the text, etc. The situation will be even more complicated if the text is presented in a clutter background where non-text objects possess similar low-level features to text. Further, all these objects are composed of distinct geometric shapes and they are similar with the essential composition elements of text objects. Pursuing a robust text feature descriptor is therefore always difficult because special feature descriptor is only a fragment of text existence. It needs the completely understanding of text. Regarding the design, understanding, representation and calculating of text as one unitary process of text perceiving, we deal with the completely understanding and representation of text in image with many kinds of aspects in different levels. Without following the legend feature based solution, this research is motivated by perceptual image processing and the observation of painting masters. It will explore a brand new solution by investigating the spatial structure of text and the compositional complexity of the visual object (i.e. text) in image. The research will present the composition granularity indicator and expose novel discriminable attributes embedded inside text objects, which can successfully differentiate text regions and non-text regions on clutter backgrounds. As figures in image with the clutter scene, it is merely the physical appearance of text which provides the perceptual content and plays a central role for text detection, i.e. location and coarse identification. During the view-construction of text, properties of individual character and textual organization of characters build up the physical appearance. When observers see text appearance in clutter scene, they describe their feelings in terms of crowding effect and clutter. However, the appearance of text still has enough saliency to reveal an informative message. Accordingly, text not only has the characteristics of crowding effect and clutter but also follows the principles of saliency. Significantly, the crowding effect of text is derived from the space regularity of inbuilt neighbouring letters which have commonalities beside their distinctiveness. In addition, low-level features of individual letters contribute to the commonalities and distinctiveness from the moment that the font is designed. Therefore, the computational model of text appearance is built up to integrate the three-level properties, including features of individual characters (low-level features), properties for spatial regularity (i.e. neighbourhood, appearance similarity), and the crowding statistics property of space averaged over pooling regions. In terms of image processing, if we consider the view construction of text, the features of individual characters in image processing are obtained on the basis of the properties of construction, including mean intensity, local RMS contrast, shape, pixel density, edge density, stroke width, straight line ratio, height to width ratio, stroke width to height ratio, etc. For the purpose of calculating the properties of space regularity and the crowding space averaging property, the spatial elements and relations are quantified and these involve space granularity and composition rules. If we examine the works of painters, especially impressionists, they use directional brushstroke or colour patches as space granularity to represent “formless” visual objects in space regularity instead of clear contour shape sketches. The space regularity of patches, i.e. repetitive patterns, can offer a compositional format to express an artist’s feelings about an object rather than to simply describe it. Secondly, it is the harmonious proportions among component parts that bridle component space patches into objects. If we consider the painter’s harmonious proportions, the component parts of an object can be said to react simultaneously so that they can be seen at one and the same time both together and separately. Similarly, image is described by a set of grey space patches in multi-grey levels. In addition, each space patch groups pixels in position proximity and similarity, in just the same way as the colour patch is used by impressionists. The space organisation of them is also quantified as the measurement of space relations, especially in terms of the neighbourhood and proportions among component parts. Moreover, the harmonious proportions among space patches are captured by the mathematical tool of geometric mean. Geometric mean (i.e., GM) is calculated over those space patches which possess the same grey level, and considered as the space granularity to form objects. Grey patches with the same GM are composed of GM regions, which are enlarged, extended kinds of pooling regions. Regions given by clusters which have resulted from similarity and neighbourhood are direct, compact pooling regions. Therefore, the statistical properties of space averaging are calculated over GM regions and image is represented as a set of GM regions over which text and other visual objects are analysed by GM indication. Finally, the representation of an image and the three-level computational text model are put into practice to develop a new-brand algorithm on the public benchmark dataset and to design and implement an automatic processing system on the real big data of the bank cheque. The resulting performance of these tools/processes shows that they are highly competitive and effective.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/36998