Leveraging visual features and hierarchical dependencies for conference information extraction

You, Y; Xu, G; Cao, J; Zhang, Y; Huang, G

Leveraging visual features and hierarchical dependencies for conference information extraction

You, Y Xu, G

Cao, J Zhang, Y Huang, G

Permalink

Publication Type:: Conference Proceeding
Citation:: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2013, 7808 LNCS pp. 404 - 416
Issue Date:: 2013-04-10

Closed Access

	Filename	Description	Size
	2012006404OK.pdf		753.01 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	You, Y	en_US
dc.contributor.author	Xu, G https://orcid.org/0000-0003-4493-6663	en_US
dc.contributor.author	Cao, J	en_US
dc.contributor.author	Zhang, Y	en_US
dc.contributor.author	Huang, G	en_US
dc.date.issued	2013-04-10	en_US
dc.identifier.citation	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2013, 7808 LNCS pp. 404 - 416	en_US
dc.identifier.isbn	9783642374005	en_US
dc.identifier.issn	0302-9743	en_US
dc.identifier.uri	http://hdl.handle.net/10453/33749
dc.identifier.uri	http://hdl.handle.net/10453/26586
dc.description.abstract	Traditional information extraction methods mainly rely on visual feature assisted techniques; but without considering the hierarchical dependencies within the paragraph structure, some important information is missing. This paper proposes an integrated approach for extracting academic information from conference Web pages. Firstly, Web pages are segmented into text blocks by applying a new hybrid page segmentation algorithm which combines visual feature and DOM structure together. Then, these text blocks are labeled by a Tree-structured Random Fields model, and the block functions are differentiated using various features such as visual features, semantic features and hierarchical dependencies. Finally, an additional post-processing is introduced to tune the initial annotation results. Our experimental results on real-world data sets demonstrated that the proposed method is able to effectively and accurately extract the needed academic information from conference Web pages. © 2013 Springer-Verlag.	en_US
dc.relation.ispartof	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)	en_US
dc.relation.isbasedon	10.1007/978-3-642-37401-2_41	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Leveraging visual features and hierarchical dependencies for conference information extraction	en_US
dc.type	Conference Proceeding
utslib.citation.volume	7808 LNCS	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US
pubs.volume	7808 LNCS	en_US

Abstract:

Traditional information extraction methods mainly rely on visual feature assisted techniques; but without considering the hierarchical dependencies within the paragraph structure, some important information is missing. This paper proposes an integrated approach for extracting academic information from conference Web pages. Firstly, Web pages are segmented into text blocks by applying a new hybrid page segmentation algorithm which combines visual feature and DOM structure together. Then, these text blocks are labeled by a Tree-structured Random Fields model, and the block functions are differentiated using various features such as visual features, semantic features and hierarchical dependencies. Finally, an additional post-processing is introduced to tune the initial annotation results. Our experimental results on real-world data sets demonstrated that the proposed method is able to effectively and accurately extract the needed academic information from conference Web pages. © 2013 Springer-Verlag.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/26586