Discover semantic topics in patents within a specific domain

Ma, W; Luo, X; Xuan, J; Xue, R; Guo, Y

Discover semantic topics in patents within a specific domain

Ma, W Luo, X Xuan, J

Xue, R Guo, Y

Permalink

Publication Type:: Journal Article
Citation:: Journal of Web Engineering, 2017, 16 (7-8), pp. 653 - 675
Issue Date:: 2017-12-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published VersionAdobe PDF (2.95 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Ma, W	en_US
dc.contributor.author	Luo, X	en_US
dc.contributor.author	Xuan, J https://orcid.org/0000-0002-8367-6908	en_US
dc.contributor.author	Xue, R	en_US
dc.contributor.author	Guo, Y	en_US
dc.date.issued	2017-12-01	en_US
dc.identifier.citation	Journal of Web Engineering, 2017, 16 (7-8), pp. 653 - 675	en_US
dc.identifier.issn	1540-9589	en_US
dc.identifier.uri	http://hdl.handle.net/10453/127214
dc.description.abstract	© Rinton Press. Patent topic discovery is critical for innovation-oriented enterprises to hedge the patent application risks and raise the success rate of patent application. Topic models are commonly recognized as an efficient tool for this task by researchers from both academy and industry. However, many existing well-known topic models, e.g., Latent Dirichlet Allocation (LDA), which are particularly designed for the documents represented by word-vectors, exhibit low accuracy and poor interpretability on patent topic discovery task. The reason is that 1) the semantics of documents are still under-explored in a specific domain 2) and the domain background knowledge is not successfully utilized to guide the process of topic discovery. In order to improve the accuracy and the interpretability, we propose a new patent representation and organization with additional inter-word relationships mined from title, abstract, and claim of patents. The representation can endow each patent with more semantics than word-vector. Meanwhile, we build a Backbone Association Link Network (Backbone ALN) to incorporate domain background semantics to further enhance the semantics of patents. With new semantic-rich patent representations, we propose a Semantic LDA model to discover semantic topics from patents within a specific domain. It can discover semantic topics with association relations between words rather than a single word vector. At last, accuracy and interpretability of the proposed model are verified on real-world patents datasets from the United States Patent and Trademark Office. The experimental results show that Semantic LDA model yields better performance than other conventional models (e.g., LDA). Furthermore, our proposed model can be easily generalized to other related text mining corpus.	en_US
dc.relation.ispartof	Journal of Web Engineering	en_US
dc.title	Discover semantic topics in patents within a specific domain	en_US
dc.type	Journal Article
utslib.citation.volume	7-8	en_US
utslib.citation.volume	16	en_US
utslib.for	0803 Computer Software	en_US
utslib.for	08 Information and Computing Sciences	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	open_access
pubs.issue	7-8	en_US
pubs.publication-status	Published	en_US
pubs.volume	16	en_US

Abstract:

© Rinton Press. Patent topic discovery is critical for innovation-oriented enterprises to hedge the patent application risks and raise the success rate of patent application. Topic models are commonly recognized as an efficient tool for this task by researchers from both academy and industry. However, many existing well-known topic models, e.g., Latent Dirichlet Allocation (LDA), which are particularly designed for the documents represented by word-vectors, exhibit low accuracy and poor interpretability on patent topic discovery task. The reason is that 1) the semantics of documents are still under-explored in a specific domain 2) and the domain background knowledge is not successfully utilized to guide the process of topic discovery. In order to improve the accuracy and the interpretability, we propose a new patent representation and organization with additional inter-word relationships mined from title, abstract, and claim of patents. The representation can endow each patent with more semantics than word-vector. Meanwhile, we build a Backbone Association Link Network (Backbone ALN) to incorporate domain background semantics to further enhance the semantics of patents. With new semantic-rich patent representations, we propose a Semantic LDA model to discover semantic topics from patents within a specific domain. It can discover semantic topics with association relations between words rather than a single word vector. At last, accuracy and interpretability of the proposed model are verified on real-world patents datasets from the United States Patent and Trademark Office. The experimental results show that Semantic LDA model yields better performance than other conventional models (e.g., LDA). Furthermore, our proposed model can be easily generalized to other related text mining corpus.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/127214