Weighted kernel method for text categorization

Zhang, L

Weighted kernel method for text categorization

Zhang, L

Permalink

Publication Type:: Thesis
Issue Date:: 2011

Closed Access

	Filename	Description	Size
	01Front.pdf	contents and abstract	2.48 MB	Adobe PDF	View/Open
	02Whole.pdf	thesis	27.53 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, L
dc.date.accessioned	2015-03-04T23:06:43Z
dc.date.available	2015-03-04T23:06:43Z
dc.date.issued	2011
dc.identifier.uri	http://hdl.handle.net/10453/34032
dc.description	University of Technology, Sydney. Faculty of Engineering and Information Technology.	en_US
dc.description	NO FULL TEXT AVAILABLE. This thesis contains 3rd party copyright material. The hardcopy may be available for consultation at the UTS Library.
dc.description.abstract	NO FULL TEXT AVAILABLE. This thesis contains 3rd party copyright material. ----- Text categorization (or classification) is to classify the natural text or hypertext documents into a fixed number of predefined categories based on their content. In the '90s, machine learning became the dominant approach in text categorization. Popular machine learning approaches relied on kernel methods to build an automatic text classifier by learning from a set of pre-classified documents. These methods operate either with word frequency or with word sequences, but not both. This thesis presents a new kernel method that operates with both word frequency and word sequences. The method includes a new kernel model and a critical vector learning algorithm that works on the model to perform the text document classification task. The proposed model, called Weighted Kernel Model (WKM), represents a text document consisting of both its word frequency and sequences information with a weighting algorithm. The motivation for the WKM is that text documents are different. Traditional research was focused on the large collection of documents where each document was also of large size. The learning algorithm based on the proposed WKM demonstrates its effectiveness, accuracy and computational efficiency for various types of text documents, especially for short and medium documents. The thesis demonstrates this with application of WKM to two different text data sets - Reuters news data set and Enron email data set. The thesis concludes with the strengths and limitations of the proposed method.	en_US
dc.format	Thesis (MSc)	en_US
dc.language.iso	en	en_US
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Weighted kernel method for text categorization	en_US
dc.type	Thesis
utslib.copyright.status	closed_access

Abstract:

NO FULL TEXT AVAILABLE. This thesis contains 3rd party copyright material. ----- Text categorization (or classification) is to classify the natural text or hypertext documents into a fixed number of predefined categories based on their content. In the '90s, machine learning became the dominant approach in text categorization. Popular machine learning approaches relied on kernel methods to build an automatic text classifier by learning from a set of pre-classified documents. These methods operate either with word frequency or with word sequences, but not both. This thesis presents a new kernel method that operates with both word frequency and word sequences. The method includes a new kernel model and a critical vector learning algorithm that works on the model to perform the text document classification task. The proposed model, called Weighted Kernel Model (WKM), represents a text document consisting of both its word frequency and sequences information with a weighting algorithm. The motivation for the WKM is that text documents are different. Traditional research was focused on the large collection of documents where each document was also of large size. The learning algorithm based on the proposed WKM demonstrates its effectiveness, accuracy and computational efficiency for various types of text documents, especially for short and medium documents. The thesis demonstrates this with application of WKM to two different text data sets - Reuters news data set and Enron email data set. The thesis concludes with the strengths and limitations of the proposed method.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/34032