Nearest neighbor selection for iteratively kNN imputation

Zhang, S

Nearest neighbor selection for iteratively kNN imputation

Zhang, S

Permalink

Publication Type:: Journal Article
Citation:: Journal of Systems and Software, 2012, 85 (11), pp. 2541 - 2552
Issue Date:: 2012-11-01

Closed Access

	Filename	Description	Size
	2012001365OK.pdf		798.33 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, S	en_US
dc.date.issued	2012-11-01	en_US
dc.identifier.citation	Journal of Systems and Software, 2012, 85 (11), pp. 2541 - 2552	en_US
dc.identifier.issn	0164-1212	en_US
dc.identifier.uri	http://hdl.handle.net/10453/22844
dc.description.abstract	Existing kNN imputation methods for dealing with missing data are designed according to Minkowski distance or its variants, and have been shown to be generally efficient for numerical variables (features, or attributes). To deal with heterogeneous (i.e., mixed-attributes) data, we propose a novel kNN (k nearest neighbor) imputation method to iteratively imputing missing data, named GkNN (gray kNN) imputation. GkNN selects k nearest neighbors for each missing datum via calculating the gray distance between the missing datum and all the training data rather than traditional distance metric methods, such as Euclidean distance. Such a distance metric can deal with both numerical and categorical attributes. For achieving the better effectiveness, GkNN regards all the imputed instances (i.e., the missing data been imputed) as observed data, which with complete instances (instances without missing values) together to iteratively impute other missing data. We experimentally evaluate the proposed approach, and demonstrate that the gray distance is much better than the Minkowski distance at both capturing the proximity relationship (or nearness) of two instances and dealing with mixed attributes. Moreover, experimental results also show that the GkNN algorithm is much more efficient than existent kNN imputation methods. © 2012 Elsevier Inc. All rights reserved.	en_US
dc.relation	http://purl.org/au-research/grants/arc/DP0985456
dc.relation.ispartof	Journal of Systems and Software	en_US
dc.relation.isbasedon	10.1016/j.jss.2012.05.073	en_US
dc.subject.classification	Software Engineering	en_US
dc.title	Nearest neighbor selection for iteratively kNN imputation	en_US
dc.type	Journal Article
utslib.citation.volume	11	en_US
utslib.citation.volume	85	en_US
utslib.for	0803 Computer Software	en_US
utslib.for	0804 Data Format	en_US
utslib.for	0806 Information Systems	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access
pubs.issue	11	en_US
pubs.publication-status	Published	en_US
pubs.volume	85	en_US

Abstract:

Existing kNN imputation methods for dealing with missing data are designed according to Minkowski distance or its variants, and have been shown to be generally efficient for numerical variables (features, or attributes). To deal with heterogeneous (i.e., mixed-attributes) data, we propose a novel kNN (k nearest neighbor) imputation method to iteratively imputing missing data, named GkNN (gray kNN) imputation. GkNN selects k nearest neighbors for each missing datum via calculating the gray distance between the missing datum and all the training data rather than traditional distance metric methods, such as Euclidean distance. Such a distance metric can deal with both numerical and categorical attributes. For achieving the better effectiveness, GkNN regards all the imputed instances (i.e., the missing data been imputed) as observed data, which with complete instances (instances without missing values) together to iteratively impute other missing data. We experimentally evaluate the proposed approach, and demonstrate that the gray distance is much better than the Minkowski distance at both capturing the proximity relationship (or nearness) of two instances and dealing with mixed attributes. Moreover, experimental results also show that the GkNN algorithm is much more efficient than existent kNN imputation methods. © 2012 Elsevier Inc. All rights reserved.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/22844