Bernoulli random forests: Closing the gap between theoretical consistency and empirical soundness

Yisen, W; Qingtao, T; Xia, ST; Wu, J; Zhu, X

Bernoulli random forests: Closing the gap between theoretical consistency and empirical soundness

Yisen, W Qingtao, T Xia, ST Wu, J Zhu, X

Permalink

Publication Type:: Conference Proceeding
Citation:: IJCAI International Joint Conference on Artificial Intelligence, 2016, 2016-January pp. 2167 - 2173
Issue Date:: 2016-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published versionAdobe PDF (605.74 kB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Yisen, W	en_US
dc.contributor.author	Qingtao, T	en_US
dc.contributor.author	Xia, ST	en_US
dc.contributor.author	Wu, J	en_US
dc.contributor.author	Zhu, X	en_US
dc.date.issued	2016-01-01	en_US
dc.identifier.citation	IJCAI International Joint Conference on Artificial Intelligence, 2016, 2016-January pp. 2167 - 2173	en_US
dc.identifier.issn	1045-0823	en_US
dc.identifier.uri	http://hdl.handle.net/10453/121835
dc.description.abstract	Random forests are one type of the most effective ensemble learning methods. In spite of their sound empirical performance, the study on their theoretical properties has been left far behind. Recently, several random forests variants with nice theoretical basis have been proposed, but they all suffer from poor empirical performance. In this paper, we propose a Bernoulli random forests model (BRF), which intends to close the gap between the theoretical consistency and the empirical soundness of random forests classification. Compared to Breiman's original random forests, BRF makes two simplifications in tree construction by using two independent Bernoulli distributions. The first Bernoulli distribution is used to control the selection of candidate attributes for each node of the tree, and the second one controls the splitting point used by each node. As a result, BRF enjoys proved theoretical consistency, so its accuracy will converge to optimum (i.e., the Bayes risk) as the training data grow infinitely large. Empirically, BRF demonstrates the best performance among all theoretical random forests, and is very comparable to Breiman's original random forests (which do not have the proved consistency yet). The theoretical and experimental studies advance the research one step further towards closing the gap between the theory and the practical performance of random forests classification.	en_US
dc.relation.ispartof	IJCAI International Joint Conference on Artificial Intelligence	en_US
dc.title	Bernoulli random forests: Closing the gap between theoretical consistency and empirical soundness	en_US
dc.type	Conference Proceeding
utslib.citation.volume	2016-January	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	open_access
pubs.publication-status	Published	en_US
pubs.volume	2016-January	en_US

Abstract:

Random forests are one type of the most effective ensemble learning methods. In spite of their sound empirical performance, the study on their theoretical properties has been left far behind. Recently, several random forests variants with nice theoretical basis have been proposed, but they all suffer from poor empirical performance. In this paper, we propose a Bernoulli random forests model (BRF), which intends to close the gap between the theoretical consistency and the empirical soundness of random forests classification. Compared to Breiman's original random forests, BRF makes two simplifications in tree construction by using two independent Bernoulli distributions. The first Bernoulli distribution is used to control the selection of candidate attributes for each node of the tree, and the second one controls the splitting point used by each node. As a result, BRF enjoys proved theoretical consistency, so its accuracy will converge to optimum (i.e., the Bayes risk) as the training data grow infinitely large. Empirically, BRF demonstrates the best performance among all theoretical random forests, and is very comparable to Breiman's original random forests (which do not have the proved consistency yet). The theoretical and experimental studies advance the research one step further towards closing the gap between the theory and the practical performance of random forests classification.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/121835