Learning from multiple teacher networks

You, S; Xu, C; Tao, D

Learning from multiple teacher networks

You, S Xu, C

Tao, D

Permalink

Publication Type:: Conference Proceeding
Citation:: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, Part F129685 pp. 1285 - 1294
Issue Date:: 2017-08-13

Closed Access

	Filename	Description	Size
	p1285-you.pdf	Published version	1.18 MB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	You, S	en_US
dc.contributor.author	Xu, C https://orcid.org/0000-0002-4756-0609	en_US
dc.contributor.author	Tao, D https://orcid.org/0000-0001-7225-5449	en_US
dc.date.issued	2017-08-13	en_US
dc.identifier.citation	Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, Part F129685 pp. 1285 - 1294	en_US
dc.identifier.isbn	9781450348874	en_US
dc.identifier.uri	http://hdl.handle.net/10453/127222
dc.description.abstract	© 2017 ACM. Training thin deep networks following the student-teacher learning paradigm has received intensive attention because of its excellent performance. However, to the best of our knowledge, most existing work mainly considers one single teacher network. In practice, a student may access multiple teachers, and multiple teacher networks together provide comprehensive guidance that is beneficial for training the student network. In this paper, we present a method to train a thin deep network by incorporating multiple teacher networks not only in output layer by averaging the softened outputs (dark knowledge) from different networks, but also in the intermediate layers by imposing a constraint about the dissimilarity among examples. We suggest that the relative dissimilarity between intermediate representations of different examples serves as a more flexible and appropriate guidance from teacher networks. Then triplets are utilized to encourage the consistence of these relative dissimilarity relationships between the student network and teacher networks. Moreover, we leverage a voting strategy to unify multiple relative dissimilarity information provided by multiple teacher networks, which realizes their incorporation in the intermediate layers. Extensive experimental results demonstrated that our method is capable of generating a well-performed student network, with the classification accuracy comparable or even superior to all teacher networks, yet having much fewer parameters and being much faster in running.	en_US
dc.relation.ispartof	Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining	en_US
dc.relation.isbasedon	10.1145/3097983.3098135	en_US
dc.title	Learning from multiple teacher networks	en_US
dc.type	Conference Proceeding
utslib.citation.volume	Part F129685	en_US
utslib.for	0804 Data Format	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US
pubs.volume	Part F129685	en_US

Abstract:

© 2017 ACM. Training thin deep networks following the student-teacher learning paradigm has received intensive attention because of its excellent performance. However, to the best of our knowledge, most existing work mainly considers one single teacher network. In practice, a student may access multiple teachers, and multiple teacher networks together provide comprehensive guidance that is beneficial for training the student network. In this paper, we present a method to train a thin deep network by incorporating multiple teacher networks not only in output layer by averaging the softened outputs (dark knowledge) from different networks, but also in the intermediate layers by imposing a constraint about the dissimilarity among examples. We suggest that the relative dissimilarity between intermediate representations of different examples serves as a more flexible and appropriate guidance from teacher networks. Then triplets are utilized to encourage the consistence of these relative dissimilarity relationships between the student network and teacher networks. Moreover, we leverage a voting strategy to unify multiple relative dissimilarity information provided by multiple teacher networks, which realizes their incorporation in the intermediate layers. Extensive experimental results demonstrated that our method is capable of generating a well-performed student network, with the classification accuracy comparable or even superior to all teacher networks, yet having much fewer parameters and being much faster in running.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/127222