Approximate conditional gradient descent on multi-class classification
- Publication Type:
- Conference Proceeding
- Citation:
- 31st AAAI Conference on Artificial Intelligence, AAAI 2017, 2017, pp. 2301 - 2307
- Issue Date:
- 2017-01-01
Closed Access
Filename | Description | Size | |||
---|---|---|---|---|---|
aaai17_grad.pdf | Published version | 676.16 kB |
Copyright Clearance Process
- Recently Added
- In Progress
- Closed Access
This item is closed access and not available.
© Copyright 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Conditional gradient descent, aka the Frank-Wolfe algorithm, regains popularity in recent years. The key advantage of Frank-Wolfe is that at each step the expensive projection is replaced with a much more efficient linear optimization step. Similar to gradient descent, the loss function of FrankWolfe scales with the data size. Training on big data poses a challenge for researchers. Recently, stochastic Frank-Wolfe methods have been proposed to solve the problem, but they do not perform well in practice. In this work, we study the problem of approximating the Frank-Wolfe algorithm on the large-scale multi-class classification problem which is a typical application of the Frank-Wolfe algorithm. We present a simple but effective method employing internal structure of data to approximate Frank-Wolfe on the large-scale multiclass classification problem. Empirical results verify that our method outperforms the state-of-the-art stochastic projectionfree methods.
Please use this identifier to cite or link to this item: