Temporary campus guidelines for any gathering other than academic classes, professional education programs (GTPE), or department meetings are available at https://specialevents.gatech.edu/spring-campus-events-guidelines.

Thursday, December 10 2020
Add To My Calendar
Frank-Wolfe Methods for Optimization and Machine Learning

Title: Frank-Wolfe Methods for Optimization and Machine Learning

Date: Thursday, December 10, 2020

Time: 2pm EST

Location: https://bluejeans.com/982850197

Cyrille Combettes

Ph.D. student in Machine Learning

School of Industrial and Systems Engineering

Georgia Institute of Technology


Prof. Sebastian Pokutta (advisor), Institute of Mathematics, Technische Universität Berlin and Department for AI in Society, Science, and Technology, Zuse Institute Berlin

Prof. Swati Gupta, School of Industrial and Systems Engineering, Georgia Institute of Technology

Prof. Guanghui (George) Lan, School of Industrial and Systems Engineering, Georgia Institute of Technology


The Frank-Wolfe algorithm (FW) has become a popular constrained optimization algorithm for it is simple and projection-free, and it has been successfully applied to problems in traffic assignment, matrix completion, computer vision, optimal transport, and many others. Its main drawback however lies in its convergence rate, which can be excessively slow due to naive descent directions. In the first part of the proposal, we address this issue by designing a boosting procedure generating descent directions better aligned with the negative gradients while still preserving the projection-free property. Although our method is reasonably simple and intuitive, it produces very significant results. We derive sublinear to linear convergence rates and demonstrate computational speedups over the state of the art on a wide variety of experiments. 

In the second part of the proposal, we consider the large-scale finite-sum setting arising in many tasks of machine learning. We improve the quality of the first-order information accessed by stochastic FW algorithms by blending in adaptive gradients using a sliding technique. We derive sublinear convergence rates and demonstrate that the blend of the two methods is successful via computational experiments on convex and nonconvex objectives. In particular, our method is the only one to improve the performance of the vanilla stochastic FW on the training of neural networks.

Both previous contributions are motivated by the projection-free property of FW. In the last part of the proposal, we leverage the natural sparsity of the iterates of FW to study an application to the approximate Carathéodory problem. We show that FW generates a simple solution to the problem and that with no modification of the algorithm, better cardinality bounds can be established using existing convergence analyses of FW in different scenarios.