publications policy — I do my best to maintain updated versions with possible typo corrections and clarifications on arxiv (both are generally marked in bold and red for easy reference). Therefore, please favor the arxiv versions to the official published ones.

codes — see my github profile for all my codes. The current version of the Performance EStimation TOolbox (PESTO) is available from here (user manual, conference proceeding). The numerical worst-case analyses from PEP can now be performed just by writting the algorithms just as you would implement them in Matlab. The new PEPit (performance estimation in Python) is available from here (due to the fabulous work of Baptiste Goujaud and Céline Moucer). It is easy to experiment with it using this notebook (see colab).

preprints

preprint

PEPit: computer-assisted worst-case analyses of first-order optimization methods in Python

PEPit is a python package aiming at simplifying the access to worst-case analyses of a large
family of first-order optimization methods possibly involving gradient, projection, proximal,
or linear optimization oracles, along with their approximate, or Bregman variants.
In short, PEPit is a package enabling computer-assisted worst-case analyses of first-
order optimization methods. The key underlying idea is to cast the problem of performing
a worst-case analysis, often referred to as a performance estimation problem (PEP), as a
semidefinite program (SDP) which can be solved numerically. For doing that, the package
users are only required to write first-order methods nearly as they would have implemented
them. The package then takes care of the SDP modelling parts, and the worst-case analysis
is performed numerically via a standard solver.

preprint

Optimal first-order methods for convex functions with a quadratic upper bound

We analyze worst-case convergence guarantees of first-order optimization methods over a function class extending that of smooth and convex functions. This class contains convex functions that admit a simple quadratic upper bound. Its study is motivated by its stability under minor perturbations.
We provide a thorough analysis of first-order methods, including worst-case convergence guarantees for several algorithms, and demonstrate that some of them achieve the optimal worst-case guarantee over the class. We support our analysis by numerical validation of worst-case guarantees using performance estimation problems. A few observations can be drawn from this analysis, particularly regarding the optimality (resp. and adaptivity) of the heavy-ball method (resp. heavy-ball with line-search). Finally, we show how our analysis can be leveraged to obtain convergence guarantees over more complex classes of functions. Overall, this study brings insights on the choice of function classes over which standard first-order methods have working worst-case guarantees.

preprint

A systematic approach to Lyapunov analyses of continuous-time models in convex optimization

First-order methods are often analyzed via their continuous-time models, where their worst-case convergence properties are usually approached via Lyapunov functions. In this work, we provide a systematic and principled approach to find and verify Lyapunov functions for classes of ordinary and stochastic differential equations. More precisely, we extend the performance estimation framework, originally proposed by Drori and Teboulle [10], to continuous-time models. We retrieve convergence results comparable to those of discrete methods using fewer assumptions and convexity inequalities, and provide new results for stochastic accelerated gradient flows.

preprint

Quadratic minimization: from conjugate gradient to an adaptive Heavy-ball method with Polyak step-sizes

In this work, we propose an adaptive variation on the classical Heavy-ball method for convex quadratic minimization. The adaptivity crucially relies on so-called “Polyak step-sizes”, which consists in using the knowledge of the optimal value of the optimization problem at hand instead of problem parameters such as a few eigenvalues of the Hessian of the problem. This method happens to also be equivalent to a variation of the classical conjugate gradient method, and thereby inherits many of its attractive features, including its finite-time convergence, instance optimality, and its worst-case convergence rates.
The classical gradient method with Polyak step-sizes is known to behave very well in situations in which it can be used, and the question of whether incorporating momentum in this method is possible and can improve the method itself appeared to be open. We provide a definitive answer to this question for minimizing convex quadratic functions, a arguably necessary first step for developing such methods in more general setups.

preprint

Convergence of Proximal Point and Extragradient-Based Methods Beyond Monotonicity: the Case of Negative Comonotonicity

Algorithms for min-max optimization and variational inequalities are often studied under monotonicity assumptions. Motivated by non-monotone machine learning applications, we follow the line of works [Diakonikolas et al., 2021, Lee and Kim, 2021, Pethick et al., 2022, Böhm, 2022] aiming at going beyond monotonicity by considering the weaker negative comonotonicity assumption. In particular, we provide tight complexity analyses for the Proximal Point, Extragradient, and Optimistic Gradient methods in this setup, closing some questions on their working guarantees beyond monotonicity.

This monograph covers some recent advances in a range of acceleration techniques frequently used in convex optimization. We first use quadratic optimization problems to introduce two key families of methods, namely momentum and nested optimization schemes. They coincide in the quadratic case to form the Chebyshev method.
We discuss momentum methods in detail, starting with the seminal work of Nesterov [1] and structure convergence proofs using a few master templates, such as that for optimized gradient methods, which provide the key benefit of showing how momentum methods optimize convergence guarantees. We further cover proximal acceleration, at the heart of the Catalyst and Accelerated Hybrid Proximal Extragradient frameworks, using similar algorithmic patterns.
Common acceleration techniques rely directly on the knowledge of some of the regularity parameters in the problem at hand. We conclude by discussing restart schemes, a set of simple techniques for reaching nearly optimal convergence rates while adapting to unobserved regularity parameters.

journals

journal

An optimal gradient method for smooth strongly convex minimization

We present an optimal gradient method for smooth strongly convex optimization. The method is optimal in the sense that its worst-case bound on the distance to an optimal point exactly matches the lower bound on the oracle complexity for the class of problems, meaning that no black-box first-order method can have a better worst-case guarantee without further assumptions on the class of problems at hand. In addition, we provide a constructive recipe for obtaining the algorithmic parameters of the method and illustrate that it can be used for deriving methods for other optimality criteria as well.

journal

On the oracle complexity of smooth strongly convex minimization

We construct a family of functions suitable for establishing lower bounds on the oracle complexity of first-order minimization of smooth strongly-convex functions. Based on this construction, we derive new lower bounds on the complexity of strongly-convex minimization under various inaccuracy criteria. The new bounds match the known upper bounds up to a constant factor, and when the inaccuracy of a solution is measured by its distance to the solution set, the new lower bound exactly matches the upper bound obtained by the recent Information-Theoretic Exact Method by the same authors, thereby establishing the exact oracle complexity for this class of problems.

journal

Principled Analyses and Design of First-Order Methods with Inexact Proximal Operators

Proximal operations are among the most common primitives appearing in both practical and theoretical (or high-level) optimization methods. This basic operation typically consists in solving an intermediary (hopefully simpler) optimization problem. In this work, we survey notions of inaccuracies that can be used when solving those intermediary optimization problems. Then, we show that worst-case guarantees for algorithms relying on such inexact proximal operations can be systematically obtained through a generic procedure based on semidefinite programming. This methodology is primarily based on the approach introduced by Drori and Teboulle (2014) and on convex interpolation results, and allows producing non-improvable worst-case analyzes. In other words, for a given algorithm, the methodology generates both worst-case certificates (i.e., proofs) and problem instances on which those bounds are achieved.
Relying on this methodology, we study numerical worst-case performances of a few basic methods relying on inexact proximal operations including accelerated variants, and design a variant with optimized worst-case behaviour. We further illustrate how to extend the approach to support strongly convex objectives by studying a simple relatively inexact proximal minimization method.

journal

A note on approximate accelerated forward-backward methods with absolute and relative errors, and possibly strongly convex objectives

In this short note, we provide a simple version of an accelerated forward-backward method (a.k.a. Nesterov’s accelerated proximal gradient method) possibly relying on approximate proximal operators and allowing to exploit strong convexity of the objective function. The method supports both relative and absolute errors, and its behavior is illustrated on a set of standard numerical experiments. Using the same developments, we further provide a version of the accelerated proximal hybrid extragradient method of Monteiro and Svaiter (2013) possibly exploiting strong convexity of the objective function.

journal

Convergence of a Constrained Vector Extrapolation Scheme

We prove non asymptotic linear convergence rates for the constrained Anderson acceleration extrapolation scheme. These guarantees come from new upper bounds on the constrained Chebyshev problem, which consists in minimizing the maximum absolute value of a polynomial on a bounded real interval with l1 constraints on its coefficients vector. Constrained Anderson Acceleration has a numerical cost comparable to that of the original scheme.

journal

Optimal complexity and certification of Bregman first-order methods

We provide a lower bound showing that the O(1/k) convergence rate of the NoLips method (a.k.a. Bregman Gradient) is optimal for the class of functions satisfying the h-smoothness assumption. This assumption, also known as relative smoothness, appeared in the recent developments around the Bregman Gradient method, where acceleration remained an open issue. On the way, we show how to constructively obtain the corresponding worst-case functions by extending the computer-assisted performance estimation framework of Drori and Teboulle (Mathematical Programming, 2014) to Bregman first-order methods, and to handle the classes of differentiable and strictly convex functions.

journal

Efficient first-order methods for convex minimization: a constructive approach

We describe a novel constructive technique for devising efficient first-order methods for a wide range of large-scale convex minimization settings, including smooth, non-smooth, and strongly convex minimization. The technique builds upon a certain variant of the conjugate gradient method to construct a family of methods such that a) all methods in the family share the same worst-case guarantee as the base conjugate gradient method, and b) the family includes a fixed-step first-order method. We demonstrate the effectiveness of the approach by deriving optimal methods for the smooth and non-smooth cases, including new methods that forego knowledge of the problem parameters at the cost of a one-dimensional line search per iteration, and a universal method for the union of these classes that requires a three-dimensional search
per iteration. In the strongly convex case, we show how numerical tools can be used to perform the construction, and show that the resulting method offers an improved worst-case bound compared to Nesterov’s celebrated fast gradient method.

We propose a methodology for studying the performance of common splitting methods through semidefinite programming. We prove tightness of the methodology and demonstrate its value by presenting two applications of it. First, we use the methodology as a tool for computer-assisted proofs to prove tight analytical contraction factors for Douglas–Rachford splitting that are likely too complicated for a human to find bare-handed. Second, we use the methodology as an algorithmic tool to computationally select the optimal splitting method parameters by solving a series of semidefinite programs.

journal

Worst-case convergence analysis of inexact gradient and Newton methods through semidefinite programming performance estimation

We provide new tools for worst-case performance analysis of the gradient (or steepest descent) method of Cauchy for smooth strongly convex functions, and Newton’s method for self-concordant functions, including the case of inexact search directions. The analysis uses
semidefinite programming performance estimation, as pioneered by Drori and Teboulle [Mathematical Programming, 145(1-2):451–482, 2014], and extends recent performance estimation results for the method of Cauchy by the authors [Optimization Letters, 11(7), 1185-1199, 2017]. To illustrate
the applicability of the tools, we demonstrate a novel complexity analysis of short step interior point methods using inexact search directions. As an example in this framework, we sketch how to give a rigorous worst-case complexity analysis of a recent interior point method by Abernethy and
Hazan [PMLR, 48:2520–2528, 2016].

journal

Exact worst-case convergence rates of the proximal gradient method for composite convex minimization

We study the worst-case convergence rates of the proximal gradient method for minimizing the sum of a smooth strongly convex function and a non-smooth convex function whose proximal operator is available.
We establish the exact worst-case convergence rates of the proximal gradient method in this setting for any step size and for different standard performance measures: objective function accuracy, distance to optimality and residual gradient norm.
The proof methodology relies on recent developments in performance estimation of first-order methods based on semidefinite programming. In the case of the proximal gradient method, this methodology allows obtaining exact and non-asymptotic worst-case guarantees that are conceptually very simple, although apparently new.
On the way, we discuss how strong convexity can be replaced by weaker assumptions, while preserving the corresponding convergence rates. We also establish that the same fixed step size policy is optimal for all three performance measures. Finally, we extend recent results on the worst-case behavior of gradient descent with exact line search to the proximal case.

journal

Exact worst-case performance of first-order methods for composite convex optimization

We provide a framework for computing the exact worst-case performance of any algorithm belonging to a broad class of oracle-based first-order methods for composite convex optimization, including those performing explicit, projected, proximal, conditional and inexact (sub)gradient steps. We simultaneously obtain tight worst-case guarantees and explicit instances of optimization problems on which the algorithm reaches this worst-case. We achieve this by reducing the computation of the worst-case to solving a convex semidefinite program, generalizing previous works on performance estimation by Drori and Teboulle [13] and the authors [43]. We use these developments to obtain a tighter analysis of the proximal point algorithm and of several variants of fast proximal gradient, conditional gradient, subgradient and alternating projection methods. In particular, we present a new analytical worst-case guarantee for the proximal point algorithm that is twice better than previously known, and improve the standard worst-case guarantee for the conditional gradient method by more than a factor of two. We also show how the optimized gradient method proposed by Kim and Fessler in [22] can be extended by incorporating a projection or a proximal operator, which leads to an algorithm that converges in the worst-case twice as fast as the standard accelerated proximal gradient method [2].

journal

On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions [Best paper award]

We consider the gradient (or steepest) descent method with exact line search applied to a strongly convex function with Lipschitz continuous gradient. We establish the exact worst-case rate of convergence of this scheme, and show that this worst-case behavior is exhibited by a certain convex quadratic function. We also give the tight worst-case complexity bound for a noisy variant of gradient descent method, where exact line-search is performed in a search direction that differs from negative gradient by at most a prescribed relative tolerance. The proofs are computer-assisted, and rely on the resolutions of semidefinite programming performance estimation problems as introduced in the paper (Drori and Teboulle, Math Progr 145(1–2):451–482, 2014).

journal

Smooth strongly convex interpolation and exact worst-case performance of first-order methods

We show that the exact worst-case performance of fixed-step first-order methods for unconstrained optimization of smooth (possibly strongly) convex functions can be obtained by solving convex programs.
Finding the worst-case performance of a black-box first-order method is formulated as an optimization problem over a set of smooth (strongly) convex functions and initial conditions. We develop closed-form necessary and sufficient conditions for smooth (strongly) convex interpolation, which provide a finite representation for those functions. This allows us to reformulate the worst-case performance estimation problem as an equivalent finite dimension-independent semidefinite optimization problem, whose exact solution can be recovered up to numerical precision. Optimal solutions to this performance estimation problem provide both worst-case performance bounds and explicit functions matching them, as our smooth (strongly) convex interpolation procedure is constructive.
Our works build on those of Drori and Teboulle in [Math. Prog. 145 (1-2), 2014] who introduced and solved relaxations of the performance estimation problem for smooth convex functions.
We apply our approach to different fixed-step first-order methods with several performance criteria, including objective function accuracy and gradient norm. We conjecture several numerically supported worst-case bounds on the performance of the fixed-step gradient, fast gradient and optimized gradient methods, both in the smooth convex and the smooth strongly convex cases, and deduce tight estimates of the optimal step size for the gradient method.

conferences

conference

Last-Iterate Convergence of Optimistic Gradient Method for Monotone Variational Inequalities

The Past Extragradient (PEG) [Popov, 1980] method, also known as the Optimistic Gradient method, has known a recent gain in interest in the optimization community with the emergence of variational inequality formulations for machine learning. Recently, in the unconstrained case, Golowich et al. [2020] proved that a O(1/N) last-iterate convergence rate in terms of the squared norm of the operator can be achieved for Lipschitz and monotone operators with a Lipschitz Jacobian. In this work, by introducing a novel analysis through potential functions, we show that (i) this O(1/N) last-iterate convergence can be achieved without any assumption on the Jacobian of the operator, and (ii) it can be extended to the constrained case, which was not derived before even under Lipschitzness of the Jacobian. The proof is significantly different from the one known from Golowich et al. [2020], and its discovery was computer-aided. Those results close the open question of the last iterate convergence of PEG for monotone variational inequalities.

conference

Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization

We consider the problem of minimizing the sum of two convex functions. One of those functions has Lipschitz-continuous gradients, and can be accessed via stochastic oracles, whereas the other is "simple". We provide a Bregman-type algorithm with accelerated convergence in function values to a ball containing the minimum. The radius of this ball depends on problem-dependent constants, including the variance of the stochastic oracle. We further show that this algorithmic setup naturally leads to a variant of Frank-Wolfe achieving acceleration under parallelization. More precisely, when minimizing a smooth convex function on a bounded domain, we show that one can achieve an εprimal-dual gap (in expectation) in \tildeO(1/\sqrt ε) iterations, by only accessing gradients of the original function and a linear maximization oracle with O(1/\sqrt ε) computing units in parallel. We illustrate this fast convergence on synthetic numerical experiments.

conference

PROX-QP: Yet another Quadratic Programming Solver for Robotics and beyond

Quadratic programming (QP) has become a core modelling component in the modern engineering toolkit. This is particularly true for simulation, planning and control in robotics. Yet, modern numerical solvers have not reached the level of efficiency and reliability required in practical applications where speed, robustness, and accuracy are all necessary. In this work, we introduce a few variations of the well-established augmented Lagrangian method, specifically for solving QPs, which include heuristics for improving practical numerical performances. Those variants are embedded within an open-source software which includes an efficient C++ implementation, a modular API, as well as best-performing heuristics for our test-bed. Relying on this framework, we present a benchmark studying the practical performances of modern optimization solvers for convex QPs on generic and complex problems of the literature as well as on common robotic scenarios. This benchmark notably highlights that this approach outperforms modern solvers in terms of efficiency, accuracy and robustness for small to medium-sized problems, while remaining competitive for higher dimensions.

We develop a convergence-rate analysis of momentum with cyclical step-sizes. We show that under some assumption on the spectral gap of Hessians in machine learning, cyclical step-sizes are provably faster than constant step-sizes. More precisely, we develop a convergence rate analysis for quadratic objectives that provides optimal parameters and shows that cyclical learning rates can improve upon traditional lower complexity bounds. We further propose a systematic approach to design optimal first order methods for quadratic minimization with a given spectral structure. Finally, we provide a local convergence rate analysis beyond quadratic minimization for the proposed methods and illustrate our findings through benchmarks on least squares and logistic regression problems.

conference

A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip [Outstanding paper award]

We introduce the "continuized"‘ Nesterov acceleration, a close variant of Nesterov
acceleration whose variables are indexed by a continuous time parameter. The
two variables continuously mix following a linear ordinary differential equation
and take gradient steps at random times. This continuized variant benefits from
the best of the continuous and the discrete frameworks: as a continuous process,
one can use differential calculus to analyze convergence and obtain analytical
expressions for the parameters; and a discretization of the continuized process
can be computed exactly with convergence rates similar to those of Nesterov
original acceleration. We show that the discretization has the same structure
as Nesterov acceleration, but with random parameters. We provide continuized
Nesterov acceleration under deterministic as well as stochastic gradients, with
either additive or multiplicative noise. Finally, using our continuized framework
and expressing the gossip averaging problem as the stochastic minimization of a
certain energy function, we provide the first rigorous acceleration of asynchronous
gossip algorithms.

conference

Complexity guarantees for Polyak steps with momentum

In smooth strongly convex optimization, knowledge of the strong convexity parameter is critical for obtaining simple methods with accelerated rates. In this work, we study a class of methods, based on Polyak steps, where this knowledge is substituted by that of the optimal value, 𝑓∗. We first show slightly improved convergence bounds than previously known for the classical case of simple gradient descent with Polyak steps, we then derive an accelerated gradient method with Polyak steps and momentum, along with convergence guarantees.

conference

Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions

We provide a novel computer-assisted technique for systematically analyzing first-order methods for optimization. In contrast with previous works, the approach is particularly suited for handling sublinear convergence rates and stochastic oracles. The technique relies on semidefinite programming and potential functions. It allows simultaneously obtaining worst-case guarantees on the behavior of those algorithms, and assisting in choosing appropriate parameters for tuning their worst-case performances. The technique also benefits from comfortable tightness guarantees, meaning that unsatisfactory results can be improved only by changing the setting. We use the approach for analyzing deterministic and stochastic first-order methods under different assumptions on the nature of the stochastic noise. Among others, we treat unstructured noise with bounded variance, different noise models arising in over-parametrized expectation minimization problems, and randomized block-coordinate descent schemes.

conference

Lyapunov functions for first-order methods: Tight automated convergence guarantees

We present a novel way of generating Lyapunov functions for proving linear convergence rates of first-order optimization methods. Our approach provably obtains the fastest linear convergence rate that can be verified by a quadratic Lyapunov function (with given states), and only relies on solving a small-sized semidefinite program. Our approach combines the advantages of performance estimation problems (PEP, due to Drori and Teboulle (2014)) and integral quadratic constraints (IQC, due to Lessard et al. (2016)), and relies on convex interpolation (due to Taylor et al. (2017c;b)).

We present a MATLAB toolbox that automatically computes tight worst-case performance guarantees for a broad class of first-order methods for convex optimization. The class of methods includes those performing explicit, projected, proximal, conditional and inexact (sub)gradient steps. The toolbox relies on the performance estimation (PE) framework, which recently emerged through works of Drori and Teboulle and the authors. The PE approach is a very systematic
manner of obtaining non-improvable worst-case guarantees for first-order numerical optimization schemes. However, using the PE methodology requires modelling efforts from the user, along
with some knowledge of semidefinite programming. The goal of this work is to ease the use of the performance estimation methodology, by providing a toolbox that implicitly does the modelling job. In short, its aim is to (i) let the user write the algorithm in a natural way, as he/she would have implemented it, and (ii) let the computer perform the modelling and worst-case analysis parts automatically.

The goal of this thesis is to show how to derive in a completely automated way exact and global worst-case guarantees for first-order methods in convex optimization. To this end, we formulate a generic optimization problem looking for the worst-case scenarios. The worst-case computation problems, referred to as performance estimation problems (PEPs), are intrinsically infinite-dimensional optimization problems formulated over a given class of objective functions. To render those problems tractable, we develop (smooth and non-smooth) convex interpolation framework, which provides necessary and sufficient conditions to interpolate our objective functions. With this idea, we transform PEPs into solvable finite-dimensional semidefinite programs, from which one obtains worst-case guarantees and worst-case functions, along with the corresponding explicit proofs. PEPs already proved themselves very useful as a tool for developing convergence analyses of first-order optimization methods. Among others, PEPs allow obtaining exact guarantees for gradient methods, along with their inexact, projected, proximal, conditional, decentralized and accelerated versions.