PROJECTED GRADIENT DESCENT


Recall:

Function f(x) is strongly convex with parameter μ:

f(y)f(x)+(yx)Tf(x)+μ2yx2x,yRn

Function f(x) is called smooth with parameter L

f(y)f(x)+f(x)T(yx)+L2xy2x,yRn

Constrained Minimization

Let f:D(f)R be convex and let XD(f) be a convex set. A point xX is a minimizer of f over X if:

f(x)f(y),yX.

Lemma 1

If f:D(f)R is convex and differentiable over an open domain D(f)Rn and XD(f) is a convex set, then a point xX is a minimizer of f over X if and only if:

f(x)T(xx)0,xX.

Projected Gradient Descent.png|500

Recall:

Existence of a Minimum for constrained optimization.


Definition of Projected Gradient Descent

Consider the constrained optimization problem:

minxXf(x),

where X is a feasible convex set. The Projected Gradient Descent (PGD) iterates are given by:

xk+1=PX(xkγkf(xk)),

where PX() is the projection operator.


Projected Gradient Descent Algorithm

Initialization

Choose an initial point x0X.

Iterative Steps

For k=0,1,2,:

  1. Compute the gradient:gk=f(xk)
  2. Take a gradient step:yk=xkγkgk
  3. Project onto the feasible set X:xk+1=PX(yk)

Repeat until convergence.


Theorem 1: Step Size Selection

If f(x) is convex and L-smooth

f(x)f(y)Lxy,

then the Projected Gradient Method converges for step sizes satisfying:

0<γk1L.

A common choice is:

γk=1L.

Theorem 2: Convergence Rate

Let f:RnR be convex and differentiable, XRn closed and convex, and let x be a minimizer of f over X. Suppose that:

Where B is the Lipschitz constant of the gradient.

Choosing the constant step size:

γ:=RBK,

Projected Gradient Descent yields:

1Kt=0K1(f(xk)f(x))RBK.

Looks the same as the Gradient Descent convergence rate!


Theorem 3: Convergence Rate (Smooth Case)

Let f:RnR be convex and differentiable, and let XRn be a closed convex set. Suppose there is a minimizer x of f over X and that f is smooth over X with parameter L. Choosing the step size:

γ:=1L,

Projected Gradient Descent yields:

f(xK)f(x)L2Kx0x2,K>0.

Looks similar to the Gradient Descent smooth case!


Theorem 4: Error Bound

If f:RnR is convex and differentiable, and f is L-smooth and strongly convex with parameter μ>0, then choosing:

γ:=1L,

Projected Gradient Descent satisfies:

  1. Geometric Decrease in Distance to x:

    xk+1x2(1μL)xkx2,k0.
  2. Exponential Decrease in Function Value:

    f(xK)f(x)L2(1μL)Kx0x2,K>0.

Thus, error decreases exponentially as iterations proceed.


Summary:

PGD is widely used in optimization, machine learning, and constrained convex problems. 🚀