PROJECTED GRADIENT DESCENT

Gradient-based optimization is widely used in various fields.
Many problems include constraints that must be satisfied.
Projected Gradient Descent (PGD) extends standard gradient descent by ensuring iterates remain feasible within a given constraint set.

Recall:

Function $f (x)$ is strongly convex with parameter $μ$ :

f (y) \geq f (x) + (y - x)^{T} \nabla f (x) + \frac{μ}{2} ∥ y - x ∥^{2} \forall x, y \in R^{n}

Function $f (x)$ is called smooth with parameter $L$

f (y) \leq f (x) + \nabla f (x)^{T} (y - x) + \frac{L}{2} ∥ x - y ∥^{2} \forall x, y \in R^{n}

Constrained Minimization

Let $f : D (f) \to R$ be convex and let $X \subseteq D (f)$ be a convex set. A point $x \in X$ is a minimizer of $f$ over $X$ if:

f (x) \leq f (y), \forall y \in X .

Lemma 1

If $f : D (f) \to R$ is convex and differentiable over an open domain $D (f) \subseteq R^{n}$ and $X \subseteq D (f)$ is a convex set, then a point $x^{*} \in X$ is a minimizer of $f$ over $X$ if and only if:

\nabla f (x^{*})^{T} (x - x^{*}) \geq 0, \forall x \in X .

Projected Gradient Descent.png|500

Recall:

Existence of a Minimum for constrained optimization.

Definition of Projected Gradient Descent

Consider the constrained optimization problem:

min_{x \in X} f (x),

where $X$ is a feasible convex set. The Projected Gradient Descent (PGD) iterates are given by:

x_{k + 1} = P_{X} (x_{k} - γ_{k} \nabla f (x_{k})),

where $P_{X} (\cdot)$ is the projection operator.

Projected Gradient Descent Algorithm

Initialization

Choose an initial point $x_{0} \in X$ .

Iterative Steps

For $k = 0, 1, 2, \dots$ :

Compute the gradient: $g_{k} = \nabla f (x_{k})$
Take a gradient step: $y_{k} = x_{k} - γ_{k} g_{k}$
Project onto the feasible set $X$ : $x_{k + 1} = P_{X} (y_{k})$

Repeat until convergence.

Theorem 1: Step Size Selection

If $f (x)$ is convex and $L$ -smooth

∥ \nabla f (x) - \nabla f (y) ∥ \leq L ∥ x - y ∥,

then the Projected Gradient Method converges for step sizes satisfying:

0 < γ_{k} \leq \frac{1}{L} .

A common choice is:

γ_{k} = \frac{1}{L} .

Theorem 2: Convergence Rate

Let $f : R^{n} \to R$ be convex and differentiable, $X \subseteq R^{n}$ closed and convex, and let $x^{*}$ be a minimizer of $f$ over $X$ . Suppose that:

$∥ x_{0} - x^{*} ∥ \leq R$ with $x_{0} \in X$ .
$∥ \nabla f (x) ∥ \leq B$ for all $x \in X$ .

Where $B$ is the Lipschitz constant of the gradient.

Choosing the constant step size:

γ := \frac{R}{B \sqrt{K}},

Projected Gradient Descent yields:

\frac{1}{K} \sum_{t = 0}^{K - 1} (f (x_{k}) - f (x^{*})) \leq \frac{R B}{\sqrt{K}} .

Looks the same as the Gradient Descent convergence rate!

Theorem 3: Convergence Rate (Smooth Case)

Let $f : R^{n} \to R$ be convex and differentiable, and let $X \subseteq R^{n}$ be a closed convex set. Suppose there is a minimizer $x^{*}$ of $f$ over $X$ and that $f$ is smooth over $X$ with parameter $L$ . Choosing the step size:

γ := \frac{1}{L},

Projected Gradient Descent yields:

f (x_{K}) - f (x^{*}) \leq \frac{L}{2 K} ∥ x_{0} - x^{*} ∥^{2}, K > 0.

Looks similar to the Gradient Descent smooth case!

Theorem 4: Error Bound

If $f : R^{n} \to R$ is convex and differentiable, and $f$ is $L$ -smooth and strongly convex with parameter $μ > 0$ , then choosing:

γ := \frac{1}{L},

Projected Gradient Descent satisfies:

Geometric Decrease in Distance to $x^{*}$ :
$∥ x_{k + 1} - x^{*} ∥^{2} \leq (1 - \frac{μ}{L}) ∥ x_{k} - x^{*} ∥^{2}, k \geq 0.$
Exponential Decrease in Function Value:
$f (x_{K}) - f (x^{*}) \leq \frac{L}{2} {(1 - \frac{μ}{L})}^{K} ∥ x_{0} - x^{*} ∥^{2}, K > 0.$

Thus, error decreases exponentially as iterations proceed.

Summary:

Projected Gradient Descent (PGD) is an extension of gradient descent for constrained problems.
It ensures that iterates remain feasible within the constraint set $X$ .
The projection step ensures valid updates.
Step sizes are chosen to ensure convergence based on properties of the function (e.g., smoothness).
Convergence rate depends on problem smoothness and strong convexity.

PGD is widely used in optimization, machine learning, and constrained convex problems. 🚀