Subgradient and subdifferential

In convex optimization, the subgradient generalizes the concept of the derivative to non-differentiable functions. While smooth functions have unique gradients at every point, non-differentiable convex functions can have multiple subgradients at a given point.

Definition

A vector $g \in R^{d}$ is a subgradient of a function $f$ at $x$ if:

f (y) \geq f (x) + g^{T} (y - x) for all y \in D (f) .

The set of all subgradients at $x$ is called the subdifferential of $f$ at $x$ and is denoted as:

\partial f (x) = ⋂_{y \in D (f)} {g ∣ f (y) \geq f (x) + g^{T} (y - x)}

\partial f (x) \subseteq R^{n} .

This means that every subgradient forms a linear underestimator of the function, touching or staying below the function graph at $x$ .

Geometric Interpretation

The subgradient generalizes the tangent plane to convex functions that are not differentiable.
If a function is differentiable at $x$ , the subdifferential contains a single element — the gradient $\nabla f (x)$ .
If a function is not differentiable at $x$ , the subdifferential contains multiple subgradients, forming a set of supporting hyperplanes.

Subgradients.png|600

In the diagram:

$g_{1}$ is a subgradient at $x_{1}$ . Function $f$ is differentiable at $x_{1}$ , so $g_{1}$ is the gradient.
$g_{2}$ and $g_{3}$ are subgradients at $x_{2}$ (since there are multiple valid supporting hyperplanes).

Subgradient Characterization of Convexity

The subdifferential provides a characterization of convexity:

Lemma 1

If $f : dom (f) \to R$ is differentiable at $x \in dom (f)$ , then the subdifferential contains only the gradient:

\partial f (x) \subseteq {\nabla f (x)} .

Note:

The set of subgradients could be empty if the function is not convex.
That's why this is a subset of the gradients.
It could be $\emptyset$ or ${\nabla f (x)}$ .

Lemma 2

A function $f : dom (f) \to R$ is convex if and only if:

$dom (f)$ is convex, and
The subdifferential is nonempty for all $x \in dom (f)$ :

\partial f (x) \neq \emptyset \forall x \in dom (f) .

This means that a function is convex if and only if it has at least one supporting hyperplane at every point in its domain.

Convex and Lipschitz Functions Have Bounded Subgradients

For convex and Lipschitz functions, the subgradients are bounded by the Lipschitz constant.

Lemma 3

Let $f : dom (f) \to R$ be convex and Lipschitz with parameter $B$ . Then the following are equivalent:

The norm of every subgradient is bounded:
$∥ g ∥ \leq B \forall x \in dom (f), \forall g \in \partial f (x) .$
The function satisfies the Lipschitz condition:
$| f (x) - f (y) | \leq B ∥ x - y ∥ \forall x, y \in dom (f) .$

This means that for Lipschitz-continuous convex functions, the subgradients cannot be arbitrarily large — they are always bounded by the Lipschitz constant.

Lipschitz_Visualisierung.gif|500

Subgradient Optimality Condition

For convex functions, the subgradient can be used to determine optimality.

Lemma 4

If $x \in dom (f)$ satisfies:

0 \in \partial f (x),

then $x$ is a global minimum.

This means that if the zero vector belongs to the subdifferential, the function has no lower values, ensuring that $x$ is an optimal solution.

Summary

Property	Meaning
Subgradient condition	$f (y) \geq f (x) + g^{T} (y - x)$ for all $y \in dom (f)$ .
Subdifferential	The set of all subgradients: $\partial f (x)$ .
Differentiability and subgradients	If $f$ is differentiable, then $\partial f (x) = {\nabla f (x)}$ .
Characterization of convexity	A function is convex if and only if its subdifferential is nonempty everywhere.
Bounded subgradients	If $f$ is Lipschitz, then $\| g \| \leq B$ for all $g \in \partial f (x)$ .
Optimality condition	If $0 \in \partial f (x)$ , then $x$ is a global minimum.

Subgradients extend gradients to non-differentiable convex functions, making them a fundamental tool in non-smooth optimization. 🚀

What can we do next?

How to Calculate the Subgradient
Subgradient Descent
Extend to Stochastic Subgradient Descent