Subgradient and subdifferential


In convex optimization, the subgradient generalizes the concept of the derivative to non-differentiable functions. While smooth functions have unique gradients at every point, non-differentiable convex functions can have multiple subgradients at a given point.


Definition

A vector gRd is a subgradient of a function f at x if:

f(y)f(x)+gT(yx)for all yD(f).

The set of all subgradients at x is called the subdifferential of f at x and is denoted as:

f(x)=yD(f){gf(y)f(x)+gT(yx)}f(x)Rn.

This means that every subgradient forms a linear underestimator of the function, touching or staying below the function graph at x.

Subderivative.png


Geometric Interpretation

Subgradients.png|600

In the diagram:


Subgradient Characterization of Convexity

The subdifferential provides a characterization of convexity:

Lemma 1

If f:dom(f)R is differentiable at xdom(f), then the subdifferential contains only the gradient:

f(x){f(x)}.
Note:

The set of subgradients could be empty if the function is not convex.
That's why this is a subset of the gradients.
It could be or {f(x)}.

Lemma 2

A function f:dom(f)R is convex if and only if:

f(x)xdom(f).

This means that a function is convex if and only if it has at least one supporting hyperplane at every point in its domain.


Convex and Lipschitz Functions Have Bounded Subgradients

For convex and Lipschitz functions, the subgradients are bounded by the Lipschitz constant.

Lemma 3

Let f:dom(f)R be convex and Lipschitz with parameter B. Then the following are equivalent:

  1. The norm of every subgradient is bounded:

    gBxdom(f),gf(x).
  2. The function satisfies the Lipschitz condition:

    |f(x)f(y)|Bxyx,ydom(f).

This means that for Lipschitz-continuous convex functions, the subgradients cannot be arbitrarily large — they are always bounded by the Lipschitz constant.

Lipschitz_Visualisierung.gif|500


Subgradient Optimality Condition

For convex functions, the subgradient can be used to determine optimality.

Lemma 4

If xdom(f) satisfies:

0f(x),

then x is a global minimum.

This means that if the zero vector belongs to the subdifferential, the function has no lower values, ensuring that x is an optimal solution.

g = 0global minimum


Summary

Property Meaning
Subgradient condition f(y)f(x)+gT(yx) for all ydom(f).
Subdifferential The set of all subgradients: f(x).
Differentiability and subgradients If f is differentiable, then f(x)={f(x)}.
Characterization of convexity A function is convex if and only if its subdifferential is nonempty everywhere.
Bounded subgradients If f is Lipschitz, then |g|B for all gf(x).
Optimality condition If 0f(x), then x is a global minimum.

Subgradients extend gradients to non-differentiable convex functions, making them a fundamental tool in non-smooth optimization. 🚀

What can we do next?