Smooth Functions

A smooth function is a differentiable function that does not change too abruptly. It is controlled by a smoothness parameter $L$ , which bounds how fast the gradient can change. Unlike convexity, smoothness does not require the function to be convex, but it provides useful guarantees about how the function behaves.

Definition

A function $f : D (f) \to R$ is smooth with parameter $L$ over $Ω \subseteq D (f)$ if:

f (y) \leq f (x) + \nabla f (x)^{T} (y - x) + \frac{L}{2} ∥ x - y ∥^{2} \forall x, y \in D (f) .

This inequality ensures that the function is not too steep, meaning that the graph of $f$ is bounded above by a quadratic approximation at every point.

Note

This formula is derived from the second-order Taylor expansion of $f$ around $x$ , with an additional quadratic term $\frac{L}{2} ∥ x - y ∥^{2}$ that bounds the function's growth.

So, basically, instead of $\frac{1}{2} (y - x)^{T} H_{f} (x) (y - x)$ , where $H_{f} (x)$ is the Hessian matrix (second derivative for Taylor expansion), we use $\frac{L}{2} ∥ x - y ∥^{2}$ — the smoothness parameter.

The smoothness parameter $L$ controls how rapidly the function's gradient can change, ensuring it does not grow too steeply.

This quadratic term $\frac{L}{2} ∥ x - y ∥^{2}$ acts as a global upper bound on how much $f (y)$ can exceed the first-order approximation $f (x) + \nabla f (x)^{T} (y - x)$ .

Remark

This definition does not require convexity, meaning a function can be smooth even if it is not convex.

Geometric Intuition

At any point $x$ , the graph of $f$ lies below a quadratic upper bound (a paraboloid). Thus, the function does not grow too rapidly, as illustrated by the tangent paraboloid in the diagram.

Smooth Function.png|600

Operations That Preserve Smoothness

Just like convexity, smoothness is preserved under specific operations:

1. Linear Combination of Smooth Functions

If $f_{1}, \dots, f_{m}$ are smooth functions with parameters $L_{1}, L_{2}, \dots, L_{m}$ and scalars $λ_{1}, \dots, λ_{m} \in R_{+}$ , then:

f := \sum_{i = 1}^{k} λ_{i} f (x_{i})

is smooth with parameter:

L = \sum_{i = 1}^{m} λ_{i} L_{i} .

This means that taking a weighted sum of smooth functions preserves smoothness, and the new smoothness parameter is just the sum of the individual parameters.

2. Composition with an Affine Transformation

If $f$ is smooth with parameter $L$ , and:

g (x) = A x + b

where $A$ is an $n \times n$ matrix and $b$ is a vector, then the composite function $f \circ g$ is smooth with parameter:

L ∥ A ∥^{2}

where $∥ A ∥$ is the spectral norm of $A$ .

This means that applying a linear transformation to a smooth function still results in a smooth function, but the smoothness parameter gets scaled by the square of the spectral norm.

Characterization of Smooth Functions (Lemma)

If $f (x) : R^{n} \to R$ is convex and differentiable, then the following two statements are equivalent:

$f$ is smooth with parameter $L$ .
The gradient of $f$ is Lipschitz continuous:
$∥ \nabla f (x) - \nabla f (y) ∥ \leq L ∥ x - y ∥ \forall x, y \in R^{n} .$

This means that the gradient of a smooth function does not change too abruptly, making optimization methods like gradient descent more stable.

Strongly Convex Functions (Another Approach)

In addition to being smooth, a function can also be strongly convex. Strong convexity strengthens the usual convexity condition by ensuring that the function has a uniform curvature, which guarantees faster convergence in optimization algorithms.

A function that is both strongly convex and smooth enjoys better optimization properties compared to standard convex functions.

Recall from the image:

Strict convex.png

Definition (close to L-smooth)

A function $f : R^{n} \to R$ is called $μ$ -strongly convex (or strongly сonvex with parameter $μ$ ) if, for all $x, y$ :

f (y) \geq f (x) + (y - x)^{T} \nabla f (x) + \frac{μ}{2} ∥ y - x ∥^{2} .

This condition ensures that $f$ curves upward at a minimum rate of $μ$ , preventing flat regions that could slow down optimization.

Comparing this definition with the smoothness condition:

$f (y) \geq f (x) + (y - x)^{T} \nabla f (x) + \frac{μ}{2} ∥ y - x ∥^{2} .$ $f (y) \leq f (x) + \nabla f (x)^{T} (y - x) + \frac{L}{2} ∥ x - y ∥^{2},$

So, simply put, strong convexity is a lower bound, while smoothness is an upper bound on how much the function can grow.

Strong Convexity and Smoothness Together

If a function $f$ is both $μ$ -strongly convex and $L$ -smooth (as defined earlier), then it satisfies the following sandwich inequality 🥪:

f (x) + (y - x)^{T} \nabla f (x) + \frac{μ}{2} ∥ x - y ∥^{2} \leq f (y) \leq f (x) + (y - x)^{T} \nabla f (x) + \frac{L}{2} ∥ x - y ∥^{2}

This means that the function is trapped between two quadratic approximations — one defined by $l$ (strong convexity) and one by $L$ (smoothness).

Once again:

The lower bound (left inequality) guarantees that the function is at least as curved as a quadratic function with curvature $μ$ .
The upper bound (right inequality) guarantees that the function does not grow faster than a quadratic function with curvature $L$ .

Summary

Concept	Meaning
Smooth function	A differentiable function whose gradient does not change too fast.
Smoothness condition	$f (y) \leq f (x) + \nabla f (x)^{T} (y - x) + \frac{L}{2} \| x - y \|^{2}$ .
Smoothness does not require convexity	A function can be smooth without being convex.
Lipschitz gradient	The gradient must satisfy $\| \nabla f (x) - \nabla f (y) \| \leq L \| x - y \|$ .
Preservation	Smoothness is preserved under weighted sums and affine transformations.
$μ$ -strongly convex	The function curves upwards uniformly, ensuring no flat regions.
$μ$ -strong convexity + $L$ -smoothness	The function is bounded between two quadratic approximations.

Smooth functions are important in optimization because they ensure that gradients behave predictably, making them useful in gradient-based methods. 🚀

In short, a function that is both strongly convex and smooth is easier to optimize, enjoys better theoretical guarantees, and ensures efficient gradient-based optimization. 🚀