Smooth Functions


A smooth function is a differentiable function that does not change too abruptly. It is controlled by a smoothness parameter L, which bounds how fast the gradient can change. Unlike convexity, smoothness does not require the function to be convex, but it provides useful guarantees about how the function behaves.

Definition

A function f:D(f)R is smooth with parameter L over ΩD(f) if:

f(y)f(x)+f(x)T(yx)+L2xy2x,yD(f).

This inequality ensures that the function is not too steep, meaning that the graph of f is bounded above by a quadratic approximation at every point.

Note

This formula is derived from the second-order Taylor expansion of f around x, with an additional quadratic term L2xy2 that bounds the function's growth.

So, basically, instead of 12(yx)THf(x)(yx), where Hf(x) is the Hessian matrix (second derivative for Taylor expansion), we use L2xy2 — the smoothness parameter.

The smoothness parameter L controls how rapidly the function's gradient can change, ensuring it does not grow too steeply.

This quadratic term L2xy2 acts as a global upper bound on how much f(y) can exceed the first-order approximation f(x)+f(x)T(yx).

Remark

This definition does not require convexity, meaning a function can be smooth even if it is not convex.

Geometric Intuition

At any point x, the graph of f lies below a quadratic upper bound (a paraboloid). Thus, the function does not grow too rapidly, as illustrated by the tangent paraboloid in the diagram.

Smooth Function.png|600


Operations That Preserve Smoothness

Just like convexity, smoothness is preserved under specific operations:

1. Linear Combination of Smooth Functions

If f1,,fm are smooth functions with parameters L1,L2,,Lm and scalars λ1,,λmR+, then:

f:=i=1kλif(xi)

is smooth with parameter:

L=i=1mλiLi.

This means that taking a weighted sum of smooth functions preserves smoothness, and the new smoothness parameter is just the sum of the individual parameters.

2. Composition with an Affine Transformation

If f is smooth with parameter L, and:

g(x)=Ax+b

where A is an n×n matrix and b is a vector, then the composite function fg is smooth with parameter:

LA2

where A is the spectral norm of A.

This means that applying a linear transformation to a smooth function still results in a smooth function, but the smoothness parameter gets scaled by the square of the spectral norm.


Characterization of Smooth Functions (Lemma)

If f(x):RnR is convex and differentiable, then the following two statements are equivalent:

  1. f is smooth with parameter L.

  2. The gradient of f is Lipschitz continuous:

    f(x)f(y)Lxyx,yRn.

This means that the gradient of a smooth function does not change too abruptly, making optimization methods like gradient descent more stable.


Strongly Convex Functions (Another Approach)

In addition to being smooth, a function can also be strongly convex. Strong convexity strengthens the usual convexity condition by ensuring that the function has a uniform curvature, which guarantees faster convergence in optimization algorithms.

A function that is both strongly convex and smooth enjoys better optimization properties compared to standard convex functions.

Recall from the image:

Strict convex.png


Definition (close to L-smooth)

A function f:RnR is called μ-strongly convex (or strongly сonvex with parameter μ) if, for all x,y:

f(y)f(x)+(yx)Tf(x)+μ2yx2.

This condition ensures that f curves upward at a minimum rate of μ, preventing flat regions that could slow down optimization.

Comparing this definition with the smoothness condition:

f(y)f(x)+(yx)Tf(x)+μ2yx2.f(y)f(x)+f(x)T(yx)+L2xy2,

So, simply put, strong convexity is a lower bound, while smoothness is an upper bound on how much the function can grow.


Strong Convexity and Smoothness Together

If a function f is both μ-strongly convex and L-smooth (as defined earlier), then it satisfies the following sandwich inequality 🥪:

f(x)+(yx)Tf(x)+μ2xy2f(y)f(x)+(yx)Tf(x)+L2xy2

This means that the function is trapped between two quadratic approximations — one defined by l (strong convexity) and one by L (smoothness).

Once again:


Summary

Concept Meaning
Smooth function A differentiable function whose gradient does not change too fast.
Smoothness condition f(y)f(x)+f(x)T(yx)+L2|xy|2.
Smoothness does not require convexity A function can be smooth without being convex.
Lipschitz gradient The gradient must satisfy |f(x)f(y)|L|xy|.
Preservation Smoothness is preserved under weighted sums and affine transformations.
μ-strongly convex The function curves upwards uniformly, ensuring no flat regions.
μ-strong convexity + L-smoothness The function is bounded between two quadratic approximations.

Smooth functions are important in optimization because they ensure that gradients behave predictably, making them useful in gradient-based methods. 🚀

In short, a function that is both strongly convex and smooth is easier to optimize, enjoys better theoretical guarantees, and ensures efficient gradient-based optimization. 🚀