Proximal GD Idea Explained
Classical Gradient Step for Minimizing
If we were minimizing only
Here:
- The first two terms are the first-order Taylor expansion of
around . - The third term is a quadratic regularization (proximal term) that keeps updates close to
to prevent large jumps.
Actually, this term is the second-order Taylor expansion ofaround .
Adding the Term
When we introduce
Now, the function consists of:
- A linear approximation of
. - A quadratic proximity term.
- The non-smooth function
**.
Completing the Square
Since
This term is the first-order approximation of
Understanding the Completing-the-Square Step
We need to rewrite the quadratic expression:
into a squared norm form plus a constant term.
Step-by-Step Breakdown
1. Expand the squared norm
The Euclidean norm squared is given by:
So, we rewrite the term:
2. Introduce a shift using
We want to introduce a shifted term in the form
Observe that:
This suggests rewriting the term as:
3. Expand the squared norm
Consider the squared term:
Expanding it:
Breaking it down:
Expanding using the identity
Dividing everything by
Thus:
4. Rearrange the expression
Comparing with the original form:
We get:
This is the final transformed expression!
BUT! The final formula looks like this:
This is beacuse the term