STEEPEST GRADIENT DESCENT


Steepest gradient descent is a variation of the gradient descent method where the step size is not fixed but instead chosen dynamically at each iteration. This approach ensures that the algorithm takes the most efficient step towards the minimum in each iteration.

Update Rule

The update rule for steepest gradient descent is given by:

xk+1=xkγkf(xk),

where the step size γk is chosen to minimize the function along the descent direction:

γk=argminγ>0f(xkγf(xk))()

This means that at each iteration, we find the optimal step size γk by minimizing the function along the direction of the negative gradient.


Geometric Interpretation

Steepest gradient descent.png|700

The two images illustrate the difference between fixed step size gradient descent and steepest gradient descent:

  1. Fixed Step Size in Gradient Descent:

    • The step size is constant across all iterations, leading to potential inefficiencies in convergence.
  2. Adaptive Step Size in Steepest Gradient Descent:

    • The step size is chosen optimally at each iteration, ensuring more efficient movement towards the minimum.

Convergence Properties

Theorem:
Let {xk} be a convergent sequence generated by the steepest descent algorithm applied to a function f. Then, in the worst case, the order of convergence of {xk} is 1.

This means that in the worst case, steepest gradient descent has a linear rate of convergence.

What Does This Mean?


Modification: Step Size Reduction (Step Shredding)

A practical modification of steepest gradient descent involves step size reduction, also called step shredding. Here, the step size is adjusted dynamically based on a condition:

f(xk+1)f(xk)εγf(xk)2()

where ε(0,1) is a pre-selected method parameter, that controls the minimum required decrease in function value.

Remark

Very often, ε is set to a small value, such as 0.1

This condition ensures that each step results in sufficient function decrease.

If the condition is not satisfied, we reduce the step size γk using a reduction factor δ(0,1):

γk=δγk.

The parameter δ determines how aggressively the step size is reduced when the descent condition is not met.

Remark

Very often, the reduction factor δ is set to a small value, such as 0.5.

Why is this modification useful?


Algorithm for Modified Steepest Descent

  1. Initialize x0Rn, an arbitrary step size γ (the same in all iterations), and parameters ε,δ(0,1).

  2. For each iteration k+1:

    Set initial γk=γ.
    Compute the update:

    xk+1=xkγkf(xk).

    Check if the inequality:

    f(xk+1)f(xk)εγkf(xk)2

    is satisfied.
    If satisfied: accept γk and move to the next iteration.
    If not satisfied: reduce step size γk=δγk and repeat.

This method ensures that the step size is sufficiently large for fast convergence while also ensuring descent in function value.


Practical Insights

Steepest gradient descent dynamically adapts the step size, ensuring that the optimal step length is chosen at each iteration, leading to faster and more efficient convergence.


Simple Example of the Algorithm

Finding the Minimum of f(x)=x2

Let’s minimize:

f(x)=x2

using Modified Steepest Descent.

Step 1: Initialize Parameters

Step 2: Iterate

  1. Compute Gradient:

    f(x)=2x

    At x0=5:

    f(5)=10
  2. Compute Update:

    x1=x0γf(x0)=51(10)=5
  3. Check Descent Condition:

    f(x1)f(x0)εγf(x0)2(5)2520.1(1)(102)25251025152515

Since the condition fails, we reduce γ:

γ1=δγ=0.5(1)=0.5

Then repeat the update with the smaller step size.


Step 3: New Update with Reduced γ

  1. Compute New Update:

    x1=50.5(10)=0
  2. Check Descent Condition Again:

    f(0)520.1(0.5)(102)0255020020

Since the condition holds, we accept x1=0 and move to the next iteration.