How to Visualize Gradient Descent | QuantumSketch

Visualize gradient descent as a ball rolling downhill on a loss surface, taking steps proportional to the slope until it settles in a minimum. Here's the animation.

By Shihab
2 min read

Visualize gradient descent as a ball rolling downhill on a loss surface, taking steps proportional to the slope until it settles in a minimum. Steep slope → big step; flat bottom → tiny steps → it stops.

The core idea

Gradient descent minimizes a loss function by repeatedly stepping downhill:

θθηL(θ)\theta \leftarrow \theta - \eta \nabla L(\theta)

∇L is the slope (gradient), η is the learning rate (step size). Subtracting the gradient moves you toward lower loss.

The animation, beat by beat

  1. Draw the loss curve — a parabola (1D) or bowl (3D surface).
  2. Drop the ball at a random start.
  3. Show the slope as a tangent arrow at the ball.
  4. Step downhill by −η·slope; repeat. Steps shrink as the slope flattens.
  5. Settle in the minimum.

The learning-rate lesson

| Learning rate | Behavior | |---|---| | Too small | Slow crawl | | Good | Smooth, fast convergence | | Too large | Overshoots, oscillates or diverges |

Animating all three on the same curve makes the trade-off unforgettable — this is the intuition behind training every neural network. See Visualize a Neural Network.

Manim building blocks

axes.plot for the loss curve, a Dot for the ball, always_redraw for the tangent arrow, and a ValueTracker stepping the parameter. For 3D, Surface + ThreeDScene.

The prompt

"Show gradient descent as a ball on the loss curve L(θ)=θ², stepping downhill by −η·slope, comparing small, good, and too-large learning rates."

→ Render it at quantumsketch.app. Related: Animate the Derivative.


Written by Shihab Shahriar Antor · Shahriar Labs

FAQ

Q.What's the simplest way to picture gradient descent?

Picture a ball rolling downhill on a curved surface. The surface is the loss function — height is the error for a given set of parameters — and gradient descent rolls the ball toward the lowest point. At each step the algorithm measures the slope (the gradient) and moves a step downhill proportional to that slope times the learning rate. Steep slopes give big steps; near the bottom the slope flattens and steps shrink, so the ball settles into a minimum. Animating the ball's path on a 2D parabola or a 3D bowl makes both the steps and the role of the learning rate visible.

Q.How do I show what the learning rate does in gradient descent?

Animate the same descent at different learning rates side by side. With a small rate, the ball takes tiny careful steps and converges slowly but smoothly. With a good rate, it descends efficiently. With too large a rate, it overshoots the minimum and bounces back and forth or even diverges up the walls. Showing these three behaviors on the same loss curve makes the learning-rate trade-off obvious. Describe it as a prompt and QuantumSketch renders all three as a narrated Manim comparison.

Tags:#math#animation#machine-learning
S

Shihab Shahriar

AI Engineer & Founder of Shahriar Labs. Exploring the intersection of design, cognition, and machine learning.