How LLMs Write Manim Code | QuantumSketch

LLMs write Manim code by turning a prompt into a storyboard of beats, then emitting Scene classes for each โ€” validated by actually running the code, not guessing.

By Shihab
2 min read

LLMs write Manim code by turning your prompt into a storyboard of beats, then emitting a Manim Scene class for each โ€” and crucially, the code is run, not just generated. Execution makes the output real and deterministic.

The pipeline

Prompt โ†’ LLM storyboard โ†’ Manim code per beat โ†’ execute/render โ†’ TTS โ†’ FFmpeg merge
  1. Plan. The model breaks the concept into 4โ€“6 narrative beats.
  2. Write code. For each beat it picks the right Mobjects (Axes, MathTex) and animations (Transform, FadeIn).
  3. Execute. The Python runs in a sandbox with Manim, LaTeX, and FFmpeg installed โ†’ video chunks.
  4. Narrate. TTS voices the per-beat script โ€” see How AI Narrates Math Videos.
  5. Merge. FFmpeg stitches chunks + audio into one MP4.

Why "execute, don't guess" matters

Asking a model to describe a video gives you hallucinated frames. Asking it to write code that runs gives you proof: if the MP4 renders, the code worked. A bad API call fails the render immediately, so the pipeline can repair and retry.

| Approach | Reliability | |---|---| | Model describes video | Low โ€” hallucination | | Model writes + runs Manim | High โ€” verified by render |

What the model is good and bad at

  • Good: picking the right Manim primitives, sequencing beats, writing valid MathTex.
  • Needs iteration: layout and pacing โ€” refined via the prompt, not luck.

See it in action

QuantumSketch runs this exact loop. The core is open-source โ€” read Inside the manim-coding-skill.

โ†’ quantumsketch.app


Written by Shihab Shahriar Antor ยท Shahriar Labs

FAQ

Q.How does an AI turn a text prompt into a Manim animation?

It works in stages. First the language model reads your prompt and plans a storyboard โ€” breaking the concept into a handful of narrative beats. Then, for each beat, it writes a Manim Scene class using the right Mobjects and animations (Axes, MathTex, Transform, FadeIn). That generated Python is executed in a sandbox with Manim, LaTeX, and FFmpeg installed, which renders each beat to a video chunk. A text-to-speech step voices the per-beat script, and FFmpeg merges everything into a final MP4. The key is that the code is actually run, so the output is real, deterministic Manim โ€” not a guess at what the video should look like.

Q.What stops the AI from generating Manim code that doesn't run?

Execution and iteration. Because the generated code is run inside a real Manim environment, a syntax error or bad API call surfaces immediately as a failed render rather than silently shipping. Good pipelines catch that error and regenerate or repair the code until it runs and produces frames. This execute-and-verify loop is why a Manim-generating tool is more reliable than asking a model to describe a video: the rendered MP4 is proof the code worked. The remaining failure mode is layout, not correctness, and that's refined through the prompt.

Tags:#ai#manim#llm
S

Shihab Shahriar

AI Engineer & Founder of Shahriar Labs. Exploring the intersection of design, cognition, and machine learning.