AlphaGo from Scratch: Lessons for Modern AI and LLMs

The foundational techniques that powered DeepMind's world-beating AlphaGo system are experiencing a renaissance in 2026, as AI researchers return to its architecture to understand the trajectory of modern artificial intelligence. According to a project detailed by researcher Eric Jang, building AlphaGo from scratch using today's AI-assisted coding tools offers a unique lens on the "primitives of intelligence": search, learning from experience, and self-play. This retrospective engineering effort is not merely an academic exercise; it provides critical context for dissecting how reinforcement learning functions within large language models (LLMs) and how more general AIs of the future might learn.

The Enduring Blueprint of AlphaGo's Architecture

AlphaGo's victory in 2016 was a landmark, but its underlying architecture remains a masterclass in combining different AI paradigms. As analyzed in a review on GreaterWrong, the system integrated several key components:

Supervised learning policy networks trained on expert games
Reinforcement learning policy network refined through self-play
Value network to evaluate board positions
Monte Carlo Tree Search (MCTS) algorithm for planning

This hybrid approach allowed it to both learn from human expertise and surpass it through billions of self-play games.

The Technical Balancing Act

Technical accounts of the original development, such as those shared by former DeepMind engineer Julian Schrittwieser, reveal the intricate balancing act required for performance. A core challenge was harmonizing the batch-processing efficiency of accelerators like TPUs with the sequential, adaptive nature of tree search.

"Ideally we would like to evaluate one position at a time, so that we can use the evaluation results in deciding which position to evaluate next. Accelerators perform best when evaluating a large batch of positions all at once," Schrittwieser notes. The solution was a sophisticated implementation of MCTS with virtual losses, enabling parallel exploration.

From Go Boards to Language Models: A Shared Foundation

The connection between AlphaGo's mechanics and modern LLMs is more than metaphorical. The supervised learning policy networks in AlphaGo were trained to predict the next move from a dataset of expert games, a direct parallel to how LLMs are trained to predict the next token from vast corpora of text.

Parallel Learning Architectures

Both learn a probability distribution—a softmax over next moves or next tokens—from observed sequences. This foundational similarity is why researchers are now actively exploring integrating planning techniques like MCTS into LLMs, a discussion fueled by rumors around projects like Google's Gemini and OpenAI's Q*.

The Modern Reimplementation Process

Jang's modern re-implementation, conducted as part of a "Claude Code bender," symbolizes a broader shift in 2026. He describes grappling with the fact that he no longer needs to write code by hand, using AI agents to handle infrastructure and implementation.

This meta-layer—using modern AI to rebuild historic AI—underscores the accelerating pace of the field. The project serves a dual purpose: to solidify understanding of foundational deep learning techniques and to master a new paradigm of programming with AI collaborators.

Beyond Game-Playing: Latent State Inference

The principles gleaned from such foundational work extend beyond game-playing. In earlier writings, Jang has explored the challenge of inferring latent states—like human intention or reasoning—from observed behavior alone, a problem central to imitation learning in robotics.

The question of whether a system can learn to "think" from behavioral data mirrors the leap AlphaGo made from imitating moves to developing superior strategic understanding through self-play.

Engineering Lessons for General AI Systems in 2026

Revisiting AlphaGo also reinforces enduring engineering principles for building generalizable systems. In a 2021 essay, Jang argued that "large amounts of diverse data are more important to generalization than clever model biases."

The AlphaGo Zero Validation

AlphaGo Zero, the successor that learned solely through self-play without human data, was a stunning validation of this principle, achieving superhuman performance using only the rules of the game and massive computation. The system's ability to generate its own diverse experience through self-play was its key to generalization.

Scale and Data Diversity Paradigm

This focus on scale and data diversity is now the dominant paradigm in LLMs. The narrative that progress comes from "pushing diverse data into a sufficiently high-capacity model" directly descends from the lessons of AlphaGo and its successors.

The project's clean demonstration of how search amplifies learning, and how learning guides search, provides a template for future systems that may need to reason, plan, and adapt in open-ended environments far more complex than a Go board.

Looking Backward to See Forward: AlphaGo's Legacy in 2026

Ultimately, rebuilding AlphaGo from scratch is an act of looking backward to see forward. In a landscape now dominated by LLMs and generative AI, the elegant synthesis of search, learning, and self-play embodied by AlphaGo stands as a timeless reference model.

It reminds the field that intelligence, artificial or otherwise, may arise not from a single, monolithic technique, but from the intelligent integration of complementary capabilities. As researchers push the boundaries of what's possible with language models and agents, the core lessons from building AlphaGo remain more relevant than ever.

AI-Powered Content

Sources: evjang.com • www.julian.ac • blog.evjang.com • www.greaterwrong.com • blog.evjang.com

Related content: For more on reinforcement learning techniques and how they're applied in modern AI systems, explore our technical guides. Also check out DeepMind's official AlphaGo research page for original papers and resources.

How Building AlphaGo from Scratch in 2026 Reveals Core AI Principles for LLMs

How Building AlphaGo from Scratch in 2026 Reveals Core AI Principles for LLMs

summarize3-Point Summary

psychology_altWhy It Matters

The Enduring Blueprint of AlphaGo's Architecture

The Technical Balancing Act

From Go Boards to Language Models: A Shared Foundation

Parallel Learning Architectures

The Modern Reimplementation Process

Beyond Game-Playing: Latent State Inference

Engineering Lessons for General AI Systems in 2026

The AlphaGo Zero Validation

Scale and Data Diversity Paradigm

Looking Backward to See Forward: AlphaGo's Legacy in 2026

AI Terms in This Article

recommendRelated Articles

Adam Optimizer in 2026: How It Corrects SGD's Frequency Bias in Language Models

LLM Societies: How Multi-Agent Thought Revolutionizes AI Chip Design in 2026

Nuclear LLMs & China's 2026 AI Benchmark Reshape Global Tech Race