Spaced Repetition Algorithms: From Ebbinghaus to FSRS — A Deep Dive

hbaristr 17 menit membaca

The Algorithm That Decides When You Forget

You open your flashcard app. A card appears. You answer. The app decides when to show it again -- tomorrow, next week, in three months. That scheduling decision is the entire game. Get it right and you retain 95% of everything you study with minimal effort. Get it wrong and you either forget (intervals too long) or waste hours reviewing things you already know (intervals too short).

Spaced repetition algorithms are the scheduling engines behind this decision. They have evolved from a German psychologist's self-experiments with nonsense syllables in 1885 to machine learning systems trained on hundreds of millions of review records. This is the history, the math, and the engineering of how we got here.

The Forgetting Curve: Where It All Started

In 1885, Hermann Ebbinghaus published Über das Gedächtnis (Memory: A Contribution to Experimental Psychology), one of the most consequential single-author psychology experiments ever conducted. Working alone in Berlin, he memorized lists of 13 nonsense syllables (CVC trigrams like WID, ZOF, BUP) and tested himself at intervals ranging from 20 minutes to 31 days. His metric was "savings" -- how much less time relearning took compared to original learning.

Time Since Learning Retention (%) Lost (%)
20 minutes 58.2 41.8
1 hour 44.2 55.8
8.8 hours 35.8 64.2
1 day 33.7 66.3
2 days 27.8 72.2
6 days 25.4 74.6
31 days 21.1 78.9

Source: Ebbinghaus (1885), Section 29. Replicated by Murre & Dros (2015) in PLOS ONE with modern controls.

Ebbinghaus's forgetting curve showing memory retention decaying over time, with a steep initial drop and long tail
The classic shape of Ebbinghaus's forgetting curve: rapid initial loss, then a long, slow tail. Source: Wikimedia Commons (public domain).

The shape of this curve -- steep initial drop, long tail -- has been confirmed across thousands of studies over 140 years. Cepeda et al. (2006) meta-analyzed 184 articles containing 317 experiments on distributed practice and found that spaced study produced 10--30% better retention than massed study across virtually every condition tested.

But Ebbinghaus established something even more fundamental: each review resets the curve at a shallower slope. The interval before you forget to the same threshold grows after every successful retrieval. This is the spacing effect -- the oldest and most replicated finding in experimental psychology.

The Debate: Exponential vs. Power Law Forgetting

What mathematical function best describes the forgetting curve? This matters because your scheduling algorithm's entire prediction depends on it.

Ebbinghaus himself fitted his data to a power function: b = 100k / ((log t)c + k). For decades, the field assumed forgetting was exponential: R(t) = e-t/S, where S is memory stability. SuperMemo used this model for over 30 years.

But in 2024, the FSRS team showed that a power function fits real-world data better:

R(t, S) = (1 + F · t/S)C

where F = 19/81 and C = -0.5.

Why? Individual memories may decay exponentially, but when you aggregate across memories of different strengths -- which is what a real flashcard deck contains -- the mixture follows a power law. Ebbinghaus actually discovered this himself: his original data fits a power function better than an exponential. It took 139 years for spaced repetition software to catch up.

Model Equation Tail Behavior Fit to Real Data
Exponential R = e-t/S Drops to near-zero quickly Good for single-item, poor for mixed decks
Power law R = (1 + t/9S)-1 Heavy tail, slower decay Better fit across 10K+ user collections

Source: A technical explanation of FSRS, Expertium's Blog

1967: Pimsleur's Graduated Intervals

Before computers entered the picture, Paul Pimsleur published a graduated interval recall schedule in 1967, designed for audio language instruction:

5 sec → 25 sec → 2 min → 10 min → 1 hour → 5 hours → 1 day → 5 days → 25 days → 4 months → 2 years

Each interval is roughly 5x the previous. This was hand-tuned to the constraints of audio lessons where you cannot flip back. Pimsleur's system is still used in Pimsleur language courses today -- a fixed schedule, no adaptation to the learner, but remarkably effective for its simplicity.

1972: Leitner's Box System

Sebastian Leitner published So lernt man lernen ("How to Learn to Learn") in 1972, introducing the Leitner system: five physical boxes of flashcards with increasing review intervals. Get a card right, it moves to the next box (reviewed less often). Get it wrong, it returns to Box 1.

Box Review Frequency
1 Every day
2 Every 2 days
3 Every 4 days
4 Every 9 days
5 Every 14 days

Elegant, physical, zero computation required. The Leitner system is the only spaced repetition algorithm you can implement with cardboard. Its core insight -- focus study time on the items you know least -- remains the foundation of every algorithm that followed.

Diagram of the Leitner flashcard box system: cards advance to the next box on correct recall and return to the first box on failure
The Leitner box system: a correct answer promotes the card one box rightward (less frequent review); a wrong answer demotes it back to box 1. Source: Wikimedia Commons (CC0).

1985--1987: Wozniak and SM-2 — The Algorithm That Conquered the World

On February 25, 1985, a 22-year-old molecular biology student named Piotr Wozniak began the experiment that would define spaced repetition software. Frustrated with forgetting his English word and biochemistry facts, he started tracking optimal inter-repetition intervals by hand.

By December 13, 1987, he had coded SuperMemo 1.0 for DOS -- the first computer program to calculate spaced repetition schedules. The algorithm inside, SM-2, tracks three variables per card:

  • n: repetition number (how many times you've reviewed it)
  • EF: easiness factor (initialized to 2.5, adjusted by your responses)
  • I: inter-repetition interval in days

The interval schedule:

I(1) = 1 day
I(2) = 6 days
I(n) = I(n-1) × EF    for n > 2

After each review, EF is adjusted:

EF' = EF + (0.1 - (5-q) × (0.08 + (5-q) × 0.02))

where q is the user's quality-of-response rating (0--5). EF is clamped to a minimum of 1.3.

That is the entire algorithm. Two formulas, three variables, fits in a tweet. It was designed by one person from empirical self-observation, not from any formal memory model.

And it works. SM-2 has been running, in nearly unchanged form, inside Anki since 2006, inside Mnemosyne since 2003, and inside dozens of other apps. It is by far the most widely deployed spaced repetition algorithm in history.

But SM-2 has problems:

Limitation Consequence
No probability model Cannot predict how likely you are to recall a card at any given moment
Fixed initial intervals (1, 6) No adaptation to card difficulty before first review
Linear EF adjustment Overreacts to single bad reviews; slow to recover
No per-user optimization Same formula for a medical student and a casual hobbyist
No forgetting model When you fail a card, it just resets -- no modeling of what went wrong

1989--2016: The SuperMemo Divergence

After SM-2, Wozniak kept iterating. SM-4 (1989) introduced an optimization matrix. SM-5 (1989) made it converge faster. SM-8 through SM-18 added increasingly sophisticated models -- two-component memory (stability + retrievability), neural network optimization, and incremental reading.

But these advances stayed locked inside SuperMemo's proprietary codebase. The rest of the world kept using SM-2.

Piotr Wozniak's own history of spaced repetition is worth reading in full -- it is one of the most remarkable single-author research programs in software history, even if the commercial isolation meant the field as a whole stagnated for 20 years.

2016: Duolingo's Half-Life Regression

In 2016, Burr Settles and Brendan Meeder at Duolingo published A Trainable Spaced Repetition Model for Language Learning (ACL 2016), introducing Half-Life Regression (HLR).

HLR models each word's "half-life" in memory -- the time until recall probability drops to 50%. Unlike SM-2, it:

  • Uses logistic regression with psycholinguistic features (word frequency, cognate status, user history)
  • Trains on millions of real review records
  • Predicts actual recall probabilities

On Duolingo's data, HLR achieved 45%+ error reduction versus baselines. In A/B testing with live users:

Metric Improvement
Practice session retention +9.5%
Lesson retention +1.7%
Overall daily activity +12%

Source: Settles & Meeder (2016), ACL. Code: github.com/duolingo/halflife-regression

HLR proved the concept: machine learning on review data can substantially outperform hand-tuned heuristics. But it was designed for Duolingo's specific feature set and never gained adoption outside.

2022--2025: FSRS — The Open Algorithm

In 2022, Jarrett Ye released FSRS (Free Spaced Repetition Scheduler), an open-source algorithm that brought modern machine learning to the Anki ecosystem. By November 2023, Anki shipped FSRS as a native option. By 2025, it had become the default for new users.

FSRS models memory with the DSR (Difficulty, Stability, Retrievability) framework:

Variable Symbol Definition Range
Difficulty D How hard it is to increase stability for this card 1--10
Stability S Days for retrievability to drop from 100% to 90% 0.1--36,500
Retrievability R Probability of successful recall right now 0--1

The core equations:

Forgetting curve (power function):

R(t, S) = (1 + F · t/S)C, where F = 19/81, C = -0.5

Stability after successful recall:

S'_r = S · (1 + ew₈ · (11 - D) · S-w₉ · (ew₁₀·(1-R) - 1) · hard/easy)

Stability after forgetting (lapse):

S'_f = w₁₁ · D-w₁₂ · ((S+1)w₁₃ - 1) · ew₁₄·(1-R)

The 19 weights (w₀ through w₁₈) are optimized per-user via gradient descent on their review history. This is the key innovation: FSRS treats scheduling as a machine learning problem where the loss function is log loss between predicted and actual recall.

Source: The Algorithm (FSRS Wiki), ABC of FSRS

The Benchmark: How Do They Actually Compare?

The open-spaced-repetition/srs-benchmark project evaluates algorithms on real Anki review data across thousands of user collections. The metric is log loss -- cross-entropy between predicted recall probability and binary outcome (recalled or forgot). Lower is better.

Algorithm Year Model Type Parameters Log Loss ↓ Notes
SM-2 (trainable) 1987 Linear EF 2 0.346 Added probability layer for benchmark
Leitner 1972 Fixed boxes 0 ~0.36 No probability prediction natively
HLR (Duolingo) 2016 Logistic regression 3+ 0.327 Feature-engineered
FSRS v3 2022 DSR exponential 13 0.332 First release
FSRS v4 2023 DSR power 17 0.326 Power curve, +4 params
FSRS-5 2024 DSR power + same-day 19 0.325 Same-day review handling
FSRS-6 2025 DSR power + flat curve 21 0.324 Optimizable curve flatness

Source: Benchmark of Spaced Repetition Algorithms, Expertium's Blog. Dataset: 10,000+ Anki user collections.

The headline number: FSRS-5 outperforms SM-2 in 97.4% of user collections. Against SM-17 (SuperMemo's current proprietary algorithm), FSRS-6 wins in 83.3% of collections.

In practical terms, users switching from SM-2 to FSRS report 20--30% fewer reviews for the same retention level. That is not a small efficiency gain -- for someone doing 200 reviews/day, it is 40--60 fewer cards per session, compounding over months and years.

What Makes FSRS Work: The Three Key Innovations

1. Per-user parameter optimization. SM-2 uses the same formula for everyone. FSRS trains 19 weights on your review history. If you consistently remember well at 30-day intervals, FSRS learns that your stability grows faster than average and gives you longer intervals. If you struggle with certain material, it adapts.

2. The difficulty-stability interaction. In FSRS, difficulty affects how much stability grows after a successful review, not just the base interval. A difficult card (D=8) with high stability (S=90 days) will have its stability increase less than an easy card (D=3) with the same stability. This models a real phenomenon: hard items need more reinforcement even after you "know" them.

3. The retrievability-aware scheduling. FSRS knows your exact probability of recall at any moment. If you review a card when R=0.70, the stability gain is larger than reviewing at R=0.95 -- because retrieving at lower confidence is a desirable difficulty that produces stronger memory encoding. This directly implements Robert Bjork's theory.

The Science Behind It All: Why Spacing Works

The spacing effect is not just an empirical regularity. It is backed by converging evidence from cognitive psychology, neuroscience, and now computational modeling.

The testing effect. Roediger & Karpicke (2006) showed that students who tested themselves three times (STTT) recalled 61% after one week, while students who studied four times (SSSS) recalled only 40%. Testing is not assessment -- it is the most powerful encoding event available. Rowland's 2014 meta-analysis of 159 studies confirmed the effect at Hedges' g = 0.50.

Desirable difficulties. Robert and Elizabeth Bjork coined this framework in 1994: conditions that make learning harder in the short term -- spacing, interleaving, retrieval practice -- produce better long-term retention. The difficulty is the mechanism. FSRS encodes this directly: reviewing at lower retrievability yields larger stability gains.

Consolidation theory. Memory consolidation during sleep transfers labile hippocampal traces to stable neocortical representations. Spacing reviews across multiple sleep cycles gives the consolidation process time to operate. Massed practice (cramming) competes with itself for consolidation resources.

Study Year N Key Finding Effect Size
Ebbinghaus 1885 1 Forgetting follows power law decay --
Cepeda et al. (meta) 2006 184 articles Spacing produces 10--30% better retention d = 0.42--0.77
Roediger & Karpicke 2006 120 Testing beats restudying at 1 week (61% vs 40%) large
Rowland (meta) 2014 159 studies Testing effect robust across conditions g = 0.50
Kornell & Bjork 2008 120 Interleaving doubles classification accuracy d = 0.99

The Essential Reading List

If you want to go deep, here is the canon -- the five texts that define the field.

Text Author(s) Year Why It Matters
Spaced Repetition for Efficient Learning Gwern Branwen 2009 The definitive overview. 50,000+ words. Covers history, research, practice, software. If you read one thing, read this.
Augmenting Long-term Memory Michael Nielsen 2018 A working scientist's account of using Anki daily for years. The "memory is a choice" framing that changed how people think about SRS.
Make It Stick Brown, Roediger, McDaniel 2014 The science of learning distilled for practitioners. Covers spacing, testing, interleaving, and why most study habits are wrong.
Andy Matuschak's notes Andy Matuschak 2019-- The frontier research on "mnemonic media" -- embedding spaced repetition inside reading, not as a separate activity.
A Three-Day Journey from Novice to Expert Jarrett Ye 2023 The FSRS creator's own tutorial. Takes you from zero to understanding the DSR model in three days.

Gwern's Spaced Repetition for Efficient Learning deserves special mention. Originally published in 2009 and continuously updated, it is arguably the most thorough single article ever written on the subject. Gwern provides a practical "5-minute rule" for deciding what to add to your deck: if you will spend more than 5 minutes over your lifetime looking something up or suffering from not knowing it, it is worth putting into SRS. That simple heuristic resolves most "what should I Anki?" debates.

Michael Nielsen's Augmenting Long-term Memory reframed the entire conversation in 2018: "The single biggest change that Anki brings about is that it means memory is no longer a haphazard event, to be left to chance. Rather, it guarantees I will remember something, with minimal effort. That is, Anki makes memory a choice."

The Algorithm Timeline: 1885--2025

Year Event Innovation
1885 Ebbinghaus publishes Über das Gedächtnis Quantified forgetting for the first time
1932 C.J. Spitzer tests 3,600 students First large-scale spacing effect study
1967 Pimsleur's graduated interval recall Hand-tuned schedule for audio learning
1972 Leitner's box system Physical spaced repetition without computation
1985 Wozniak begins self-experiments Birth of computational spaced repetition
1987 SuperMemo 1.0 / SM-2 First computer scheduling algorithm
1989 SM-4, SM-5 First adaptive algorithms (optimization matrix)
1991 SuperMemo 2.0 released as freeware SM-2 spreads globally
1994 Bjork coins "desirable difficulties" Theoretical framework for why spacing works
2003 Mnemosyne released First open-source SRS (uses SM-2)
2006 Anki released (Damien Elmes) SM-2 goes mainstream; 10M+ users eventually
2006 Roediger & Karpicke testing effect paper Landmark retrieval practice evidence
2016 Duolingo's HLR paper ML-based scheduling enters the literature
2022 Jarrett Ye releases FSRS v3 Open-source DSR model for Anki
2023 FSRS v4 (power curve) Power function replaces exponential
2023 Anki 23.10 ships native FSRS FSRS reaches millions of users
2024 FSRS-5 (same-day reviews, 19 params) Handles short-term memory
2025 FSRS-6 (21 params) Optimizable curve flatness

Implementing FSRS: It Is Surprisingly Small

Fernando Borretti wrote Implementing FSRS in 100 Lines -- and he was not exaggerating. The core algorithm is compact enough to fit in a single file. The complexity is not in the math; it is in the optimizer that trains the 19 weights on your review history.

For reference, My Kanji implements FSRS-5 natively in Ruby. Our study session system uses the DSR model to schedule kanji reviews, with per-user weight optimization. The implementation lives in app/services/fsrs_scheduler.rb -- a single file, ~200 lines, no external dependencies.

The Unsolved Problems

Spaced repetition algorithms are good. They are not done.

1. The cold-start problem. FSRS needs review history to optimize weights. A brand-new user with zero reviews gets default parameters. Andy Matuschak notes that the first 100 reviews are essentially flying blind.

2. Interference between similar items. If you learn (wait) and (hold) on the same day, they interfere with each other. No production algorithm models inter-item interference. The FSRS team has discussed this but it remains an open research problem.

3. Beyond recall: understanding. Current SRS only asks "can you recall this?" It cannot test whether you understand it in context, can use it productively, or have integrated it with related knowledge. Matuschak's mnemonic medium research is the most promising work on this frontier.

4. Emotional engagement. Matuschak argues that "the critical thing to optimize in spaced repetition memory systems is emotional connection to the review session and its contents." No algorithm does this. The 200th card of a review session feels different from the 5th, and algorithms are blind to that.

5. Optimal retention target. FSRS lets you set a desired retention rate (default: 90%). But is 90% optimal? Higher retention means more reviews. Lower means more forgetting. The Expertium benchmark shows diminishing returns above 90%, but the optimal point depends on the learner's goals, time budget, and material type. No algorithm adapts this dynamically.

The Bottom Line

Spaced repetition is the closest thing we have to a free lunch in learning. The spacing effect has been replicated for 140 years across every population, every material type, and every testing condition ever studied. The only question is how efficiently your algorithm exploits it.

SM-2 was a breakthrough in 1987 and remains serviceable today. FSRS is measurably better -- fewer reviews, more accurate predictions, per-user adaptation -- and it is open source.

If you are studying kanji, or anything else that requires long-term retention of thousands of discrete items, the choice of algorithm is not academic. Over a year of daily study, a 20% reduction in review load means dozens of hours reclaimed. That is time you can spend learning new material instead of re-reviewing what a better algorithm would have scheduled correctly.

The algorithm decides when you forget. Choose a good one.

References

Send feedback

Optional — only if you'd like a reply.