Spaced Repetition Algorithms: From Ebbinghaus to FSRS — A Deep Dive
The Algorithm That Decides When You Forget
You open your flashcard app. A card appears. You answer. The app decides when to show it again -- tomorrow, next week, in three months. That scheduling decision is the entire game. Get it right and you retain 95% of everything you study with minimal effort. Get it wrong and you either forget (intervals too long) or waste hours reviewing things you already know (intervals too short).
Spaced repetition algorithms are the scheduling engines behind this decision. They have evolved from a German psychologist's self-experiments with nonsense syllables in 1885 to machine learning systems trained on hundreds of millions of review records. This is the history, the math, and the engineering of how we got here.
The Forgetting Curve: Where It All Started
In 1885, Hermann Ebbinghaus published Über das Gedächtnis (Memory: A Contribution to Experimental Psychology), one of the most consequential single-author psychology experiments ever conducted. Working alone in Berlin, he memorized lists of 13 nonsense syllables (CVC trigrams like WID, ZOF, BUP) and tested himself at intervals ranging from 20 minutes to 31 days. His metric was "savings" -- how much less time relearning took compared to original learning.
| Time Since Learning | Retention (%) | Lost (%) |
|---|---|---|
| 20 minutes | 58.2 | 41.8 |
| 1 hour | 44.2 | 55.8 |
| 8.8 hours | 35.8 | 64.2 |
| 1 day | 33.7 | 66.3 |
| 2 days | 27.8 | 72.2 |
| 6 days | 25.4 | 74.6 |
| 31 days | 21.1 | 78.9 |
Source: Ebbinghaus (1885), Section 29. Replicated by Murre & Dros (2015) in PLOS ONE with modern controls.
The classic shape of Ebbinghaus's forgetting curve: rapid initial loss, then a long, slow tail. Source: Wikimedia Commons (public domain).
The shape of this curve -- steep initial drop, long tail -- has been confirmed across thousands of studies over 140 years. Cepeda et al. (2006) meta-analyzed 184 articles containing 317 experiments on distributed practice and found that spaced study produced 10--30% better retention than massed study across virtually every condition tested.
But Ebbinghaus established something even more fundamental: each review resets the curve at a shallower slope. The interval before you forget to the same threshold grows after every successful retrieval. This is the spacing effect -- the oldest and most replicated finding in experimental psychology.
The Debate: Exponential vs. Power Law Forgetting
What mathematical function best describes the forgetting curve? This matters because your scheduling algorithm's entire prediction depends on it.
Ebbinghaus himself fitted his data to a power function: b = 100k / ((log t)c + k). For decades, the field assumed forgetting was exponential: R(t) = e-t/S, where S is memory stability. SuperMemo used this model for over 30 years.
But in 2024, the FSRS team showed that a power function fits real-world data better:
R(t, S) = (1 + F · t/S)C
where F = 19/81 and C = -0.5.
Why? Individual memories may decay exponentially, but when you aggregate across memories of different strengths -- which is what a real flashcard deck contains -- the mixture follows a power law. Ebbinghaus actually discovered this himself: his original data fits a power function better than an exponential. It took 139 years for spaced repetition software to catch up.
| Model | Equation | Tail Behavior | Fit to Real Data |
|---|---|---|---|
| Exponential | R = e-t/S | Drops to near-zero quickly | Good for single-item, poor for mixed decks |
| Power law | R = (1 + t/9S)-1 | Heavy tail, slower decay | Better fit across 10K+ user collections |
Source: A technical explanation of FSRS, Expertium's Blog
1967: Pimsleur's Graduated Intervals
Before computers entered the picture, Paul Pimsleur published a graduated interval recall schedule in 1967, designed for audio language instruction:
5 sec → 25 sec → 2 min → 10 min → 1 hour → 5 hours → 1 day → 5 days → 25 days → 4 months → 2 years
Each interval is roughly 5x the previous. This was hand-tuned to the constraints of audio lessons where you cannot flip back. Pimsleur's system is still used in Pimsleur language courses today -- a fixed schedule, no adaptation to the learner, but remarkably effective for its simplicity.
1972: Leitner's Box System
Sebastian Leitner published So lernt man lernen ("How to Learn to Learn") in 1972, introducing the Leitner system: five physical boxes of flashcards with increasing review intervals. Get a card right, it moves to the next box (reviewed less often). Get it wrong, it returns to Box 1.
| Box | Review Frequency |
|---|---|
| 1 | Every day |
| 2 | Every 2 days |
| 3 | Every 4 days |
| 4 | Every 9 days |
| 5 | Every 14 days |
Elegant, physical, zero computation required. The Leitner system is the only spaced repetition algorithm you can implement with cardboard. Its core insight -- focus study time on the items you know least -- remains the foundation of every algorithm that followed.
The Leitner box system: a correct answer promotes the card one box rightward (less frequent review); a wrong answer demotes it back to box 1. Source: Wikimedia Commons (CC0).
1985--1987: Wozniak and SM-2 — The Algorithm That Conquered the World
On February 25, 1985, a 22-year-old molecular biology student named Piotr Wozniak began the experiment that would define spaced repetition software. Frustrated with forgetting his English word and biochemistry facts, he started tracking optimal inter-repetition intervals by hand.
By December 13, 1987, he had coded SuperMemo 1.0 for DOS -- the first computer program to calculate spaced repetition schedules. The algorithm inside, SM-2, tracks three variables per card:
- n: repetition number (how many times you've reviewed it)
- EF: easiness factor (initialized to 2.5, adjusted by your responses)
- I: inter-repetition interval in days
The interval schedule:
I(1) = 1 day
I(2) = 6 days
I(n) = I(n-1) × EF for n > 2
After each review, EF is adjusted:
EF' = EF + (0.1 - (5-q) × (0.08 + (5-q) × 0.02))
where q is the user's quality-of-response rating (0--5). EF is clamped to a minimum of 1.3.
That is the entire algorithm. Two formulas, three variables, fits in a tweet. It was designed by one person from empirical self-observation, not from any formal memory model.
And it works. SM-2 has been running, in nearly unchanged form, inside Anki since 2006, inside Mnemosyne since 2003, and inside dozens of other apps. It is by far the most widely deployed spaced repetition algorithm in history.
But SM-2 has problems:
| Limitation | Consequence |
|---|---|
| No probability model | Cannot predict how likely you are to recall a card at any given moment |
| Fixed initial intervals (1, 6) | No adaptation to card difficulty before first review |
| Linear EF adjustment | Overreacts to single bad reviews; slow to recover |
| No per-user optimization | Same formula for a medical student and a casual hobbyist |
| No forgetting model | When you fail a card, it just resets -- no modeling of what went wrong |
1989--2016: The SuperMemo Divergence
After SM-2, Wozniak kept iterating. SM-4 (1989) introduced an optimization matrix. SM-5 (1989) made it converge faster. SM-8 through SM-18 added increasingly sophisticated models -- two-component memory (stability + retrievability), neural network optimization, and incremental reading.
But these advances stayed locked inside SuperMemo's proprietary codebase. The rest of the world kept using SM-2.
Piotr Wozniak's own history of spaced repetition is worth reading in full -- it is one of the most remarkable single-author research programs in software history, even if the commercial isolation meant the field as a whole stagnated for 20 years.
2016: Duolingo's Half-Life Regression
In 2016, Burr Settles and Brendan Meeder at Duolingo published A Trainable Spaced Repetition Model for Language Learning (ACL 2016), introducing Half-Life Regression (HLR).
HLR models each word's "half-life" in memory -- the time until recall probability drops to 50%. Unlike SM-2, it:
- Uses logistic regression with psycholinguistic features (word frequency, cognate status, user history)
- Trains on millions of real review records
- Predicts actual recall probabilities
On Duolingo's data, HLR achieved 45%+ error reduction versus baselines. In A/B testing with live users:
| Metric | Improvement |
|---|---|
| Practice session retention | +9.5% |
| Lesson retention | +1.7% |
| Overall daily activity | +12% |
Source: Settles & Meeder (2016), ACL. Code: github.com/duolingo/halflife-regression
HLR proved the concept: machine learning on review data can substantially outperform hand-tuned heuristics. But it was designed for Duolingo's specific feature set and never gained adoption outside.
2022--2025: FSRS — The Open Algorithm
In 2022, Jarrett Ye released FSRS (Free Spaced Repetition Scheduler), an open-source algorithm that brought modern machine learning to the Anki ecosystem. By November 2023, Anki shipped FSRS as a native option. By 2025, it had become the default for new users.
FSRS models memory with the DSR (Difficulty, Stability, Retrievability) framework:
| Variable | Symbol | Definition | Range |
|---|---|---|---|
| Difficulty | D | How hard it is to increase stability for this card | 1--10 |
| Stability | S | Days for retrievability to drop from 100% to 90% | 0.1--36,500 |
| Retrievability | R | Probability of successful recall right now | 0--1 |
The core equations:
Forgetting curve (power function):
R(t, S) = (1 + F · t/S)C, where F = 19/81, C = -0.5
Stability after successful recall:
S'_r = S · (1 + ew₈ · (11 - D) · S-w₉ · (ew₁₀·(1-R) - 1) · hard/easy)
Stability after forgetting (lapse):
S'_f = w₁₁ · D-w₁₂ · ((S+1)w₁₃ - 1) · ew₁₄·(1-R)
The 19 weights (w₀ through w₁₈) are optimized per-user via gradient descent on their review history. This is the key innovation: FSRS treats scheduling as a machine learning problem where the loss function is log loss between predicted and actual recall.
Source: The Algorithm (FSRS Wiki), ABC of FSRS
The Benchmark: How Do They Actually Compare?
The open-spaced-repetition/srs-benchmark project evaluates algorithms on real Anki review data across thousands of user collections. The metric is log loss -- cross-entropy between predicted recall probability and binary outcome (recalled or forgot). Lower is better.
| Algorithm | Year | Model Type | Parameters | Log Loss ↓ | Notes |
|---|---|---|---|---|---|
| SM-2 (trainable) | 1987 | Linear EF | 2 | 0.346 | Added probability layer for benchmark |
| Leitner | 1972 | Fixed boxes | 0 | ~0.36 | No probability prediction natively |
| HLR (Duolingo) | 2016 | Logistic regression | 3+ | 0.327 | Feature-engineered |
| FSRS v3 | 2022 | DSR exponential | 13 | 0.332 | First release |
| FSRS v4 | 2023 | DSR power | 17 | 0.326 | Power curve, +4 params |
| FSRS-5 | 2024 | DSR power + same-day | 19 | 0.325 | Same-day review handling |
| FSRS-6 | 2025 | DSR power + flat curve | 21 | 0.324 | Optimizable curve flatness |
Source: Benchmark of Spaced Repetition Algorithms, Expertium's Blog. Dataset: 10,000+ Anki user collections.
The headline number: FSRS-5 outperforms SM-2 in 97.4% of user collections. Against SM-17 (SuperMemo's current proprietary algorithm), FSRS-6 wins in 83.3% of collections.
In practical terms, users switching from SM-2 to FSRS report 20--30% fewer reviews for the same retention level. That is not a small efficiency gain -- for someone doing 200 reviews/day, it is 40--60 fewer cards per session, compounding over months and years.
What Makes FSRS Work: The Three Key Innovations
1. Per-user parameter optimization. SM-2 uses the same formula for everyone. FSRS trains 19 weights on your review history. If you consistently remember well at 30-day intervals, FSRS learns that your stability grows faster than average and gives you longer intervals. If you struggle with certain material, it adapts.
2. The difficulty-stability interaction. In FSRS, difficulty affects how much stability grows after a successful review, not just the base interval. A difficult card (D=8) with high stability (S=90 days) will have its stability increase less than an easy card (D=3) with the same stability. This models a real phenomenon: hard items need more reinforcement even after you "know" them.
3. The retrievability-aware scheduling. FSRS knows your exact probability of recall at any moment. If you review a card when R=0.70, the stability gain is larger than reviewing at R=0.95 -- because retrieving at lower confidence is a desirable difficulty that produces stronger memory encoding. This directly implements Robert Bjork's theory.
The Science Behind It All: Why Spacing Works
The spacing effect is not just an empirical regularity. It is backed by converging evidence from cognitive psychology, neuroscience, and now computational modeling.
The testing effect. Roediger & Karpicke (2006) showed that students who tested themselves three times (STTT) recalled 61% after one week, while students who studied four times (SSSS) recalled only 40%. Testing is not assessment -- it is the most powerful encoding event available. Rowland's 2014 meta-analysis of 159 studies confirmed the effect at Hedges' g = 0.50.
Desirable difficulties. Robert and Elizabeth Bjork coined this framework in 1994: conditions that make learning harder in the short term -- spacing, interleaving, retrieval practice -- produce better long-term retention. The difficulty is the mechanism. FSRS encodes this directly: reviewing at lower retrievability yields larger stability gains.
Consolidation theory. Memory consolidation during sleep transfers labile hippocampal traces to stable neocortical representations. Spacing reviews across multiple sleep cycles gives the consolidation process time to operate. Massed practice (cramming) competes with itself for consolidation resources.
| Study | Year | N | Key Finding | Effect Size |
|---|---|---|---|---|
| Ebbinghaus | 1885 | 1 | Forgetting follows power law decay | -- |
| Cepeda et al. (meta) | 2006 | 184 articles | Spacing produces 10--30% better retention | d = 0.42--0.77 |
| Roediger & Karpicke | 2006 | 120 | Testing beats restudying at 1 week (61% vs 40%) | large |
| Rowland (meta) | 2014 | 159 studies | Testing effect robust across conditions | g = 0.50 |
| Kornell & Bjork | 2008 | 120 | Interleaving doubles classification accuracy | d = 0.99 |
The Essential Reading List
If you want to go deep, here is the canon -- the five texts that define the field.
| Text | Author(s) | Year | Why It Matters |
|---|---|---|---|
| Spaced Repetition for Efficient Learning | Gwern Branwen | 2009 | The definitive overview. 50,000+ words. Covers history, research, practice, software. If you read one thing, read this. |
| Augmenting Long-term Memory | Michael Nielsen | 2018 | A working scientist's account of using Anki daily for years. The "memory is a choice" framing that changed how people think about SRS. |
| Make It Stick | Brown, Roediger, McDaniel | 2014 | The science of learning distilled for practitioners. Covers spacing, testing, interleaving, and why most study habits are wrong. |
| Andy Matuschak's notes | Andy Matuschak | 2019-- | The frontier research on "mnemonic media" -- embedding spaced repetition inside reading, not as a separate activity. |
| A Three-Day Journey from Novice to Expert | Jarrett Ye | 2023 | The FSRS creator's own tutorial. Takes you from zero to understanding the DSR model in three days. |
Gwern's Spaced Repetition for Efficient Learning deserves special mention. Originally published in 2009 and continuously updated, it is arguably the most thorough single article ever written on the subject. Gwern provides a practical "5-minute rule" for deciding what to add to your deck: if you will spend more than 5 minutes over your lifetime looking something up or suffering from not knowing it, it is worth putting into SRS. That simple heuristic resolves most "what should I Anki?" debates.
Michael Nielsen's Augmenting Long-term Memory reframed the entire conversation in 2018: "The single biggest change that Anki brings about is that it means memory is no longer a haphazard event, to be left to chance. Rather, it guarantees I will remember something, with minimal effort. That is, Anki makes memory a choice."
The Algorithm Timeline: 1885--2025
| Year | Event | Innovation |
|---|---|---|
| 1885 | Ebbinghaus publishes Über das Gedächtnis | Quantified forgetting for the first time |
| 1932 | C.J. Spitzer tests 3,600 students | First large-scale spacing effect study |
| 1967 | Pimsleur's graduated interval recall | Hand-tuned schedule for audio learning |
| 1972 | Leitner's box system | Physical spaced repetition without computation |
| 1985 | Wozniak begins self-experiments | Birth of computational spaced repetition |
| 1987 | SuperMemo 1.0 / SM-2 | First computer scheduling algorithm |
| 1989 | SM-4, SM-5 | First adaptive algorithms (optimization matrix) |
| 1991 | SuperMemo 2.0 released as freeware | SM-2 spreads globally |
| 1994 | Bjork coins "desirable difficulties" | Theoretical framework for why spacing works |
| 2003 | Mnemosyne released | First open-source SRS (uses SM-2) |
| 2006 | Anki released (Damien Elmes) | SM-2 goes mainstream; 10M+ users eventually |
| 2006 | Roediger & Karpicke testing effect paper | Landmark retrieval practice evidence |
| 2016 | Duolingo's HLR paper | ML-based scheduling enters the literature |
| 2022 | Jarrett Ye releases FSRS v3 | Open-source DSR model for Anki |
| 2023 | FSRS v4 (power curve) | Power function replaces exponential |
| 2023 | Anki 23.10 ships native FSRS | FSRS reaches millions of users |
| 2024 | FSRS-5 (same-day reviews, 19 params) | Handles short-term memory |
| 2025 | FSRS-6 (21 params) | Optimizable curve flatness |
Implementing FSRS: It Is Surprisingly Small
Fernando Borretti wrote Implementing FSRS in 100 Lines -- and he was not exaggerating. The core algorithm is compact enough to fit in a single file. The complexity is not in the math; it is in the optimizer that trains the 19 weights on your review history.
For reference, My Kanji implements FSRS-5 natively in Ruby. Our study session system uses the DSR model to schedule kanji reviews, with per-user weight optimization. The implementation lives in app/services/fsrs_scheduler.rb -- a single file, ~200 lines, no external dependencies.
The Unsolved Problems
Spaced repetition algorithms are good. They are not done.
1. The cold-start problem. FSRS needs review history to optimize weights. A brand-new user with zero reviews gets default parameters. Andy Matuschak notes that the first 100 reviews are essentially flying blind.
2. Interference between similar items. If you learn 待 (wait) and 持 (hold) on the same day, they interfere with each other. No production algorithm models inter-item interference. The FSRS team has discussed this but it remains an open research problem.
3. Beyond recall: understanding. Current SRS only asks "can you recall this?" It cannot test whether you understand it in context, can use it productively, or have integrated it with related knowledge. Matuschak's mnemonic medium research is the most promising work on this frontier.
4. Emotional engagement. Matuschak argues that "the critical thing to optimize in spaced repetition memory systems is emotional connection to the review session and its contents." No algorithm does this. The 200th card of a review session feels different from the 5th, and algorithms are blind to that.
5. Optimal retention target. FSRS lets you set a desired retention rate (default: 90%). But is 90% optimal? Higher retention means more reviews. Lower means more forgetting. The Expertium benchmark shows diminishing returns above 90%, but the optimal point depends on the learner's goals, time budget, and material type. No algorithm adapts this dynamically.
The Bottom Line
Spaced repetition is the closest thing we have to a free lunch in learning. The spacing effect has been replicated for 140 years across every population, every material type, and every testing condition ever studied. The only question is how efficiently your algorithm exploits it.
SM-2 was a breakthrough in 1987 and remains serviceable today. FSRS is measurably better -- fewer reviews, more accurate predictions, per-user adaptation -- and it is open source.
If you are studying kanji, or anything else that requires long-term retention of thousands of discrete items, the choice of algorithm is not academic. Over a year of daily study, a 20% reduction in review load means dozens of hours reclaimed. That is time you can spend learning new material instead of re-reviewing what a better algorithm would have scheduled correctly.
The algorithm decides when you forget. Choose a good one.
References
- Bjork, R.A. & Bjork, E.L. (2011). Creating desirable difficulties to enhance learning. In Psychology and the Real World. Worth Publishers.
- Branwen, G. (2009--). Spaced Repetition for Efficient Learning. gwern.net.
- Brown, P.C., Roediger, H.L., & McDaniel, M.A. (2014). Make It Stick: The Science of Successful Learning. Harvard University Press.
- Cepeda, N.J., Pashler, H., Vul, E., Wixted, J.T., & Rohrer, D. (2006). Distributed practice in verbal recall tasks: A review and quantitative synthesis. Psychological Bulletin, 132(3), 354--380.
- Ebbinghaus, H. (1885). Über das Gedächtnis. English trans.: Memory: A Contribution to Experimental Psychology (1913).
- Expertium. (2024). Benchmark of Spaced Repetition Algorithms. GitHub Pages.
- Expertium. (2024). A technical explanation of FSRS. GitHub Pages.
- Leitner, S. (1972). So lernt man lernen. Herder.
- Matuschak, A. (2019--). Spaced repetition memory system notes. andymatuschak.org.
- Murre, J.M.J. & Dros, J. (2015). Replication and analysis of Ebbinghaus' forgetting curve. PLOS ONE, 10(7), e0120644.
- Nielsen, M. (2018). Augmenting Long-term Memory. augmentingcognition.com.
- Roediger, H.L. & Karpicke, J.D. (2006). Test-enhanced learning. Psychological Science, 17(3), 249--255.
- Settles, B. & Meeder, B. (2016). A Trainable Spaced Repetition Model for Language Learning. Proc. ACL 2016.
- Wozniak, P. (1990--). The true history of spaced repetition. supermemo.com.
- Wozniak, P. (1987). Algorithm SM-2. supermemo.guru.
- Ye, J. (2022--). FSRS4Anki. GitHub.
- Ye, J. (2023). Spaced Repetition Algorithm: A Three-Day Journey from Novice to Expert. FSRS Wiki.