# The Architecture of Kanji: Components, Positions, and Composition Rules

## Kanji as a Compression Algorithm

The CJK Unified Ideographs block in Unicode encodes 97,680 characters. The KanjiJump decomposition project reduces the 3,500 most important Japanese kanji to just 281 atomic components -- 200 if you merge positional variants. That is a compression ratio of roughly 12:1 from a component alphabet smaller than the set of English Scrabble tiles. The *Cihai* dictionary identifies 675 primitive components across 16,339 characters; a 2009 Chinese national standard narrows it to 514 for common use.

This is not metaphor. Kanji are a combinatorial writing system with a formal grammar, positional constraints, and phonetic encoding -- properties that Unicode has literally codified into twelve composition operators.

### The Twelve Composition Operators: Unicode's Kanji Grammar

Unicode block U+2FF0--U+2FFB defines Ideographic Description Characters (IDCs) -- prefix operators that describe how components combine into characters. These form a context-free grammar for glyph structure:

| Symbol | Code Point | Name | Example | Decomposition |
|:------:|------------|------|---------|---------------|
| ⿰ | U+2FF0 | Left to right | [相](/kanjis/76f8) | ⿰木目 |
| ⿱ | U+2FF1 | Above to below | 杏 | ⿱木口 |
| ⿲ | U+2FF2 | Left-middle-right | 衍 | ⿲彳氵亍 |
| ⿳ | U+2FF3 | Above-middle-below | [京](/kanjis/4eac) | ⿳亠口小 |
| ⿴ | U+2FF4 | Full surround | [回](/kanjis/56de) | ⿴囗口 |
| ⿵ | U+2FF5 | Surround from above | 凰 | ⿵几皇 |
| ⿶ | U+2FF6 | Surround from below | [凶](/kanjis/51f6) | ⿶凵㐅 |
| ⿷ | U+2FF7 | Surround from left | 匠 | ⿷匚斤 |
| ⿸ | U+2FF8 | Above-left surround | [病](/kanjis/75c5) | ⿸疒丙 |
| ⿹ | U+2FF9 | Above-right surround | [戒](/kanjis/6212) | ⿹戈廾 |
| ⿺ | U+2FFA | Below-left surround | [超](/kanjis/8d85) | ⿺走召 |
| ⿻ | U+2FFB | Overlaid | 巫 | ⿻工从 |

![Examples of Ideographic Description Sequences: 字 decomposes as ⿱宀子, 匠 as ⿷匚斤, 京 as ⿳亠口小, 米 as ⿻八木](https://upload.wikimedia.org/wikipedia/commons/b/bf/Ideographic_description_sequences.png)
*Worked decomposition examples: each character on the left is rewritten as an IDC operator (the dashed box) followed by its component characters. Source: [Wikimedia Commons](https://commons.wikimedia.org/wiki/File:Ideographic_description_sequences.png).*

Ten of the twelve are binary operators (two operands); ⿲ and ⿳ are ternary (three). Unicode 15.1 added four more (U+2FFC--U+2FFF) for left-open surround, bottom-right surround, horizontal reflection, and rotation -- bringing the total to 16. But the original twelve handle the vast majority of characters.

![The U+2FF0 through U+2FFF Unicode block: sixteen Ideographic Description Characters as code-point cells](https://upload.wikimedia.org/wikipedia/commons/c/c5/UCB_Ideographic_Description_Characters.png)
*The full Ideographic Description Characters Unicode block (U+2FF0--U+2FFF), including the four operators added in Unicode 15.1 (U+2FFC--U+2FFF). Source: [Wikimedia Commons](https://commons.wikimedia.org/wiki/File:UCB_Ideographic_Description_Characters.png).*

The CHISE project and the cjkvi-ids database on GitHub have applied IDS decomposition to over 75,000 CJK ideographs, producing a machine-readable structural atlas of the entire character space. The distribution is heavily skewed: Gao and Kao (2002) found that over 60% of high-frequency characters use ⿰ (left-right), roughly 20% use ⿱ (top-bottom), and the remaining 20% divide among enclosure and overlay patterns. Left-right dominance reflects the phono-semantic architecture: semantic radical on the left, phonetic component on the right.

### The Seven Positional Slots

Japanese pedagogy formalizes component placement into seven named positions (部首の位置). These are not arbitrary labels -- they are structural constraints that determine which shape variant a component takes and which slot it occupies.

| Position | Japanese | Reading | Location | Examples |
|----------|----------|---------|----------|---------|
| Hen | 偏 | へん | Left side | 氵 in [海](/kanjis/6d77), 亻 in [休](/kanjis/4f11), 扌 in [持](/kanjis/6301) |
| Tsukuri | 旁 | つくり | Right side | 刂 in [判](/kanjis/5224), 攵 in 教, 頁 in 頭 |
| Kanmuri | 冠 | かんむり | Top (crown) | 艹 in [花](/kanjis/82b1), 宀 in [安](/kanjis/5b89), 雨 in [雲](/kanjis/96f2) |
| Ashi | 脚 | あし | Bottom (legs) | 灬 in [然](/kanjis/7136), 心 in [思](/kanjis/601d), 皿 in [盤](/kanjis/76e4) |
| Tare | 垂 | たれ | Top-left drape | 广 in [店](/kanjis/5e97), 疒 in [病](/kanjis/75c5), 尸 in [届](/kanjis/5c4a) |
| Nyou | 繞 | にょう | Bottom-left wrap | 辶 in [道](/kanjis/9053), 廴 in [建](/kanjis/5efa), 之 in [芝](/kanjis/829d) |
| Kamae | 構 | かまえ | Full/partial surround | 門 in [間](/kanjis/9593), 囗 in [国](/kanjis/56fd), 行 in [術](/kanjis/8853) |

The hen (left) and tsukuri (right) positions dominate, accounting for over 60% of all component placements -- a direct consequence of the ⿰ operator's prevalence. Among the 2,136 joyo kanji, just 6 radicals account for 25% of all characters, and 50 radicals cover 75%. Nearly all appear in the hen or kanmuri slots. Many radicals are constrained to a single position: 氵 is always hen, 刂 is always tsukuri, 艹 is always kanmuri. When a component moves position, it changes shape -- [水](/kanjis/6c34) becomes 氵 on the left, [心](/kanjis/5fc3) becomes 忄 on the left but stays 心 on the bottom, [火](/kanjis/706b) becomes 灬 at the bottom.

### Phonetic Components: The Sound Encoding Layer

Approximately 67--82% of kanji are phono-semantic compounds (形声文字), depending on the analysis. The phonetic component (声符 *seifu*) encodes the on'yomi while the semantic radical signals the meaning domain. The EDRDG project catalogs 150 phonetic components; KanjiJump documents 808 sound components across the broader set, noting that 74% of the 3,500 most important kanji either include or serve as a sound component.

Reliability varies dramatically. Some phonetic series achieve 100% consistency. Others degrade through centuries of sound change between Old Chinese and modern Japanese on'yomi. The ten most productive phonetic components:

| Component | On'yomi | Derivatives | Reliability | Example Series |
|:---------:|---------|:-----------:|:-----------:|----------------|
| 匕 | ヒ | ~30 | Medium | [比](/kanjis/6bd4), 匙, [旨](/kanjis/65e8), [尼](/kanjis/5c3c), [北](/kanjis/5317) |
| [者](/kanjis/8005) | シャ | ~23 | Medium | [暑](/kanjis/6691), [署](/kanjis/7f72), [諸](/kanjis/8af8), [緒](/kanjis/7dd2), [都](/kanjis/90fd) |
| [生](/kanjis/751f) | セイ | ~21 | Medium | [性](/kanjis/6027), [星](/kanjis/661f), [姓](/kanjis/59d3), [牲](/kanjis/7272), [産](/kanjis/7523) |
| 勹 | ホウ | ~20 | Medium | [包](/kanjis/5305), [抱](/kanjis/62b1), [泡](/kanjis/6ce1), [砲](/kanjis/7832), [飽](/kanjis/98fd) |
| 隹 | サイ | ~19 | Medium | [推](/kanjis/63a8), [維](/kanjis/7dad), [雄](/kanjis/96c4), [集](/kanjis/96c6), [準](/kanjis/6e96) |
| [可](/kanjis/53ef) | カ | ~17 | High | [何](/kanjis/4f55), [河](/kanjis/6cb3), [荷](/kanjis/8377), [歌](/kanjis/6b4c), [苛](/kanjis/82db) |
| 圭 | ケイ | ~17 | High | [掛](/kanjis/639b), 畦, 桂, 蛙, [街](/kanjis/8857) |
| [方](/kanjis/65b9) | ホウ | ~16 | High | [放](/kanjis/653e), [防](/kanjis/9632), [紡](/kanjis/7d21), [坊](/kanjis/574a), [芳](/kanjis/82b3) |
| [白](/kanjis/767d) | ハク | ~15 | High | [伯](/kanjis/4f2f), [拍](/kanjis/62cd), [泊](/kanjis/6cca), [迫](/kanjis/8feb), [舶](/kanjis/8236) |
| [各](/kanjis/5404) | カク | ~15 | High | [格](/kanjis/683c), [閣](/kanjis/95a3), [額](/kanjis/984d), [客](/kanjis/5ba2), [略](/kanjis/7565) |

*High reliability: >80% of derivatives share the predicted on'yomi. Medium: 50--80%. Data from EDRDG, The Kanji Code, and KanjiJump.*

The "perfect series" are the highest-leverage components for learners. [票](/kanjis/7968) (ヒョウ) generates 12 derivatives -- [標](/kanjis/6a19), [漂](/kanjis/6f02), 瓢, 剽, and others -- all reading ヒョウ with zero exceptions. 冓 (コウ) yields 10 derivatives ([構](/kanjis/69cb), [溝](/kanjis/6e9d), [講](/kanjis/8b1b), [購](/kanjis/8cfc)), all reading コウ. [包](/kanjis/5305) (ホウ) gives 6 ([抱](/kanjis/62b1), [泡](/kanjis/6ce1), [砲](/kanjis/7832), 胞, [飽](/kanjis/98fd)), all ホウ. These 100%-reliable series mean that learning a single component lets you predict the on'yomi of every character in the family on sight.

### Computational Decomposition: CHISE and the IDS Tree

The CHISE project (Character Processing Based on a Huge Structured Environment), based at Kyoto University, maintains an IDS decomposition database serialized in RDF and queryable via SPARQL. Each character is represented as a tree of composition operators and atomic components -- essentially an abstract syntax tree for glyphs. The cjk-decomp project provides decomposition data for 75,000 ideographs, identifying approximately 10,000 intermediate composite components between the atomic primitives and the final characters.

This hierarchy mirrors how compilers represent expressions: terminals (atomic strokes and components), non-terminals (composite sub-components), and production rules (the IDS operators). The implication is that kanji are not 50,000 independent symbols. They are 50,000 *strings* generated by a grammar with roughly 300--700 terminals and 12 production rules. Viewed this way, the writing system is less a dictionary than a codebase -- and component analysis is the decompiler.

We've made our own decompiler: the [Kanji Atlas](/components) renders the full component graph for the 2,136 joyo characters. [Atlas Grade 1](/components/grade/1) is the place to see the principle in action — every Grade 1 character is decomposed into its kanji, [radicals](/radicals), and [graphemes](/graphemes).

### References

- Unicode Consortium. "Ideographic Description Characters." *The Unicode Standard*, [Chapter 18](https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-18/).
- CHISE Project. [chise.org](https://www.chise.org/)
- cjkvi-ids. IDS Data for CJK Unified Ideographs. [github.com/cjkvi/cjkvi-ids](https://github.com/cjkvi/cjkvi-ids)
- amake/cjk-decomp. Decomposition data for 75,000 CJK ideographs. [github.com/amake/cjk-decomp](https://github.com/amake/cjk-decomp)
- KanjiJump. "The 281 Atomic Kanji Components." [kanjijump.com](https://www.kanjijump.com/browse/atomic)
- Gao, D.G. & Kao, H.S.R. (2002). Chinese character structure analysis. *Acta Psychologica Sinica*.
- EDRDG. Kanji Phonetic Components. [edrdg.org](https://www.edrdg.org/~jwb/kanjiphonetics/)
- Millen, A. *The Kanji Code*. [thekanjicode.com](https://thekanjicode.com/)
- Wikipedia. "[Ideographic Description Characters](https://en.wikipedia.org/wiki/Ideographic_Description_Characters)," "[Chinese Character Components](https://en.wikipedia.org/wiki/Chinese_character_components)," "[List of Kanji Radicals by Frequency](https://en.wikipedia.org/wiki/List_of_kanji_radicals_by_frequency)."

