Authoring registry data
Dieser Inhalt ist noch nicht in deiner Sprache verfügbar.
Hand-authored YAML lives in the separate textrefs/registry repository, mounted in this repo as a git submodule at data/ (run git submodule update --init --recursive after cloning; see CONTRIBUTING.md for the full workflow). Contributors edit data/works/{work_key}.yaml and data/systems/{system_key}.yaml there. The compiled registry — Works, CitationSystems, CanonicalReferences, MappingAssertions — is produced by npm run build:data from the pinned submodule pointer and written as JSONL resources plus datapackage.json under dist/dump/. Published dumps are attached to TextRefs Standard/site releases and long-term archived in the TextRefs Zenodo community with citable DOIs.
This page documents the YAML format.
The two source directories
Section titled “The two source directories”data/├── works/{work_key}.yaml # one file per Work└── systems/{system_key}.yaml # one file per CitationSystemA Work source file declares the work itself, references the citation system it uses, lists references, optionally lists resolver templates, and optionally lists work-level mappings. A CitationSystem source file declares the locator regex and examples.
A worked example
Section titled “A worked example”The Dhammapada has 423 verses across 26 chapters and is hosted on four different reading platforms with four different URL patterns. The whole work, with chapter 1 fully wired up, is roughly 60 lines of YAML:
work: key: dhammapada preferred_label: Dhammapada status: candidate created: 2026-05-31 modified: 2026-05-31
citation_system: dhammapada-chapter-verse
mappings: - relation: exactMatch target_kind: wikidata identifier: 'https://www.wikidata.org/entity/Q220114' source: manual-curation status: candidate created: 2026-05-31 modified: 2026-05-31
resolvers: - provider: Project Gutenberg edition: 'Müller (1881), Gutenberg ebook #2017' language: en access: open url: 'https://www.gutenberg.org/files/2017/2017-h/2017-h.htm#link2HCH{chapter04}'
- provider: Wikisource language: en access: open url_by: chapter: 1: 'https://en.wikisource.org/wiki/Dhammapada_(Muller)#Chapter_I:_The_Twin-Verses' 2: 'https://en.wikisource.org/wiki/Dhammapada_(Muller)#Chapter_II:_On_Earnestness' # … one per chapter you have registered references for
- provider: palikanon.com language: de access: open url: 'https://palikanon.com/khuddaka/dhp/dhp.html#dhp_{verse}'
references: - '1.1' - '1.2' - '1.3' # …Each reference gets one entry per resolver. The compiler expands {chapter04} and {verse} from the named capture groups in the citation system’s locator_regex, and looks up url_by.chapter[N] for providers whose URL structure isn’t templatable.
Optional: creators
Section titled “Optional: creators”Works MAY carry an optional creators array under work: for citation rendering. Two entry shapes:
work: key: plato.republic preferred_label: Republic creators: - kind: person family: Plato # mononyms use `family` alone (CSL convention)
work: key: aristotle.de-mundo preferred_label: De mundo creators: - kind: literal name: '[Pseudo-]Aristotle' # institutions, collective, or pseudonymous attributionOmit creators entirely for anonymous or non-authored works (e.g. the Dhammapada, the Bible). For attributed-but-disputed works, record the traditional attribution for citation rendering and document uncertainty through mappings or review notes. The field is purely optional; nothing in the registry depends on it.
Naming and identity
Section titled “Naming and identity”Get work.key, work.preferred_label, and work.creators right on the first commit — renaming a key after publication is a tombstone event that mints new reference IRIs.
Shape: {author-slug}.{work-slug} for attributed works; bare {work-slug} for anonymous, collective, or canonical corpora.
author-slug— lowercased family name (or single mononym for antiquity); ASCII-folded;-for spaces; no initials. E.g.homer,plato,aristotle,wittgenstein,confucius,laozi,murasaki-shikibu.work-slug— the short form readers actually use:iliad,republic,tractatus,analects,daodejing. Avoid cryptic initialisms (eth-nic) and avoid full Latin titles unless that is the short form.- Bare slug for unattributed corpora:
tanakh,dhammapada,new-testament,quran. - Multiple works per author with the same short title: disambiguate inside the work-slug, not by promoting the author. E.g.
aristotle.nicomachean-ethics,aristotle.eudemian-ethics.
preferred_label
Section titled “preferred_label”The display title. No parenthetical disambiguator — author goes in creators, edition (SBLGNT, OCT, …) goes on the resolver target, alt-names belong in a future alt_labels field.
- Attributed: just the title —
Iliad,Republic,Tractatus Logico-Philosophicus. - Anonymous / collective: the conventional English name —
Tanakh,Dhammapada,New Testament.
creators
Section titled “creators”Follow CSL-JSON conventions so citeproc-js / Zotero render correctly.
- Standard names:
kind: personwithfamilyandgiven. E.g.{ kind: person, family: Wittgenstein, given: Ludwig }. - Mononyms (Homer, Plato, Confucius, Laozi, Murasaki Shikibu, …):
kind: personwithfamilyonly and nogiven. CSL convention for single-name authors; matches Chicago’s “Homer, Iliad 1.1.” output. - Anonymous / collective: omit
creatorsentirely. Don’t write a literal “Anonymous” — absence is the correct CSL signal. - Reserve
kind: literalfor names that genuinely should not decompose: corporate/institutional authors (“World Health Organization”) or pseudonymous attribution strings (“[Pseudo-]Aristotle”). - Attributed-but-disputed (e.g. Laozi for Daodejing): record the traditional attribution as
kind: person, family: Laozi; encode uncertainty via acloseMatchmapping, not in the name string.
How URL templates work
Section titled “How URL templates work”The compiler treats every resolver url as an RFC 6570 Level 1 template. Variables are drawn from two sources:
- Named capture groups in the citation system’s
locator_regex. For example, a regex like^(?<chapter>\d+)\.(?<verse>\d+)$exposes{chapter}and{verse}to every template. - Zero-padded variants of any numeric capture, generated automatically:
{chapter02},{chapter03},{chapter04},{verse02},{verse03}. Use the padding width that matches the target site’s URL. - Roman-numeral variants of any numeric capture in 1..3999, generated automatically:
{chapterRoman}producesI,VIII,XXVI. Useful for sites that anchor sections by Roman chapter (e.g. Wikisource’s#I:8Dhammapada verses). - Cumulative
{verseGlobal}— for systems whose locators have numericchapterandversegroups and declarechapter_sizes:(see below), the compiler exposes a global 1..N verse counter. Useful for single-page resolvers (e.g. palikanon.com’s#dhp_8,#dhp_102) whose anchors use one running index across all chapters.
If a template references a variable that doesn’t exist for a given reference, the compiler skips that resolver entry for that reference and warns. Empty resolver_targets arrays are valid; references stay registered.
When a URL pattern isn’t templatable
Section titled “When a URL pattern isn’t templatable”Some providers use chapter or section names that don’t fit a formula (e.g. 01-Pairs.htm, 02-Heedfulness.htm). For those, replace url: with a per-key map:
- provider: ancient-buddhist-texts.net language: en access: open url_by: chapter: 1: 'https://ancient-buddhist-texts.net/Texts-and-Translations/Dhammapada/01-Pairs.htm' 2: 'https://ancient-buddhist-texts.net/Texts-and-Translations/Dhammapada/02-Heedfulness.htm'The compiler looks up the value of the chosen variable (chapter) in the map and uses the matching URL. References for chapters not in the map are silently skipped for this provider — finish the map at your own pace.
When even that isn’t enough
Section titled “When even that isn’t enough”For one-off URLs that don’t fit any pattern (the typical case for older citation systems like Stephanus or Bekker pagination), put the URL directly on the reference using extra_resolvers:
references: - locator: '514a' extra_resolvers: - provider: Perseus Digital Library edition: 'Plato, Republic' language: grc-Grek access: open url: 'https://www.perseus.tufts.edu/...' last_checked: '2026-01-01'Top-level resolvers: and per-reference extra_resolvers: both contribute to the final resolver_targets array.
Enumerating canonical reference sets
Section titled “Enumerating canonical reference sets”Hand-listing every verse of Genesis or every line of the Iliad is not the right shape for a YAML file. For works whose reference set is regular enough to describe in a few numbers, use references_range: instead of (or alongside) references:. Each entry is one named expander; the compiler concatenates every expansion with the explicit references: list, de-dupes, and validates each generated locator against the citation system’s regex.
# 81 references: '1', '2', …, '81'references_range: - kind: integer from: 1 to: 81
# Iliad — 15,693 references from per-book line counts (Allen OCT):# '1.1', '1.2', …, '24.804'references_range: - kind: book_line counts: [ 611, 877, 461, 544, 909, 529, 482, 565, 713, 579, 848, 471, 837, 522, 746, 867, 761, 617, 424, 503, 611, 515, 897, 804, ]
# Analects — 517 references from per-book chapter counts:# '1.1', …, '20.5'references_range: - kind: book_chapter counts: [16, 24, 26, 26, 28, 30, 38, 21, 31, 27, 26, 24, 30, 47, 42, 14, 26, 11, 25, 5]
# Genesis — 1,533 references from per-chapter verse counts:# 'Genesis.1.1', …, 'Genesis.50.26'references_range: - kind: book_chapter_verse book: Genesis counts: [31, 25, 24, 26, 32, 22 /* …, 26 */]
# Dhammapada — 423 references from per-chapter verse counts (no book prefix):# '1.1', …, '26.41'references_range: - kind: chapter_verse counts: [20, 12, 11, 16, 16, 14, 10, 16, 13, 17, 11, 10, 12, 18, 12, 12, 14, 21, 17, 17, 16, 14, 14, 26, 23, 41]
# Bekker — page × {a,b} × lines 1..N, with explicit per-book page ranges:# '1094a1', '1094a2', …, '1181b30'references_range: - kind: bekker page_ranges: - [1094, 1103] # …one entry per Aristotelian book lines_per_column: 30
# Stephanus — page × sections {a..e}: '327a', '327b', …, '621e'references_range: - kind: stephanus page_range: [327, 621]Multiple references_range entries on one work are concatenated. Combine with explicit references: entries for one-off locators that don’t fit any range.
Citation system files
Section titled “Citation system files”A citation system declares its locator format once and is reused by every work that cites it.
key: dhammapada-chapter-versepreferred_label: Dhammapada chapter-and-versenormalization_version: 1.0.0locator_regex: '^(?<chapter>[1-9]|1[0-9]|2[0-6])\.(?<verse>[1-9][0-9]*)$'# Optional: per-chapter verse counts. When present, the compiler exposes# `{verseGlobal}` (cumulative 1..N) to URL templates of works using this system.chapter_sizes: [ 20, 12, 11, 16, 16, 14, 10, 16, 13, 17, 11, 10, 12, 18, 12, 12, 14, 21, 17, 17, 16, 14, 14, 26, 23, 41, ]examples: valid: ['1.1', '1.20', '8.3', '26.41'] invalid: ['0.1', '27.1', '1', '1.0']status: candidatecreated: 2026-05-31modified: 2026-06-01Name your capture groups deliberately — every URL template in every work that uses this system can refer to them. Add chapter_sizes: only when (a) the locator has numeric chapter and verse groups and (b) at least one resolver actually needs the global counter.
Building, validating, and previewing
Section titled “Building, validating, and previewing”npm run compile:data # expand YAML → JSONL resources plus datapackage.json under dist/dump/npm run validate:data # check every record against the canonical Zod schemasnpm run build:data # both, in ordernpm run dev # browse at http://localhost:4321/reg/ ; records live under /id/The compiler is deterministic: re-running compile:data against unchanged source produces zero diff. MappingAssertion and CanonicalReference UUIDs are derived from content per Identifier syntax, so the same YAML always produces the same identifiers.
What lives where
Section titled “What lives where”/id/work/{key}/— a Work’s canonical landing page (mappings, references, citation systems). A sibling/id/work/{key}.jsonserves the same record as JSON-LD./id/system/{key}/— a CitationSystem’s canonical landing page (regex, examples, references). Plus/id/system/{key}.json./id/ref/{uuid}/— a CanonicalReference page with every resolver URL grouped by language. Plus/id/ref/{uuid}.json./id/mapping/{uuid}/— a MappingAssertion page. Plus/id/mapping/{uuid}.json./reg/— the human registry browser (filter works and citation systems, then browse paginated reference lists from work/system pages)./cite/{work_key}/{locator}/— short alias that redirects to the canonical reference page.
A reader who types https://textrefs.org/cite/dhammapada/1.1 lands on the canonical reference page; the alias index is generated alongside the records by the compiler. See URL layout for the full four-prefix model.