The Spine 4.3 release brought a quiet but significant improvement to the /markdown endpoint: it now returns clean, semantic Markdown rather than the previous output riddled with inline HTML tags and style attributes. This change dramatically simplified the document manager toolchain — no HTML parsing or stripping required, just well-structured Markdown ready for processing from the start.
The tool itself is built around two core ideas: TOC-driven splitting and content-hash-based UID tracking. When you're dealing with the Spine API Reference — a single Markdown file spanning hundreds of class entries — translating the entire document in one piece is impractical for collaboration and makes it impossible to know which sections are done, which are drafts, and which haven't been touched. Here's how the manager addresses this:
1. Heading-level splitting with tunable granularity
The tool splits the source document along ## (or ###, controlled by --level) heading boundaries into independent sub-documents. The default --level 2 produces 134 individual files; if shorter sections like single-line enums don't warrant standalone files, --level 3 cuts only at ### and deeper, yielding 119 files with those brief ## blocks kept within their parent. Split files are auto-sorted into directories — animation/, timeline/, attachments/, and so on — based on class naming rules.
2. UID markers — rename, move, reorganize; merge still works
During splitting, the tool appends an HTML comment marker to the end of every sub-document:
<!-- spine-doc:uid=524b0fcd name=Animation status=untranslated -->
The uid is a full 32-character MD5 hex digest of the content — deterministic, so identical content always produces the identical UID. At merge time, the tool scans all .md files for these markers to locate fragments, rather than relying on file paths recorded in manifest.json. This is the mechanism's key value: fragment location is fully decoupled from filesystem path. Translators can rename files, move them across directories, or reorganize the folder structure, and the merge step will still find every fragment.
3. Translators change exactly one field
The only action required after translating a sub-document is editing the marker line: change status=untranslated to status=translated. No CLI invocation, no manifest editing — just type it in the editor. During merge, the tool automatically scans all status fields, warns about any remaining untranslated sections, and strips every UID marker line so the final output is pristine Markdown, identical in structure to the source.
The full round-trip:
Source Markdown → split (insert UID markers) → translate & update status →
merge (locate by UID, strip markers) → output Markdown
The merged result is Markdown all the way through, directly diff-able against the source. The entire toolchain uses only the Python standard library — zero third-party dependencies. A companion doc_fetcher.py script handles pulling the latest API Reference from the official site, with MD5 comparison to detect content changes before overwriting.
I'd like to proceed with the API reference translation using the document manager tool I mentioned. The idea is to work through it chapter by chapter — splitting off one module at a time (e.g., animation/, timeline/, then attachments/, etc.), translating each batch of sub-documents, and submitting them as they're ready rather than waiting for the entire document to be done.
The tool's UID tracking means you can rename or reorganize files on your end without breaking the merge, so I think it should be fairly flexible. But I wanted to check: does submitting in chunks like this work well with your workflow? Any preference on batch size, review cadence, or how you'd like to receive them? Keen to hear your thoughts before I dive in.