**天空вℓσи∂** @skyblond@qoto.org · 2026-03-07T17:43:07Z

天空вℓσи∂ @skyblond@qoto.org

The general process is as follows:

1. Unpack the epub file, clean up any remnants of the Kobo crack, and change the layout from Japanese vertical to a standard left-to-right horizontal format.

2. Scan for the 'p' tag, generate a unique XPath based on the tag, and then extract the HTML and store it in a JSON array.

3. Feed the HTML to the LLM to identify entities by chapter, also generate chapter summaries.

4. Manually review the entity list to determine the names of key characters and items, ensuring consistency in key character names.

5. Translating sentence by sentence based on the above information.

Afterwards, I'm planning to implement automated QA using the LLM to check for translations that do not conform to the XHTML format, and to find mistranslations, omissions, fabrications, and errors in terminology and entity naming.

This way, I don't need to read it from beginning to end; I only need to correct the key errors. Finally, restore the XHTML file based on the XPath, repackage the epub file, and the translation is complete.

By the way, this is a translation of the novel of Chou KaguyaHime.

**天空Blond** @skyblond@m.skyblond.info

天空Blond @skyblond@m.skyblond.info

大概的流程是： 1. 解包epub，清理kobo破解的残留，从日文竖版排版调成普通的左到右横版 2. 扫描标签p，根据标签生成唯一的xpath，然后将html抓出来存到一个json数组里 3. 按照章节丢给llm识别其中的实体，同时生成章节摘要 4. 人工过一次实体列表，确定关键人物和物品的名...

Mar 07, 2026, 17:43 · · · ·

Resources

Developers

What is Mastodon?

qoto.org

More…