Follow

The general process is as follows:

1. Unpack the epub file, clean up any remnants of the Kobo crack, and change the layout from Japanese vertical to a standard left-to-right horizontal format.

2. Scan for the 'p' tag, generate a unique XPath based on the tag, and then extract the HTML and store it in a JSON array.

3. Feed the HTML to the LLM to identify entities by chapter, also generate chapter summaries.

4. Manually review the entity list to determine the names of key characters and items, ensuring consistency in key character names.

5. Translating sentence by sentence based on the above information.

Afterwards, I'm planning to implement automated QA using the LLM to check for translations that do not conform to the XHTML format, and to find mistranslations, omissions, fabrications, and errors in terminology and entity naming.

This way, I don't need to read it from beginning to end; I only need to correct the key errors. Finally, restore the XHTML file based on the XPath, repackage the epub file, and the translation is complete.

By the way, this is a translation of the novel of Chou KaguyaHime.

天空Blond  
大概的流程是: 1. 解包epub,清理kobo破解的残留,从日文竖版排版调成普通的左到右横版 2. 扫描标签p,根据标签生成唯一的xpath,然后将html抓出来存到一个json数组里 3. 按照章节丢给llm识别其中的实体,同时生成章节摘要 4. 人工过一次实体列表,确定关键人物和物品的名...
Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.