The general process is as follows:
1. Unpack the epub file, clean up any remnants of the Kobo crack, and change the layout from Japanese vertical to a standard left-to-right horizontal format.
2. Scan for the 'p' tag, generate a unique XPath based on the tag, and then extract the HTML and store it in a JSON array.
3. Feed the HTML to the LLM to identify entities by chapter, also generate chapter summaries.
4. Manually review the entity list to determine the names of key characters and items, ensuring consistency in key character names.
5. Translating sentence by sentence based on the above information.
Afterwards, I'm planning to implement automated QA using the LLM to check for translations that do not conform to the XHTML format, and to find mistranslations, omissions, fabrications, and errors in terminology and entity naming.
This way, I don't need to read it from beginning to end; I only need to correct the key errors. Finally, restore the XHTML file based on the XPath, repackage the epub file, and the translation is complete.
By the way, this is a translation of the novel of Chou KaguyaHime.