I'm currently working on a OCR/HTR "editor-in-the-loop" browser tool.
It has rule-based and LLM-based validation recommendations. You can load Page XML, IIIF and images into it, and use Gemini 3 Flash (or whatever you want to use) for transcribing (or your local DeepSeek OCR 2 via Ollama), before exporting it in different formats. HTR will be getting more tricky. But for OCR the DeepSeek OCR 2 is very good.
@chpollin interested to know what tool you’re using. Is it possible to share a link to information about it?
@chpollin found it, for other curious eyes. 👀 https://github.com/DigitalHumanitiesCraft/co-ocr-htr
@ingridbmason Yes, that is the repository. It is a work in progress and is built using Claude Code. I am using a specific way of representing context information in an Obsidian-like structure (= knowledge folder). :)
@chpollin @ingridbmason "integrating domain experts". I'd suggest reconsidering this formulation to better reflect what roles you would like to give to people and agents. It might not sound like a big thing, but human-in-the-loop is kinda the opposite of computer-assisted.
@mapto @ingridbmason Thank you very much! Yes, absolutely you're right and this is also what I was thinking about. Putting the editor/expert in the center. It's just me alone working on this from home, so I very much appreciate such feedback. I will adapt this! :)
@chpollin @ingridbmason I have found the persuasive technology triad to be quite relevant: https://en.wikipedia.org/wiki/Persuasive_technology#Functional_triad . With some wishful thinking this could be translated to computer-aided/-assisted (as in CAD, CAM, CALL, etc), computer-supported (as in CSCW/CSCL) and computer-generated as in GenAI.
@mapto @ingridbmason thank you very much! :)
i like it!