I'm currently working on a OCR/HTR "editor-in-the-loop" browser tool.
It has rule-based and LLM-based validation recommendations. You can load Page XML, IIIF and images into it, and use Gemini 3 Flash (or whatever you want to use) for transcribing (or your local DeepSeek OCR 2 via Ollama), before exporting it in different formats. HTR will be getting more tricky. But for OCR the DeepSeek OCR 2 is very good.
@chpollin interested to know what tool you’re using. Is it possible to share a link to information about it?
@chpollin found it, for other curious eyes. 👀 https://github.com/DigitalHumanitiesCraft/co-ocr-htr
@chpollin @ingridbmason "integrating domain experts". I'd suggest reconsidering this formulation to better reflect what roles you would like to give to people and agents. It might not sound like a big thing, but human-in-the-loop is kinda the opposite of computer-assisted.
@mapto @ingridbmason Edit: Maybe "computer-assisted" is too broad as well? Do you have any resources on framing this better? I find naming and framing these human/AI relationships an important topic. :)
@chpollin @ingridbmason I have found the persuasive technology triad to be quite relevant: https://en.wikipedia.org/wiki/Persuasive_technology#Functional_triad . With some wishful thinking this could be translated to computer-aided/-assisted (as in CAD, CAM, CALL, etc), computer-supported (as in CSCW/CSCL) and computer-generated as in GenAI.
@mapto @ingridbmason thank you very much! :)
i like it!
@mapto @ingridbmason Thank you very much! Yes, absolutely you're right and this is also what I was thinking about. Putting the editor/expert in the center. It's just me alone working on this from home, so I very much appreciate such feedback. I will adapt this! :)