I'm currently working on a OCR/HTR "editor-in-the-loop" browser tool.

It has rule-based and LLM-based validation recommendations. You can load Page XML, IIIF and images into it, and use Gemini 3 Flash (or whatever you want to use) for transcribing (or your local DeepSeek OCR 2 via Ollama), before exporting it in different formats. HTR will be getting more tricky. But for OCR the DeepSeek OCR 2 is very good.

@chpollin interested to know what tool you’re using. Is it possible to share a link to information about it?

@ingridbmason Yes, that is the repository. It is a work in progress and is built using Claude Code. I am using a specific way of representing context information in an Obsidian-like structure (= knowledge folder). :)

Follow

@chpollin @ingridbmason "integrating domain experts". I'd suggest reconsidering this formulation to better reflect what roles you would like to give to people and agents. It might not sound like a big thing, but human-in-the-loop is kinda the opposite of computer-assisted.

@mapto @ingridbmason Thank you very much! Yes, absolutely you're right and this is also what I was thinking about. Putting the editor/expert in the center. It's just me alone working on this from home, so I very much appreciate such feedback. I will adapt this! :)

@mapto @ingridbmason Edit: Maybe "computer-assisted" is too broad as well? Do you have any resources on framing this better? I find naming and framing these human/AI relationships an important topic. :)

@chpollin @ingridbmason I have found the persuasive technology triad to be quite relevant: en.wikipedia.org/wiki/Persuasi . With some wishful thinking this could be translated to computer-aided/-assisted (as in CAD, CAM, CALL, etc), computer-supported (as in CSCW/CSCL) and computer-generated as in GenAI.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.