These are public posts tagged with #pdftohtml. You can interact with them if you have an account anywhere in the fediverse.
Just published the preliminary tool #pdf4anki on #codeberg
https://codeberg.org/barefootstache/pdf4anki
It mainly describes how to do it and is a semi-automation tool to get PDFs into #anki.
In the current version one will still need to modify the pattern constant in the clean-html.js file to align with the PDF in use.
Export data from PDF and import for Anki.
Codeberg.orgAfter struggling to get #python #PyMuPDF to work and being close the deadline, I shifted to using a combination of other commands.
First using the #linux #pdftohtml command, which is so much faster than PyMuPDF and packages the result similar to saving a website.
Next with #NeoVim and #RegEx format the #HTML file to be able to be quickly processed with #NodeJs #cheerio and eventually through #json to be saved in #sqlite.
Is it elegant and automatic? No, though it works!