barefootstache

While working on #pdf4anki integrated the feature of automatically adding opinionated #CSS with #cheerio to the initial #HTML file. This had the unknown side effect of destroying all the self-closing HTML tags, thus breaking the pattern matching.

The fun of taking apart the code to #debug this issue.

barefootstache

As neat as #jquery or #cheerio is, I miss the abilities of #VanillaJavaScript in the browser.

I don't remember how many times I tried to grab certain properties, which would have been available in the browser, but don't exist in cheerio.

And it is a bit annoying to constantly put various html elements into the cheerio wrapper class to get access to the various functionalities it offers. Thus instead grabbed the minimal viable data and just worked further with arrays.

#javascript

barefootstache

After struggling to get #python #PyMuPDF to work and being close the deadline, I shifted to using a combination of other commands.

First using the #linux #pdftohtml command, which is so much faster than PyMuPDF and packages the result similar to saving a website.

Next with #NeoVim and #RegEx format the #HTML file to be able to be quickly processed with #NodeJs #cheerio and eventually through #json to be saved in #sqlite.

Is it elegant and automatic? No, though it works!

#JavaScript

sodslawyer

Look, if you click *follow* on my account and then just ignore a good-faith attempt at engagement on my part, then clearly we've got different expectations of social media #Cheerio

EikeZ

#NeuHier
ich freue mich auf Austausch zu vielen sozial-politischen Themen ; insbesondere aber auch im Bereich #ökologischeBaustoffe #Cradle2Cradle (nicht dogmatisch .. nur vom Ansatz her) klima-schonene heiz- und klimatechnik und allgemein Architektur und Haus-Ausstattung mit #HighLowTech

auch energie-speicher-systeme bzw. -versorgung ist für mich interessant.

#Cheerio

Jeremia Kimelman

And here's the list!

General use tools:

1. #d3
2. #lodash

Web scraping tools

1. #p-queue
2. #cheerio
3. #puppeteer

Geospatial tools

1. #mapshaper
2. #turfjs
3. #qgis

Website tools

1. #sveltekit

Tools that are also companies

1. @observablehq
2. #Github actions
3. #netlify

Tykayn

Exercice de #dataScraping avec #nodejs et #cheerio hier, voici un fichier json présentant les infos publiées par les #CHATONS sur le site www.chatons.org
si vous voulez faire des stats ou autre réutilisation o/
vous pouvez aussi régénérer le fichier quand vous voulez, la doc est fournie.

forge.chapril.org/tykayn/frama

Bienvenue sur le site du collectif CHATONS | CHATONS

www.chatons.org
Tykayn

après avoir fait du #cheerio avec #nodejs pour faire un csv des journées mondiales je suis officiellement data scientiste.
toi aussi conquiere le monde en deux deux.