Very nice work!
I see the engine is a single #python script that connects to a #PostgreSQL db.
Both are cool technologies but require quite a bit of technical expertise to be self-hosted.
Over the years I've seen that projects based on less cool technologies (php, cgi-bin, sqlite...) enable both #selfhosting and #SAFEhosting, that is using cheap, local, non #BigTech hosting providers.
It's something I realized reading free software based on the #permacomputing values.
Not really a suggestion or a feature request (maybe a note to myself, for a fork when I'll have more free time), but something I think you might consider.
Another one: it would be cool to enable a sort of federation among the instances, either by simply proxying the trusted instances (and excluding the duplicated urls) on user's search, or by enabling trusted #fediverse users to add websites to be crawled.
"Trust" here is a key concept: federation should be optional and disabled by default.
Anyway: good luck and good work!
And thanks for using a network #copyleft!
(I prefer the #HackingLicense over #AGPLv3 in the age of #GitHubCopilot/CopyALot, but at least AGPL protects the work you donated to the world from direct privatization...)
Long post
@Shamar @selea
Thanks for the thoughts, they're good ones!
Postgres I mostly picked because it's the database I've heard the highest praise of from everyone I've talked to who works daily with databases. Myself, I had something of a database phobia (😬) before starting work on this project so I was entirely working off other people's recommendations in that matter.
As for Python, it was mostly for access to a really great natural language processing library, Polyglot, which works well cross-language. Most of the heavy work is done by a SQL query, so it likely wouldn't be hard to reimplement in another backend language if someone so desired.
And for hosting apart from big tech, I'm currently running the live version on Uberspace which I highly recommend and which I expect will continue being able to host something like clew for the forseeable future (storage space perhaps being the main tricky bit).