@hansw if someone replies to our post, your original post is not indexed, only their reply is indexed. If you reply to their post, because you as the author chose not to be indexed, it will not be indexed.
Here's the background issue on scalability, for your interest.
Let's say someone who has been indexed for a long time decided that they no longer wants to be indexed. We will have to search for their presence in all the posts that they are mentioned in. We can't do this at scale without an index, as otherwise it means scanning through millions of rows. We can't really create indices for mentions either, as this index's only use case is to help speed up unindexing of user's posts in mentions.
@hansw @fedisearch_com we are committed to work on that, but there isn't a solution that is scalable to the size of our data. Commiting to an arbitrary date of your choice seems unrealistic. However, if you can DM me all the account uri that you own, I can do a one-time search and delete for you,hope this addresses the immediate cocnern.
We are launching https://fedisearch.com , a website for doing full text search on fediverse content.
This could be very useful for small instances users that are missing full text searches, just about anyone else who is looking for a blazing fast fediverse search engine.
We have a well behaved indexer that honors users' preference on opting out status indexing.
Please give it a try and let us know what you think.
True true. Adding some insights on behind the scenes, our crawler only visits mastodon, pleroma and misskey instances, because it only knows how to parse content from those instances. Some hubzilla posts get indexed because they appeared on the federated timelines of those three type of instances.
Yup this is indeed the way we are aiming for. However, still facing some technical difficulties with the performance when applying this method on mega threads where many people are being mentioned.
Just found at another instance -- and think this is a really important development.
Fedisearch.com allows Full Text Searches on fedinet posted content -- from Mastodon, Pleroma, Misskey instances.
In Mastodon instances, Full Text Search is VERY RARE in my experience. It's available at qoto.org, and one reason I have recommended people to keep an account there if their main account does not provide it.
Even Elite instances have this option turned OFF (to save processor cycles and memory). Silly, because they could support that easily.
Now we have an external search Engine and it works. Tested briefly and found results I knew would be there, shown as expected.
Kudos for a much needed service and thank you to @fedisearch !
I get your point indeed, and we take users' privacy very very seriously. I think the main problem here is that Pixelfed doesn't support the semantic to allow users to opt out from search indexing. In the case of mastodon, we support this feature.
The same set of settings that blocks google from indexing you would also block fedisearch from doing so as well. We have a insignificant amount of traffic compared to google... so if being indexed is the main worry, we feel we shouldn't be the main antagonist 😅
@hansw I understand the concern, and personally hate spammers a lot too.
We try to strike for a balance between out of the box utility and respects for people's desire to stay out of the spotlight, and it indeed has been hard.
If we search pixelfed + a username on Google, we'd find many results. It wouldn't be fair for Google to grab all the search results without opt in while a smaller niche focused projects has to knock on doors right?
Not saying that defaulting to opt-out is right, but just want some extra leniency on our way to do something good to this community.
@namark I don't see mastodon and other community publically deny access to the network from close sourced software. Would you want to substantiate?
@hansw I think this is very valid concern. Sorry we let this one slip through. We will resolve this.
Hi I'm Justin, admin of fedisearch.com,
Fedisearch is a search engine for fediverse (mastodon, miskey and pleroma) content. Fedisearch respects privacy and robot no indexing directives.
QOTO: Question Others to Teach Ourselves. A STEM-oriented instance.
An inclusive free speech instance.
All cultures and opinions welcome.
Explicit hate speech and harassment strictly forbidden.
We federate with all servers: we don't block any servers.