@fedisearch Do you honor the robots.txt file and other means for a server or individual to opt-out of indexing
@fedisearch Yea I get that (I read your about). You might want to make it clear in your about that you honor robot.txt. The text isnt explicit about that and actually makes it sound like maybe you ignore it entierly.
@freemo good point
@freemo
The crawler honors robots.txt. However, due to how federation works, even if an instance A has a robots.txt that blocks all crawler access, it is still possible that the content from A appear on the federated timeline of other instances. It is for this reason that we also check for instance-domain-name.tld/@username for noindex meta tag.
Hence, using noindex metatag is the only reliable way to opt out of indexing. [1]
Mastodon has this meta tag option built-in and instance admins are able to override this option for every user on their site.
[1]: https://fedisearch.com/about