@fedisearch Do you honor the robots.txt file and other means for a server or individual to opt-out of indexing
The crawler honors robots.txt. However, due to how federation works, even if an instance A has a robots.txt that blocks all crawler access, it is still possible that the content from A appear on the federated timeline of other instances. It is for this reason that we also check for instance-domain-name.tld/@username for noindex meta tag.
Hence, using noindex metatag is the only reliable way to opt out of indexing. [1]
Mastodon has this meta tag option built-in and instance admins are able to override this option for every user on their site.