Follow

@macinchem nice idea, I see how this solves the bit collision problem, but I'm not sure such methodology is not introducing other significant limitations.
If we are clustering a large set of molecules with very different ring systems, will this work or will small clusters be discarded as the ring system is not considered significant?
If I have a large set of molecules annotated with their fingerprints and I need to add new ones to the set, can I just compute the fingerprint for the new ones or do I have to recompute all the fingerprints?
If I'm performing predictions, am I able to evaluate the applicability domain or will everything be considered applicable because only accepted substructures have a weight?

I would have liked some specific datasets to be used to evaluate these properties in real life scenarios.
After all the largest dataset they used has less than 10,000 molecules; I do not believe this covers the real applications of ecfp outside of qsar. And even then; when you have molecular descriptors readily available and efficient feature selection algorithms I'm not too sure ecfp is a very good representation for qsar tasks.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.