Now that I have the opportunity, let me #introduce our quality-diversity #compchem algorithm to Mastadon. It's based on @janhjensen 's generative algorithm. We avoid stagnation issues, and illuminates opportunities across chemical space, and outperforms #deeplearning approaches!
Code: https://github.com/Jonas-Verhellen/Argenomic
Paper: https://pubs.rsc.org/en/content/articlelanding/2020/sc/d0sc03544k
@bonifartius @janhjensen Yes, the fitness evaluation and description calculations are parallelised.
@jverhell nice! i think there is much to gain with specialized parallel algorithms instead of using parallelism for unspecific deep learning models.
@janhjensen
@jverhell @janhjensen This is quite interesting! I have a few questions if you don't mind (and please pardon the American spelling conventions lol):
1. How did you deal with the possibility of the graph-isomorphism problem appearing during your memoization step? i.e. it seems that you're storing the result of the objective function(s) on a specific molecular graph, but is it possible for your genetic algorithm to accidentally generate a graph isomorphic but not identical to a previous input, and if so, is there a handy way to detect that with this graph based data format?
2. In your algorithm efficiency comparison, is the GB-GA algorithm also parallelized like your GB-EPI? Of course, I can see the number of evaluations and success ratio is still better than the alternative, but since you listed it as a metric, I would like to know it's an apples to apples comparison. The reference you cited (50) didn't seem to mention parallelization in the article.
3. I liked your conclusion section a lot, particularly the nod to active learning endeavors. Personally, I suspect much of small-molecule drug research is limited by non-native protein conformation data collected from crystallography data (in conjunction with the limitations of current solvent force-fields), rather than native conformations apparent in CryoEM data. Do you think such an active learning approach could help remediate this issue (if it is, in fact, an issue)?
My hunch for 3 is that it could help, but may not generalize across different classes of target proteins, but it's been years since I've looked at this, so who knows.
@jverhell really nice, _especially_ because it isn't just a statistical model nobody understands (aka. deep learning) but a sound stochastic algorithm. is the implementation parallelized like described in the paper?
@janhjensen