@jverhell @janhjensen This is quite interesting! I have a few questions if you don't mind (and please pardon the American spelling conventions lol):
1. How did you deal with the possibility of the graph-isomorphism problem appearing during your memoization step? i.e. it seems that you're storing the result of the objective function(s) on a specific molecular graph, but is it possible for your genetic algorithm to accidentally generate a graph isomorphic but not identical to a previous input, and if so, is there a handy way to detect that with this graph based data format?
2. In your algorithm efficiency comparison, is the GB-GA algorithm also parallelized like your GB-EPI? Of course, I can see the number of evaluations and success ratio is still better than the alternative, but since you listed it as a metric, I would like to know it's an apples to apples comparison. The reference you cited (50) didn't seem to mention parallelization in the article.
3. I liked your conclusion section a lot, particularly the nod to active learning endeavors. Personally, I suspect much of small-molecule drug research is limited by non-native protein conformation data collected from crystallography data (in conjunction with the limitations of current solvent force-fields), rather than native conformations apparent in CryoEM data. Do you think such an active learning approach could help remediate this issue (if it is, in fact, an issue)?
My hunch for 3 is that it could help, but may not generalize across different classes of target proteins, but it's been years since I've looked at this, so who knows.
@johnabs @jverhell
1. You can check graph isomorphism by converting to canonical SMILES
2. GB-GA is parallelized