@thalesdisciple @luna@mathstodon.xyz It would be cool, I don't know what he means by advanced chemistry however.
Since this is quite a bold claim I decided to test it and see if chatGPT was able to generate a valid molecular structure that was not present in its training set and if it was able to produce a credible synthetical pathway to produce it.
I had to go through a prompt injection to make it work as it would otherwise tell me that he's not able to do original research, thanks to https://scribe.nixnet.services/seeds-for-the-future/tricking-chatgpt-do-anything-now-prompt-injection-a0f65c307f6b?sk=9a87ef2e08fa6777d92afe0fd025af8e I was able to retrieve a simple way to do that, changing it to apply to chemistry.
I asked chatGPT to generate a substituted coumarin that had never been published before, to provide a synthetic pathway to it and to provide a SMILES representation of each of the compounds involved in the reactions.
He produces as an output molecule 7-methoxy-4-phenyl-6-(2,2,2-trifluoroethoxy)coumarin, this is a chemically valid molecule which is not present on chemical databases; thus I'm pretty confident it was not present in its training set. It appears that chatGPT is somehow able to produce novel valid molecules when representing them in IUPAC form.
The synthetic pathway is definitely not correct; it uses reactions that are in fact used to synthesize coumarines but the reagents it lists would not work. It thus fails to provide a valid synthetic pathway to this molecule (don't get me wrong, that would be quite a complicated task even for a trained chemist as far as I know substitutions of 4 in coumarines are not the easiest thing to achieve).
The SMILES strings are the funniest, it does get the first ones correct for the basic reagents, those are definitely in the training set. Even though it puts this --> which I'm not really sure what should stand for.
However, the SMILES for the products are just plain invalid. In the final product it leaves there the silicon atom which he apparently forgot to remove after the TBAF deprotection.
All in all, I'm quite surprised. It did manage to produce a valid novel molecule. That's cool, something you can definitely talk about at the bar with your friends when you run out of other things.
However, there are much better methodologies to do this kinds of things, I would not rely at all on such a model to generate new molecules and definitely would not use it to predict synthetical procedures.