Here's one for my #cheminformatics buddies. Has anyone looked into applying Data Format Description Language to solve the problems of ever multiplying chemical file formats?
I'm just trying to learn about DFDL but it seems that it could be a potential game changer.

Follow

@deadlyvices I feel like this would fail to describe data formats with non fixed information content. For example the mmCIF format for crystallographic structures of macro molecules contains a lot of fields that can be optional and can be placed in different positions providing different information.

Anyways, I don't think this would solve the issue, it would just let us describe the specific properties of each file format in a standardized way.

I don't think the problem of infinite chemical file formats will be solved any time soon, each one is based on a different way to look at chemical entities and often on different theories.
It would be nice to have a standard file format for small organic molecules containing 3D information, it would also be great if it wasn't based (at least not completely) on the Valence Bond theory.

@rastinza I'm not trying to solve the *fundamental* issue of multiplying file formats. I'm just trying to find the easiest way to make sense of what we do have. Being able at least to standardise on an information model, and having automatic two-way translation between that and all the file formats would be a huge step forward.

Possible the best stab so far at the small molecule problem is Chemical Markup Language (CML). We use it as our primary format in Chem4Word.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.