Universal Chemical Markup (UCM) - A new format for common chemical data
Background: We wish to introduce a new chemical format called UCM (Universal Chemical Markup). The format is based on XML (Extensible Markup Language) and its first version focuses on recording chemical structures and their properties. Results: UCM currently supports structures containing isotopes, ions and various types of bonding including delocalized bonds. Properties can be expressed by combining UCM with UnitsML (Units Markup Language). Using UnitsML one defines quantities with scientific units, and then refers to them in UCM when recording property values. Users can also add literature references with BibTeXML (BibTeX Markup Language) and annotate the recorded data using plain text or XHTML (Extensible Hypertext Markup Language) descriptions. In contrast to presently available general-purpose chemical formats, UCM offers built-in validation, which combines both grammar and pattern-based XML schema languages. Thus, all recorded data can be precisely validated by UCM schemas in standard XML validators. Conclusions: We developed the structure for UCM from scratch on the basis of an analysis described in our previous article. Starting from scratch allowed us to integrate BibTeXML, UnitsML and XHTML as well as chemical line notations and identifiers into UCM. It also helped us to avoid unnecessary redundant parts and create the implementation that aims to minimize ambiguity and is designed to be easily extensible in the future.