<p>Recognizing
substructures and their relations embedded in a molecular structure
representation is a key process for <a></a><a>structure-activity</a>
or structure-property relationship (SAR/SPR) studies. A molecular structure can
be either explicitly represented as a connection table (CT) or linear notation,
such as SMILES, which is a language describing the connectivity of atoms in the
molecular structure. Conventional SAR/SPR approaches rely on partitioning the
CT into a set of predefined substructures as structural descriptors. In this
work, we propose a new method to identifying SAR/SPR through linear notation
(for example, SMILES) syntax analysis with self-attention mechanism, an
interpretable deep learning architecture. The method has been evaluated by
predicting chemical property, toxicology, and bioactivity
from experimental data sets. Our results demonstrate that the method yields superior performance
comparing with state-of-art methods. Moreover, the method can produce
chemically interpretable results, which can be used for
a chemist to design, and synthesize the activity/property improved compounds.</p>