Conformational analysis from crystallographic data using conceptual clustering

1996 ◽  
Vol 52 (3) ◽  
pp. 535-549 ◽  
Author(s):  
D. Conklin ◽  
S. Fortier ◽  
J. I. Glasgow ◽  
F. H. Allen

The rapid growth of crystallographic databases has created a demand for novel and efficient techniques for the analysis of molecular conformations, in order to derive new concepts and rules and to generate useful classifications of the available data. This paper presents a conceptual clustering approach, termed IMEM (image memory), which discovers the conformational diversity present in a dataset of crystal structures. In contrast to numerical clustering methods, IMEM views a molecular structure as comprising qualitative relationships among its parts, i.e. the structure is viewed as a molecular scene. In addition, IMEM does not require the user to have any a priori knowledge of an expected number of conformational classes within a given dataset. The IMEM approach is applied to several datasets derived from the Cambridge Structural Database and, in all cases, chemically correct and sensible conformational classifications were discovered. This is confirmed by a rigorous comparison of IMEM results with published conformational data obtained by energy-minimization and numerical clustering methods. Conformational analysis tools have an important part to play in the conversion of raw molecular databases to knowledge bases.

2019 ◽  
pp. 257-261
Author(s):  
Vladimir Laryukhin ◽  
Petr Skobelev ◽  
Oleg Lakhin ◽  
Sergey Grachev ◽  
Vladimir Yalovenko ◽  
...  

The paper presents the multi-agent approach for developing cyber-physical system for managing precise farms with digital twins of plants. It discusses complexity of the problem caused by a priori incompleteness of knowledge about factors of plant growth and development, high uncertainty of crops cultivation, variety of weather, business and technical requirements, etc. The approach proposes knowledge bases and multi-agent technology in combination with machine learning methods for designing considered systems. Digital twin of plant is specified as an agent based on ontology model of objects relevant for plant cultivation (specific sort of plant, soil, etc) associated with history of operations and environment conditions. The architecture and functions of system components are designed. The expected results of system implementation and the benefits for farmers are discussed.


2004 ◽  
Vol 01 (04) ◽  
pp. 647-680 ◽  
Author(s):  
STERGIOS PAPADIMITRIOU ◽  
SPIRIDON D. LIKOTHANASSIS

Self-Organized Maps (SOMs) are a popular approach for analyzing genome-wide expression data. However, most SOM based approaches ignore prior knowledge about functional gene categories. Also, Self Organized Map (SOM) based approaches usually develop topographic maps with disjoint and uniform activation regions that correspond to a hard clustering of the patterns at their nodes. We present a novel Self-Organizing map, the Kernel Supervised Dynamic Grid Self-Organized Map (KSDG-SOM). This model adapts its parameters in a kernel space. Gaussian kernels are used and their mean and variance components are adapted in order to optimize the fitness to the input density. The KSDG-SOM also grows dynamically up to a size defined with statistical criteria. It is capable of incorporating a priori information for the known functional characteristics of genes. This information forms a supervised bias at the cluster formation and the model owns the potentiality of revising incorrect functional labels. The new method overcomes the main drawbacks of most of the existing clustering methods that lack a mechanism for dynamical extension on the basis of a balance between unsupervised and supervised drives.


Author(s):  
Min Joong Jeong ◽  
Brian H. Dennis ◽  
Shinobu Yoshimura

Data clustering methods can be a useful tool for engineering design that is based on numerical optimization. The clustering method is an effective way of producing representative designs, or clusters, from a large set of potential designs. These methods have recently been applied to the clustering of Pareto-optimal solutions from multi-objective optimization. The results presented here focus on the application of clustering to single objective optimization results. In the case of single objective optimization, the method is used to determine the clusters in a set of quasi-optimal feasible solutions generated by an optimizer. A data clustering procedure based on an evolutionary method is briefly described. The number of clusters is determined automatically and need not be known a priori. The method is demonstrated by application to the results of a turbine blade coolant passage shape optimization problem. The solutions are transformed to a lower-dimensional space for better understanding of their variance and character. Engineering information, such as the shapes and locations of the internal passages, is supported by the visualization of clustered solutions. The clustering, transformation, and visualization methods presented in this study might be applicable to the increasing interpretation demands of design optimization.


1967 ◽  
Vol 20 (11) ◽  
pp. 2395 ◽  
Author(s):  
JR Gollogly ◽  
CJ Hawkins

The stereospecificity of the ligand, R-N,N,N?,N?-tetrakis(2?- aminoethyl)-1,2-diaminopropane, when coordinated as a sexadentate chelate to cobalt(III), has been investigated by an a priori calculation of the conformational energy difference between the various possible absolute configurations of the complex. It has been shown that the L isomer is more stable than the D isomer by an extremely large energy difference which is due mainly to van der Waals interactions. Some of the terms which contribute to conformational energy differences between metal complexes have not been considered previously.


2019 ◽  
Vol 26 (3) ◽  
pp. 293-318 ◽  
Author(s):  
R. Silveira ◽  
V. Furtado ◽  
V. Pinheiro

AbstractExtraction keyphrase systems traditionally use classification algorithms and do not consider the fact that part of the keyphrases may not be found in the text, reducing the accuracy of such algorithms a priori. In this work, we propose to improve the accuracy of these systems with inferential mechanisms that use a knowledge representation model, including symbolic models of knowledge bases and distributional semantics, to expand the set of keyphrase candidates to be submitted to the classification algorithm with terms that are not in the text (not-in-text terms). The basic assumption we have is that not-in-text terms have a semantic relationship with terms that are in the text. To represent this relationship, we have defined two new features to be represented as input to the classification algorithms. The first feature refers to the power of discrimination of the inferred not-in-text terms. The intuition behind this is that good candidates for a keyphrase are those that are deduced from various textual terms in a specific document and that are not often deduced in other documents. The other feature represents the descriptive strength of a not-in-text candidate. We argue that not-in-text keyphrases must have a strong semantic relationship with the text and that the power of this semantic relationship can be measured in a similar way as popular metrics like TFxIDF. The method proposed in this work was compared with state-of-the-art systems using five corpora and the results show that it has significantly improved automatic keyphrase extraction, dealing with the limitation of extracting keyphrases absent from the text.


1994 ◽  
Vol 266 (5) ◽  
pp. R1697-R1704 ◽  
Author(s):  
R. Benigni ◽  
A. Giuliani

Even though elegant examples of mathematical modeling of biological problems exist, such approaches still remain outside the domain of most biologists. It is proposed that, for a wider and more systematic use of mathematical models in biology, the soft modeling approaches, which are applicable to phenomena with a limited level of definition, should be investigated and preferred. In particular, multivariate data analysis (MDA) is indicated as an important tool toward fulfilling this goal. This paper reviews the general principles of MDA and examines in detail principal component analysis and cluster analysis, which are two of the most important MDA techniques. A number of applications to real biological problems are presented. These examples show how the construction of classifications corresponds to the generation of new knowledge and new concepts, which are hierarchically on a higher level than the initial information. This new form of knowledge is obtained without superimposing a priori theories on the data. It is demonstrated how the MDA can lead to the identification of biological systems; also shown is their ability to describe multiple scale phenomena, a typical feature of biological systems. Moreover, the multivariate analyses provide new descriptors for a given biological system; these descriptors are quantitative, thus allowing the system to be described in a "metric space," where it then becomes possible to use any other mathematical tool.


Author(s):  
S. S. Patra ◽  
S. R. Dash

Cluster analysis is the term applied to a group of analyses that seek to divide a set of objects into a number of homogeneous groups or clusters, when there no a priori information about the group structure of the data. Clustering is an active research topic in data mining and different methods have been proposed in the literature. Most of these methods are based on the use of a distance measure defined either on numerical attributes or on categorical attributes. There are three basic categories of clustering methods: partitional methods, hierarchical methods and density-based methods. This paper proposes an iterative algorithm for partitional clustering.


Sign in / Sign up

Export Citation Format

Share Document