scholarly journals Cocrystals in the Cambridge Structural Database: a network approach

Author(s):  
Jan-Joris Devogelaer ◽  
Hugo Meekes ◽  
Elias Vlieg ◽  
René de Gelder

To obtain a better understanding of which coformers to combine for the successful formation of a cocrystal, techniques from data mining and network science are used to analyze the data contained in the Cambridge Structural Database (CSD). A network of coformers is constructed based on cocrystal entries present in the CSD and its properties are analyzed. From this network, clusters of coformers with a similar tendency to form cocrystals are extracted. The popularity of the coformers in the CSD is unevenly distributed: a small group of coformers is responsible for most of the cocrystals, hence resulting in an inherently biased data set. The coformers in the network are found to behave primarily in a bipartite manner, demonstrating the importance of combining complementary coformers for successful cocrystallization. Based on our analysis, it is demonstrated that the CSD coformer network is a promising source of information for knowledge-based cocrystal prediction.

CrystEngComm ◽  
2020 ◽  
Vol 22 (43) ◽  
pp. 7290-7297 ◽  
Author(s):  
Jen E. Werner ◽  
Jennifer A. Swift

A search method based on SMILES string matching was developed to identify hydrate–anhydrate structure pairs in the Cambridge Structure Database.


2018 ◽  
Vol 58 (3) ◽  
pp. 615-629 ◽  
Author(s):  
Jason C. Cole ◽  
Oliver Korb ◽  
Patrick McCabe ◽  
Murray G. Read ◽  
Robin Taylor

2019 ◽  
Vol 8 (S2) ◽  
pp. 83-87
Author(s):  
S. Peerbasha ◽  
M. Mohamed Surputheen

The development of many educational institutions is based on the performance of students learning and understanding capabilities. Here, we analyzed their academic profile with their grades and various cumulative attributes. The academic performance in learning their subjects could be improved by motivational approach. The analysis of student performance is carried out through knowledge-based data mining process. But, the problem is arrived by a probability of information prediction accuracy from student data set which is not accurate. Here, we propose a novel machine learning algorithm based on subspace clustering and multi-perspective classification techniques to identify psychological motivation required students. Also, the extraction of relational patterns to form enhanced clustering classes is done. This discovers the innovative relations between students and their educational performance in the various attributes using surf scale nested clustering approach based on an intelligent predicting system from soft computing processing tasks. This improves the data prediction rate by considering the time factor analysis and complexity to design and develop an efficient clustering algorithm which maximizes the clustering and classification accuracy for improving academic performance.


Energies ◽  
2020 ◽  
Vol 13 (10) ◽  
pp. 2559
Author(s):  
Marian B. Gorzałczany ◽  
Jakub Piekoszewski ◽  
Filip Rudziński

The main objective and contribution of this paper was/is the application of our knowledge-based data-mining approach (a fuzzy rule-based classification system) characterized by a genetically optimized interpretability-accuracy trade-off (by means of multi-objective evolutionary optimization algorithms) for transparent and accurate prediction of decentral smart grid control (DSGC) stability. In particular, we aim at uncovering the hierarchy of influence of particular input attributes upon the DSGC stability. Moreover, we also analyze the effect of possible "overlapping" of some input attributes over the other ones from the DSGC-stability perspective. The recently published and available at the UCI Database Repository Electrical Grid Stability Simulated Data Set and its input-aggregate-based concise version were used in our experiments. A comparison with 39 alternative approaches was also performed, demonstrating the advantages of our approach in terms of: (i) interpretable and accurate fuzzy rule-based DSGC-stability prediction and (ii) uncovering the hierarchy of DSGC-system’s attribute significance.


Author(s):  
V. Rajni Swamy ◽  
P. Müller ◽  
N. Srinivasan ◽  
S. Perumal ◽  
R. V. Krishnakumar

The two new isomorphous structures [3-methyl-4-(4-methylphenyl)-1-phenyl-6-trifluoromethyl-1H-pyrazolo[3,4-b]pyridin-5-yl](thiophen-2-yl)methanone, C26H18F3N3OS, (I), and [4-(4-chlorophenyl)-3-methyl-1-phenyl-6-trifluoromethyl-1H-pyrazolo[3,4-b]pyridin-5-yl](thiophen-2-yl)methanone, C25H15ClF3N3OS, (II), are shown to obey the chlorine–methyl exchange rule. Both structures show extensive disorder, treatment of which greatly improves the quality of the description of the structures. In addition, it is worth noting that the presence of extensive disorder may make it difficult to detect the isomorphism automatically during data-mining procedures (such as searches of the Cambridge Structural Database).


Author(s):  
Michal Kaźmierczak ◽  
Ewa Patyk-Kaźmierczak

The Cambridge Structural Database (CSD) is the largest repository of crystal structures of organic and metal–organic compounds, containing over 1.1 million entries. Over 3300 of the deposits are structures determined under high pressure, with the number being strongly affected by the experimental requirements of the high-pressure techniques. Nevertheless, it still presents a population sufficiently representative for statistical data mining. In this work, an in-depth analysis of this population is presented, showing where contributors of high-pressure depositions come from, which journals high-pressure structures are published in, and also providing information on some trends in high-pressure crystallography and how they have changed over the years elucidated from data collected in the CSD. The ultimate goal of this article is to bring the high-pressure crystallography content in the CSD to a wider audience of scientists.


1996 ◽  
Vol 35 (01) ◽  
pp. 41-51 ◽  
Author(s):  
F. Molino ◽  
D. Furia ◽  
F. Bar ◽  
S. Battista ◽  
N. Cappello ◽  
...  

AbstractThe study reported in this paper is aimed at evaluating the effectiveness of a knowledge-based expert system (ICTERUS) in diagnosing jaundiced patients, compared with a statistical system based on probabilistic concepts (TRIAL). The performances of both systems have been evaluated using the same set of data in the same number of patients. Both systems are spin-off products of the European project Euricterus, an EC-COMACBME Project designed to document the occurrence and diagnostic value of clinical findings in the clinical presentation of jaundice in Europe, and have been developed as decision-making tools for the identification of the cause of jaundice based only on clinical information and routine investigations. Two groups of jaundiced patients were studied, including 500 (retrospective sample) and 100 (prospective sample) subjects, respectively. All patients were independently submitted to both decision-support tools. The input of both systems was the data set agreed within the Euricterus Project. The performances of both systems were evaluated with respect to the reference diagnoses provided by experts on the basis of the full clinical documentation. Results indicate that both systems are clinically reliable, although the diagnostic prediction provided by the knowledge-based approach is slightly better.


2014 ◽  
Vol 6 (1) ◽  
pp. 15-20 ◽  
Author(s):  
David Hartanto Kamagi ◽  
Seng Hansun

Graduation Information is important for Universitas Multimedia Nusantara  which engaged in education. The data of graduated students from each academic year is an important part as a source of information to make a decision for BAAK (Bureau of Academic and Student Administration). With this information, a prediction can be made for students who are still active whether they can graduate on time, fast, late or drop out with the implementation of data mining. The purpose of this study is to make a prediction of students’ graduation with C4.5 algorithm as a reference for making policies and actions of academic fields (BAAK) in reducing students who graduated late and did not pass. From the research, the category of IPS semester one to semester six, gender, origin of high school, and number of credits, can predict the graduation of students with conditions quickly pass, pass on time, pass late and drop out, using data mining with C4.5 algorithm. Category of semester six is the highly influential on the predicted outcome of graduation. With the application test result, accuracy of the graduation prediction acquired is 87.5%. Index Terms-Data mining, C4.5 algorithm, Universitas Multimedia Nusantara, prediction.


Sign in / Sign up

Export Citation Format

Share Document