Iterative Supervised Principal Component Analysis-Driven Ligand Design for Regioselective Ti-Catalyzed Pyrrole Synthesis

Author(s):  
Xin Yi See ◽  
Benjamin Reiner ◽  
Xuelan Wen ◽  
T. Alexander Wheeler ◽  
Channing Klein ◽  
...  

<div> <div> <div> <p>Herein, we describe the use of iterative supervised principal component analysis (ISPCA) in de novo catalyst design. The regioselective synthesis of 2,5-dimethyl-1,3,4-triphenyl-1H- pyrrole (C) via Ti- catalyzed formal [2+2+1] cycloaddition of phenyl propyne and azobenzene was targeted as a proof of principle. The initial reaction conditions led to an unselective mixture of all possible pyrrole regioisomers. ISPCA was conducted on a training set of catalysts, and their performance was regressed against the scores from the top three principal components. Component loadings from this PCA space along with k-means clustering were used to inform the design of new test catalysts. The selectivity of a prospective test set was predicted in silico using the ISPCA model, and only optimal candidates were synthesized and tested experimentally. This data-driven predictive-modeling workflow was iterated, and after only three generations the catalytic selectivity was improved from 0.5 (statistical mixture of products) to over 11 (> 90% C) by incorporating 2,6-dimethyl- 4-(pyrrolidin-1-yl)pyridine as a ligand. The successful development of a highly selective catalyst without resorting to long, stochastic screening processes demonstrates the inherent power of ISPCA in de novo catalyst design and should motivate the general use of ISPCA in reaction development. </p> </div> </div> </div>

Author(s):  
Xin Yi See ◽  
Benjamin Reiner ◽  
Xuelan Wen ◽  
T. Alexander Wheeler ◽  
Channing Klein ◽  
...  

<div> <div> <div> <p>Herein, we describe the use of iterative supervised principal component analysis (ISPCA) in de novo catalyst design. The regioselective synthesis of 2,5-dimethyl-1,3,4-triphenyl-1H- pyrrole (C) via Ti- catalyzed formal [2+2+1] cycloaddition of phenyl propyne and azobenzene was targeted as a proof of principle. The initial reaction conditions led to an unselective mixture of all possible pyrrole regioisomers. ISPCA was conducted on a training set of catalysts, and their performance was regressed against the scores from the top three principal components. Component loadings from this PCA space along with k-means clustering were used to inform the design of new test catalysts. The selectivity of a prospective test set was predicted in silico using the ISPCA model, and only optimal candidates were synthesized and tested experimentally. This data-driven predictive-modeling workflow was iterated, and after only three generations the catalytic selectivity was improved from 0.5 (statistical mixture of products) to over 11 (> 90% C) by incorporating 2,6-dimethyl- 4-(pyrrolidin-1-yl)pyridine as a ligand. The successful development of a highly selective catalyst without resorting to long, stochastic screening processes demonstrates the inherent power of ISPCA in de novo catalyst design and should motivate the general use of ISPCA in reaction development. </p> </div> </div> </div>


2022 ◽  
pp. 146808742110707
Author(s):  
Aran Mohammad ◽  
Reza Rezaei ◽  
Christopher Hayduk ◽  
Thaddaeus Delebinski ◽  
Saeid Shahpouri ◽  
...  

The development of internal combustion engines is affected by the exhaust gas emissions legislation and the striving to increase performance. This demands for engine-out emission models that can be used for engine optimization for real driving emission controls. The prediction capability of physically and data-driven engine-out emission models is influenced by the system inputs, which are specified by the user and can lead to an improved accuracy with increasing number of inputs. Thereby the occurrence of irrelevant inputs becomes more probable, which have a low functional relation to the emissions and can lead to overfitting. Alternatively, data-driven methods can be used to detect irrelevant and redundant inputs. In this work, thermodynamic states are modeled based on 772 stationary measured test bench data from a commercial vehicle diesel engine. Afterward, 37 measured and modeled variables are led into a data-driven dimensionality reduction. For this purpose, approaches of supervised learning, such as lasso regression and linear support vector machine, and unsupervised learning methods like principal component analysis and factor analysis are applied to select and extract the relevant features. The selected and extracted features are used for regression by the support vector machine and the feedforward neural network to model the NOx, CO, HC, and soot emissions. This enables an evaluation of the modeling accuracy as a result of the dimensionality reduction. Using the methods in this work, the 37 variables are reduced to 25, 22, 11, and 16 inputs for NOx, CO, HC, and soot emission modeling while maintaining the accuracy. The features selected using the lasso algorithm provide more accurate learning of the regression models than the extracted features through principal component analysis and factor analysis. This results in test errors RMSETe for modeling NOx, CO, HC, and soot emissions 19.22 ppm, 6.46 ppm, 1.29 ppm, and 0.06 FSN, respectively.


2018 ◽  
Vol 857 (1) ◽  
pp. 55 ◽  
Author(s):  
Gergely Hajdu ◽  
István Dékány ◽  
Márcio Catelan ◽  
Eva K. Grebel ◽  
Johanna Jurcsik

2020 ◽  
Vol 23 (11) ◽  
pp. 2414-2430
Author(s):  
Khaoula Ghoulem ◽  
Tarek Kormi ◽  
Nizar Bel Hadj Ali

In the general framework of data-driven structural health monitoring, principal component analysis has been applied successfully in continuous monitoring of complex civil infrastructures. In the case of linear or polynomial relationship between monitored variables, principal component analysis allows generation of structured residuals from measurement outputs without a priori structural model. The principal component analysis has been widely used for system monitoring based on its ability to handle high-dimensional, noisy, and highly correlated data by projecting the data onto a lower dimensional subspace that contains most of the variance of the original data. However, for nonlinear systems, it could be easily demonstrated that linear principal component analysis is unable to disclose nonlinear relationships between variables. This has naturally motivated various developments of nonlinear principal component analysis to tackle damage diagnosis of complex structural systems, especially those characterized by a nonlinear behavior. In this article, a data-driven technique for damage detection in nonlinear structural systems is presented. The proposed method is based on kernel principal component analysis. Two case studies involving nonlinear cable structures are presented to show the effectiveness of the proposed methodology. The validity of the kernel principal component analysis–based monitoring technique is shown in terms of the ability to damage detection. Robustness to environmental effects and disturbances are also studied.


Blood ◽  
2008 ◽  
Vol 112 (11) ◽  
pp. 2695-2695
Author(s):  
Silvia Bresolin ◽  
Luca Trentin ◽  
Geertruy Kronnie ◽  
Laura Sainati ◽  
Marco Zecca ◽  
...  

Abstract Myelodysplatic syndromes (MDS) are rare malignant haematopoietic stem cell disorders in children. They have the propensity to transform into acute myeloid leukemia and, for this reason they can be considered as a pre-leukemia condition. In this study we analyzed the gene expression profile (GEP) of a large cohort of pediatric patients: 14 MDS [6 refractory cytopenias (RC), 6 refractory anemias with excess blasts (RAEB) and 2 refractory anemias with excess blasts in transformation (RAEB-t)], 50 de-novo acute myeloid leukemia (AML) and 6 normal bone marrow (BM) aspirates. Furthermore, in 5 cases, samples were available for analysis both at diagnosis and at time of secondary AML progression. Gene expression analysis was performed on the Affymetrix HG U133 Plus 2.0 oligonucleotide microarrays using Partek software packages and the leukemia classifier version 7(LCver7). Statistical analyses were performed to determine the correlation between the gene expression signature and MDS subtype. Unsupervised hierarchical clustering analysis separated the majority of MDS cases from the diagnostic AML samples and placed the normal BM specimens into the MDS cluster. Remarkably, all the MDS cases that evolved into an AML within one year, except one, clustered together inside the diagnostic AML group. Performing principal component analysis (PCA) we observed that MDS samples were clustered between the group of normal BM and AML samples. Moreover the RC samples were located proximal to the cluster of normal BM samples while RAEB and RAEB-t specimens were nearest to the AML samples cluster (Fig 1). Further, we classified the MDS samples using the LCver7 classifier, an algorithm developed inside the MILE (Microarray Innovation In LEukemia) study that gives an overall cross-validation accuracy of &gt;95% for distinct sub-classes of pediatric and adult leukemias using gene expression profiles. The 14 MDS samples had 57.2 % and 42.8 % AML and non AML-like signatures, respectively. The Fisher exact test showed that there was a statistical concordance (p=0.008) between the FAB classification and the gene expression signature. In fact, 83% of RC patients had a non AML-like signature whereas only 17% had an AML-like signature. On the contrary 85% of the RAEB and RAEB-t patients had an AML-like signature. In conclusion, the results of unsupervised analysis not only demonstrated that gene expression technology is able to distinguish between MDS, AML and normal BM samples but, in addition, GEP can identify an AML-like signature in samples at diagnosis of MDS, with a higher risk of AML-evolution, allowing to identify a group of patients that could be eligible for a more intensive treatment. Fig.1. Principal component analysis (PCA) of MDS, AML and normal BM samples. Red: MDS samples, Blue: de novo AML, Green: secondary AML, Violet: normal BM. RC samples are placed closer to the normal BM specimens. The three MDS patients that will evolve into secondary AML are included into the green ellipsoids. Fig.1. Principal component analysis (PCA) of MDS, AML and normal BM samples. Red: MDS samples, Blue: de novo AML, Green: secondary AML, Violet: normal BM. RC samples are placed closer to the normal BM specimens. The three MDS patients that will evolve into secondary AML are included into the green ellipsoids.


Sign in / Sign up

Export Citation Format

Share Document