Iterative Supervised Principal Component Analysis-Driven Ligand Design for Regioselective Ti-Catalyzed Pyrrole Synthesis

<div> <div> <div> <p>Herein, we describe the use of iterative supervised principal component analysis (ISPCA) in de novo catalyst design. The regioselective synthesis of 2,5-dimethyl-1,3,4-triphenyl-1H- pyrrole (C) via Ti- catalyzed formal [2+2+1] cycloaddition of phenyl propyne and azobenzene was targeted as a proof of principle. The initial reaction conditions led to an unselective mixture of all possible pyrrole regioisomers. ISPCA was conducted on a training set of catalysts, and their performance was regressed against the scores from the top three principal components. Component loadings from this PCA space along with k-means clustering were used to inform the design of new test catalysts. The selectivity of a prospective test set was predicted in silico using the ISPCA model, and only optimal candidates were synthesized and tested experimentally. This data-driven predictive-modeling workflow was iterated, and after only three generations the catalytic selectivity was improved from 0.5 (statistical mixture of products) to over 11 (> 90% C) by incorporating 2,6-dimethyl- 4-(pyrrolidin-1-yl)pyridine as a ligand. The successful development of a highly selective catalyst without resorting to long, stochastic screening processes demonstrates the inherent power of ISPCA in de novo catalyst design and should motivate the general use of ISPCA in reaction development. </p> </div> </div> </div>

Download Full-text

Erratum to “Data‐driven dimension reduction in functional principal component analysis identifying the change‐point in functional data”

Statistical Analysis and Data Mining The ASA Data Science Journal ◽

10.1002/sam.11510 ◽

2021 ◽

Keyword(s):

Principal Component Analysis ◽

Dimension Reduction ◽

Functional Data ◽

Change Point ◽

Principal Component ◽

Component Analysis ◽

Functional Principal Component Analysis ◽

Data Driven ◽

Functional Principal Component

Download Full-text

Physical-oriented and machine learning-based emission modeling in a diesel compression ignition engine: Dimensionality reduction and regression

International Journal of Engine Research ◽

10.1177/14680874211070736 ◽

2022 ◽

pp. 146808742110707

Author(s):

Aran Mohammad ◽

Reza Rezaei ◽

Christopher Hayduk ◽

Thaddaeus Delebinski ◽

Saeid Shahpouri ◽

...

Keyword(s):

Principal Component Analysis ◽

Support Vector Machine ◽

Factor Analysis ◽

Dimensionality Reduction ◽

Principal Component ◽

Component Analysis ◽

Data Driven ◽

Support Vector ◽

Emission Models ◽

Emission Modeling

The development of internal combustion engines is affected by the exhaust gas emissions legislation and the striving to increase performance. This demands for engine-out emission models that can be used for engine optimization for real driving emission controls. The prediction capability of physically and data-driven engine-out emission models is influenced by the system inputs, which are specified by the user and can lead to an improved accuracy with increasing number of inputs. Thereby the occurrence of irrelevant inputs becomes more probable, which have a low functional relation to the emissions and can lead to overfitting. Alternatively, data-driven methods can be used to detect irrelevant and redundant inputs. In this work, thermodynamic states are modeled based on 772 stationary measured test bench data from a commercial vehicle diesel engine. Afterward, 37 measured and modeled variables are led into a data-driven dimensionality reduction. For this purpose, approaches of supervised learning, such as lasso regression and linear support vector machine, and unsupervised learning methods like principal component analysis and factor analysis are applied to select and extract the relevant features. The selected and extracted features are used for regression by the support vector machine and the feedforward neural network to model the NOx, CO, HC, and soot emissions. This enables an evaluation of the modeling accuracy as a result of the dimensionality reduction. Using the methods in this work, the 37 variables are reduced to 25, 22, 11, and 16 inputs for NOx, CO, HC, and soot emission modeling while maintaining the accuracy. The features selected using the lasso algorithm provide more accurate learning of the regression models than the extracted features through principal component analysis and factor analysis. This results in test errors RMSETe for modeling NOx, CO, HC, and soot emissions 19.22 ppm, 6.46 ppm, 1.29 ppm, and 0.06 FSN, respectively.

Download Full-text

Data-Driven Incipient Fault Detection via Canonical Variate Dissimilarity and Mixed Kernel Principal Component Analysis

IEEE Transactions on Industrial Informatics ◽

10.1109/tii.2020.3029900 ◽

2020 ◽

pp. 1-1

Author(s):

Ping Wu ◽

Riccardo Ferrari ◽

Yichao Liu ◽

Jan-Willem van Wingerden

Keyword(s):

Principal Component Analysis ◽

Fault Detection ◽

Principal Component ◽

Component Analysis ◽

Data Driven ◽

Kernel Principal Component Analysis ◽

Canonical Variate ◽

Incipient Fault ◽

Incipient Fault Detection

Download Full-text

A Data-driven Study of RR Lyrae Near-IR Light Curves: Principal Component Analysis, Robust Fits, and Metallicity Estimates

The Astrophysical Journal ◽

10.3847/1538-4357/aab4fd ◽

2018 ◽

Vol 857 (1) ◽

pp. 55 ◽

Cited By ~ 10

Author(s):

Gergely Hajdu ◽

István Dékány ◽

Márcio Catelan ◽

Eva K. Grebel ◽

Johanna Jurcsik

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Light Curves ◽

Data Driven ◽

Rr Lyrae ◽

Near Ir ◽

Ir Light

Download Full-text

Damage detection in nonlinear civil structures using kernel principal component analysis

Advances in Structural Engineering ◽

10.1177/1369433220913207 ◽

2020 ◽

Vol 23 (11) ◽

pp. 2414-2430

Author(s):

Khaoula Ghoulem ◽

Tarek Kormi ◽

Nizar Bel Hadj Ali

Keyword(s):

Principal Component Analysis ◽

Damage Detection ◽

Structural Model ◽

Principal Component ◽

Original Data ◽

Component Analysis ◽

Correlated Data ◽

Data Driven ◽

Kernel Principal Component Analysis ◽

Structural Systems

In the general framework of data-driven structural health monitoring, principal component analysis has been applied successfully in continuous monitoring of complex civil infrastructures. In the case of linear or polynomial relationship between monitored variables, principal component analysis allows generation of structured residuals from measurement outputs without a priori structural model. The principal component analysis has been widely used for system monitoring based on its ability to handle high-dimensional, noisy, and highly correlated data by projecting the data onto a lower dimensional subspace that contains most of the variance of the original data. However, for nonlinear systems, it could be easily demonstrated that linear principal component analysis is unable to disclose nonlinear relationships between variables. This has naturally motivated various developments of nonlinear principal component analysis to tackle damage diagnosis of complex structural systems, especially those characterized by a nonlinear behavior. In this article, a data-driven technique for damage detection in nonlinear structural systems is presented. The proposed method is based on kernel principal component analysis. Two case studies involving nonlinear cable structures are presented to show the effectiveness of the proposed methodology. The validity of the kernel principal component analysis–based monitoring technique is shown in terms of the ability to damage detection. Robustness to environmental effects and disturbances are also studied.

Download Full-text

Data-Driven Fault Classification for Non-Inverting Buck–Boost DC–DC Power Converters Based on Expectation Maximisation Principal Component Analysis and Support Vector Machine Approaches

10.1109/peas53589.2021.9628697 ◽

2021 ◽

Author(s):

Yichuan Fu ◽

Zhiwei Gao ◽

Haimeng Wu ◽

Xiuxia Yin ◽

Aihua Zhang

Keyword(s):

Principal Component Analysis ◽

Support Vector Machine ◽

Power Converters ◽

Principal Component ◽

Component Analysis ◽

Data Driven ◽

Fault Classification ◽

Support Vector ◽

Dc Power ◽

Expectation Maximisation

Download Full-text

Event-by-event non-rigid data-driven PET respiratory motion correction methods: comparison of principal component analysis and centroid of distribution

Physics in Medicine and Biology ◽

10.1088/1361-6560/ab0bc9 ◽

2019 ◽

Vol 64 (16) ◽

pp. 165014 ◽

Cited By ~ 1

Author(s):

Silin Ren ◽

Yihuan Lu ◽

Ottavia Bertolli ◽

Kris Thielemans ◽

Richard E Carson

Keyword(s):

Principal Component Analysis ◽

Motion Correction ◽

Respiratory Motion ◽

Principal Component ◽

Component Analysis ◽

Data Driven ◽

Methods Comparison ◽

Respiratory Motion Correction

Download Full-text

Gene Expression Profile Analysis of Pediatric MDS Patients Correlates with FAB Classification and Has Prognostic Relevance

Blood ◽

10.1182/blood.v112.11.2695.2695 ◽

2008 ◽

Vol 112 (11) ◽

pp. 2695-2695

Author(s):

Silvia Bresolin ◽

Luca Trentin ◽

Geertruy Kronnie ◽

Laura Sainati ◽

Marco Zecca ◽

...

Keyword(s):

Gene Expression ◽

Principal Component Analysis ◽

Myeloid Leukemia ◽

De Novo ◽

Gene Expression Signature ◽

Principal Component ◽

Component Analysis ◽

Secondary Aml ◽

Fab Classification ◽

De Novo Aml

Abstract Myelodysplatic syndromes (MDS) are rare malignant haematopoietic stem cell disorders in children. They have the propensity to transform into acute myeloid leukemia and, for this reason they can be considered as a pre-leukemia condition. In this study we analyzed the gene expression profile (GEP) of a large cohort of pediatric patients: 14 MDS [6 refractory cytopenias (RC), 6 refractory anemias with excess blasts (RAEB) and 2 refractory anemias with excess blasts in transformation (RAEB-t)], 50 de-novo acute myeloid leukemia (AML) and 6 normal bone marrow (BM) aspirates. Furthermore, in 5 cases, samples were available for analysis both at diagnosis and at time of secondary AML progression. Gene expression analysis was performed on the Affymetrix HG U133 Plus 2.0 oligonucleotide microarrays using Partek software packages and the leukemia classifier version 7(LCver7). Statistical analyses were performed to determine the correlation between the gene expression signature and MDS subtype. Unsupervised hierarchical clustering analysis separated the majority of MDS cases from the diagnostic AML samples and placed the normal BM specimens into the MDS cluster. Remarkably, all the MDS cases that evolved into an AML within one year, except one, clustered together inside the diagnostic AML group. Performing principal component analysis (PCA) we observed that MDS samples were clustered between the group of normal BM and AML samples. Moreover the RC samples were located proximal to the cluster of normal BM samples while RAEB and RAEB-t specimens were nearest to the AML samples cluster (Fig 1). Further, we classified the MDS samples using the LCver7 classifier, an algorithm developed inside the MILE (Microarray Innovation In LEukemia) study that gives an overall cross-validation accuracy of >95% for distinct sub-classes of pediatric and adult leukemias using gene expression profiles. The 14 MDS samples had 57.2 % and 42.8 % AML and non AML-like signatures, respectively. The Fisher exact test showed that there was a statistical concordance (p=0.008) between the FAB classification and the gene expression signature. In fact, 83% of RC patients had a non AML-like signature whereas only 17% had an AML-like signature. On the contrary 85% of the RAEB and RAEB-t patients had an AML-like signature. In conclusion, the results of unsupervised analysis not only demonstrated that gene expression technology is able to distinguish between MDS, AML and normal BM samples but, in addition, GEP can identify an AML-like signature in samples at diagnosis of MDS, with a higher risk of AML-evolution, allowing to identify a group of patients that could be eligible for a more intensive treatment. Fig.1. Principal component analysis (PCA) of MDS, AML and normal BM samples. Red: MDS samples, Blue: de novo AML, Green: secondary AML, Violet: normal BM. RC samples are placed closer to the normal BM specimens. The three MDS patients that will evolve into secondary AML are included into the green ellipsoids. Fig.1. Principal component analysis (PCA) of MDS, AML and normal BM samples. Red: MDS samples, Blue: de novo AML, Green: secondary AML, Violet: normal BM. RC samples are placed closer to the normal BM specimens. The three MDS patients that will evolve into secondary AML are included into the green ellipsoids.

Download Full-text