Continuous flexibility analysis of SARS-CoV-2 spike prefusion structures

Using a new consensus-based image-processing approach together with principal component analysis, the flexibility and conformational dynamics of the SARS-CoV-2 spike in the prefusion state have been analysed. These studies revealed concerted motions involving the receptor-binding domain (RBD), N-terminal domain, and subdomains 1 and 2 around the previously characterized 1-RBD-up state, which have been modeled as elastic deformations. It is shown that in this data set there are not well defined, stable spike conformations, but virtually a continuum of states. An ensemble map was obtained with minimum bias, from which the extremes of the change along the direction of maximal variance were modeled by flexible fitting. The results provide a warning of the potential image-processing classification instability of these complicated data sets, which has a direct impact on the interpretability of the results.

Download Full-text

Continuous flexibility analysis of SARS-CoV-2 Spike prefusion structures

10.1101/2020.07.08.191072 ◽

2020 ◽

Cited By ~ 3

Author(s):

Roberto Melero ◽

Carlos Oscar S. Sorzano ◽

Brent Foster ◽

José-Luis Vilas ◽

Marta Martínez ◽

...

Keyword(s):

Conformational Dynamics ◽

Principal Component ◽

Wild Type ◽

Elastic Deformations ◽

Flexibility Analysis ◽

High Resolution Structure ◽

Novel Processing ◽

Spike Dynamics ◽

Continuum Of States ◽

Resolution Structure

AbstractWith the help of novel processing workflows and algorithms, we have obtained a better understanding of the flexibility and conformational dynamics of the SARS-CoV-2 spike in the prefusion state. We have re-analyzed previous cryo-EM data combining 3D clustering approaches with ways to explore a continuous flexibility space based on 3D Principal Component Analysis. These advanced analyses revealed a concerted motion involving the receptor-binding domain (RBD), N-terminal domain (NTD), and subdomain 1 and 2 (SD1 & SD2) around the previously characterized 1-RBD-up state, which have been modeled as elastic deformations. We show that in this dataset there are not well-defined, stable, spike conformations, but virtually a continuum of states moving in a concerted fashion. We obtained an improved resolution ensemble map with minimum bias, from which we model by flexible fitting the extremes of the change along the direction of maximal variance. Moreover, a high-resolution structure of a recently described biochemically stabilized form of the spike is shown to greatly reduce the dynamics observed for the wild-type spike. Our results provide new detailed avenues to potentially restrain the spike dynamics for structure-based drug and vaccine design and at the same time give a warning of the potential image processing classification instability of these complicated datasets, having a direct impact on the interpretability of the results.

Download Full-text

Application of multivariable statistical techniques in plant-wide WWTP control strategies analysis

Water Science & Technology ◽

10.2166/wst.2007.586 ◽

2007 ◽

Vol 56 (6) ◽

pp. 75-83 ◽

Cited By ~ 3

Author(s):

X. Flores ◽

J. Comas ◽

I.R. Roda ◽

L. Jiménez ◽

K.V. Gernaey

Keyword(s):

Cluster Analysis ◽

Control Strategies ◽

Treatment Plant ◽

Principal Component ◽

Statistical Techniques ◽

Data Sets ◽

Data Set ◽

Casual Relation ◽

Evaluation Matrix ◽

Natural Groups

The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation of the complex multicriteria data sets and allows an improved use of information for effective evaluation of control strategies.

Download Full-text

Dimensionality and Its Reduction

Statistics, Data Mining, and Machine Learning in Astronomy ◽

10.23943/princeton/9780691151687.003.0007 ◽

2014 ◽

Author(s):

Andrew J. Connolly ◽

Jacob T. VanderPlas ◽

Alexander Gray ◽

Andrew J. Connolly ◽

Jacob T. VanderPlas ◽

...

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Reduction Technique ◽

High Dimensional ◽

Data Sets ◽

Data Set ◽

Gaussian Distributions ◽

Dimensionality Reduction Technique ◽

Alternative Techniques ◽

New Generation

With the dramatic increase in data available from a new generation of astronomical telescopes and instruments, many analyses must address the question of the complexity as well as size of the data set. This chapter deals with how we can learn which measurements, properties, or combinations thereof carry the most information within a data set. It describes techniques that are related to concepts discussed when describing Gaussian distributions, density estimation, and the concepts of information content. The chapter begins with an exploration of the problems posed by high-dimensional data. It then describes the data sets used in this chapter, and introduces perhaps the most important and widely used dimensionality reduction technique, principal component analysis (PCA). The remainder of the chapter discusses several alternative techniques which address some of the weaknesses of PCA.

Download Full-text

Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components

Cancer Informatics ◽

10.1177/1176935118771082 ◽

2018 ◽

Vol 17 ◽

pp. 117693511877108 ◽

Cited By ~ 4

Author(s):

Min Wang ◽

Steven M Kornblau ◽

Kevin R Coombes

Keyword(s):

Principal Components ◽

Myeloid Leukemia ◽

Principal Component ◽

R Package ◽

Biological Data ◽

Data Sets ◽

Proteomics Data ◽

Data Set ◽

Apoptosis Pathway ◽

Biological Interpretation

Principal component analysis (PCA) is one of the most common techniques in the analysis of biological data sets, but applying PCA raises 2 challenges. First, one must determine the number of significant principal components (PCs). Second, because each PC is a linear combination of genes, it rarely has a biological interpretation. Existing methods to determine the number of PCs are either subjective or computationally extensive. We review several methods and describe a new R package, PCDimension, that implements additional methods, the most important being an algorithm that extends and automates a graphical Bayesian method. Using simulations, we compared the methods. Our newly automated procedure is competitive with the best methods when considering both accuracy and speed and is the most accurate when the number of objects is small compared with the number of attributes. We applied the method to a proteomics data set from patients with acute myeloid leukemia. Proteins in the apoptosis pathway could be explained using 6 PCs. By clustering the proteins in PC space, we were able to replace the PCs by 6 “biological components,” 3 of which could be immediately interpreted from the current literature. We expect this approach combining PCA with clustering to be widely applicable.

Download Full-text

Quantitative comparison of the variability in observed and simulated shortwave reflectance

Atmospheric Chemistry and Physics ◽

10.5194/acp-13-3133-2013 ◽

2013 ◽

Vol 13 (6) ◽

pp. 3133-3147 ◽

Cited By ~ 10

Author(s):

Y. L. Roberts ◽

P. Pilewskie ◽

B. C. Kindel ◽

D. R. Feldman ◽

W. D. Collins

Keyword(s):

Principal Component ◽

Reflectance Spectra ◽

Observation System ◽

Data Sets ◽

Water Vapor Absorption ◽

Data Set ◽

Shared Spaces ◽

Si Traceability ◽

Scanning Imaging ◽

Earth’S Climate

Abstract. The Climate Absolute Radiance and Refractivity Observatory (CLARREO) is a climate observation system that has been designed to monitor the Earth's climate with unprecedented absolute radiometric accuracy and SI traceability. Climate Observation System Simulation Experiments (OSSEs) have been generated to simulate CLARREO hyperspectral shortwave imager measurements to help define the measurement characteristics needed for CLARREO to achieve its objectives. To evaluate how well the OSSE-simulated reflectance spectra reproduce the Earth's climate variability at the beginning of the 21st century, we compared the variability of the OSSE reflectance spectra to that of the reflectance spectra measured by the Scanning Imaging Absorption Spectrometer for Atmospheric Cartography (SCIAMACHY). Principal component analysis (PCA) is a multivariate decomposition technique used to represent and study the variability of hyperspectral radiation measurements. Using PCA, between 99.7% and 99.9% of the total variance the OSSE and SCIAMACHY data sets can be explained by subspaces defined by six principal components (PCs). To quantify how much information is shared between the simulated and observed data sets, we spectrally decomposed the intersection of the two data set subspaces. The results from four cases in 2004 showed that the two data sets share eight (January and October) and seven (April and July) dimensions, which correspond to about 99.9% of the total SCIAMACHY variance for each month. The spectral nature of these shared spaces, understood by examining the transformed eigenvectors calculated from the subspace intersections, exhibit similar physical characteristics to the original PCs calculated from each data set, such as water vapor absorption, vegetation reflectance, and cloud reflectance.

Download Full-text

Quantitative comparison of the variability in observed and simulated shortwave reflectance

Atmospheric Chemistry and Physics Discussions ◽

10.5194/acpd-12-28305-2012 ◽

2012 ◽

Vol 12 (10) ◽

pp. 28305-28341

Author(s):

Y. L. Roberts ◽

P. Pilewskie ◽

B. C. Kindel ◽

D. R. Feldman ◽

W. D. Collins

Keyword(s):

Principal Component ◽

Reflectance Spectra ◽

Observation System ◽

Data Sets ◽

Water Vapor Absorption ◽

Data Set ◽

Shared Spaces ◽

Si Traceability ◽

Scanning Imaging ◽

Earth’S Climate

Abstract. The Climate Absolute Radiance and Refractivity Observatory (CLARREO) is a climate observation system that has been designed to monitor the Earth's climate with unprecedented absolute radiometric accuracy and SI traceability. Climate Observation System Simulation Experiments (OSSEs) have been generated to simulate CLARREO hyperspectral shortwave imager measurements to help define the measurement characteristics needed for CLARREO to achieve its objectives. To evaluate how well the OSSE-simulated reflectance spectra reproduce the Earth's climate variability at the beginning of the 21st century, we compared the variability of the OSSE reflectance spectra to that of the reflectance spectra measured by the Scanning Imaging Absorption Spectrometer for Atmospheric Cartography (SCIAMACHY). Principal component analysis (PCA) is a multivariate spectral decomposition technique used to represent and study the variability of hyperspectral radiation measurements. Using PCA, between 99.7% and 99.9% of the total variance the OSSE and SCIAMACHY data sets can be explained by subspaces defined by six principal components (PCs). To quantify how much information is shared between the simulated and observed data sets, we spectrally decomposed the intersection of the two data set subspaces. The results from four cases in 2004 showed that the two data sets share eight (January and October) and seven (April and July) dimensions, which correspond to about 99.9% of the total SCIAMACHY variance for each month. The spectral nature of these shared spaces, understood by examining the transformed eigenvectors calculated from the subspace intersections, exhibit similar physical characteristics to the original PCs calculated from each data set, such as water vapor absorption, vegetation reflectance, and cloud reflectance.

Download Full-text

Extreme data compression while searching for new physics

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa2589 ◽

2020 ◽

Vol 498 (3) ◽

pp. 3440-3451

Author(s):

Alan F Heavens ◽

Elena Sellentin ◽

Andrew H Jaffe

Keyword(s):

Data Compression ◽

Principal Component ◽

New Physics ◽

Standard Theory ◽

Data Sets ◽

Data Set ◽

Formidable Challenge ◽

Public Data ◽

Bayesian Evidence ◽

Data Points

ABSTRACT Bringing a high-dimensional data set into science-ready shape is a formidable challenge that often necessitates data compression. Compression has accordingly become a key consideration for contemporary cosmology, affecting public data releases, and reanalyses searching for new physics. However, data compression optimized for a particular model can suppress signs of new physics, or even remove them altogether. We therefore provide a solution for exploring new physics during data compression. In particular, we store additional agnostic compressed data points, selected to enable precise constraints of non-standard physics at a later date. Our procedure is based on the maximal compression of the MOPED algorithm, which optimally filters the data with respect to a baseline model. We select additional filters, based on a generalized principal component analysis, which are carefully constructed to scout for new physics at high precision and speed. We refer to the augmented set of filters as MOPED-PC. They enable an analytic computation of Bayesian Evidence that may indicate the presence of new physics, and fast analytic estimates of best-fitting parameters when adopting a specific non-standard theory, without further expensive MCMC analysis. As there may be large numbers of non-standard theories, the speed of the method becomes essential. Should no new physics be found, then our approach preserves the precision of the standard parameters. As a result, we achieve very rapid and maximally precise constraints of standard and non-standard physics, with a technique that scales well to large dimensional data sets.

Download Full-text

Separation of the daily quiet variation from the geomagnetic field observations with the principal component analysis

10.5194/egusphere-egu2020-3423 ◽

2020 ◽

Author(s):

Anna Morozova ◽

Rania Rebbah ◽

M. Alexandra Pais

Keyword(s):

Principal Component Analysis ◽

Geomagnetic Field ◽

Daily Variation ◽

Extraction Procedure ◽

Principal Component ◽

Component Analysis ◽

Activity Level ◽

Data Series ◽

Data Sets ◽

Data Set

<p>Geomagnetic field (GMF) variations from external sources are classified as regular diurnal or occurring during periods of disturbances. The most significant regular variations are the quiet solar daily variation (Sq) and the disturbance daily variation (SD). These variations have well recognized daily cycles and need to be accounted for before the analysis of the disturbed field. Preliminary analysis of the GMF variations shows that the principal component analysis (PCA) is a useful tool for extraction of regular variations of GMF; however the requirements to the data set length, geomagnetic activity level etc. need to be established.</p><p>Here we present preliminary results of the PCA-based Sq extraction procedure based on the analysis of the Coimbra Geomagnetic Observatory (COI) measurements of the geomagnetic field components H, X, Y and Z between 2007 and 2015. The PCA-based Sq curves are compared with the standard ones obtained using 5 IQD per month. PCA was applied to data sets of different length: either 1 month-long data set for one of 2007-2015 years or data series for the same month but from different years (2007-2015) combined together. For most of the analyzed years the first PCA mode (PC1) was identified as SD variation and the second mode (PC2) was identified as Sq variation.</p>

Download Full-text

Microfocus diffraction from different regions of a protein crystal: structural variations and unit-cell polymorphism

Acta Crystallographica Section D Structural Biology ◽

10.1107/s2059798318003479 ◽

2018 ◽

Vol 74 (5) ◽

pp. 411-421 ◽

Cited By ~ 4

Author(s):

Michael C. Thompson ◽

Duilio Cascio ◽

Todd O. Yeates

Keyword(s):

Principal Component ◽

Unit Cell ◽

Protein Crystal ◽

Data Sets ◽

Structural Variations ◽

Data Set ◽

X Ray ◽

Unit Cells ◽

Factor Data ◽

Dynamics Simulations

Real macromolecular crystals can be non-ideal in a myriad of ways. This often creates challenges for structure determination, while also offering opportunities for greater insight into the crystalline state and the dynamic behavior of macromolecules. To evaluate whether different parts of a single crystal of a dynamic protein, EutL, might be informative about crystal and protein polymorphism, a microfocus X-ray synchrotron beam was used to collect a series of 18 separate data sets from non-overlapping regions of the same crystal specimen. A principal component analysis (PCA) approach was employed to compare the structure factors and unit cells across the data sets, and it was found that the 18 data sets separated into two distinct groups, with largeRvalues (in the 40% range) and significant unit-cell variations between the members of the two groups. This categorization mapped the different data-set types to distinct regions of the crystal specimen. Atomic models of EutL were then refined against two different data sets obtained by separately merging data from the two distinct groups. A comparison of the two resulting models revealed minor but discernable differences in certain segments of the protein structure, and regions of higher deviation were found to correlate with regions where larger dynamic motions were predicted to occur by normal-mode molecular-dynamics simulations. The findings emphasize that large spatially dependent variations may be present across individual macromolecular crystals. This information can be uncovered by simultaneous analysis of multiple partial data sets and can be exploited to reveal new insights about protein dynamics, while also improving the accuracy of the structure-factor data ultimately obtained in X-ray diffraction experiments.

Download Full-text

A SURVEY ON THE CURES FOR THE CURSE OF DIMENSIONALITY IN BIG DATA

Asian Journal of Pharmaceutical and Clinical Research ◽

10.22159/ajpcr.2017.v10s1.19755 ◽

2017 ◽

Vol 10 (13) ◽

pp. 355 ◽

Cited By ~ 1

Author(s):

Reshma Remesh ◽

Pattabiraman. V

Keyword(s):

Dimensionality Reduction ◽

Input Data ◽

Principal Component ◽

Kernel Principal Component Analysis ◽

High Dimensional ◽

Data Sets ◽

Learning Approaches ◽

Data Set ◽

Reduction Techniques ◽

Dimensionality Reduction Techniques

Dimensionality reduction techniques are used to reduce the complexity for analysis of high dimensional data sets. The raw input data set may have large dimensions and it might consume time and lead to wrong predictions if unnecessary data attributes are been considered for analysis. So using dimensionality reduction techniques one can reduce the dimensions of input data towards accurate prediction with less cost. In this paper the different machine learning approaches used for dimensionality reductions such as PCA, SVD, LDA, Kernel Principal Component Analysis and Artificial Neural Network have been studied.

Download Full-text