scholarly journals Multivariate Analysis as a Tool for Quantification of Conformational Transitions in DNA Thin Films

2021 ◽  
Vol 11 (13) ◽  
pp. 5895
Author(s):  
Kristina Serec ◽  
Sanja Dolanski Babić

The double-stranded B-form and A-form have long been considered the two most important native forms of DNA, each with its own distinct biological roles and hence the focus of many areas of study, from cellular functions to cancer diagnostics and drug treatment. Due to the heterogeneity and sensitivity of the secondary structure of DNA, there is a need for tools capable of a rapid and reliable quantification of DNA conformation in diverse environments. In this work, the second paper in the series that addresses conformational transitions in DNA thin films utilizing FTIR spectroscopy, we exploit popular chemometric methods: the principal component analysis (PCA), support vector machine (SVM) learning algorithm, and principal component regression (PCR), in order to quantify and categorize DNA conformation in thin films of different hydrated states. By complementing FTIR technique with multivariate statistical methods, we demonstrate the ability of our sample preparation and automated spectral analysis protocol to rapidly and efficiently determine conformation in DNA thin films based on the vibrational signatures in the 1800–935 cm−1 range. Furthermore, we assess the impact of small hydration-related changes in FTIR spectra on automated DNA conformation detection and how to avoid discrepancies by careful sampling.

2015 ◽  
Vol 28 (3) ◽  
pp. 1016-1030 ◽  
Author(s):  
Erik Swenson

Abstract Various multivariate statistical methods exist for analyzing covariance and isolating linear relationships between datasets. The most popular linear methods are based on singular value decomposition (SVD) and include canonical correlation analysis (CCA), maximum covariance analysis (MCA), and redundancy analysis (RDA). In this study, continuum power CCA (CPCCA) is introduced as one extension of continuum power regression for isolating pairs of coupled patterns whose temporal variation maximizes the squared covariance between partially whitened variables. Similar to the whitening transformation, the partial whitening transformation acts to decorrelate individual variables but only to a partial degree with the added benefit of preconditioning sample covariance matrices prior to inversion, providing a more accurate estimate of the population covariance. CPCCA is a unified approach in the sense that the full range of solutions bridges CCA, MCA, RDA, and principal component regression (PCR). Recommended CPCCA solutions include a regularization for CCA, a variance bias correction for MCA, and a regularization for RDA. Applied to synthetic data samples, such solutions yield relatively higher skill in isolating known coupled modes embedded in noise. Provided with some crude prior expectation of the signal-to-noise ratio, the use of asymmetric CPCCA solutions may be justifiable and beneficial. An objective parameter choice is offered for regularization with CPCCA based on the covariance estimate of O. Ledoit and M. Wolf, and the results are quite robust. CPCCA is encouraged for a range of applications.


Author(s):  
L. Nirmala Devi ◽  
A.Nageswar Rao

Human action recognition (HAR) is one of most significant research topics, and it has attracted the concentration of many researchers. Automatic HAR system is applied in several fields like visual surveillance, data retrieval, healthcare, etc. Based on this inspiration, in this chapter, the authors propose a new HAR model that considers an image as input and analyses and exposes the action present in it. Under the analysis phase, they implement two different feature extraction methods with the help of rotation invariant Gabor filter and edge adaptive wavelet filter. For every action image, a new vector called as composite feature vector is formulated and then subjected to dimensionality reduction through principal component analysis (PCA). Finally, the authors employ the most popular supervised machine learning algorithm (i.e., support vector machine [SVM]) for classification. Simulation is done over two standard datasets; they are KTH and Weizmann, and the performance is measured through an accuracy metric.


2020 ◽  
pp. 1-11
Author(s):  
Chuanxin Fang

English Online teaching quality evaluation refers to the process of using effective technical means to comprehensively collect, sort and analyze the teaching status and make value judgments to improve teaching activities and improve teaching quality. The research work of this paper is mainly around the design of teaching quality evaluation model based on machine learning theory and has done in-depth research on the preprocessing of evaluation indicators and the construction of support vector machine teaching quality evaluation model. Moreover, this study uses improved principal component analysis to reduce the dimensionality of the evaluation index, thus avoiding the impact of the overly complicated network model on the prediction effect. In addition, in order to verify that the model proposed in this study has more advantages in evaluating teaching quality than other shallow models, the parameters of the model are tuned, and a control experiment is designed to verify the performance of the model. The research results show that this research model has a certain effect on the evaluation of school teaching quality, and it can be applied to practice.


2020 ◽  
Vol 16 (1) ◽  
pp. 155014772090363 ◽  
Author(s):  
Ying Liu ◽  
Lihua Huang

Recently, support vector machines, a supervised learning algorithm, have been widely used in the scope of credit risk management. However, noise may increase the complexity of the algorithm building and destroy the performance of classifier. In our work, we propose an ensemble support vector machine model to solve the risk assessment of supply chain finance, combined with reducing noises method. The main characteristics of this approach include that (1) a novel noise filtering scheme that avoids the noisy examples based on fuzzy clustering and principal component analysis algorithm is proposed to remove both attribute noise and class noise to achieve an optimal clean set, and (2) support vector machine classifiers, based on the improved particle swarm optimization algorithm, are seen as component classifiers. Then, we obtained the final classification results by combining finally individual prediction through AdaBoosting algorithm on the new sample set. Some experiments are applied on supply chain financial analysis of China’s listed companies. Results indicate that the credit assessment accuracy can be increased by applying this approach.


2017 ◽  
Vol 14 (S339) ◽  
pp. 345-348
Author(s):  
H. Yuan ◽  
Y. Zhang ◽  
Y. Lei ◽  
Y. Dong ◽  
Z. Bai ◽  
...  

AbstractWith so many spectroscopic surveys, both past and upcoming, such as SDSS and LAMOST, the number of accessible stellar spectra is continuously increasing. There is therefore a great need for automated procedures that will derive estimates of stellar parameters. Working with spectra from SDSS and LAMOST, we put forward a hybrid approach of Kernel Principal Component Analysis (KPCA) and Support Vector Machine (SVM) to determine the stellar atmospheric parameters effective temperature, surface gravity and metallicity. For stars with both APOGEE and LAMOST spectra, we adopt the LAMOST spectra and APOGEE parameters, and then use KPCA to reduce dimensionality and SVM to measure parameters. Our method provides reliable and precise results; for example, the standard deviation of effective temperature, surface gravity and metallicity for the test sample come to approximately 47–75 K, 0.11–0.15 dex and 0.06–0.075 dex, respectively. The impact of the signal:noise ratio of the observations upon the accuracy of the results is also investigated.


2018 ◽  
Vol 171 ◽  
pp. 1577-1592 ◽  
Author(s):  
Han Li ◽  
Shijun You ◽  
Huan Zhang ◽  
Wandong Zheng ◽  
Wai-ling Lee ◽  
...  

Energies ◽  
2018 ◽  
Vol 11 (12) ◽  
pp. 3408 ◽  
Author(s):  
Muhammad Ahmad ◽  
Anthony Mouraud ◽  
Yacine Rezgui ◽  
Monjur Mourshed

Predictive analytics play a significant role in ensuring optimal and secure operation of power systems, reducing energy consumption, detecting fault and diagnosis, and improving grid resilience. However, due to system nonlinearities, delay, and complexity of the problem because of many influencing factors (e.g., climate, occupants’ behaviour, occupancy pattern, building type), it is a challenging task to get accurate energy consumption prediction. This paper investigates the accuracy and generalisation capabilities of deep highway networks (DHN) and extremely randomized trees (ET) for predicting hourly heating, ventilation and air conditioning (HVAC) energy consumption of a hotel building. Their performance was compared with support vector regression (SVR), a most widely used supervised machine learning algorithm. Results showed that both ET and DHN models marginally outperform the SVR algorithm. The paper also details the impact of increasing the deep highway network’s complexity on its performance. The paper concludes that all developed models are equally applicable for predicting hourly HVAC energy consumption. Possible reasons for the minimum impact of DHN complexity and future research work are also highlighted in the paper.


2008 ◽  
Vol 21 (17) ◽  
pp. 4384-4398 ◽  
Author(s):  
Michael K. Tippett ◽  
Timothy DelSole ◽  
Simon J. Mason ◽  
Anthony G. Barnston

Abstract There are a variety of multivariate statistical methods for analyzing the relations between two datasets. Two commonly used methods are canonical correlation analysis (CCA) and maximum covariance analysis (MCA), which find the projections of the data onto coupled patterns with maximum correlation and covariance, respectively. These projections are often used in linear prediction models. Redundancy analysis and principal predictor analysis construct projections that maximize the explained variance and the sum of squared correlations of regression models. This paper shows that the above pattern methods are equivalent to different diagonalizations of the regression between the two datasets. The different diagonalizations are computed using the singular value decomposition of the regression matrix developed using data that are suitably transformed for each method. This common framework for the pattern methods permits easy comparison of their properties. Principal component regression is shown to be a special case of CCA-based regression. A commonly used linear prediction model constructed from MCA patterns does not give a least squares estimate since correlations among MCA predictors are neglected. A variation, denoted least squares estimate (LSE)-MCA, is suggested that uses the same patterns but minimizes squared error. Since the different pattern methods correspond to diagonalizations of the same regression matrix, they all produce the same regression model when a complete set of patterns is used. Different prediction models are obtained when an incomplete set of patterns is used, with each method optimizing different properties of the regression. Some key points are illustrated in two idealized examples, and the methods are applied to statistical downscaling of rainfall over the northeast of Brazil.


2018 ◽  
Vol 27 (06) ◽  
pp. 1850088 ◽  
Author(s):  
Jing Hua ◽  
Hua Zhang ◽  
Jizhong Liu ◽  
Yilu Xu ◽  
Fumin Guo

Due to the capacity of processing signal with low energy consumption, compressive sensing (CS) has been widely used in wearable health monitoring system for arrhythmia classification of electrocardiogram (ECG) signals. However, most existing works focus on compressive sensing reconstruction, in other words, the ECG signals must be reconstructed before use. Hence, these methods have high computational complexity. In this paper, the authors propose a cardiac arrhythmia classification scheme that performs classification task directly in the compressed domain, skipping the reconstruction stage. The proposed scheme first employs the Pan–Tompkins algorithm to preprocess the ECG signals, including denoising and QRS detection, and then compresses the ECG signals by CS to obtain the compressive measurements. The features are extracted directly from these measurements based on principal component analysis (PCA), and are used to classify the ECG signals into different types by the proposed semi-supervised learning algorithm based on support vector machine (SVM). Extensive simulations have been performed to validate the effectiveness of the proposed scheme. Experimental results have shown that the proposed scheme achieves an average accuracy of [Formula: see text] at a sensing rate of 0.7, compared to an accuracy of [Formula: see text] for noncompressive ECG data.


2013 ◽  
Vol 67 (4) ◽  
pp. 817-823 ◽  
Author(s):  
Li Jing ◽  
Li Fadong ◽  
Liu Qiang ◽  
Song Shuai ◽  
Zhao Guangshuai

For this study, 34 water samples were collected along the Wei River and its tributaries. Multivariate statistical analyses were employed to interpret the environmental data and to identify the natural and anthropogenic trace metal inputs to the surface waters of the river. Our results revealed that Zn, Se, B, Ba, Fe, Mn, Mo, Ni and V were all detected in the Wei River. Compared to drinking water guidelines, the primary trace metal pollution components (B, Ni, Zn and Mn) exceeded drinking water standard levels by 47.1, 50.0, 44.1 and 26.5%, respectively. Inter-element relationships and landscape features of trace metals conducted by hierarchical cluster analysis (HCA) identified a uniform source of trace metals for all sampling sites, excluding one site that exhibited anomalous concentrations. Based on the patterns of relative loadings of individual metals calculated by principal component analysis (PCA), the primary trace metal sources were associated with natural/geogenic contributions, agro-chemical processes and discharge from local industrial sources. These results demonstrated the impact of human activities on metal concentrations in the Wei River.


Sign in / Sign up

Export Citation Format

Share Document