TERMINATION CRITERION FOR PCA WITH ANN FOR DETECTION OF NS1 FROM ADULTERATED SALIVA

2016 ◽  
Vol 78 (6-8) ◽  
Author(s):  
N. H. Othman ◽  
Khuan Y. Lee ◽  
A. R. M. Radzol ◽  
Wahidah Mansor

Detection of Non-structural Protein 1 (NS1) in saliva has become appealing as it may lead to a non-invasive detection method for NS1-related diseases at the febrile phase, before complication developed. NS1 is found to have its unique molecular fingerprint from Surface Enhanced Raman Spectroscopy (SERS) technique. Our work here intends to investigate the effect of termination criterion of Principal Component Analysis (PCA) on the classification performance by the different Artificial Neural Network (ANN) learning algorithms. This will help in optimizing the automated classification of NS1 adulterated saliva, and hence detection of NS1-related diseases. Raman spectra of saliva (n=64) and saliva mixed with NS1 (n=64) are acquired using SERS obtained from the UiTM-NMRR 12868-NS1-DENV database. Large input data dimension of the raw [128 x 1801] are reduced according to the respective PCA termination criteria: Scree test [128 x 5], Cumulative Percent of Total Variance (CPV) [128 x 70] and Eigenvalues One Criterion (EOC) [128 x 115]. The reduced data dimensions are used as inputs to ANN algorithms. Performance of these algorithms, in term of [accuracy, sensitivity, specificity, and precision] from Levenbergh Marquardt (LM), Scale Conjugate Gradient (SCG), Resilient Backpropagation (RPROP) and One Step Secant (OSS) are investigated. The best performance [100%, 100%, 100%, 100%] are achieved from the integration of Scree test criterion and SCG learning algorithm, the highest expected of adulterated data.

Sensors ◽  
2021 ◽  
Vol 21 (16) ◽  
pp. 5519
Author(s):  
Kenneth E. Schackart ◽  
Jeong-Yeol Yoon

Since their inception, biosensors have frequently employed simple regression models to calculate analyte composition based on the biosensor’s signal magnitude. Traditionally, bioreceptors provide excellent sensitivity and specificity to the biosensor. Increasingly, however, bioreceptor-free biosensors have been developed for a wide range of applications. Without a bioreceptor, maintaining strong specificity and a low limit of detection have become the major challenge. Machine learning (ML) has been introduced to improve the performance of these biosensors, effectively replacing the bioreceptor with modeling to gain specificity. Here, we present how ML has been used to enhance the performance of these bioreceptor-free biosensors. Particularly, we discuss how ML has been used for imaging, Enose and Etongue, and surface-enhanced Raman spectroscopy (SERS) biosensors. Notably, principal component analysis (PCA) combined with support vector machine (SVM) and various artificial neural network (ANN) algorithms have shown outstanding performance in a variety of tasks. We anticipate that ML will continue to improve the performance of bioreceptor-free biosensors, especially with the prospects of sharing trained models and cloud computing for mobile computation. To facilitate this, the biosensing community would benefit from increased contributions to open-access data repositories for biosensor data.


1999 ◽  
Vol 183 ◽  
pp. 154-154
Author(s):  
S.R. Folkes ◽  
O. Lahav ◽  
S.J. Maddox

We present a method for automated classification of galaxies with low signal-to-noise (S/N) spectra typical of redshift surveys. We develop spectral simulations based on the parameters for the 2dF Galaxy Redshift Survey and investigate the technique of Principal Component Analysis when applied to spectra of low S/N. It is found that the projection onto the first 8 Principal Components hold most of the real spectral information, with later projections only adding noise. Using these components as input, we train an Artificial Neural Network (ANN) to classify the noisy simulated spectra into morphological classes. We find that more than 90% of our sample of normal galaxies are correctly classified into one of five morphological classes for simulations at bJ=19.7.


Energies ◽  
2021 ◽  
Vol 14 (7) ◽  
pp. 1809
Author(s):  
Mohammed El Amine Senoussaoui ◽  
Mostefa Brahami ◽  
Issouf Fofana

Machine learning is widely used as a panacea in many engineering applications including the condition assessment of power transformers. Most statistics attribute the main cause of transformer failure to insulation degradation. Thus, a new, simple, and effective machine-learning approach was proposed to monitor the condition of transformer oils based on some aging indicators. The proposed approach was used to compare the performance of two machine-learning classifiers: J48 decision tree and random forest. The service-aged transformer oils were classified into four groups: the oils that can be maintained in service, the oils that should be reconditioned or filtered, the oils that should be reclaimed, and the oils that must be discarded. From the two algorithms, random forest exhibited a better performance and high accuracy with only a small amount of data. Good performance was achieved through not only the application of the proposed algorithm but also the approach of data preprocessing. Before feeding the classification model, the available data were transformed using the simple k-means method. Subsequently, the obtained data were filtered through correlation-based feature selection (CFsSubset). The resulting features were again retransformed by conducting the principal component analysis and were passed through the CFsSubset filter. The transformation and filtration of the data improved the classification performance of the adopted algorithms, especially random forest. Another advantage of the proposed method is the decrease in the number of the datasets required for the condition assessment of transformer oils, which is valuable for transformer condition monitoring.


2021 ◽  
pp. 000370282110329
Author(s):  
Ling Wang ◽  
Mario O. Vendrell-Dones ◽  
Chiara Deriu ◽  
Sevde Doğruer ◽  
Peter de B. Harrington ◽  
...  

Recently there has been upsurge in reports that illicit seizures of cocaine and heroin have been adulterated with fentanyl. Surface-enhanced Raman spectroscopy (SERS) provides a useful alternative to current screening procedures that permits detection of trace levels of fentanyl in mixtures. Samples are solubilized and allowed to interact with aggregated colloidal nanostars to produce a rapid and sensitive assay. In this study, we present the quantitative determination of fentanyl in heroin and cocaine using SERS, using a point-and-shoot handheld Raman system. Our protocol is optimized to detect pure fentanyl down to 0.20 ± 0.06 ng/mL and can also distinguish pure cocaine and heroin at ng/mL levels. Multiplex analysis of mixtures is enabled by combining SERS detection with principal component analysis and super partial least squares regression discriminate analysis (SPLS-DA), which allow for the determination of fentanyl as low as 0.05% in simulated seized heroin and 0.10% in simulated seized cocaine samples.


2021 ◽  
Vol 13 (3) ◽  
pp. 526
Author(s):  
Shengliang Pu ◽  
Yuanfeng Wu ◽  
Xu Sun ◽  
Xiaotong Sun

The nascent graph representation learning has shown superiority for resolving graph data. Compared to conventional convolutional neural networks, graph-based deep learning has the advantages of illustrating class boundaries and modeling feature relationships. Faced with hyperspectral image (HSI) classification, the priority problem might be how to convert hyperspectral data into irregular domains from regular grids. In this regard, we present a novel method that performs the localized graph convolutional filtering on HSIs based on spectral graph theory. First, we conducted principal component analysis (PCA) preprocessing to create localized hyperspectral data cubes with unsupervised feature reduction. These feature cubes combined with localized adjacent matrices were fed into the popular graph convolution network in a standard supervised learning paradigm. Finally, we succeeded in analyzing diversified land covers by considering local graph structure with graph convolutional filtering. Experiments on real hyperspectral datasets demonstrated that the presented method offers promising classification performance compared with other popular competitors.


2021 ◽  
Vol 21 (S2) ◽  
Author(s):  
Kun Zeng ◽  
Yibin Xu ◽  
Ge Lin ◽  
Likeng Liang ◽  
Tianyong Hao

Abstract Background Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. Methods An ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories. Results Our ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement. Conclusions A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance.


Cancers ◽  
2021 ◽  
Vol 13 (6) ◽  
pp. 1407
Author(s):  
Matyas Bukva ◽  
Gabriella Dobra ◽  
Juan Gomez-Perez ◽  
Krisztian Koos ◽  
Maria Harmati ◽  
...  

Investigating the molecular composition of small extracellular vesicles (sEVs) for tumor diagnostic purposes is becoming increasingly popular, especially for diseases for which diagnosis is challenging, such as central nervous system (CNS) malignancies. Thorough examination of the molecular content of sEVs by Raman spectroscopy is a promising but hitherto barely explored approach for these tumor types. We attempt to reveal the potential role of serum-derived sEVs in diagnosing CNS tumors through Raman spectroscopic analyses using a relevant number of clinical samples. A total of 138 serum samples were obtained from four patient groups (glioblastoma multiforme, non-small-cell lung cancer brain metastasis, meningioma and lumbar disc herniation as control). After isolation, characterization and Raman spectroscopic assessment of sEVs, the Principal Component Analysis–Support Vector Machine (PCA–SVM) algorithm was performed on the Raman spectra for pairwise classifications. Classification accuracy (CA), sensitivity, specificity and the Area Under the Curve (AUC) value derived from Receiver Operating Characteristic (ROC) analyses were used to evaluate the performance of classification. The groups compared were distinguishable with 82.9–92.5% CA, 80–95% sensitivity and 80–90% specificity. AUC scores in the range of 0.82–0.9 suggest excellent and outstanding classification performance. Our results support that Raman spectroscopic analysis of sEV-enriched isolates from serum is a promising method that could be further developed in order to be applicable in the diagnosis of CNS tumors.


2021 ◽  
Vol 11 (13) ◽  
pp. 5895
Author(s):  
Kristina Serec ◽  
Sanja Dolanski Babić

The double-stranded B-form and A-form have long been considered the two most important native forms of DNA, each with its own distinct biological roles and hence the focus of many areas of study, from cellular functions to cancer diagnostics and drug treatment. Due to the heterogeneity and sensitivity of the secondary structure of DNA, there is a need for tools capable of a rapid and reliable quantification of DNA conformation in diverse environments. In this work, the second paper in the series that addresses conformational transitions in DNA thin films utilizing FTIR spectroscopy, we exploit popular chemometric methods: the principal component analysis (PCA), support vector machine (SVM) learning algorithm, and principal component regression (PCR), in order to quantify and categorize DNA conformation in thin films of different hydrated states. By complementing FTIR technique with multivariate statistical methods, we demonstrate the ability of our sample preparation and automated spectral analysis protocol to rapidly and efficiently determine conformation in DNA thin films based on the vibrational signatures in the 1800–935 cm−1 range. Furthermore, we assess the impact of small hydration-related changes in FTIR spectra on automated DNA conformation detection and how to avoid discrepancies by careful sampling.


2019 ◽  
Vol 73 (5) ◽  
pp. 565-573 ◽  
Author(s):  
Yun Zhao ◽  
Mahamed Lamine Guindo ◽  
Xing Xu ◽  
Miao Sun ◽  
Jiyu Peng ◽  
...  

In this study, a method based on laser-induced breakdown spectroscopy (LIBS) was developed to detect soil contaminated with Pb. Different levels of Pb were added to soil samples in which tobacco was planted over a period of two to four weeks. Principal component analysis and deep learning with a deep belief network (DBN) were implemented to classify the LIBS data. The robustness of the method was verified through a comparison with the results of a support vector machine and partial least squares discriminant analysis. A confusion matrix of the different algorithms shows that the DBN achieved satisfactory classification performance on all samples of contaminated soil. In terms of classification, the proposed method performed better on samples contaminated for four weeks than on those contaminated for two weeks. The results show that LIBS can be used with deep learning for the detection of heavy metals in soil.


Algorithms ◽  
2021 ◽  
Vol 14 (1) ◽  
pp. 18
Author(s):  
Michael Li ◽  
Santoso Wibowo ◽  
Wei Li ◽  
Lily D. Li

Extreme learning machine (ELM) is a popular randomization-based learning algorithm that provides a fast solution for many regression and classification problems. In this article, we present a method based on ELM for solving the spectral data analysis problem, which essentially is a class of inverse problems. It requires determining the structural parameters of a physical sample from the given spectroscopic curves. We proposed that the unknown target inverse function is approximated by an ELM through adding a linear neuron to correct the localized effect aroused by Gaussian basis functions. Unlike the conventional methods involving intensive numerical computations, under the new conceptual framework, the task of performing spectral data analysis becomes a learning task from data. As spectral data are typical high-dimensional data, the dimensionality reduction technique of principal component analysis (PCA) is applied to reduce the dimension of the dataset to ensure convergence. The proposed conceptual framework is illustrated using a set of simulated Rutherford backscattering spectra. The results have shown the proposed method can achieve prediction inaccuracies of less than 1%, which outperform the predictions from the multi-layer perceptron and numerical-based techniques. The presented method could be implemented as application software for real-time spectral data analysis by integrating it into a spectroscopic data collection system.


Sign in / Sign up

Export Citation Format

Share Document