Classification of bacterial species from proteomic data using combinatorial approaches incorporating artificial neural networks, cluster analysis and principal components analysis

Σε αυτή τη διατριβή μελετήθηκε η εφαρμογή πολυπαραμετρικών τεχνικών σε μεγάλες βάσεις δεδομένων ταξινόμησης, με σκοπό τη θεωρητική τους παρουσίαση, τη σύγκριση αυτών και την εξαγωγή συμπερασμάτων, σχετικά με το πεδίο εφαρμογής τους και το χειρισμό τους, τις δυνατότητες και τους περιορισμούς τους. Χρησιμοποιήθηκαν μη επιβλεπόμενες τεχνικές όπως Principal Components Analysis/Factor Analysis (PCA/FA) και Cluster Analysis (CA) αλλά και επιβλεπόμενες όπως Discriminant Analysis (DA), Classification Trees (CT) και Artificial Neural Networks (ANN). Ιδιαίτερη έμφαση δόθηκε στις τεχνικές CT και ANN (μελετήθηκαν τρεις μέθοδοι και αρχιτεκτονικές αντίστοιχα για καθεμιά από αυτές). Ερευνήθηκαν τα πλεονεκτήματα, μειονεκτήματα και ιδιαιτερότητες τους και βελτιστοποιήθηκαν τα μοντέλα ταξινόμησης των τεχνικών. Όλες οι τεχνικές συγκρίθηκαν μεταξύ τους, με κριτήριο τα αποτελέσματα τους (της ορθής ταξινόμησης των δειγμάτων) σε τρεις βάσεις δεδομένων οι οποίες αφορούσαν τους προσδιορισμούς α) μετάλλων-μεταλλοειδών στους τρεις ταμιευτήρες που χρησιμοποιούνται για την ύδρευση της πρωτεύουσας (Υλίκη, Μόρνο και Μαραθώνα), β) μετάλλων-μεταλλοειδών και ανόργανων στοιχείων σε θαλάσσια δείγματα ιζημάτων από μεγάλες ιχθυοκαλλιέργειες της χώρας, γ) σπανίων γαιών σε δείγματα ελαιολάδων από διάφορες περιοχές. Η DA αν και είναι παραμετρική τεχνική με πολλούς περιορισμούς στην εφαρμογή της, ανταποκρίθηκε στις ανάγκες των προβλημάτων και παρείχε πάντα μια πρώτη άποψη για το πρόβλημα (δυνατότητα ή όχι γραμμικού διαχωρισμού των ομάδων με βάση το Canonical plot της ανάλυσης και αρχική αξιολόγηση των μεταβλητών). Τα ποσοστά ορθής ταξινόμησης που παρείχε ήταν αρκετές φορές συγκρίσιμα με των πιο προηγμένων τεχνικών. Τα CT με 3 διαφορετικές μεθόδους και αρκετή ευελιξία (παρείχαν πολλές παραμέτρους προς δοκιμή και βελτιστοποίηση), επέτυχαν υψηλά ποσοστά ταξινόμησης με λίγες ή πολλές μεταβλητές (περισσότερες συνήθως των ANN), κατασκευάζοντας επαναλήψιμα μοντέλα με δυνατότητες γενίκευσης. Τα ANN αποδείχθηκαν ιδιαίτερα ευέλικτη τεχνική, με δυνατότητες αποτελεσματικής αξιολόγησης των μεταβλητών και εφαρμογής τους σε απλές αλλά και πολυπλοκότερες βάσεις προσεγγίζοντας γραμμικές και μη γραμμικές συναρτήσεις. Κατασκευάστηκαν ανθεκτικά και ευέλικτα μοντέλα. Μειονέκτημά τους αποτέλεσαν ωστόσο, τα φαινόμενα υπερπροσαρμογής που παρουσιάζουν και χρειάστηκαν προσεκτικοί χειρισμοί για την αποφυγή τους. Έτσι, τα διαθέσιμα δείγματα διαχωρίστηκαν σε τρεις ομάδες: χρησιμοποιήθηκαν εκτός της συνήθους ομάδας εκπαίδευσης, επιπλέον ομάδες επικύρωσης και ελέγχου. Με τον τρόπο αυτό, έγινε άμεση ταυτοποίηση των φαινομένων υπερπροσαρμογής (ώστε να διακόπτεται αυτόματα η εκπαίδευση του μοντέλου), αλλά και δοκιμή των μοντέλων σε νέα, “’άγνωστα” δείγματα, ώστε να ελέγχεται η δυνατότητα γενίκευσης αυτών. Ο διαχωρισμός σε ομάδες έγινε είτε τυχαία (όπως επιτάσσει η σύγχρονη βιβλιογραφία), είτε με βάση της προκατεργασίας με DA (μέθοδος που δεν έχει χρησιμοποιηθεί ποτέ στο παρελθόν). Επιπλέον, έγινε προσπάθεια εφαρμογής όσο το δυνατόν απλούστερων δομών με λίγες παραμέτρους (μεταβλητές, βάρη) αλλά και λειτουργικές μονάδες επεξεργασίας (νευρώνες).

Download Full-text

REDUCTION OF INPUT VARIABLES IN ARTIFICIAL NEURAL NETWORKS AS FROM PRINCIPAL COMPONENTS ANALYSIS DATA IN THE MODELING OF DISSOLVED OXYGEN

Química Nova ◽

10.5935/0100-4042.20160024 ◽

2016 ◽

Author(s):

Saulo Rodrigues e Silva ◽

Fernando Schimidt

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Dissolved Oxygen ◽

Principal Components Analysis ◽

Principal Components ◽

Analysis Data ◽

Components Analysis ◽

Artificial Neural ◽

Input Variables

Download Full-text

Artificial Neural Networks and Principal Components Analysis for Detection of Idiopathic Pulmonary Fibrosis in Microscopy Images

Engineering Applications of Neural Networks - Communications in Computer and Information Science ◽

10.1007/978-3-642-41013-0_30 ◽

2013 ◽

pp. 292-301

Author(s):

Spiros V. Georgakopoulos ◽

Sotiris K. Tasoulis ◽

Vassilis P. Plagianakos ◽

Ilias Maglogiannis

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Idiopathic Pulmonary Fibrosis ◽

Pulmonary Fibrosis ◽

Principal Components Analysis ◽

Principal Components ◽

Components Analysis ◽

Artificial Neural ◽

Microscopy Images

Download Full-text

Identifying Apple Surface Defects Using Principal Components Analysis and Artificial Neural Networks

Transactions of the ASABE ◽

10.13031/2013.24078 ◽

2007 ◽

Vol 50 (6) ◽

pp. 2257-2265 ◽

Cited By ~ 8

Author(s):

B. S. Bennedsen D. L. Peterson ◽

A. Tabb

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Principal Components Analysis ◽

Principal Components ◽

Surface Defects ◽

Components Analysis ◽

Artificial Neural

Download Full-text

Management of Wireless Local Area Networks by Artificial Neural Networks with Principal Components Analysis

2009 First Asian Conference on Intelligent Information and Database Systems ◽

10.1109/aciids.2009.56 ◽

2009 ◽

Author(s):

Ping-Feng Pai ◽

Ying-Chieh Chang ◽

Yu-Pin Hu

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Principal Components Analysis ◽

Principal Components ◽

Local Area Networks ◽

Local Area ◽

Wireless Local Area Networks ◽

Components Analysis ◽

Artificial Neural

Download Full-text

Determination of compound aminopyrine phenacetin tablets by using artificial neural networks combined with principal components analysis

Analytical Biochemistry ◽

10.1016/j.ab.2005.10.041 ◽

2006 ◽

Vol 351 (2) ◽

pp. 174-180 ◽

Cited By ~ 17

Author(s):

Ying Dou ◽

Hong Mi ◽

Lingzhi Zhao ◽

Yuqiu Ren ◽

Yulin Ren

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Principal Components Analysis ◽

Principal Components ◽

Components Analysis ◽

Artificial Neural

Download Full-text

Classification of Fourier Transform Infrared Microscopic Imaging Data of Human Breast Cells by Cluster Analysis and Artificial Neural Networks

Applied Spectroscopy ◽

10.1366/000370203321165151 ◽

2003 ◽

Vol 57 (1) ◽

pp. 14-22 ◽

Cited By ~ 43

Author(s):

Lin Zhang ◽

Gary W. Small ◽

Abigail S. Haka ◽

Linda H. Kidder ◽

E. Neil Lewis

Keyword(s):

Neural Networks ◽

Cluster Analysis ◽

Fourier Transform ◽

Artificial Neural Networks ◽

Fourier Transform Infrared ◽

Carcinoma Cells ◽

Human Breast ◽

Microscopic Imaging ◽

Artificial Neural

Cluster analysis and artificial neural networks (ANNs) are applied to the automated assessment of disease state in Fourier transform infrared microscopic imaging measurements of normal and carcinomatous immortalized human breast cell lines. K-means clustering is used to implement an automated algorithm for the assignment of pixels in the image to cell and non-cell categories. Cell pixels are subsequently classified into carcinoma and normal categories through the use of a feed-forward ANN computed with the Broyden–Fletcher–Goldfarb–Shanno training algorithm. Inputs to the ANN consist of principal component scores computed from Fourier filtered absorbance data. A grid search optimization procedure is used to identify the optimal network architecture and filter frequency response. Data from three images corresponding to normal cells, carcinoma cells, and a mixture of normal and carcinoma cells are used to build and test the classification methodology. A successful classifier is developed through this work, although differences in the spectral backgrounds between the three images are observed to complicate the classification problem. The robustness of the final classifier is improved through the use of a rejection threshold procedure to prevent classification of outlying pixels.

Download Full-text

Classification of Indian meteorological stations using cluster and fuzzy cluster analysis, and Kohonen artificial neural networks

Hydrology Research ◽

10.2166/nh.2007.013 ◽

2007 ◽

Vol 38 (3) ◽

pp. 303-314 ◽

Cited By ~ 13

Author(s):

K. Srinivasa Raju ◽

D. Nagesh Kumar

Keyword(s):

Neural Networks ◽

Cluster Analysis ◽

Artificial Neural Networks ◽

Optimal Number ◽

Fuzzy Cluster ◽

Fuzzy Cluster Analysis ◽

Homogeneous Groups ◽

Index Approach ◽

Artificial Neural

The present study deals with the application of cluster analysis, Fuzzy Cluster Analysis (FCA) and Kohonen Artificial Neural Networks (KANN) methods for classification of 159 meteorological stations in India into meteorologically homogeneous groups. Eight parameters, namely latitude, longitude, elevation, average temperature, humidity, wind speed, sunshine hours and solar radiation, are considered as the classification criteria for grouping. The optimal number of groups is determined as 14 based on the Davies–Bouldin index approach. It is observed that the FCA approach performed better than the other two methodologies for the present study.

Download Full-text