Enhanced Dimensionality Reduction Methods for Classifying Malaria Vector Dataset using Decision Tree

RNA-Seq data are utilized for biological applications and decision making for classification of genes. Lots of work in recent time are focused on reducing the dimension of RNA-Seq data. Dimensionality reduction approaches have been proposed in fetching relevant information in a given data. In this study, a novel optimized dimensionality reduction algorithm is proposed, by combining an optimized genetic algorithm with Principal Component Analysis and Independent Component Analysis (GA-O-PCA and GAO-ICA), which are used to identify an optimum subset and latent correlated features, respectively. The classifier uses Decision tree on the reduced mosquito anopheles gambiae dataset to enhance the accuracy and scalability in the gene expression analysis. The proposed algorithm is used to fetch relevant features based from the high-dimensional input feature space. A feature ranking and earlier experience are used. The performances of the model are evaluated and validated using the classification accuracy to compare existing approaches in the literature. The achieved experimental results prove to be promising for feature selection and classification in gene expression data analysis and specify that the approach is a capable accumulation to prevailing data mining techniques.

Download Full-text

Optimized Hybrid Heuristic Based Dimensionality Reduction Methods for Malaria Vector Using KNN Classifier

10.21203/rs.3.rs-107396/v1 ◽

2020 ◽

Author(s):

Micheal Olaolu Arowolo ◽

Marion Olubunmi Adebiyi ◽

Ayodele Ariyo Adebiyi ◽

Oludayo Olugbara

Keyword(s):

Gene Expression ◽

Dimensionality Reduction ◽

Principal Component ◽

Feature Space ◽

Component Analysis ◽

Rna Seq ◽

Knn Classifier ◽

Data Dimensionality Reduction ◽

Reduction Methods ◽

Mosquito Anopheles Gambiae

Abstract RNA-Seq data are utilized for biological applications and decision making for the classification of genes. A lot of works in recent time are focused on reducing the dimension of RNA-Seq data. Dimensionality reduction approaches have been proposed in the transformation of these data. In this study, a novel optimized hybrid investigative approach is proposed. It combines an optimized genetic algorithm with Principal Component Analysis and Independent Component Analysis (GA-O-PCA and GAO-ICA), which are used to identify an optimum subset and latent correlated features, respectively. The classifier uses KNN on the reduced mosquito Anopheles gambiae dataset, to enhance the accuracy and scalability in the gene expression analysis. The proposed algorithm is used to fetch relevant features based on the high-dimensional input feature space. A fast algorithm for feature ranking is used to select relevant features. The performances of the model are evaluated and validated using the classification accuracy to compare existing approaches in the literature. The achieved experimental results prove to be promising for selecting relevant genes and classifying pertinent gene expression data analysis by indicating that the approach is a capable addition to prevailing machine learning methods.

Download Full-text

Optimized hybrid investigative based dimensionality reduction methods for malaria vector using KNN classifier

Journal Of Big Data ◽

10.1186/s40537-021-00415-z ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Micheal Olaolu Arowolo ◽

Marion Olubunmi Adebiyi ◽

Ayodele Ariyo Adebiyi ◽

Oludayo Olugbara

Keyword(s):

Gene Expression ◽

Dimensionality Reduction ◽

Principal Component ◽

Feature Space ◽

Component Analysis ◽

Rna Seq ◽

Knn Classifier ◽

Data Dimensionality Reduction ◽

Reduction Methods ◽

Mosquito Anopheles Gambiae

AbstractRNA-Seq data are utilized for biological applications and decision making for the classification of genes. A lot of works in recent time are focused on reducing the dimension of RNA-Seq data. Dimensionality reduction approaches have been proposed in the transformation of these data. In this study, a novel optimized hybrid investigative approach is proposed. It combines an optimized genetic algorithm with Principal Component Analysis and Independent Component Analysis (GA-O-PCA and GAO-ICA), which are used to identify an optimum subset and latent correlated features, respectively. The classifier uses KNN on the reduced mosquito Anopheles gambiae dataset, to enhance the accuracy and scalability in the gene expression analysis. The proposed algorithm is used to fetch relevant features based on the high-dimensional input feature space. A fast algorithm for feature ranking is used to select relevant features. The performances of the model are evaluated and validated using the classification accuracy to compare existing approaches in the literature. The achieved experimental results prove to be promising for selecting relevant genes and classifying pertinent gene expression data analysis by indicating that the approach is capable of adding to prevailing machine learning methods.

Download Full-text

The Connections between Principal Component Analysis and Dimensionality Reduction Methods of Manifolds

Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-642-25944-9_83 ◽

2012 ◽

pp. 638-643

Author(s):

Bo Li ◽

Jin Liu

Keyword(s):

Principal Component Analysis ◽

Dimensionality Reduction ◽

Principal Component ◽

Component Analysis ◽

Reduction Methods

Download Full-text

Tuning parameters of dimensionality reduction methods for single-cell RNA-seq analysis

10.1101/2020.04.27.064816 ◽

2020 ◽

Author(s):

Felix Raimundo ◽

Celine Vallot ◽

Jean Philippe Vert

Keyword(s):

Dimensionality Reduction ◽

Single Cell ◽

Biological Diversity ◽

Unmet Need ◽

Principal Component ◽

Parameter Tuning ◽

Differential Analysis ◽

Rna Seq ◽

Reduction Methods ◽

Complex Models

AbstractBackgroundMany computational methods have been developed recently to analyze single-cell RNA-seq (scRNA-seq) data. Several benchmark studies have compared these methods on their ability for dimensionality reduction, clustering or differential analysis, often relying on default parameters. Yet given the biological diversity of scRNA-seq datasets, parameter tuning might be essential for the optimal usage of methods, and determining how to tune parameters remains an unmet need.ResultsHere, we propose a benchmark to assess the performance of five methods, systematically varying their tunable parameters, for dimension reduction of scRNA-seq data, a common first step to many downstream applications such as cell type identification or trajectory inference. We run a total of 1.5 million experiments to assess the influence of parameter changes on the performance of each method, and propose two strategies to automatically tune parameters for methods that need it.ConclusionsWe find that principal component analysis (PCA)-based methods like scran and Seurat are competitive with default parameters but do not benefit much from parameter tuning, while more complex models like ZinbWave, DCA and scVI can reach better performance but after parameter tuning.

Download Full-text

Kernel principal component analysis for multimedia retrieval

Global Journal of Information Technology Emerging Technologies ◽

10.18844/gjit.v6i1.384 ◽

2016 ◽

Vol 6 (1) ◽

Author(s):

Guang-Ho Cha

Keyword(s):

Principal Component Analysis ◽

Dimensionality Reduction ◽

Principal Components ◽

Principal Component ◽

Feature Space ◽

Component Analysis ◽

Multimedia Retrieval ◽

Kernel Principal Component Analysis ◽

Kernel Pca ◽

Data Set

Principal component analysis (PCA) is an important tool in many areas including data reduction and interpretation, information retrieval, image processing, and so on. Kernel PCA has recently been proposed as a nonlinear extension of the popular PCA. The basic idea is to first map the input space into a feature space via a nonlinear map and then compute the principal components in that feature space. This paper illustrates the potential of kernel PCA for dimensionality reduction and feature extraction in multimedia retrieval. By the use of Gaussian kernels, the principal components were computed in the feature space of an image data set and they are used as new dimensions to approximate image features. Extensive experimental results show that kernel PCA performs better than linear PCA with respect to the retrieval quality as well as the retrieval precision in content-based image retrievals.Keywords: Principal component analysis, kernel principal component analysis, multimedia retrieval, dimensionality reduction, image retrieval

Download Full-text

Principal Component Analysis for Dimensionality Reduction for Animal Classification based on LR

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j8805.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1118-1123 ◽

Cited By ~ 1

Keyword(s):

Principal Component Analysis ◽

Dimensionality Reduction ◽

Computation Time ◽

Principal Component ◽

Feature Space ◽

Detailed Comparison ◽

Component Analysis ◽

Machine Learning Algorithms ◽

Data Generation ◽

Reduction Techniques

Nowadays, data generation is huge in nature and there is a need for analysis, visualization and prediction. Data scientists find many difficulties in processing the data at once due to its massive nature, unstructured or raw. Thus, feature extraction plays a vital role in many applications of machine learning algorithms. The process of decreasing the dimensions of the feature space by considering the prime features is defined as the dimensionality reduction. It is understood that with the dimensionality reduction techniques, redundancy could be removed and the computation time is decreased. This work gives a detailed comparison of the existing dimension reduction techniques and in addition, the importance of Principal Component Analysis is also investigated by implementing on the animal classification. In the present work, as the first phase the important features are extracted and then the logistic regression (LR) is implemented to classify the animals.

Download Full-text

Predicting RNA-seq data using genetic algorithm and ensemble classification algorithms

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v21.i2.pp1073-1081 ◽

2021 ◽

Vol 21 (2) ◽

pp. 1073

Author(s):

Micheal Olaolu Arowolo ◽

Marion O. Adebiyi ◽

Ayodele A. Adebiyi ◽

Olatunji J. Okesola

Keyword(s):

Gene Expression ◽

Genetic Algorithm ◽

Relevant Information ◽

Biological Data ◽

Ensemble Classification ◽

Classification Algorithms ◽

Learning Approaches ◽

Rna Seq ◽

Dimensionality Reduction Technique ◽

Mosquito Anopheles Gambiae

<p>Malaria parasites accept uncertain, inconsistent life span breeding through vectors of mosquitoes stratospheres. Thousands of different transcriptome parasites exist. A prevalent ribonucleic acid sequencing (RNA-seq) technique for gene expression has brought about enhanced identifications of genetical queries. Computation of RNA-seq gene expression data transcripts requires enhancements using analytical machine learning procedures. Numerous learning approaches have been adopted for analyzing and enhancing the performance of biological data and machines. In this study, a genetic algorithm dimensionality reduction technique is proposed to fetch relevant information from a huge dimensional RNA-seq dataset, and classification uses Ensemble classification algorithms. The experiment is performed using a mosquito Anopheles gambiae dataset with a classification accuracy of 81.7% and 88.3%.</p>

Download Full-text

A model for spectroscopic food sample analysis using data sonification

International Journal of Speech Technology ◽

10.1007/s10772-020-09794-9 ◽

2021 ◽

Author(s):

Hsein Kew

Keyword(s):

Dimensionality Reduction ◽

Classification Accuracy ◽

Reduction Method ◽

Real Life ◽

Principal Component ◽

Relevant Information ◽

Analysis Model ◽

Linear Discriminant ◽

Audio Output ◽

Dimensionality Reduction Method

AbstractIn this paper, we propose a method to generate an audio output based on spectroscopy data in order to discriminate two classes of data, based on the features of our spectral dataset. To do this, we first perform spectral pre-processing, and then extract features, followed by machine learning, for dimensionality reduction. The features are then mapped to the parameters of a sound synthesiser, as part of the audio processing, so as to generate audio samples in order to compute statistical results and identify important descriptors for the classification of the dataset. To optimise the process, we compare Amplitude Modulation (AM) and Frequency Modulation (FM) synthesis, as applied to two real-life datasets to evaluate the performance of sonification as a method for discriminating data. FM synthesis provides a higher subjective classification accuracy as compared with to AM synthesis. We then further compare the dimensionality reduction method of Principal Component Analysis (PCA) and Linear Discriminant Analysis in order to optimise our sonification algorithm. The results of classification accuracy using FM synthesis as the sound synthesiser and PCA as the dimensionality reduction method yields a mean classification accuracies of 93.81% and 88.57% for the coffee dataset and the fruit puree dataset respectively, and indicate that this spectroscopic analysis model is able to provide relevant information on the spectral data, and most importantly, is able to discriminate accurately between the two spectra and thus provides a complementary tool to supplement current methods.

Download Full-text

Gene expression profiling with principal component analysis depicts the biological continuum from essential thrombocythemia over polycythemia vera to myelofibrosis

Experimental Hematology ◽

10.1016/j.exphem.2012.05.011 ◽

2012 ◽

Vol 40 (9) ◽

pp. 771-780.e19 ◽

Cited By ~ 40

Author(s):

Vibe Skov ◽

Mads Thomassen ◽

Caroline H. Riley ◽

Morten K. Jensen ◽

Ole Weis Bjerrum ◽

...

Keyword(s):

Gene Expression ◽

Principal Component Analysis ◽

Gene Expression Profiling ◽

Polycythemia Vera ◽

Essential Thrombocythemia ◽

Expression Profiling ◽

Principal Component ◽

Component Analysis

Download Full-text

Data Mining in Analysis of Biomechanical Signals

Solid State Phenomena ◽

10.4028/www.scientific.net/ssp.147-149.588 ◽

2009 ◽

Vol 147-149 ◽

pp. 588-593 ◽

Cited By ~ 3

Author(s):

Marcin Derlatka ◽

Jolanta Pauk

Keyword(s):

Data Mining ◽

Principal Component Analysis ◽

Cerebral Palsy ◽

Spina Bifida ◽

Decision Tree ◽

Principal Component ◽

Data Preprocessing ◽

Component Analysis ◽

Kernel Principal Component Analysis

In the paper the procedure of processing biomechanical data has been proposed. It consists of selecting proper noiseless data, preprocessing data by means of model’s identification and Kernel Principal Component Analysis and next classification using decision tree. The obtained results of classification into groups (normal and two selected pathology of gait: Spina Bifida and Cerebral Palsy) were very good.

Download Full-text