Highly Accurate Gene Essentiality Prediction with w-nucleotide Z Curve Features and Feature Selection Technique in Saccharomyces Cerevisiae

Background: Many studies have been conducted on essentiality prediction in the Saccharomyces cerevisiae genome, but the accuracy is not as high as those in bacterial or human genomes. The most frequently used features are protein-protein interaction (PPI) networks combined with some other features, such as evolutionary conservation, expression level, and protein domain information. Sequence composition features are used least often. Objective: To improve the accuracy of essentiality prediction in the Saccharomyces cerevisiae genome, we proposed a highly accurate gene essentiality prediction algorithm. Methods: In this paper, we propose an algorithm based on a linear support vector machine (SVM) using sequence features only. The variables in this paper are derived from sequence data based on the w-nucleotide Z curve format without any other information. Results: After feature selection, the best area under the receiver operating characteristic curve (AUC) was 0.944 for 5-fold cross-validation. From 1- to 6-nucleotide Z curve variables, feature extraction can increase the AUC in all cases. Conclusion: Prediction only on sequence composition is promising, particularly when a feature filtering method is used, and maybe a good complement for algorithms based on other features.

Download Full-text

Ensemble swarm behaviour based feature selection and support vector machine classifier for chronic kidney disease prediction

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.31.13438 ◽

2018 ◽

Vol 7 (2.31) ◽

pp. 190 ◽

Cited By ~ 1

Author(s):

S Belina V.J. Sara ◽

K Kalaiselvi

Keyword(s):

Chronic Kidney Disease ◽

Support Vector Machine ◽

Feature Selection ◽

Kidney Disease ◽

Information Gain ◽

Clonal Selection ◽

Prediction Algorithm ◽

Support Vector ◽

Svm Classifier ◽

Classification Algorithms

Kidney Disease and kidney failure is the one of the complicated and challenging health issues regarding human health. Without having any symptoms few diseases are detected in later stages which results in dialysis. Advanced excavating technologies can always give various possibilities to deal with the situation by determining important realations and associations in drilling down health related data. The prediction accuracy of classification algorithms depends upon appropriate Feature Selection (FS) algorithms decrease the number of features from collection of data. FS is the procedure of choosing the most relevant features, removing irrelevant features. To identify the Chronic Kidney Disease (CKD), Hybrid Wrapper and Filter based FS (HWFFS) algorithm is proposed to reduce the dimension of CKD dataset. Filter based FS algorithm is performed based on the three major functions: Information Gain (IG), Correlation Based Feature Selection (CFS) and Consistency Based Subset Evaluation (CS) algorithms respectively. Wrapper based FS algorithm is performed based on the Enhanced Immune Clonal Selection (EICS) algorithm to choose most important features from the CKD dataset. The results from these FS algorithms are combined with new HWFFS algorithm using classification threshold value. Finally Support Vector Machine (SVM) based prediction algorithm be proposed in order to predict CKD and being evaluated on the MATLAB platform. The results demonstrated with the purpose of the SVM classifier by using HWFFS algorithm provides higher prediction rate in the diagnosis of CKD when compared to other classification algorithms.

Download Full-text

Analysis of Sentiment of Moving a National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i3.1942 ◽

2020 ◽

Vol 4 (3) ◽

pp. 504-512

Author(s):

Faried Zamachsari ◽

Gabriel Vangeran Saragih ◽

Susafa'ati ◽

Windu Gata

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Feature Selection ◽

Public Opinion ◽

Naive Bayes ◽

Naïve Bayes ◽

Capital City ◽

Support Vector ◽

National Capital ◽

Bayes Algorithm

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.

Download Full-text

Identification of Chronic Hypersensitivity Pneumonitis Biomarkers with Machine Learning and Differential Co-expression Analysis

Current Gene Therapy ◽

10.2174/1566523220666201208093325 ◽

2020 ◽

Vol 20 ◽

Author(s):

Hongwei Zhang ◽

Steven Wang ◽

Tao Huang

Keyword(s):

Feature Selection ◽

Expression Analysis ◽

Hypersensitivity Pneumonitis ◽

Enrichment Analysis ◽

Functional Enrichment ◽

Great Promise ◽

Support Vector ◽

Svm Classifier ◽

Clinical Tool ◽

Chronic Hypersensitivity Pneumonitis

Aims: We would like to identify the biomarkers for chronic hypersensitivity pneumonitis (CHP) and facilitate the precise gene therapy of CHP. Background: Chronic hypersensitivity pneumonitis (CHP) is an interstitial lung disease caused by hypersensitive reactions to inhaled antigens. Clinically, the tasks of differentiating between CHP and other interstitial lungs diseases, especially idiopathic pulmonary fibrosis (IPF), were challenging. Objective: In this study, we analyzed the public available gene expression profile of 82 CHP patients, 103 IPF patients, and 103 control samples to identify the CHP biomarkers. Method: The CHP biomarkers were selected with advanced feature selection methods: Monte Carlo Feature Selection (MCFS) and Incremental Feature Selection (IFS). A Support Vector Machine (SVM) classifier was built. Then, we analyzed these CHP biomarkers through functional enrichment analysis and differential co-expression analysis. Result: There were 674 identified CHP biomarkers. The co-expression network of these biomarkers in CHP included more negative regulations and the network structure of CHP was quite different from the network of IPF and control. Conclusion: The SVM classifier may serve as an important clinical tool to address the challenging task of differentiating between CHP and IPF. Many of the biomarker genes on the differential co-expression network showed great promise in revealing the underlying mechanisms of CHP.

Download Full-text

An Improved Intelligent Approach to Enhance the Sentiment Classifier for Knowledge Discovery Using Machine Learning

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327910999200528114552 ◽

2020 ◽

Vol 10 (4) ◽

pp. 582-593

Author(s):

Midde Venkateswarlu Naik ◽

D. Vasumathi ◽

A.P. Siva Kumar

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Global Warming ◽

Particle Swarm Optimization ◽

Sentiment Analysis ◽

Optimization Technique ◽

Particle Swarm ◽

Sentiment Classification ◽

Support Vector ◽

Swarm Optimization

Aims: The proposed research work is on an evolutionary enhanced method for sentiment or emotion classification on unstructured review text in the big data field. The sentiment analysis plays a vital role for current generation of people for extracting valid decision points about any aspect such as movie ratings, education institute or politics ratings, etc. The proposed hybrid approach combined the optimal feature selection using Particle Swarm Optimization (PSO) and sentiment classification through Support Vector Machine (SVM). The current approach performance is evaluated with statistical measures, such as precision, recall, sensitivity, specificity, and was compared with the existing approaches. The earlier authors have achieved an accuracy of sentiment classifier in the English text up to 94% as of now. In the proposed scheme, an average accuracy of sentiment classifier on distinguishing datasets outperformed as 99% by tuning various parameters of SVM, such as constant c value and kernel gamma value in association with PSO optimization technique. The proposed method utilized three datasets, such as airline sentiment data, weather, and global warming datasets, that are publically available. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Background: The sentiment analysis plays a vital role for current generation people for extracting valid decisions about any aspect such as movie rating, education institute or even politics ratings, etc. Sentiment Analysis (SA) or opinion mining has become fascinated scientifically as a research domain for the present environment. The key area is sentiment classification on semi-structured or unstructured data in distinguish languages, which has become a major research aspect. User-Generated Content [UGC] from distinguishing sources has been hiked significantly with rapid growth in a web environment. The huge user-generated data over social media provides substantial value for discovering hidden knowledge or correlations, patterns, and trends or sentiment extraction about any specific entity. SA is a computational analysis to determine the actual opinion of an entity which is expressed in terms of text. SA is also called as computation of emotional polarity expressed over social media as natural text in miscellaneous languages. Usually, the automatic superlative sentiment classifier model depends on feature selection and classification algorithms. Methods: The proposed work used Support vector machine as classification technique and particle swarm optimization technique as feature selection purpose. In this methodology, we tune various permutations and combination parameters in order to obtain expected desired results with kernel and without kernel technique for sentiment classification on three datasets, including airline, global warming, weather sentiment datasets, that are freely hosted for research practices. Results: In the proposed scheme, The proposed method has outperformed with 99.2% of average accuracy to classify the sentiment on different datasets, among other machine learning techniques. The attained high accuracy in classifying sentiment or opinion about review text proves superior effectiveness over existing sentiment classifiers. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Conclusion: The objective of the research issue sentiment classifier accuracy has been hiked with the help of Kernel-based Support Vector Machine (SVM) based on parameter optimization. The optimal feature selection to classify sentiment or opinion towards review documents has been determined with the help of a particle swarm optimization approach. The proposed method utilized three datasets to simulate the results, such as airline sentiment data, weather sentiment data, and global warming data that are freely available datasets.

Download Full-text

Ensemble based filter feature selection with harmonize particle swarm optimization and support vector machine for optimal cancer classification

Machine Learning with Applications ◽

10.1016/j.mlwa.2021.100054 ◽

2021 ◽

pp. 100054

Author(s):

Tengku Mazlin Tengku Ab Hamid ◽

Roselina Sallehuddin ◽

Zuriahati Mohd Yunos ◽

Aida Ali

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Particle Swarm Optimization ◽

Particle Swarm ◽

Cancer Classification ◽

Support Vector ◽

Swarm Optimization

Download Full-text

A fuzzy gaussian rank aggregation ensemble feature selection method for microarray data

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-190134 ◽

2021 ◽

Vol 24 (4) ◽

pp. 289-301

Author(s):

B. Venkatesh ◽

J. Anuradha

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Classification Accuracy ◽

Performance Metrics ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Binary Particle Swarm Optimization ◽

Selection Methods

In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.

Download Full-text

A Comparison of Feature Selection and Forecasting Machine Learning Algorithms for Predicting Glycaemia in Type 1 Diabetes Mellitus

Applied Sciences ◽

10.3390/app11041742 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1742

Author(s):

Ignacio Rodríguez-Rodríguez ◽

José-Víctor Rodríguez ◽

Wai Lok Woo ◽

Bo Wei ◽

Domingo-Javier Pardo-Quiles

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Type 1 Diabetes ◽

Feature Selection ◽

Blood Glucose ◽

Type 1 Diabetes Mellitus ◽

Support Vector ◽

Chronic Hyperglycemia ◽

Predictive Algorithms

Type 1 diabetes mellitus (DM1) is a metabolic disease derived from falls in pancreatic insulin production resulting in chronic hyperglycemia. DM1 subjects usually have to undertake a number of assessments of blood glucose levels every day, employing capillary glucometers for the monitoring of blood glucose dynamics. In recent years, advances in technology have allowed for the creation of revolutionary biosensors and continuous glucose monitoring (CGM) techniques. This has enabled the monitoring of a subject’s blood glucose level in real time. On the other hand, few attempts have been made to apply machine learning techniques to predicting glycaemia levels, but dealing with a database containing such a high level of variables is problematic. In this sense, to the best of the authors’ knowledge, the issues of proper feature selection (FS)—the stage before applying predictive algorithms—have not been subject to in-depth discussion and comparison in past research when it comes to forecasting glycaemia. Therefore, in order to assess how a proper FS stage could improve the accuracy of the glycaemia forecasted, this work has developed six FS techniques alongside four predictive algorithms, applying them to a full dataset of biomedical features related to glycaemia. These were harvested through a wide-ranging passive monitoring process involving 25 patients with DM1 in practical real-life scenarios. From the obtained results, we affirm that Random Forest (RF) as both predictive algorithm and FS strategy offers the best average performance (Root Median Square Error, RMSE = 18.54 mg/dL) throughout the 12 considered predictive horizons (up to 60 min in steps of 5 min), showing Support Vector Machines (SVM) to have the best accuracy as a forecasting algorithm when considering, in turn, the average of the six FS techniques applied (RMSE = 20.58 mg/dL).

Download Full-text

Application of Fuzzy Entropy to Improve Feature Selection for Defect Recognition Using Support Vector Machine in High Voltage Cable Joints

IEEE Transactions on Dielectrics and Electrical Insulation ◽

10.1109/tdei.2020.009055 ◽

2020 ◽

Vol 27 (6) ◽

pp. 2147-2155

Author(s):

Chien-Kuo Chang ◽

Bharath Kumar Boyanapalli ◽

Ruay-Nan Wu

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

High Voltage ◽

Fuzzy Entropy ◽

Support Vector ◽

Selection For ◽

Defect Recognition

Download Full-text

A Hybrid Swarm and Gravitation-based feature selection algorithm for handwritten Indic script classification problem

Complex & Intelligent Systems ◽

10.1007/s40747-020-00237-1 ◽

2021 ◽

Author(s):

Ritam Guha ◽

Manosij Ghosh ◽

Pawan Kumar Singh ◽

Ram Sarkar ◽

Mita Nasipuri

Keyword(s):

Feature Selection ◽

Character Recognition ◽

Optical Character Recognition ◽

Classification Problem ◽

Classification Model ◽

Support Vector ◽

Intermediate Step ◽

Hybrid Swarm ◽

Feature Vectors ◽

Indic Script

AbstractIn any multi-script environment, handwritten script classification is an unavoidable pre-requisite before the document images are fed to their respective Optical Character Recognition (OCR) engines. Over the years, this complex pattern classification problem has been solved by researchers proposing various feature vectors mostly having large dimensions, thereby increasing the computation complexity of the whole classification model. Feature Selection (FS) can serve as an intermediate step to reduce the size of the feature vectors by restricting them only to the essential and relevant features. In the present work, we have addressed this issue by introducing a new FS algorithm, called Hybrid Swarm and Gravitation-based FS (HSGFS). This algorithm has been applied over three feature vectors introduced in the literature recently—Distance-Hough Transform (DHT), Histogram of Oriented Gradients (HOG), and Modified log-Gabor (MLG) filter Transform. Three state-of-the-art classifiers, namely, Multi-Layer Perceptron (MLP), K-Nearest Neighbour (KNN), and Support Vector Machine (SVM), are used to evaluate the optimal subset of features generated by the proposed FS model. Handwritten datasets at block, text line, and word level, consisting of officially recognized 12 Indic scripts, are prepared for experimentation. An average improvement in the range of 2–5% is achieved in the classification accuracy by utilizing only about 75–80% of the original feature vectors on all three datasets. The proposed method also shows better performance when compared to some popularly used FS models. The codes used for implementing HSGFS can be found in the following Github link: https://github.com/Ritam-Guha/HSGFS.

Download Full-text

Feature Selection and Parameter Optimization of Support Vector Machines Based on Modified Artificial Fish Swarm Algorithms

Mathematical Problems in Engineering ◽

10.1155/2015/604108 ◽

2015 ◽

Vol 2015 ◽

pp. 1-9 ◽

Cited By ~ 7

Author(s):

Kuan-Cheng Lin ◽

Sih-Yang Chen ◽

Jason C. Hung

Keyword(s):

Feature Selection ◽

Parameter Optimization ◽

Combinatorial Problem ◽

Combinatorial Problems ◽

Support Vector ◽

Local Optimum ◽

Artificial Fish Swarm Algorithm ◽

Swarm Algorithms ◽

Artificial Fish Swarm ◽

Information And Communication

Rapid advances in information and communication technology have made ubiquitous computing and the Internet of Things popular and practicable. These applications create enormous volumes of data, which are available for analysis and classification as an aid to decision-making. Among the classification methods used to deal with big data, feature selection has proven particularly effective. One common approach involves searching through a subset of the features that are the most relevant to the topic or represent the most accurate description of the dataset. Unfortunately, searching through this kind of subset is a combinatorial problem that can be very time consuming. Meaheuristic algorithms are commonly used to facilitate the selection of features. The artificial fish swarm algorithm (AFSA) employs the intelligence underlying fish swarming behavior as a means to overcome optimization of combinatorial problems. AFSA has proven highly successful in a diversity of applications; however, there remain shortcomings, such as the likelihood of falling into a local optimum and a lack of multiplicity. This study proposes a modified AFSA (MAFSA) to improve feature selection and parameter optimization for support vector machine classifiers. Experiment results demonstrate the superiority of MAFSA in classification accuracy using subsets with fewer features for given UCI datasets, compared to the original FASA.

Download Full-text