Forecasting mergers and acquisitions failure based on partial-sigmoid neural network and feature selection

Traditional forecasting methods in mergers and acquisitions (M&A) data have two limitations that significantly reduce forecasting accuracy: (1) the imbalance of data, that is, the failure cases of M&A are far fewer than the successful cases (82%/18% of our sample), and (2) both the bidder and the target of the merger have numerous descriptive features, making it difficult to choose which ones to forecast. This study proposes a neural network using partial-sigmoid (i.e., partial-sigmoid neural network [PSNN]) as the activation function of the output layer and compares three feature selection methods, namely, chi-square (chi2) test, information gain and gradient boosting decision tree (GBDT). Experimental results prove that our PSNN (improved up to 0.37 precision, 0.49 recall, 0.41 G-Mean and 0.23 F1-measure) and feature selection (improved 1.83%-13.16% accuracy) method can effectively improve the adverse effects of the defects of the above two merger data on forecasting. Scholars who studied the forecast of merger failure have overlooked three important features: assets of the previous year, market value and capital expenditure. The chi2 test feature selection method is the best among the three feature selection methods.

Download Full-text

Impact of feature selection on classification via clustering techniques in software defect prediction

Journal of Computer Science and Its Application ◽

10.4314/jcsia.v26i1.8 ◽

2020 ◽

Vol 26 (1) ◽

Cited By ~ 1

Author(s):

F.E. Usman-Hamza ◽

A.F. Atte ◽

A.O. Balogun ◽

H.A. Mojeed ◽

A.O. Bajeh ◽

...

Keyword(s):

Feature Selection ◽

Information Gain ◽

Feature Selection Method ◽

Predictive Performance ◽

Defect Prediction ◽

Software Defect Prediction ◽

Selection Methods ◽

Clustering Techniques ◽

Software Defect ◽

The Impact

Software testing using software defect prediction aims to detect as many defects as possible in software before the software release. This plays an important role in ensuring quality and reliability. Software defect prediction can be modeled as a classification problem that classifies software modules into two classes: defective and non-defective; and classification algorithms are used for this process. This study investigated the impact of feature selection methods on classification via clustering techniques for software defect prediction. Three clustering techniques were selected; Farthest First Clusterer, K-Means and Make-Density Clusterer, and three feature selection methods: Chi-Square, Clustering Variation, and Information Gain were used on software defect datasets from NASA repository. The best software defect prediction model was farthest-first using information gain feature selection method with an accuracy of 78.69%, precision value of 0.804 and recall value of 0.788. The experimental results showed that the use of clustering techniques as a classifier gave a good predictive performance and feature selection methods further enhanced their performance. This indicates that classification via clustering techniques can give competitive results against standard classification methods with the advantage of not having to train any model using labeled dataset; as it can be used on the unlabeled datasets.Keywords: Classification, Clustering, Feature Selection, Software Defect PredictionVol. 26, No 1, June, 2019

Download Full-text

A NEW FEATURE SELECTION METHOD FOR TEXT CLASSIFICATION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001407005466 ◽

2007 ◽

Vol 21 (02) ◽

pp. 423-438 ◽

Cited By ~ 9

Author(s):

GULDEN UCHYIGIT ◽

KEITH CLARK

Keyword(s):

Feature Selection ◽

Text Classification ◽

Information Gain ◽

Feature Selection Method ◽

Feature Space ◽

Selection Method ◽

Computational Time ◽

Small Subset ◽

Selection Methods ◽

New Feature

Text classification is the problem of classifying a set of documents into a pre-defined set of classes. A major problem with text classification problems is the high dimensionality of the feature space. Only a small subset of these words are feature words which can be used in determining a document's class, while the rest adds noise and can make the results unreliable and significantly increase computational time. A common approach in dealing with this problem is feature selection where the number of words in the feature space are significantly reduced. In this paper we present the experiments of a comparative study of feature selection methods used for text classification. Ten feature selection methods were evaluated in this study including the new feature selection method, called the GU metric. The other feature selection methods evaluated in this study are: Chi-Squared (χ2) statistic, NGL coefficient, GSS coefficient, Mutual Information, Information Gain, Odds Ratio, Term Frequency, Fisher Criterion, BSS/WSS coefficient. The experimental evaluations show that the GU metric obtained the best F1 and F2 scores. The experiments were performed on the 20 Newsgroups data sets with the Naive Bayesian Probabilistic Classifier.

Download Full-text

Multi-Layer Perceptron Neural Network Model Development for Chili Pepper Disease Diagnosis Using Filter and Wrapper Feature Selection Methods

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.4383 ◽

2021 ◽

Vol 11 (5) ◽

pp. 7714-7719

Author(s):

S. Nuanmeesri ◽

W. Sriurai

Keyword(s):

Neural Network ◽

Feature Selection ◽

Information Gain ◽

Disease Diagnosis ◽

Chili Pepper ◽

Growth Stages ◽

Multi Layer Perceptron ◽

Selection Methods ◽

Diagnosis Model ◽

Wrapper Feature Selection

The goal of the current study is to develop a diagnosis model for chili pepper disease diagnosis by applying filter and wrapper feature selection methods as well as a Multi-Layer Perceptron Neural Network (MLPNN). The data used for developing the model include 1) types, 2) causative agents, 3) areas of infection, 4) growth stages of infection, 5) conditions, 6) symptoms, and 7) 14 types of chili pepper diseases. These datasets were applied to the 3 feature selection techniques, including information gain, gain ratio, and wrapper. After selecting the key features, the selected datasets were utilized to develop the diagnosis model towards the application of MLPNN. According to the model’s effectiveness evaluation results, estimated by 10-fold cross-validation, it can be seen that the diagnosis model developed by applying the wrapper method along with MLPNN provided the highest level of effectiveness, with an accuracy of 98.91%, precision of 98.92%, and recall of 98.89%. The findings showed that the developed model is applicable.

Download Full-text

Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization

The Scientific World JOURNAL ◽

10.1155/2014/625342 ◽

2014 ◽

Vol 2014 ◽

pp. 1-17 ◽

Cited By ~ 9

Author(s):

Jieming Yang ◽

Zhaoyang Qu ◽

Zhiying Liu

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Information Gain ◽

Feature Selection Method ◽

Support Vector ◽

Selection Methods ◽

Document Collections ◽

Imbalance Problem ◽

Important Approach ◽

Selection Algorithms

The filtering feature-selection algorithm is a kind of important approach to dimensionality reduction in the field of the text categorization. Most of filtering feature-selection algorithms evaluate the significance of a feature for category based on balanced dataset and do not consider the imbalance factor of dataset. In this paper, a new scheme was proposed, which can weaken the adverse effect caused by the imbalance factor in the corpus. We evaluated the improved versions of nine well-known feature-selection methods (Information Gain, Chi statistic, Document Frequency, Orthogonal Centroid Feature Selection, DIA association factor, Comprehensive Measurement Feature Selection, Deviation from Poisson Feature Selection, improved Gini index, and Mutual Information) using naïve Bayes and support vector machines on three benchmark document collections (20-Newsgroups, Reuters-21578, and WebKB). The experimental results show that the improved scheme can significantly enhance the performance of the feature-selection methods.

Download Full-text

Enhanced Classification Method for Phishing Emails Detection

Journal of Information Security and Cybercrimes Research ◽

10.26735/ygmy6142 ◽

2020 ◽

Vol 3 (1) ◽

pp. 58-63

Author(s):

Y. Mansour Mansour ◽

Majed A. Alenizi

Keyword(s):

Feature Selection ◽

Information Gain ◽

Hybrid Approach ◽

Feature Selection Method ◽

Search Space ◽

Selection Method ◽

Classification Model ◽

Selection Methods ◽

Accuracy Rate ◽

Communication Method

Emails are currently the main communication method worldwide as it proven in its efficiency. Phishing emails in the other hand is one of the major threats which results in significant losses, estimated at billions of dollars. Phishing emails is a more dynamic problem, a struggle between the phishers and defenders where the phishers have more flexibility in manipulating the emails features and evading the anti-phishing techniques. Many solutions have been proposed to mitigate the phishing emails impact on the targeted sectors, but none have achieved 100% detection and accuracy. As phishing techniques are evolving, the solutions need to be evolved and generalized in order to mitigate as much as possible. This article presents a new emergent classification model based on hybrid feature selection method that combines two common feature selection methods, Information Gain and Genetic Algorithm that keep only significant and high-quality features in the final classifier. The Proposed hybrid approach achieved 98.9% accuracy rate against phishing emails dataset comprising 8266 instances and results depict enhancement by almost 4%. Furthermore, the presented technique has contributed to reducing the search space by reducing the number of selected features.

Download Full-text

Sentiment Classification using Neural Network and Ensemble Model based on Genetic Algorithm

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b3677.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 1885-1891

Keyword(s):

Neural Network ◽

Genetic Algorithm ◽

Feature Selection ◽

Web Sites ◽

Information Gain ◽

Back Propagation ◽

Feature Selection Method ◽

Propagation Model ◽

Feature Selection Technique ◽

Fast Development

The fast development of web sites and the number of product on these websites are available. The purpose of classification of sentiment is to efficiently identify opinion expressed in text. This paper compares three different optimized models including genetic optimized feature selection method, Genetic Algorithm (GA), ensemble approach that uses information gain and genetic algorithm as feature selection methods incorporated SVM model, Genetic Bagging (GB) and the next method uses optimized feature selection as feature selection technique incorporated back propagation model, Genetic Neural Network (GNN) models are compared. We are tested in sentiment analysis using sample multi-domain review datasets and movie review dataset.. These approaches are tested using various quality metrics and the results show that the Genetic Bagging (GB) technique outperforms in classifying the sentiment of the multi domain reviews and movie reviews. An empirical analysis is performed to compare the level of importance of the classifiers GB, GNN methods with McNemar’s statistical method.

Download Full-text

A Comparative Analysis of Filter-Based Feature Selection Methods for Software Fault Prediction

Research and Development on Information and Communication Technology ◽

10.32913/mic-ict-research-vn.v2021.n1.969 ◽

2021 ◽

pp. 1-7

Author(s):

Thị Minh Phương Hà ◽

Thi My Hanh Le ◽

Thanh Binh Nguyen

Keyword(s):

Feature Selection ◽

Prediction Models ◽

Information Gain ◽

Feature Selection Method ◽

Computation Time ◽

Fault Prediction ◽

Software Systems ◽

Selection Methods ◽

Software Fault Prediction

The rapid growth of data has become a huge challenge for software systems. The quality of fault predictionmodel depends on the quality of software dataset. High-dimensional data is the major problem that affects the performance of the fault prediction models. In order to deal with dimensionality problem, feature selection is proposed by various researchers. Feature selection method provides an effective solution by eliminating irrelevant and redundant features, reducing computation time and improving the accuracy of the machine learning model. In this study, we focus on research and synthesis of the Filter-based feature selection with several search methods and algorithms. In addition, five filter-based feature selection methods are analyzed using five different classifiers over datasets obtained from National Aeronautics and Space Administration (NASA) repository. The experimental results show that Chi-Square and Information Gain methods had the best influence on the results of predictive models over other filter ranking methods.

Download Full-text

Improved ICHI square feature selection method for Arabic classifiers

International Journal of Informatics and Communication Technology (IJ-ICT) ◽

10.11591/ijict.v9i3.pp157-170 ◽

2020 ◽

Vol 9 (3) ◽

pp. 157

Author(s):

Hadeel N. Alshaer ◽

Mohammed A. Otair ◽

Laith Abualigah

Keyword(s):

Feature Selection ◽

Information Gain ◽

Feature Selection Method ◽

Selection Methods ◽

Chi Square ◽

Feature Selection Problem ◽

Time To Build ◽

Arabic Text Classification ◽

Almost All ◽

Text And Data Mining

<span>Feature selection problem is one of the main important problems in the text and data mining domain. </span><span>This paper presents a comparative study of feature selection methods for Arabic text classification. Five of the feature selection methods were selected: ICHI square, CHI square, Information Gain, Mutual Information and Wrapper. It was tested with five classification algorithms: Bayes Net, Naive Bayes, Random Forest, Decision Tree and Artificial Neural Networks. In addition, Data Collection was used in Arabic consisting of 9055 documents, which were compared by four criteria: Precision, Recall, F-measure and Time to build model. The results showed that the improved ICHI feature selection got almost all the best results in comparison with other methods.</span>

Download Full-text

Feature Selection by Applying Parallel Collaborative Evolutionary GA

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.263-266.2074 ◽

2012 ◽

Vol 263-266 ◽

pp. 2074-2081

Author(s):

Zhi Cheng Qu ◽

Qin Yang ◽

Bin Jiang

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Information Gain ◽

Feature Selection Method ◽

Data Sets ◽

Selection Methods ◽

Text Data ◽

Evolutionary Genetic ◽

Parallel Strategy ◽

Better Than

Feature selection is one of the important topics in text classification. However, most of existing feature selection methods are serial and inefficient to be applied to massive text data sets. In this paper, a feature selection method based on parallel collaborative evolutionary genetic algorithm is presented. The presented method uses genetic algorithm to select feature subsets and takes advantage of parallel collaborative evolution to enhance time efficiency, so it can quickly acquire the feature subsets which are more representative. The experimental results show that: For macro-average and micro-average , the presented method is better than three classical methods: Information Gain、x2 Statistics、 Mutual Information. For the consumed time, the presented method with a CPU is inferior to the above mentioned three methods, but the presented method is superior after using the parallel strategy.

Download Full-text

A New Feature Selection Method for Sentiment Analysis in Short Text

Journal of Intelligent Systems ◽

10.1515/jisys-2018-0171 ◽

2018 ◽

Vol 29 (1) ◽

pp. 1122-1134

Author(s):

H. M. Keerthi Kumar ◽

B. S. Harish

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Classification Accuracy ◽

Information Gain ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Selection Methods ◽

K Nearest Neighbors ◽

Short Text

Abstract In recent internet era, micro-blogging sites produce enormous amount of short textual information, which appears in the form of opinions or sentiments of users. Sentiment analysis is a challenging task in short text, due to use of formal language, misspellings, and shortened forms of words, which leads to high dimensionality and sparsity. In order to deal with these challenges, this paper proposes a novel, simple, and yet effective feature selection method, to select frequently distributed features related to each class. In this paper, the feature selection method is based on class-wise information, to identify the relevant feature related to each class. We evaluate the proposed feature selection method by comparing with existing feature selection methods like chi-square ( χ2), entropy, information gain, and mutual information. The performances are evaluated using classification accuracy obtained from support vector machine, K nearest neighbors, and random forest classifiers on two publically available datasets viz., Stanford Twitter dataset and Ravikiran Janardhana dataset. In order to demonstrate the effectiveness of the proposed feature selection method, we conducted extensive experimentation by selecting different feature sets. The proposed feature selection method outperforms the existing feature selection methods in terms of classification accuracy on the Stanford Twitter dataset. Similarly, the proposed method performs competently equally in terms of classification accuracy compared to other feature selection methods in most of the feature subsets on Ravikiran Janardhana dataset.

Download Full-text