information gain ratio Latest Research Papers

With the increase in the amount of data and documents on the web, text summarization has become one of the significant fields which cannot be avoided in today’s digital era. Automatic text summarization provides a quick summary to the user based on the information presented in the text documents. This paper presents the automated single document summarization by constructing similitude graphs from the extracted text segments. On extracting the text segments, the feature values are computed for all the segments by comparing them with the title and the entire document and by computing segment significance using the information gain ratio. Based on the computed features, the similarity between the segments is evaluated to construct the graph in which the vertices are the segments and the edges specify the similarity between them. The segments are ranked for including them in the extractive summary by computing the graph score and the sentence segment score. The experimental analysis has been performed using ROUGE metrics and the results are analyzed for the proposed model. The proposed model has been compared with the various existing models using 4 different datasets in which the proposed model acquired top 2 positions with the average rank computed on various metrics such as precision, recall, F-score. HIGHLIGHTS Paper presents the automated single document summarization by constructing similitude graphs from the extracted text segments It utilizes information gain ratio, graph construction, graph score and the sentence segment score computation Results analysis has been performed using ROUGE metrics with 4 popular datasets in the document summarization domain The model acquired top 2 positions with the average rank computed on various metrics such as precision, recall, F-score GRAPHICAL ABSTRACT

Download Full-text

Association Rules Mining Algorithm Based on Information Gain Ratio Attribute Reduction

Business Intelligence and Information Technology - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-3-030-92632-8_18 ◽

2021 ◽

pp. 181-189

Author(s):

Tongtong Han ◽

Wenjing Wang ◽

Min Guo ◽

Shiyong Ning

Keyword(s):

Association Rules ◽

Information Gain ◽

Attribute Reduction ◽

Association Rules Mining ◽

Gain Ratio ◽

Mining Algorithm ◽

Information Gain Ratio

Download Full-text

The Improvement of Attribute Reduction Algorithm Based on Information Gain Ratio in Rough Set Theory

Business Intelligence and Information Technology - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-3-030-92632-8_15 ◽

2021 ◽

pp. 152-159

Author(s):

Wenjing Wang ◽

Min Guo ◽

Tongtong Han ◽

Shiyong Ning

Keyword(s):

Set Theory ◽

Rough Set ◽

Rough Set Theory ◽

Information Gain ◽

Attribute Reduction ◽

Reduction Algorithm ◽

Gain Ratio ◽

Information Gain Ratio

Download Full-text

Gain-Loss Evaluation-Based Generic Selection for Steganalysis Feature

Symmetry ◽

10.3390/sym13101775 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1775

Author(s):

Ruixia Jin ◽

Yihao Wang ◽

Yuanyuan Ma ◽

Tao Li ◽

Xintao Duan

Keyword(s):

Information Gain ◽

Detection Accuracy ◽

Distortion Factor ◽

Temporal Complexity ◽

Spatio Temporal ◽

The Difference ◽

Feature Dimension ◽

Gain Loss ◽

Information Gain Ratio ◽

Asymmetric Distortion

Fewer contribution feature components in the image high-dimensional steganalysis feature are able to increase the spatio-temporal complexity of detecting the stego images, and even reduce the detection accuracy. In order to maintain or even improve the detection accuracy while effectively reducing the dimension of the DCTR steganalysis feature, this paper proposes a new selection approach for DCTR feature. First, the asymmetric distortion factor and information gain ratio of each feature component are improved to measure the difference between the symmetric cover and stego features, which provides the theoretical basis for selecting the feature components that contribute to a great degree to detecting the stego images. Additionally, the feature components are arranged in descending order rely on the two measurement criteria, which provides the basis for deleting the components. Based on the above, removing feature components that are ranked larger differently according to two criteria. Ultimately, the preserved feature components are used as the final selected feature for training and detection. Comparison experiments with existing classical approaches indicate that this approach can effectively reduce the feature dimension while maintaining or even improving the detection accuracy. At the same time, it can reduce the detection spatio-temporal complexity of the stego images.

Download Full-text

Telemarketing Guidance in Selling Banking Services: A Data Mining Approach

Indonesian Journal of Business Analytics ◽

10.54259/ijba.v1i1.8 ◽

2021 ◽

Vol 1 (1) ◽

pp. 1-16

Author(s):

Kattareeya Prompreing ◽

Theera Prompreing

Keyword(s):

Data Mining ◽

Information Gain ◽

Data Mining Technique ◽

Attribute Importance ◽

Managerial Implication ◽

Banking Services ◽

Data Mining Approach ◽

Information Gain Ratio ◽

Potential Customers ◽

Fold Cross Validation

In telemarketing activity, selecting the most potential customers are important because can reduce processing time and operational cost. Therefore, the ability to select the most likely buying customers are urgently needed. In this study, we propose a clear sequence in doing telemarketing activity based on the previous telemarketing data which applying data mining technique. We weight the importance of 16 customer characteristics through 45,211 observations from a Portuguese bank. Applying Random Forest algorithm along with Information Gain Ratio as a criterion and 10-fold Cross Validation, the model able to weight the importance of attributes and achieves 90.01 % accuracy in predicting telemarketing success. Furthermore, the rank of attribute importance was designed to be a guidance map in selecting potential targeted customers as a managerial implication.

Download Full-text

Intrusion Detection System Berbasis Seleksi Fitur Dengan Kombinasi Filter Information Gain Ratio Dan Correlation

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.0813154 ◽

2021 ◽

Vol 8 (3) ◽

pp. 457

Author(s):

Nitami Lestari Putri ◽

Radityo Adi Nugroho ◽

Rudy Herteno

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Nearest Neighbor ◽

Information Gain ◽

Detection System ◽

Feature Ranking ◽

K Nearest Neighbor ◽

Gain Ratio ◽

Information Gain Ratio

Intrusion Detection System merupakan suatu sistem yang dikembangkan untuk memantau dan memfilter aktivitas jaringan dengan mengidentifikasi serangan. Karena jumlah data yang perlu diperiksa oleh IDS sangat besar dan banyaknya fitur-fitur asing yang dapat membuat proses analisis menjadi sulit untuk mendeteksi pola perilaku yang mencurigakan, maka IDS perlu mengurangi jumlah data yang akan diproses dengan cara mengurangi fitur yang dapat dilakukan dengan seleksi fitur. Pada penelitian ini mengkombinasikan dua metode perangkingan fitur yaitu Information Gain Ratio dan Correlation dan mengklasifikasikannya menggunakan algoritma K-Nearest Neighbor. Hasil perankingan dari kedua metode dibagi menjadi dua kelompok. Pada kelompok pertama dicari nilai mediannya dan untuk kelompok kedua dihapus. Lalu dilakukan klasifikasi K-Nearest Neighbor dengan menggunakan 10 kali validasi silang dan dilakukan pengujian dengan nilai k=5. Penerapan pemodelan yang diusulkan menghasilkan akurasi tertinggi sebesar 99.61%. Sedangkan untuk akurasi tanpa seleksi fitur menghasilkan akurasi tertinggi sebesar 99.59%. AbstractIntrusion Detection System is a system that was developed for monitoring and filtering activity in network with identified of attack. Because of the amount of the data that need to be checked by IDS is very large and many foreign feature that can make the analysis process difficult for detection suspicious pattern of behavior, so that IDS need for reduce amount of the data to be processed by reducing features that can be done by feature selection. In this study, combines two methods of feature ranking is Information Gain Ratio and Correlation and classify it using K-Nearest Neighbor algorithm. The result of feature ranking from the both methods divided into two groups. in the first group searched for the median value and in the second group is removed. Then do the classification of K-Nearest Neighbor using 10 fold cross validation and do the tests with values k=5. The result of the proposed modelling produce the highest accuracy of 99.61%. While the highest accuracy value of the not using the feature selection is 99.59%.

Download Full-text

Random Fuzzy Granular Decision Tree

Mathematical Problems in Engineering ◽

10.1155/2021/5578682 ◽

2021 ◽

Vol 2021 ◽

pp. 1-17

Author(s):

Wei Li ◽

Xiaoyu Ma ◽

Yumin Chen ◽

Bin Dai ◽

Runjing Chen ◽

...

Keyword(s):

Standard Deviation ◽

Decision Tree ◽

High Efficiency ◽

Information Gain ◽

Optimal Solution ◽

Numerical Data ◽

Classification Problem ◽

Classification Problems ◽

Local Optimal Solution ◽

Information Gain Ratio

In this study, the classification problem is solved from the view of granular computing. That is, the classification problem is equivalently transformed into the fuzzy granular space to solve. Most classification algorithms are only adopted to handle numerical data; random fuzzy granular decision tree (RFGDT) can handle not only numerical data but also nonnumerical data like information granules. Measures can be taken in four ways as follows. First, an adaptive global random clustering (AGRC) algorithm is proposed, which can adaptively find the optimal cluster centers and maximize the ratio of interclass standard deviation to intraclass standard deviation, and avoid falling into local optimal solution; second, on the basis of AGRC, a parallel model is designed for fuzzy granulation of data to construct granular space, which can greatly enhance the efficiency compared with serial granulation of data; third, in the fuzzy granular space, we design RFGDT to classify the fuzzy granules, which can select important features as tree nodes based on information gain ratio and avoid the problem of overfitting based on the pruning algorithm proposed. Finally, we employ the dataset from UC Irvine Machine Learning Repository for verification. Theory and experimental results prove that RFGDT has high efficiency and accuracy and is robust in solving classification problems.

Download Full-text

Landslide susceptibility modelling based on decision tree classification model of machine learning: A case study of Kullu-Rohtang Pass transport corridor (Himachal Pradesh), India

10.21203/rs.3.rs-313992/v1 ◽

2021 ◽

Author(s):

Nirbhav Sharma ◽

Ram Babu Singh ◽

Anand Malik ◽

Maheshwar Sharma

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Landslide Susceptibility ◽

Information Gain ◽

Himachal Pradesh ◽

Classification Model ◽

Decision Tree Classification ◽

Landslide Hazards ◽

Susceptibility Model ◽

Information Gain Ratio

Abstract Landslide hazards are responsible for causing substantial destruction and losses in mountainous region. In order to lessen the damage in these vulnerable areas, the key challenge is to predict the landslide events with accuracy and precision. The principal objective of the study conducted is to assess the landslide susceptibility along the transport corridor from Kullu to Rohtang Pass in Himachal Pradesh, India. To achieve this objective, a detailed landslide inventory has been prepared based on the imagery data and frequent field visits. A total of 197 landslides were taken under consideration including 153 rock slides and 44 debris slides. Nine landslide factors were prepared initially and their relationships with each other and with the type of landslide was analysed. Later, information gain ratio measure was used to identify the triggering factors having best score for eliminating the unimportant factors. Train_test_split method was used to classify the dataset into training and testing groups. Decision tree classification model of machine learning was applied for landslide susceptibility model (LSM). The performance was evaluated using classification report and receiver operating characteristic (ROC) curve. Results obtained have proved that the decision tree classification model of machine learning performed well and have a good accuracy in forecasting landslide susceptibility in the area considered for this study.

Download Full-text

Gene selection and classification combining information gain ratio with fruit fly optimisation algorithm for single-cell RNA-seq data

International Journal of Computational Science and Engineering ◽

10.1504/ijcse.2021.118098 ◽

2021 ◽

Vol 24 (5) ◽

pp. 495

Author(s):

Jie Zhang ◽

Junhong Feng ◽

Xiani Yang ◽

Jianming Liu

Keyword(s):

Single Cell ◽

Gene Selection ◽

Information Gain ◽

Fruit Fly ◽

Rna Seq ◽

Gain Ratio ◽

Optimisation Algorithm ◽

Information Gain Ratio ◽

Combining Information

Download Full-text

Gene selection and classification combining information gain ratio with fruit fly optimisation algorithm for single-cell RNA-seq data

International Journal of Computational Science and Engineering ◽

10.1504/ijcse.2021.10041500 ◽

2021 ◽

Vol 24 (5) ◽

pp. 495

Author(s):

Jie Zhang ◽

Junhong Feng ◽

Xiani Yang ◽

Jianming Liu

Keyword(s):

Single Cell ◽

Gene Selection ◽

Information Gain ◽

Fruit Fly ◽

Rna Seq ◽

Gain Ratio ◽

Optimisation Algorithm ◽

Information Gain Ratio ◽

Combining Information

Download Full-text

information gain ratio
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Similitude Based Segment Graph Construction and Segment Ranking for Automatic Summarization of Text Document

Association Rules Mining Algorithm Based on Information Gain Ratio Attribute Reduction

The Improvement of Attribute Reduction Algorithm Based on Information Gain Ratio in Rough Set Theory

Gain-Loss Evaluation-Based Generic Selection for Steganalysis Feature

Telemarketing Guidance in Selling Banking Services: A Data Mining Approach

Intrusion Detection System Berbasis Seleksi Fitur Dengan Kombinasi Filter Information Gain Ratio Dan Correlation

Random Fuzzy Granular Decision Tree

Landslide susceptibility modelling based on decision tree classification model of machine learning: A case study of Kullu-Rohtang Pass transport corridor (Himachal Pradesh), India

Gene selection and classification combining information gain ratio with fruit fly optimisation algorithm for single-cell RNA-seq data

Gene selection and classification combining information gain ratio with fruit fly optimisation algorithm for single-cell RNA-seq data

Export Citation Format

information gain ratioRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Similitude Based Segment Graph Construction and Segment Ranking for Automatic Summarization of Text Document

Association Rules Mining Algorithm Based on Information Gain Ratio Attribute Reduction

The Improvement of Attribute Reduction Algorithm Based on Information Gain Ratio in Rough Set Theory

Gain-Loss Evaluation-Based Generic Selection for Steganalysis Feature

Telemarketing Guidance in Selling Banking Services: A Data Mining Approach

Intrusion Detection System Berbasis Seleksi Fitur Dengan Kombinasi Filter Information Gain Ratio Dan Correlation

Random Fuzzy Granular Decision Tree

Landslide susceptibility modelling based on decision tree classification model of machine learning: A case study of Kullu-Rohtang Pass transport corridor (Himachal Pradesh), India

Gene selection and classification combining information gain ratio with fruit fly optimisation algorithm for single-cell RNA-seq data

Gene selection and classification combining information gain ratio with fruit fly optimisation algorithm for single-cell RNA-seq data

information gain ratio
Recently Published Documents