information gain ratio
Recently Published Documents


TOTAL DOCUMENTS

30
(FIVE YEARS 19)

H-INDEX

3
(FIVE YEARS 1)

2022 ◽  
Vol 19 (1) ◽  
pp. 1719
Author(s):  
Saravanan Arumugam ◽  
Sathya Bama Subramani

With the increase in the amount of data and documents on the web, text summarization has become one of the significant fields which cannot be avoided in today’s digital era. Automatic text summarization provides a quick summary to the user based on the information presented in the text documents. This paper presents the automated single document summarization by constructing similitude graphs from the extracted text segments. On extracting the text segments, the feature values are computed for all the segments by comparing them with the title and the entire document and by computing segment significance using the information gain ratio. Based on the computed features, the similarity between the segments is evaluated to construct the graph in which the vertices are the segments and the edges specify the similarity between them. The segments are ranked for including them in the extractive summary by computing the graph score and the sentence segment score. The experimental analysis has been performed using ROUGE metrics and the results are analyzed for the proposed model. The proposed model has been compared with the various existing models using 4 different datasets in which the proposed model acquired top 2 positions with the average rank computed on various metrics such as precision, recall, F-score. HIGHLIGHTS Paper presents the automated single document summarization by constructing similitude graphs from the extracted text segments It utilizes information gain ratio, graph construction, graph score and the sentence segment score computation Results analysis has been performed using ROUGE metrics with 4 popular datasets in the document summarization domain The model acquired top 2 positions with the average rank computed on various metrics such as precision, recall, F-score GRAPHICAL ABSTRACT


Symmetry ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 1775
Author(s):  
Ruixia Jin ◽  
Yihao Wang ◽  
Yuanyuan Ma ◽  
Tao Li ◽  
Xintao Duan

Fewer contribution feature components in the image high-dimensional steganalysis feature are able to increase the spatio-temporal complexity of detecting the stego images, and even reduce the detection accuracy. In order to maintain or even improve the detection accuracy while effectively reducing the dimension of the DCTR steganalysis feature, this paper proposes a new selection approach for DCTR feature. First, the asymmetric distortion factor and information gain ratio of each feature component are improved to measure the difference between the symmetric cover and stego features, which provides the theoretical basis for selecting the feature components that contribute to a great degree to detecting the stego images. Additionally, the feature components are arranged in descending order rely on the two measurement criteria, which provides the basis for deleting the components. Based on the above, removing feature components that are ranked larger differently according to two criteria. Ultimately, the preserved feature components are used as the final selected feature for training and detection. Comparison experiments with existing classical approaches indicate that this approach can effectively reduce the feature dimension while maintaining or even improving the detection accuracy. At the same time, it can reduce the detection spatio-temporal complexity of the stego images.


2021 ◽  
Vol 1 (1) ◽  
pp. 1-16
Author(s):  
Kattareeya Prompreing ◽  
Theera Prompreing

In telemarketing activity, selecting the most potential customers are important because can reduce processing time and operational cost. Therefore, the ability to select the most likely buying customers are urgently needed. In this study, we propose a clear sequence in doing telemarketing activity based on the previous telemarketing data which applying data mining technique. We weight the importance of 16 customer characteristics through 45,211 observations from a Portuguese bank. Applying Random Forest algorithm along with Information Gain Ratio as a criterion and 10-fold Cross Validation, the model able to weight the importance of attributes and achieves 90.01 % accuracy in predicting telemarketing success. Furthermore, the rank of attribute importance was designed to be a guidance map in selecting potential targeted customers as a managerial implication.


2021 ◽  
Vol 8 (3) ◽  
pp. 457
Author(s):  
Nitami Lestari Putri ◽  
Radityo Adi Nugroho ◽  
Rudy Herteno

<p><em>Intrusion Detection System</em> merupakan suatu sistem yang dikembangkan untuk memantau dan memfilter aktivitas jaringan dengan mengidentifikasi serangan. Karena jumlah data yang perlu diperiksa oleh IDS sangat besar dan banyaknya fitur-fitur asing yang dapat membuat proses analisis menjadi sulit untuk mendeteksi pola perilaku yang mencurigakan, maka IDS perlu mengurangi jumlah data yang akan diproses dengan cara mengurangi fitur yang dapat dilakukan dengan seleksi fitur. Pada penelitian ini mengkombinasikan dua metode perangkingan fitur yaitu <em>Information Gain Ratio </em>dan <em>Correlation </em>dan mengklasifikasikannya menggunakan algoritma <em>K-Nearest Neighbor</em>. Hasil perankingan dari kedua metode dibagi menjadi dua kelompok. Pada kelompok pertama dicari nilai mediannya dan untuk kelompok kedua dihapus. Lalu dilakukan klasifikasi <em>K-Nearest Neighbor</em> dengan menggunakan 10 kali validasi silang dan dilakukan pengujian dengan nilai k=5. Penerapan pemodelan yang diusulkan menghasilkan akurasi tertinggi sebesar 99.61%. Sedangkan untuk akurasi tanpa seleksi fitur menghasilkan akurasi tertinggi sebesar 99.59%.</p><p> </p><p class="Judul2"><strong><em>Abstract</em></strong></p><p class="Abstract"><em>Intrusion Detection System is a system that was developed for monitoring and filtering activity in network with identified of attack. Because of the amount of the data that need to be checked by IDS is very large and many foreign feature that can make the analysis process difficult for detection suspicious pattern of behavior, so that IDS need for reduce amount of the data to be processed by reducing features that can be done by feature selection. In this study, combines two methods of feature ranking is Information Gain Ratio and Correlation and classify it using K-Nearest Neighbor algorithm. The result of feature ranking from the both methods divided into two groups. in the first group searched for the median value and in the second group is removed. Then do the classification of  K-Nearest Neighbor using 10 fold cross validation and do the tests with values k=5. The result of the  proposed modelling produce the highest accuracy of 99.61%. While the highest accuracy value of the not using the feature selection is 99.59%.</em></p>


2021 ◽  
Vol 2021 ◽  
pp. 1-17
Author(s):  
Wei Li ◽  
Xiaoyu Ma ◽  
Yumin Chen ◽  
Bin Dai ◽  
Runjing Chen ◽  
...  

In this study, the classification problem is solved from the view of granular computing. That is, the classification problem is equivalently transformed into the fuzzy granular space to solve. Most classification algorithms are only adopted to handle numerical data; random fuzzy granular decision tree (RFGDT) can handle not only numerical data but also nonnumerical data like information granules. Measures can be taken in four ways as follows. First, an adaptive global random clustering (AGRC) algorithm is proposed, which can adaptively find the optimal cluster centers and maximize the ratio of interclass standard deviation to intraclass standard deviation, and avoid falling into local optimal solution; second, on the basis of AGRC, a parallel model is designed for fuzzy granulation of data to construct granular space, which can greatly enhance the efficiency compared with serial granulation of data; third, in the fuzzy granular space, we design RFGDT to classify the fuzzy granules, which can select important features as tree nodes based on information gain ratio and avoid the problem of overfitting based on the pruning algorithm proposed. Finally, we employ the dataset from UC Irvine Machine Learning Repository for verification. Theory and experimental results prove that RFGDT has high efficiency and accuracy and is robust in solving classification problems.


2021 ◽  
Author(s):  
Nirbhav Sharma ◽  
Ram Babu Singh ◽  
Anand Malik ◽  
Maheshwar Sharma

Abstract Landslide hazards are responsible for causing substantial destruction and losses in mountainous region. In order to lessen the damage in these vulnerable areas, the key challenge is to predict the landslide events with accuracy and precision. The principal objective of the study conducted is to assess the landslide susceptibility along the transport corridor from Kullu to Rohtang Pass in Himachal Pradesh, India. To achieve this objective, a detailed landslide inventory has been prepared based on the imagery data and frequent field visits. A total of 197 landslides were taken under consideration including 153 rock slides and 44 debris slides. Nine landslide factors were prepared initially and their relationships with each other and with the type of landslide was analysed. Later, information gain ratio measure was used to identify the triggering factors having best score for eliminating the unimportant factors. Train_test_split method was used to classify the dataset into training and testing groups. Decision tree classification model of machine learning was applied for landslide susceptibility model (LSM). The performance was evaluated using classification report and receiver operating characteristic (ROC) curve. Results obtained have proved that the decision tree classification model of machine learning performed well and have a good accuracy in forecasting landslide susceptibility in the area considered for this study.


Sign in / Sign up

Export Citation Format

Share Document