Towards Optimal Information Gain for Judicious Positioning of Sensors in Geophysical Tests

Geo-Risk 2017 ◽  
2017 ◽  
Author(s):  
Siddharth S. Parida ◽  
Kallol Sett ◽  
Puneet Singla
2020 ◽  
Vol 17 (1) ◽  
pp. 319-328
Author(s):  
Ade Muchlis Maulana Anwar ◽  
Prihastuti Harsani ◽  
Aries Maesya

Population Data is individual data or aggregate data that is structured as a result of Population Registration and Civil Registration activities. Birth Certificate is a Civil Registration Deed as a result of recording the birth event of a baby whose birth is reported to be registered on the Family Card and given a Population Identification Number (NIK) as a basis for obtaining other community services. From the total number of integrated birth certificate reporting for the 2018 Population Administration Information System (SIAK) totaling 570,637 there were 503,946 reported late and only 66,691 were reported publicly. Clustering is a method used to classify data that is similar to others in one group or similar data to other groups. K-Nearest Neighbor is a method for classifying objects based on learning data that is the closest distance to the test data. k-means is a method used to divide a number of objects into groups based on existing categories by looking at the midpoint. In data mining preprocesses, data is cleaned by filling in the blank data with the most dominating data, and selecting attributes using the information gain method. Based on the k-nearest neighbor method to predict delays in reporting and the k-means method to classify priority areas of service with 10,000 birth certificate data on birth certificates in 2019 that have good enough performance to produce predictions with an accuracy of 74.00% and with K = 2 on k-means produces a index davies bouldin of 1,179.


2021 ◽  
Vol 15 (8) ◽  
pp. 912-926
Author(s):  
Ge Zhang ◽  
Pan Yu ◽  
Jianlin Wang ◽  
Chaokun Yan

Background: There have been rapid developments in various bioinformatics technologies, which have led to the accumulation of a large amount of biomedical data. However, these datasets usually involve thousands of features and include much irrelevant or redundant information, which leads to confusion during diagnosis. Feature selection is a solution that consists of finding the optimal subset, which is known to be an NP problem because of the large search space. Objective: For the issue, this paper proposes a hybrid feature selection method based on an improved chemical reaction optimization algorithm (ICRO) and an information gain (IG) approach, which called IGICRO. Methods: IG is adopted to obtain some important features. The neighborhood search mechanism is combined with ICRO to increase the diversity of the population and improve the capacity of local search. Results: Experimental results of eight public available data sets demonstrate that our proposed approach outperforms original CRO and other state-of-the-art approaches.


1991 ◽  
Vol 56 (3) ◽  
pp. 505-559 ◽  
Author(s):  
Karel Eckschlager

In this review, analysis is treated as a process of gaining information on chemical composition, taking place in a stochastic system. A model of this system is outlined, and a survey of measures and methods of information theory is presented to an extent as useful for qualitative or identification, quantitative and trace analysis and multicomponent analysis. It is differentiated between information content of an analytical signal and information gain, or amount of information, obtained by the analysis, and their interrelation is demonstrated. Some notions of analytical chemistry are quantified from the information theory and system theory point of view; it is also demonstrated that the use of fuzzy set theory can be suitable. The review sums up the principal results of the series of 25 papers which have been published in this journal since 1971.


2018 ◽  
Vol 7 (1) ◽  
pp. 57-72
Author(s):  
H.P. Vinutha ◽  
Poornima Basavaraju

Day by day network security is becoming more challenging task. Intrusion detection systems (IDSs) are one of the methods used to monitor the network activities. Data mining algorithms play a major role in the field of IDS. NSL-KDD'99 dataset is used to study the network traffic pattern which helps us to identify possible attacks takes place on the network. The dataset contains 41 attributes and one class attribute categorized as normal, DoS, Probe, R2L and U2R. In proposed methodology, it is necessary to reduce the false positive rate and improve the detection rate by reducing the dimensionality of the dataset, use of all 41 attributes in detection technology is not good practices. Four different feature selection methods like Chi-Square, SU, Gain Ratio and Information Gain feature are used to evaluate the attributes and unimportant features are removed to reduce the dimension of the data. Ensemble classification techniques like Boosting, Bagging, Stacking and Voting are used to observe the detection rate separately with three base algorithms called Decision stump, J48 and Random forest.


Sign in / Sign up

Export Citation Format

Share Document