information gain Latest Research Papers

Student Profile Modeling Using Boosting Algorithms

International Journal of Web-Based Learning and Teaching Technologies ◽

10.4018/ijwltt.20220901.oa4 ◽

2022 ◽

Vol 17 (5) ◽

pp. 1-13

Author(s):

Touria Hamim ◽

Faouzia Benabbou ◽

Nawal Sael

Keyword(s):

Student Performance ◽

Information Gain ◽

Recursive Feature Elimination ◽

Gradient Boosting ◽

Fisher Score ◽

Student Profile ◽

Light Gradient ◽

Gradient Boosting Machine ◽

Boosting Algorithms ◽

Classification Prediction

The student profile has become an important component of education systems. Many systems objectives, as e-recommendation, e-orientation, e-recruitment and dropout prediction are essentially based on the profile for decision support. Machine learning plays an important role in this context and several studies have been carried out either for classification, prediction or clustering purpose. In this paper, the authors present a comparative study between different boosting algorithms which have been used successfully in many fields and for many purposes. In addition, the authors applied feature selection methods Fisher Score, Information Gain combined with Recursive Feature Elimination to enhance the preprocessing task and models’ performances. Using multi-label dataset predict the class of the student performance in mathematics, this article results show that the Light Gradient Boosting Machine (LightGBM) algorithm achieved the best performance when using Information gain with Recursive Feature Elimination method compared to the other boosting algorithms.

Detecting Arabic Spam Reviews in Social Networks Based on Classification Algorithms

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3476115 ◽

2022 ◽

Vol 21 (1) ◽

pp. 1-13

Author(s):

Hassan Najadat ◽

Mohammad A. Alzubaidi ◽

Islam Qarqaz

Keyword(s):

Social Media ◽

Information Gain ◽

Machine Learning Algorithms ◽

Classification Algorithms ◽

Detection Accuracy ◽

Original Text ◽

Decision Tree Classifier ◽

Filter Methods ◽

Tree Classifier ◽

Business Entities

Reviews or comments that users leave on social media have great importance for companies and business entities. New product ideas can be evaluated based on customer reactions. However, this use of social media is complicated by those who post spam on social media in the form of reviews and comments. Designing methodologies to automatically detect and block social media spam is complicated by the fact that spammers continuously develop new ways to leave their spam comments. Researchers have proposed several methods to detect English spam reviews. However, few studies have been conducted to detect Arabic spam reviews. This article proposes a keyword-based method for detecting Arabic spam reviews. Keywords or Features are subsets of words from the original text that are labelled as important. A term's weight, Term Frequency–Inverse Document Frequency (TF-IDF) matrix, and filter methods (such as information gain, chi-squared, deviation, correlation, and uncertainty) have been used to extract keywords from Arabic text. The method proposed in this article detects Arabic spam in Facebook comments. The dataset consists of 3,000 Arabic comments extracted from Facebook pages. Four different machine learning algorithms are used in the detection process, including C4.5, kNN, SVM, and Naïve Bayes classifiers. The results show that the Decision Tree classifier outperforms the other classification algorithms, with a detection accuracy of 92.63%.

Predicting Students’ Performance In Basic Algorithms Programming In an E-Learning Environment Using Decision Tree Approach

Syntax Literate ; Jurnal Ilmiah Indonesia ◽

10.36418/syntax-literate.v7i1.5733 ◽

2022 ◽

Vol 7 (1) ◽

pp. 498

Author(s):

Jonas De Deus Guterres ◽

Kusuma Ayu Laksitowening ◽

Febryanti Sthevanie

Keyword(s):

Learning Environment ◽

Decision Tree ◽

Student Performance ◽

Information Gain ◽

Quality Data ◽

Basic Algorithm ◽

Id3 Algorithm ◽

E Learning ◽

Advanced Analysis ◽

Quality In Higher Education

Predicting the performance of students plays an important role in every institution to protect their students from failures and leverage their quality in higher education. Algorithm and Programming is a fundamental course for the students who start their studies in Informatics. Hence, the scope of this research is to identify the critical attributes which influence student performance in the E-learning Environment on Moodle LMS (Learning Management System) Platform and its accuracy. Data mining helps the process of preprocessing data in a dataset from raw data to quality data for advanced analysis. Dataset set is consisting of student academic performance such as grades of Quizzes, Mid exams, Final exams, and Final projects. Moreover, the dataset from LMS is considered as well in the process of modeling, in terms of constructing the decision tree, such as punctuality submission of Quizzes, Assignments, and Final Projects. Regarding the Basic Algorithm and Programming course, which is separated into two subjects in the first and second semester, thus the research will predict the student performance in the Basic Algorithm and programming course in the second semester based on the Introduction to programming course in the first semester. Decision Tree techniques are applied by using information gain in ID3 algorithm to get the important feature which is the PP index has the highest information gain with value 0.44, also the accuracy between ID3 and J48 algorithm that shows ID3 has the highest accuracy of modeling which is 84.80% compared to J48 82.34%.

Estimated path information gain-based robot exploration under perceptual uncertainty

Robotica ◽

10.1017/s0263574721001946 ◽

2022 ◽

pp. 1-17

Author(s):

Jie Liu ◽

Chaoqun Wang ◽

Wenzheng Chi ◽

Guodong Chen ◽

Lining Sun

Keyword(s):

Decision Making ◽

Topological Structure ◽

Information Gain ◽

State Of The Art ◽

Selection Process ◽

Random Trees ◽

Experimental Results ◽

Restricted Area ◽

Structure Information ◽

Path Information

Abstract At present, the frontier-based exploration has been one of the mainstream methods in autonomous robot exploration. Among the frontier-based algorithms, the method of searching frontiers based on rapidly exploring random trees consumes less computing resources with higher efficiency and performs well in full-perceptual scenarios. However, in the partially perceptual cases, namely when the environmental structure is beyond the perception range of robot sensors, the robot often lingers in a restricted area, and the exploration efficiency is reduced. In this article, we propose a decision-making method for robot exploration by integrating the estimated path information gain and the frontier information. The proposed method includes the topological structure information of the environment on the path to the candidate frontier in the frontier selection process, guiding the robot to select a frontier with rich environmental information to reduce perceptual uncertainty. Experiments are carried out in different environments with the state-of-the-art RRT-exploration method as a reference. Experimental results show that with the proposed strategy, the efficiency of robot exploration has been improved obviously.

Integrative Profiling of Amyotrophic Lateral Sclerosis Lymphoblasts Identifies Unique Metabolic and Mitochondrial Disease Fingerprints

10.21203/rs.3.rs-1196454/v1 ◽

2022 ◽

Author(s):

Teresa Cunha-Oliveira ◽

Marcelo Carvalho ◽

Vilma Sardão ◽

Elisabete Ferreiro ◽

Débora Mena ◽

...

Keyword(s):

Amyotrophic Lateral Sclerosis ◽

Information Gain ◽

Principal Component ◽

Multidimensional Analysis ◽

Metabolic Phenotype ◽

Rapid Progression ◽

Peripheral Tissues ◽

Atp Production ◽

Als Patients ◽

Lateral Sclerosis

Abstract Amyotrophic lateral sclerosis (ALS) is a devastating neurodegenerative disease with a rapid progression and no effective treatment. Metabolic and mitochondrial alterations in peripheral tissues of ALS patients may present diagnostic and therapeutic interest. We aimed to identify mitochondrial fingerprints in lymphoblast from ALS patients harboring SOD1 mutations (mutSOD1) or with unidentified mutations (undSOD1), compared with age/sex matched controls. Three groups of lymphoblasts, from mutSOD1 or undSOD1 ALS patients and age/sex-matched controls, were obtained from Coriell Biobank and divided into 3 age/sex-matched cohorts. Mitochondria-associated metabolic pathways were analyzed using Seahorse MitoStress and ATP Rate assays, complemented with metabolic phenotype microarrays, metabolite levels, gene expression, and protein expression and activity. Pooled (all cohorts) and paired (intra-cohort) analyses were performed by using bioinformatic tools, and the features with higher information gain values were selected and used for principal component analysis and Naïve Bayes classification. Pooled analysis revealed that undSOD1 patients had statistically higher glycolytic ATP production rate and lower Tfam protein content compared to controls, which were also the experimental features highlighted by multidimensional analysis. Metabolic phenotypic profiles in lymphoblasts from ALS patients with mutSOD1 and undSOD1 revealed unique age-dependent different substrate oxidation profiles. For most parameters, different patterns of variation were found between cohorts, which may be due to age or sex. In the present work, we investigated several metabolic and mitochondrial hallmarks in lymphoblasts from each donor and, although a high heterogeneity of results was found, we identified specific metabolic and mitochondrial fingerprints that may have a diagnostic and therapeutic interest.

Effect of information gain on document classification using k-nearest neighbor

Register Jurnal Ilmiah Teknologi Sistem Informasi ◽

10.26594/register.v8i1.2397 ◽

2022 ◽

Vol 8 (1) ◽

pp. 50

Author(s):

Rifki Indra Perwira ◽

Bambang Yuwono ◽

Risya Ines Putri Siswoyo ◽

Febri Liantoni ◽

Hidayatulah Himawan

Keyword(s):

Feature Selection ◽

Test Data ◽

Nearest Neighbor ◽

Intelligent System ◽

Information Gain ◽

Training Data ◽

State Universities ◽

Features Selection ◽

K Nearest Neighbor ◽

Support Students

State universities have a library as a facility to support students’ education and science, which contains various books, journals, and final assignments. An intelligent system for classifying documents is needed to ease library visitors in higher education as a form of service to students. The documents that are in the library are generally the result of research. Various complaints related to the imbalance of data texts and categories based on irrelevant document titles and words that have the ambiguity of meaning when searching for documents are the main reasons for the need for a classification system. This research uses k-Nearest Neighbor (k-NN) to categorize documents based on study interests with information gain features selection to handle unbalanced data and cosine similarity to measure the distance between test and training data. Based on the results of tests conducted with 276 training data, the highest results using the information gain selection feature using 80% training data and 20% test data produce an accuracy of 87.5% with a parameter value of k=5. The highest accuracy results of 92.9% are achieved without information gain feature selection, with the proportion of training data of 90% and 10% test data and parameters k=5, 7, and 9. This paper concludes that without information gain feature selection, the system has better accuracy than using the feature selection because every word in the document title is considered to have an essential role in forming the classification.

Nearly maximal information gain due to time integration in central dogma reactions

10.1101/2022.01.02.474710 ◽

2022 ◽

Author(s):

Swarnavo Sarkar ◽

Jayan Rammohan

Keyword(s):

Rate Constants ◽

Cellular Response ◽

Channel Model ◽

Information Gain ◽

Time Integration ◽

Information Channel ◽

Information Theoretic ◽

Central Dogma ◽

New Information ◽

In Series

Living cells process information about their environment through the central dogma processes of transcription and translation, which drive the cellular response to stimuli. Here, we study the transfer of information from environmental input to the transcript and protein expression levels. Evaluation of both experimental and analogous simulation data reveals that transcription and translation are not two simple information channels connected in series. Instead, we show that the central dogma reactions often create a time-integrating information channel, where the translation channel receives and integrates multiple outputs from the transcription channel. This information channel model of the central dogma provides new information-theoretic selection criteria for the central dogma rate constants. Using the data for four well-studied species we show that their central dogma rate constants achieve information gain due to time integration while also keeping the loss due to stochasticity in translation relatively low (< 0.5 bits).

Prediction of Skin Diseases Using Machine Learning

10.4018/978-1-7998-7888-9.ch008 ◽

2022 ◽

pp. 154-178

Author(s):

Siddhartha Kumar Arjaria ◽

Vikas Raj ◽

Sunil Kumar ◽

Priyanshu Shrivastava ◽

Monu Kumar ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Skin Disease ◽

Skin Diseases ◽

Information Gain ◽

Machine Learning Algorithms ◽

Ensemble Method ◽

Chi Square ◽

Data Mining Techniques ◽

Disease Rates

Skin disease rates have been increasing over the past few decades. It has led to both fatal and non-fatal disabilities all around the world, especially in those areas where medical resources are not good enough. Early diagnosis of skin diseases increases the chances of cure significantly. Therefore, this work is comparing six machine learning algorithms, namely KNN, random forest, neural network, naïve bayes, logistic regression, and SVM, for the prediction of the skin diseases. The information gain, gain ratio, gini decrease, chi-square, and relieff are used to rank the features. This work comprises the introduction, literature review, and proposed methodology parts. In this research paper, a new method of analyzing skin disease has been proposed in which six different data mining techniques are used to develop an ensemble method that integrates all the six data mining techniques as a single one. The ensemble method used on the dermatology dataset gives improved result with 94% accuracy in comparison to other classifier algorithms and hence is more effective in this area.

Monitoring and Mapping of Crop Fields with UAV Swarms Based on Information Gain

Distributed Autonomous Robotic Systems - Springer Proceedings in Advanced Robotics ◽

10.1007/978-3-030-92790-5_24 ◽

2022 ◽

pp. 306-319

Author(s):

Carlos Carbone ◽

Dario Albani ◽

Federico Magistri ◽

Dimitri Ognibene ◽

Cyrill Stachniss ◽

...

Keyword(s):

Information Gain ◽

Crop Fields

Artificial Intelligence and Deep Learning based Information Retrieval Framework for Assessing Students Performance

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2022010101 ◽

2022 ◽

Vol 12 (1) ◽

pp. 0-0

Keyword(s):

Neural Network ◽

Deep Learning ◽

Performance Indicators ◽

Information Gain ◽

Educational Institution ◽

Academic Quality ◽

Self Assessment ◽

Continuous Quality ◽

Assessment Approach

Improving the quality of education is a challenging activity in every educational institution. Through this research paper, a model has been proposed representing the challenges in order to manage the trade-off to maintain the philosophy of continuous quality improvement and strict control based on Higher Education Institutions (HEIs). Several standards criteria, performance parameters, and Key Performance Indicators are studied and suggested for a quality self-assessment approach. After the data is collected, the significant features are selected for analysis of data using dedicated gain, which are designed by integrating the information gain and the dedicated weight constants. After that, deep learning methodologies like regression analysis, the artificial neural network, and the Matlab model are used for evaluating the academic quality of institutions. Finally, areas of development have been recommended using the probabilistic model to the administrators of the institutions based on the prediction made using a deep neural network.

information gain
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Student Profile Modeling Using Boosting Algorithms

Detecting Arabic Spam Reviews in Social Networks Based on Classification Algorithms

Predicting Students’ Performance In Basic Algorithms Programming In an E-Learning Environment Using Decision Tree Approach

Estimated path information gain-based robot exploration under perceptual uncertainty

Integrative Profiling of Amyotrophic Lateral Sclerosis Lymphoblasts Identifies Unique Metabolic and Mitochondrial Disease Fingerprints

Effect of information gain on document classification using k-nearest neighbor

Nearly maximal information gain due to time integration in central dogma reactions

Prediction of Skin Diseases Using Machine Learning

Monitoring and Mapping of Crop Fields with UAV Swarms Based on Information Gain

Artificial Intelligence and Deep Learning based Information Retrieval Framework for Assessing Students Performance

Export Citation Format

information gainRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Student Profile Modeling Using Boosting Algorithms

Detecting Arabic Spam Reviews in Social Networks Based on Classification Algorithms

Predicting Students’ Performance In Basic Algorithms Programming In an E-Learning Environment Using Decision Tree Approach

Estimated path information gain-based robot exploration under perceptual uncertainty

Integrative Profiling of Amyotrophic Lateral Sclerosis Lymphoblasts Identifies Unique Metabolic and Mitochondrial Disease Fingerprints

Effect of information gain on document classification using k-nearest neighbor

Nearly maximal information gain due to time integration in central dogma reactions

Prediction of Skin Diseases Using Machine Learning

Monitoring and Mapping of Crop Fields with UAV Swarms Based on Information Gain

Artificial Intelligence and Deep Learning based Information Retrieval Framework for Assessing Students Performance

information gain
Recently Published Documents