information gain
Recently Published Documents


TOTAL DOCUMENTS

1213
(FIVE YEARS 546)

H-INDEX

36
(FIVE YEARS 7)

Author(s):  
Touria Hamim ◽  
Faouzia Benabbou ◽  
Nawal Sael

The student profile has become an important component of education systems. Many systems objectives, as e-recommendation, e-orientation, e-recruitment and dropout prediction are essentially based on the profile for decision support. Machine learning plays an important role in this context and several studies have been carried out either for classification, prediction or clustering purpose. In this paper, the authors present a comparative study between different boosting algorithms which have been used successfully in many fields and for many purposes. In addition, the authors applied feature selection methods Fisher Score, Information Gain combined with Recursive Feature Elimination to enhance the preprocessing task and models’ performances. Using multi-label dataset predict the class of the student performance in mathematics, this article results show that the Light Gradient Boosting Machine (LightGBM) algorithm achieved the best performance when using Information gain with Recursive Feature Elimination method compared to the other boosting algorithms.


Author(s):  
Hassan Najadat ◽  
Mohammad A. Alzubaidi ◽  
Islam Qarqaz

Reviews or comments that users leave on social media have great importance for companies and business entities. New product ideas can be evaluated based on customer reactions. However, this use of social media is complicated by those who post spam on social media in the form of reviews and comments. Designing methodologies to automatically detect and block social media spam is complicated by the fact that spammers continuously develop new ways to leave their spam comments. Researchers have proposed several methods to detect English spam reviews. However, few studies have been conducted to detect Arabic spam reviews. This article proposes a keyword-based method for detecting Arabic spam reviews. Keywords or Features are subsets of words from the original text that are labelled as important. A term's weight, Term Frequency–Inverse Document Frequency (TF-IDF) matrix, and filter methods (such as information gain, chi-squared, deviation, correlation, and uncertainty) have been used to extract keywords from Arabic text. The method proposed in this article detects Arabic spam in Facebook comments. The dataset consists of 3,000 Arabic comments extracted from Facebook pages. Four different machine learning algorithms are used in the detection process, including C4.5, kNN, SVM, and Naïve Bayes classifiers. The results show that the Decision Tree classifier outperforms the other classification algorithms, with a detection accuracy of 92.63%.


2022 ◽  
Vol 7 (1) ◽  
pp. 498
Author(s):  
Jonas De Deus Guterres ◽  
Kusuma Ayu Laksitowening ◽  
Febryanti Sthevanie

Predicting the performance of students plays an important role in every institution to protect their students from failures and leverage their quality in higher education. Algorithm and Programming is a fundamental course for the students who start their studies in Informatics. Hence, the scope of this research is to identify the critical attributes which influence student performance in the E-learning Environment on Moodle LMS (Learning Management System) Platform and its accuracy. Data mining helps the process of preprocessing data in a dataset from raw data to quality data for advanced analysis. Dataset set is consisting of student academic performance such as grades of Quizzes, Mid exams, Final exams, and Final projects. Moreover, the dataset from LMS is considered as well in the process of modeling, in terms of constructing the decision tree, such as punctuality submission of Quizzes, Assignments, and Final Projects. Regarding the Basic Algorithm and Programming course, which is separated into two subjects in the first and second semester, thus the research will predict the student performance in the Basic Algorithm and programming course in the second semester based on the Introduction to programming course in the first semester. Decision Tree techniques are applied by using information gain in ID3 algorithm to get the important feature which is the PP index has the highest information gain with value 0.44, also the accuracy between ID3 and J48 algorithm that shows ID3 has the highest accuracy of modeling which is 84.80% compared to J48 82.34%.


Robotica ◽  
2022 ◽  
pp. 1-17
Author(s):  
Jie Liu ◽  
Chaoqun Wang ◽  
Wenzheng Chi ◽  
Guodong Chen ◽  
Lining Sun

Abstract At present, the frontier-based exploration has been one of the mainstream methods in autonomous robot exploration. Among the frontier-based algorithms, the method of searching frontiers based on rapidly exploring random trees consumes less computing resources with higher efficiency and performs well in full-perceptual scenarios. However, in the partially perceptual cases, namely when the environmental structure is beyond the perception range of robot sensors, the robot often lingers in a restricted area, and the exploration efficiency is reduced. In this article, we propose a decision-making method for robot exploration by integrating the estimated path information gain and the frontier information. The proposed method includes the topological structure information of the environment on the path to the candidate frontier in the frontier selection process, guiding the robot to select a frontier with rich environmental information to reduce perceptual uncertainty. Experiments are carried out in different environments with the state-of-the-art RRT-exploration method as a reference. Experimental results show that with the proposed strategy, the efficiency of robot exploration has been improved obviously.


2022 ◽  
Author(s):  
Teresa Cunha-Oliveira ◽  
Marcelo Carvalho ◽  
Vilma Sardão ◽  
Elisabete Ferreiro ◽  
Débora Mena ◽  
...  

Abstract Amyotrophic lateral sclerosis (ALS) is a devastating neurodegenerative disease with a rapid progression and no effective treatment. Metabolic and mitochondrial alterations in peripheral tissues of ALS patients may present diagnostic and therapeutic interest. We aimed to identify mitochondrial fingerprints in lymphoblast from ALS patients harboring SOD1 mutations (mutSOD1) or with unidentified mutations (undSOD1), compared with age/sex matched controls. Three groups of lymphoblasts, from mutSOD1 or undSOD1 ALS patients and age/sex-matched controls, were obtained from Coriell Biobank and divided into 3 age/sex-matched cohorts. Mitochondria-associated metabolic pathways were analyzed using Seahorse MitoStress and ATP Rate assays, complemented with metabolic phenotype microarrays, metabolite levels, gene expression, and protein expression and activity. Pooled (all cohorts) and paired (intra-cohort) analyses were performed by using bioinformatic tools, and the features with higher information gain values were selected and used for principal component analysis and Naïve Bayes classification. Pooled analysis revealed that undSOD1 patients had statistically higher glycolytic ATP production rate and lower Tfam protein content compared to controls, which were also the experimental features highlighted by multidimensional analysis. Metabolic phenotypic profiles in lymphoblasts from ALS patients with mutSOD1 and undSOD1 revealed unique age-dependent different substrate oxidation profiles. For most parameters, different patterns of variation were found between cohorts, which may be due to age or sex. In the present work, we investigated several metabolic and mitochondrial hallmarks in lymphoblasts from each donor and, although a high heterogeneity of results was found, we identified specific metabolic and mitochondrial fingerprints that may have a diagnostic and therapeutic interest.


2022 ◽  
Vol 8 (1) ◽  
pp. 50
Author(s):  
Rifki Indra Perwira ◽  
Bambang Yuwono ◽  
Risya Ines Putri Siswoyo ◽  
Febri Liantoni ◽  
Hidayatulah Himawan

State universities have a library as a facility to support students’ education and science, which contains various books, journals, and final assignments. An intelligent system for classifying documents is needed to ease library visitors in higher education as a form of service to students. The documents that are in the library are generally the result of research. Various complaints related to the imbalance of data texts and categories based on irrelevant document titles and words that have the ambiguity of meaning when searching for documents are the main reasons for the need for a classification system. This research uses k-Nearest Neighbor (k-NN) to categorize documents based on study interests with information gain features selection to handle unbalanced data and cosine similarity to measure the distance between test and training data. Based on the results of tests conducted with 276 training data, the highest results using the information gain selection feature using 80% training data and 20% test data produce an accuracy of 87.5% with a parameter value of k=5. The highest accuracy results of 92.9% are achieved without information gain feature selection, with the proportion of training data of 90% and 10% test data and parameters k=5, 7, and 9. This paper concludes that without information gain feature selection, the system has better accuracy than using the feature selection because every word in the document title is considered to have an essential role in forming the classification.


2022 ◽  
Author(s):  
Swarnavo Sarkar ◽  
Jayan Rammohan

Living cells process information about their environment through the central dogma processes of transcription and translation, which drive the cellular response to stimuli. Here, we study the transfer of information from environmental input to the transcript and protein expression levels. Evaluation of both experimental and analogous simulation data reveals that transcription and translation are not two simple information channels connected in series. Instead, we show that the central dogma reactions often create a time-integrating information channel, where the translation channel receives and integrates multiple outputs from the transcription channel. This information channel model of the central dogma provides new information-theoretic selection criteria for the central dogma rate constants. Using the data for four well-studied species we show that their central dogma rate constants achieve information gain due to time integration while also keeping the loss due to stochasticity in translation relatively low (< 0.5 bits).


2022 ◽  
pp. 154-178
Author(s):  
Siddhartha Kumar Arjaria ◽  
Vikas Raj ◽  
Sunil Kumar ◽  
Priyanshu Shrivastava ◽  
Monu Kumar ◽  
...  

Skin disease rates have been increasing over the past few decades. It has led to both fatal and non-fatal disabilities all around the world, especially in those areas where medical resources are not good enough. Early diagnosis of skin diseases increases the chances of cure significantly. Therefore, this work is comparing six machine learning algorithms, namely KNN, random forest, neural network, naïve bayes, logistic regression, and SVM, for the prediction of the skin diseases. The information gain, gain ratio, gini decrease, chi-square, and relieff are used to rank the features. This work comprises the introduction, literature review, and proposed methodology parts. In this research paper, a new method of analyzing skin disease has been proposed in which six different data mining techniques are used to develop an ensemble method that integrates all the six data mining techniques as a single one. The ensemble method used on the dermatology dataset gives improved result with 94% accuracy in comparison to other classifier algorithms and hence is more effective in this area.


Author(s):  
Carlos Carbone ◽  
Dario Albani ◽  
Federico Magistri ◽  
Dimitri Ognibene ◽  
Cyrill Stachniss ◽  
...  
Keyword(s):  

2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

Improving the quality of education is a challenging activity in every educational institution. Through this research paper, a model has been proposed representing the challenges in order to manage the trade-off to maintain the philosophy of continuous quality improvement and strict control based on Higher Education Institutions (HEIs). Several standards criteria, performance parameters, and Key Performance Indicators are studied and suggested for a quality self-assessment approach. After the data is collected, the significant features are selected for analysis of data using dedicated gain, which are designed by integrating the information gain and the dedicated weight constants. After that, deep learning methodologies like regression analysis, the artificial neural network, and the Matlab model are used for evaluating the academic quality of institutions. Finally, areas of development have been recommended using the probabilistic model to the administrators of the institutions based on the prediction made using a deep neural network.


Sign in / Sign up

Export Citation Format

Share Document