mining algorithms
Recently Published Documents


TOTAL DOCUMENTS

1230
(FIVE YEARS 419)

H-INDEX

30
(FIVE YEARS 8)

Author(s):  
Özerk Yavuz

Epidemic diseases can be extremely dangerous with its hazarding influences. They may have negative effects on economies, businesses, environment, humans, and workforce. In this paper, some of the factors that are interrelated with COVID-19 pandemic have been examined using data mining methodologies and approaches. As a result of the analysis some rules and insights have been discovered and performances of the data mining algorithms have been evaluated. According to the analysis results, JRip algorithmic technique had the most correct classification rate and the lowest root mean squared error (RMSE). Considering classification rate and RMSE measure, JRip can be considered as an effective method in understanding factors that are related with corona virus caused deaths.


Author(s):  
Feng Xiong ◽  
Hongzhi Wang

The data mining has remained a subject of unfailing charm for research. The knowledge graph is rising and showing infinite life force and strong developing potential in recent years, where it is observed that acyclic knowledge graph has capacity for enhancing usability. Though the development of knowledge graphs has provided an ample scope for appearing the abilities of data mining, related researches are still insufficient. In this paper, we introduce path traversal patterns mining to knowledge graph. We design a novel simple path traversal pattern mining framework for improving the representativeness of result. A divide-and-conquer approach of combining each path is proposed to discover the most frequent traversal patterns in knowledge graph. To support the algorithm, we design a linked list structure indexed by the length of sequences with handy operations. The correctness of algorithm is proven. Experiments show that our algorithm reaches a high coverage with low output amounts compared to existing frequent sequence mining algorithms.


2022 ◽  
Vol 12 (1) ◽  
pp. 522
Author(s):  
Na Zhao ◽  
Qian Liu ◽  
Ming Jing ◽  
Jie Li ◽  
Zhidan Zhao ◽  
...  

In research on complex networks, mining relatively important nodes is a challenging and practical work. However, little research has been done on mining relatively important nodes in complex networks, and the existing relatively important node mining algorithms cannot take into account the indicators of both precision and applicability. Aiming at the scarcity of relatively important node mining algorithms and the limitations of existing algorithms, this paper proposes a relatively important node mining method based on distance distribution and multi-index fusion (DDMF). First, the distance distribution of each node is generated according to the shortest path between nodes in the network; then, the cosine similarity, Euclidean distance and relative entropy are fused, and the entropy weight method is used to calculate the weights of different indexes; Finally, by calculating the relative importance score of nodes in the network, the relatively important nodes are mined. Through verification and analysis on real network datasets in different fields, the results show that the DDMF method outperforms other relatively important node mining algorithms in precision, recall, and AUC value.


Learning data analytics improves the learning field in higher education using educational data for extracting useful patterns and making better decision. Identifying potential at-risk students may help instructors and academic guidance to improve the students’ performance and the achievement of learning outcomes. The aim of this research study is to predict at early phases the student’s failure in a particular course using the standards-based grading. Several machines learning techniques were implemented to predict the student failure based on Support Vector Machine, Multilayer Perceptron, Naïve Bayes, and decision tree. The results on each technique shows the ability of machine learning algorithms to predict the student failure accurately after the third week and before the course dropout week. This study provides a strong knowledge for student performance in all courses. It also provides faculty members the ability to help student at-risk by focusing on them and providing necessary support to improve their performance and avoid failure.


2022 ◽  
pp. 163-187
Author(s):  
Gökçe Karahan Adalı

This study aims to measure the effect of the preventive policies on public during the COVID-19 pandemic as well as measuring the public's trust in the government. The study examines the determinants of public trust in governments and the associations between the preventive measures. It is also aimed to determine the protective measures that governments prefer to implement together by using association rules of data mining algorithms. By this means, double and triple action packages are presented. This study finds that basic characteristics such as education, health, and age are among the most basic determinants of trust in governments during the pandemic. The trust in government and opinions that measures taken are sufficient decreased as the education level increased. Considering the age criteria, this situation is the opposite. It is observed that women followed the preventative policies more strictly than men. It is also observed that public trust in governments is directly proportional to the development levels of countries.


Author(s):  
Владимир Арнольдович Биллиг ◽  
Николай Васильевич Звягинцев

В настоящее время накоплено значительное количество экспериментальных данных, фиксирующих процесс протекания химических реакций. Анализ этих данных комплексом алгоритмов Data Mining дает важную практическую информацию для поиска эффективных условий проведения реакций, при которых получается максимальное количество целевого продукта при минимальных затратах. В данной работе на примере работы с базой, содержащей данные о протекании реакции карбонилирования различных олефинов, показано, как разработанный нами программный комплекс позволяет извлечь полезные знания, способствующие повышению эффективности химических реакций. At present, a significant amount of experimental data has been accumulated, recording the process of the occurrence of chemical reactions. Analysis of these data by a set of Data Mining algorithms provides important practical information for finding effective conditions for carrying out reactions, at which the maximum amount of the target product is obtained at minimal cost. In this paper, using the example of working with a database containing data on the course of the carbonylation reaction of various olefins, it is shown how the software package developed by us allows us to extract useful knowledge that contributes to an increase in the efficiency of chemical reactions.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Juan Yang

Cross-language communication puts forward higher requirements for information mining in English translation course. Aiming at the problem that the frequent patterns in the current digital mining algorithms produce a large number of patterns and rules, with a long execution time, this paper proposes a digital mining algorithm for English translation course information based on digital twin technology. According to the results of word segmentation and tagging, the feature words of English translation text are extracted, and the cross-language mapping of text is established by using digital twin technology. The estimated probability of text translation is maximized by corresponding relationship. The text information is transformed into text vector, the semantic similarity of text is calculated, and the degree of translation matching is judged. Based on this data dimension, the frequent sequence is constructed by transforming suffix sequence into prefix sequence, and the digital mining algorithm is designed. The results of example analysis show that the execution time of digital mining algorithm based on digital twin technology is significantly shorter than that based on Apriori and Map Reduce, and the mining accuracy rate reached more than 80%, which has good performance in processing massive data.


Author(s):  
Ansar Abbas ◽  
Muhammad Aman Ullah ◽  
Abdul Waheed

This study is conducted to predict the body weight (BW) for Thalli sheep of southern Punjab from different body measurements. In the BW prediction, several body measurements viz., withers height, body length, head length, head width, ear length, ear width, neck length, neck width, heart girth, rump length, rump width, tail length, barrel depth and sacral pelvic width are used as predictors. The data mining algorithms such as Chi-square Automatic Interaction Detector (CHAID), Exhaustive CHAID, Classification and Regression Tree (CART) and Artificial Neural Network (ANN) are used to predict the BW for a total of 85 female Thalli sheep. The data set is partitioned into training (80 %) and test (20 %) sets before the algorithms are used. The minimum number of parent (4) and child nodes (2) are set in order to ensure their predictive ability. The R2 % and RMSE values for CHAID, Exhaustive CHAID, ANN and CART algorithms are 67.38(1.003), 64.37(1.049), 61.45(1.093) and 59.02(1.125), respectively. The mostsignificant predictor is BL in the BW prediction of Thalli sheep. The heaviest BW average of 9.596 kg is obtained from the subgroup of those having BL > 25.000 inches. On behalf of the several goodness of fit criteria, we conclude that the CHAID algorithm performance is better in order to predict the BW of Thalli sheep and more suitable decision tree diagram visually. Also, the obtained CHAID results may help to determine body measurements positively associated with BW for developing better selection strategies with the scope of indirect selection criteria.


Sign in / Sign up

Export Citation Format

Share Document