Performance Implications of Knowledge Discovery Techniques in Databases

Advances in Database Research - Advanced Topics in Database Research, Volume 2 ◽

10.4018/978-1-59140-063-9.ch009 ◽

2011 ◽

pp. 191-212

Author(s):

Balaji Rajagopalan ◽

Ravindra Krovi

Keyword(s):

Machine Learning ◽

Data Mining ◽

Knowledge Discovery ◽

Machine Learning Algorithms ◽

Classification Problems ◽

Business Decision ◽

Effective Implementation ◽

Simulation Based ◽

Careful Assessment ◽

Mining Tools

This chapter introduces knowledge discovery techniques as a means of identifying critical trends and patterns for business decision support. It suggests that effective implementation of these techniques requires a careful assessment of the various data mining tools and algorithms available. Both statistical and machine-learning based algorithms have been widely applied to discover knowledge from data. In this chapter we describe some of these algorithms and investigate their relative performance for classification problems. Simulation based results support the proposition that machine-learning algorithms outperform their statistical counterparts, albeit only under certain conditions. Further, the authors hope that the discussion on performance related issues will foster a better understanding of the application and appropriateness of knowledge discovery techniques.

Download Full-text

Benchmarking Data Mining Algorithms

Data Warehousing and Web Engineering ◽

10.4018/978-1-931777-02-5.ch003 ◽

2011 ◽

pp. 77-99

Author(s):

Balaji Rajagopalan ◽

Ravi Krovi

Keyword(s):

Machine Learning ◽

Data Mining ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Successful Implementation ◽

Basic Premise ◽

Data Mining Algorithms ◽

External Data ◽

Mining Algorithms ◽

Careful Assessment

Data mining is the process of sifting through the mass of organizational (internal and external) data to identify patterns critical for decision support. Successful implementation of the data mining effort requires a careful assessment of the various tools and algorithms available. The basic premise of this study is that machine-learning algorithms, which are assumption free, should outperform their traditional counterparts when mining business databases. The objective of this study is to test this proposition by investigating the performance of the algorithms for several scenarios. The scenarios are based on simulations designed to reflect the extent to which typical statistical assumptions are violated in the business domain. The results of the computational experiments support the proposition that machine learning algorithms generally outperform their statistical counterparts under certain conditions. These can be used as prescriptive guidelines for the applicability of data mining techniques.

Download Full-text

Knowledge Discovery for Higher Education Student Retention Based on Data Mining: Machine Learning Algorithms and Case Study in Chile

Entropy ◽

10.3390/e23040485 ◽

2021 ◽

Vol 23 (4) ◽

pp. 485 ◽

Cited By ~ 1

Author(s):

Carlos A. Palacios ◽

José A. Reyes-Suárez ◽

Lorena A. Bearzotti ◽

Víctor Leiva ◽

Carolina Marchant

Keyword(s):

Higher Education ◽

Machine Learning ◽

Data Mining ◽

Random Forest ◽

Student Retention ◽

Knowledge Discovery ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector

Data mining is employed to extract useful information and to detect patterns from often large data sets, closely related to knowledge discovery in databases and data science. In this investigation, we formulate models based on machine learning algorithms to extract relevant information predicting student retention at various levels, using higher education data and specifying the relevant variables involved in the modeling. Then, we utilize this information to help the process of knowledge discovery. We predict student retention at each of three levels during their first, second, and third years of study, obtaining models with an accuracy that exceeds 80% in all scenarios. These models allow us to adequately predict the level when dropout occurs. Among the machine learning algorithms used in this work are: decision trees, k-nearest neighbors, logistic regression, naive Bayes, random forest, and support vector machines, of which the random forest technique performs the best. We detect that secondary educational score and the community poverty index are important predictive variables, which have not been previously reported in educational studies of this type. The dropout assessment at various levels reported here is valid for higher education institutions around the world with similar conditions to the Chilean case, where dropout rates affect the efficiency of such institutions. Having the ability to predict dropout based on student’s data enables these institutions to take preventative measures, avoiding the dropouts. In the case study, balancing the majority and minority classes improves the performance of the algorithms.

Download Full-text

A Comparative Study of Different Machine Learning Algorithms for Disease Prediction

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/v7i7/0177 ◽

2017 ◽

Vol 7 (7) ◽

pp. 172

Author(s):

Anantvir Singh Romana

Keyword(s):

Machine Learning ◽

Subsequent Treatment ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Disease Prediction ◽

Classification Problems ◽

Learning Techniques ◽

Neural Network Classifiers ◽

Diagnostic Detection

Accurate diagnostic detection of the disease in a patient is critical and may alter the subsequent treatment and increase the chances of survival rate. Machine learning techniques have been instrumental in disease detection and are currently being used in various classification problems due to their accurate prediction performance. Various techniques may provide different desired accuracies and it is therefore imperative to use the most suitable method which provides the best desired results. This research seeks to provide comparative analysis of Support Vector Machine, Naïve bayes, J48 Decision Tree and neural network classifiers breast cancer and diabetes datsets.

Download Full-text

Pipelining machine learning algorithms for knowledge discovery

10.1117/12.380565 ◽

2000 ◽

Author(s):

Allan L. Egbert, Jr. ◽

Robert C. Lacher

Keyword(s):

Machine Learning ◽

Knowledge Discovery ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

Predicting Student Failure in University Examination using Machine Learning Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e2643.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 956-959

Keyword(s):

Machine Learning ◽

Data Mining ◽

Performance Management ◽

Student Performance ◽

Learning Algorithms ◽

Educational Data Mining ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Social Characteristics ◽

Student Failure

Student Performance Management is one of the key pillars of the higher education institutions since it directly impacts the student’s career prospects and college rankings. This paper follows the path of learning analytics and educational data mining by applying machine learning techniques in student data for identifying students who are at the more likely to fail in the university examinations and thus providing needed interventions for improved student performance. The Paper uses data mining approach with 10 fold cross validation to classify students based on predictors which are demographic and social characteristics of the students. This paper compares five popular machine learning algorithms Rep Tree, Jrip, Random Forest, Random Tree, Naive Bayes algorithms based on overall classifier accuracy as well as other class specific indicators i.e. precision, recall, f-measure. Results proved that Rep tree algorithm outperformed other machine learning algorithms in classifying students who are at more likely to fail in the examinations.

Download Full-text

Dr. Phish: Phishing Website Detector

E3S Web of Conferences ◽

10.1051/e3sconf/202129701032 ◽

2021 ◽

Vol 297 ◽

pp. 01032

Author(s):

Harish Kumar ◽

Anshal Prasad ◽

Ninad Rane ◽

Nilay Tamane ◽

Anjali Yeole

Keyword(s):

Machine Learning ◽

Data Mining ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Cyber Crime ◽

Data Mining Algorithms ◽

Learning Techniques ◽

Mining Algorithms ◽

Host Properties ◽

New Strategies

Phishing is a common attack on credulous people by making them disclose their unique information. It is a type of cyber-crime where false sites allure exploited people to give delicate data. This paper deals with methods for detecting phishing websites by analyzing various features of URLs by Machine learning techniques. This experimentation discusses the methods used for detection of phishing websites based on lexical features, host properties and page importance properties. We consider various data mining algorithms for evaluation of the features in order to get a better understanding of the structure of URLs that spread phishing. To protect end users from visiting these sites, we can try to identify the phishing URLs by analyzing their lexical and host-based features.A particular challenge in this domain is that criminals are constantly making new strategies to counter our defense measures. To succeed in this contest, we need Machine Learning algorithms that continually adapt to new examples and features of phishing URLs.

Download Full-text

Knowledge discovery in Chinese herbal medicine: a machine learning perspective

MATEC Web of Conferences ◽

10.1051/matecconf/202133606024 ◽

2021 ◽

Vol 336 ◽

pp. 06024

Author(s):

Nan Liang ◽

Qing Liang ◽

Fenglei Ji

Keyword(s):

Machine Learning ◽

Herbal Medicine ◽

Knowledge Discovery ◽

Chinese Herbal Medicine ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Current Status ◽

Network Pharmacology ◽

Valuable Insight ◽

Chinese Herbal

Traditional Chinese Medicine (TCM) has attracted more and more attention due to its remarkable effects on treating diseases, and Chinese herbal medicine (CHM) is an important partition of TCM, rich in natural active ingredients. Researchers are trying multiple analytical methods to dig out more valuable information about CHM and reveal the principle of TCM. Machine learning is playing an important role in the studies. Knowledge discovery of CHM using machine learning mainly includes quality control of CHM, network pharmacology in CHM, and medical prescriptions composed by CHM, aiming to understand TCM better, provide more efficiency methods in the production of CHM and find novel treatment of disease not curable nowadays. In this paper, we summarized the basic idea of frequently used classification and clustering machine learning algorithms, introduced pre-processing algorithms commonly used to simplify and accelerate machine learning procedure, presented current status of machine learning algorithms’ applications in knowledge discovery of CHM, discussed challenges and future trends of machine learning’s application in CHM. It is believed that the paper provides a valuable insight for the starters trying to apply machine learning in the study of CHM and catch up the recent status of related researches.

Download Full-text

Lead-based virtual screening and prediction of EGFR inhibitors using PubChem’s database with data mining and machine learning algorithms

10.1021/scimeetings.0c03836 ◽

2020 ◽

Cited By ~ 1

Author(s):

Kedan He

Keyword(s):

Machine Learning ◽

Data Mining ◽

Virtual Screening ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Egfr Inhibitors

Download Full-text

A Tutorial on Hierarchical Classification with Applications in Bioinformatics

Intelligent Information Technologies ◽

10.4018/978-1-59904-941-0.ch006 ◽

2011 ◽

pp. 114-140

Author(s):

Alex Freitas ◽

André C.P.L.F. de Carvalho

Keyword(s):

Machine Learning ◽

Data Mining ◽

Protein Function ◽

Hierarchical Classification ◽

Classification Problems ◽

Classification Techniques ◽

Hierarchical Relationship

In machine learning and data mining, most of the works in classification problems deal with flat classification, where each instance is classified in one of a set of possible classes and there is no hierarchical relationship between the classes. There are, however, more complex classification problems where the classes to be predicted are hierarchically related. This chapter presents a tutorial on the hierarchical classification techniques found in the literature. We also discuss how hierarchical classification techniques have been applied to the area of bioinformatics (particularly the prediction of protein function), where hierarchical classification problems are often found.

Download Full-text

Machine Learning for Business Analytics

Advances in Data Mining and Database Management - Challenges and Applications of Data Analytics in Social Perspectives ◽

10.4018/978-1-7998-2566-1.ch013 ◽

2021 ◽

pp. 232-256

Author(s):

Kağan Okatan

Keyword(s):

Machine Learning ◽

Data Mining ◽

Social Media ◽

Big Data ◽

Machine Learning Algorithms ◽

Decision Makers ◽

Business Analytics ◽

Business Intelligence Systems ◽

Long Time ◽

Rules Of The Game

All these types of analytics have been answering business questions for a long time about the principal methods of investigating data warehouses. Especially data mining and business intelligence systems support decision makers to reach the information they want. Many existing systems are trying to keep up with a phenomenon that has changed the rules of the game in recent years. This is undoubtedly the undeniable attraction of 'big data'. In particular, the issue of evaluating the big data generated especially by social media is among the most up-to-date issues of business analytics, and this issue demonstrates the importance of integrating machine learning into business analytics. This section introduces the prominent machine learning algorithms that are increasingly used for business analytics and emphasizes their application areas.

Download Full-text