scholarly journals Witten IH, Frank E: Data Mining: Practical Machine Learning Tools and Techniques 2nd edition

2006 ◽  
Vol 5 (1) ◽  
Author(s):  
Francisco Azuaje
2009 ◽  
Vol 79 (2) ◽  
pp. 213-225 ◽  
Author(s):  
A. V. KELAREV ◽  
J. L. YEARWOOD ◽  
P. W. VAMPLEW

AbstractDrensky and Lakatos (Lecture Notes in Computer Science, 357 (Springer, Berlin, 1989), pp. 181–188) have established a convenient property of certain ideals in polynomial quotient rings, which can now be used to determine error-correcting capabilities of combined multiple classifiers following a standard approach explained in the well-known monograph by Witten and Frank (Data Mining: Practical Machine Learning Tools and Techniques (Elsevier, Amsterdam, 2005)). We strengthen and generalise the result of Drensky and Lakatos by demonstrating that the corresponding nice property remains valid in a much larger variety of constructions and applies to more general types of ideals. Examples show that our theorems do not extend to larger classes of ring constructions and cannot be simplified or generalised.


Author(s):  
Pawan Kumar Chaurasia

This chapter conducts a critical review on ML and deep learning tools and techniques in the field of heart disease related to heart disease complexity, prediction, and diagnosis. Only specific papers are selected for the study to extract useful information, which stimulated a new hypothesis to understand further investigation of the heart disease patient.


2020 ◽  
Vol 10 (19) ◽  
pp. 6683
Author(s):  
Andrea Murari ◽  
Emmanuele Peluso ◽  
Michele Lungaroni ◽  
Riccardo Rossi ◽  
Michela Gelfusa ◽  
...  

The inadequacies of basic physics models for disruption prediction have induced the community to increasingly rely on data mining tools. In the last decade, it has been shown how machine learning predictors can achieve a much better performance than those obtained with manually identified thresholds or empirical descriptions of the plasma stability limits. The main criticisms of these techniques focus therefore on two different but interrelated issues: poor “physics fidelity” and limited interpretability. Insufficient “physics fidelity” refers to the fact that the mathematical models of most data mining tools do not reflect the physics of the underlying phenomena. Moreover, they implement a black box approach to learning, which results in very poor interpretability of their outputs. To overcome or at least mitigate these limitations, a general methodology has been devised and tested, with the objective of combining the predictive capability of machine learning tools with the expression of the operational boundary in terms of traditional equations more suited to understanding the underlying physics. The proposed approach relies on the application of machine learning classifiers (such as Support Vector Machines or Classification Trees) and Symbolic Regression via Genetic Programming directly to experimental databases. The results are very encouraging. The obtained equations of the boundary between the safe and disruptive regions of the operational space present almost the same performance as the machine learning classifiers, based on completely independent learning techniques. Moreover, these models possess significantly better predictive power than traditional representations, such as the Hugill or the beta limit. More importantly, they are realistic and intuitive mathematical formulas, which are well suited to supporting theoretical understanding and to benchmarking empirical models. They can also be deployed easily and efficiently in real-time feedback systems.


Nowadays, Data Mining is used everywhere for extracting information from the data and in turn, acquires knowledge for decision making. Data Mining analyzes patterns which are used to extract information and knowledge for making decisions. Many open source and licensed tools like Weka, RapidMiner, KNIME, and Orange are available for Data Mining and predictive analysis. This paper discusses about different tools available for Data Mining and Machine Learning, followed by the description, pros and cons of these tools. The article provides details of all the algorithms like classification, regression, characterization, discretization, clustering, visualization and feature selection for Data Mining and Machine Learning tools. It will help people for efficient decision making and suggests which tool is suitable according to their requirement.


Author(s):  
Soodeh Hosseini ◽  
Saman Rafiee Sardo

Abstract With the growth of data mining and machine learning approaches in recent years, many efforts have been made to generalize these sciences so that researchers from any field can easily utilize these sciences. One of the most important of these efforts is the development of data mining tools that try to hide the complexities from researchers so that they can achieve a professional output with any level of knowledge. This paper is focused on reviewing and comparing data mining and machine learning tools including WEKA, KNIME, Keel, Orange, Azure, IBM SPSS Modeler, R and Scikit-Learn to show what approach each of these methods has taken in the face of the complexities and problems of different scenarios of generalization of data mining and machine learning. In addition, for a more detailed review, this paper examines the challenge of network intrusion detection in two tools, Knime with graphical interface and Scikit-Learn with coding environment.


2022 ◽  
Vol 21 (4) ◽  
pp. 346-363
Author(s):  
Hubert Anysz

The use of data mining and machine learning tools is becoming increasingly common. Their usefulness is mainly noticeable in the case of large datasets, when information to be found or new relationships are extracted from information noise. The development of these tools means that datasets with much fewer records are being explored, usually associated with specific phenomena. This specificity most often causes the impossibility of increasing the number of cases, and that can facilitate the search for dependences in the phenomena under study. The paper discusses the features of applying the selected tools to a small set of data. Attempts have been made to present methods of data preparation, methods for calculating the performance of tools, taking into account the specifics of databases with a small number of records. The techniques selected by the author are proposed, which helped to break the deadlock in calculations, i.e., to get results much worse than expected. The need to apply methods to improve the accuracy of forecasts and the accuracy of classification was caused by a small amount of analysed data. This paper is not a review of popular methods of machine learning and data mining; nevertheless, the collected and presented material will help the reader to shorten the path to obtaining satisfactory results when using the described computational methods


2015 ◽  
Vol 34 (6-7) ◽  
pp. 367-379 ◽  
Author(s):  
Abraham Yosipof ◽  
Oren E. Nahum ◽  
Assaf Y. Anderson ◽  
Hannah-Noa Barad ◽  
Arie Zaban ◽  
...  

2017 ◽  
Vol 7 (1) ◽  
pp. 41-45 ◽  
Author(s):  
Rita Reis ◽  
Hugo Peixoto ◽  
José Machado ◽  
António Abelha

Abstract Healthcare is one of the world’s fastest growing industries, having large volumes of data collected on a daily basis. It is generally perceived as being ‘information rich’ yet ‘knowledge poor’. Hidden relationships and valuable knowledge can be discovered in the collected data from the application of data mining techniques. These techniques are being increasingly implemented in healthcare organizations in order to respond to the needs of doctors in their daily decision-making activities. To help the decision-makers to take the best decision it is fundamental to develop a solution able to predict events before their occurrence. The aim of this project was to predict if a patient would need to be followed by a nutrition specialist, by combining a nutritional dataset with data mining classification techniques, using WEKA machine learning tools. The achieved results showed to be very promising, presenting accuracy around 91%, specificity around 97% and precision about 95%.


Sign in / Sign up

Export Citation Format

Share Document