Online Discretization of Continuous-Valued Attributes in Rule Induction

Author(s):  
D. T. Pham ◽  
A. A. Afify

Machine learning algorithms designed for engineering applications must be able to handle numerical attributes, particularly attributes with real (or continuous) values. Many algorithms deal with continuous-valued attributes by discretizing them before starting the learning process. This paper describes a new approach for discretization of continuous-valued attributes during the learning process. Incorporating discretization within the learning process has the advantage of taking into account the bias inherent in the learning system as well as the interactions between the different attributes. Experiments have demonstrated that the proposed method, when used in conjunction with the SRI rule induction algorithm developed by the authors, improves the accuracy of the induced model.

2019 ◽  
Author(s):  
Georgy Kopanitsa ◽  
Aleksei Dudchenko ◽  
Matthias Ganzinger

BACKGROUND It has been shown in previous decades, that Machine Learning (ML) has a huge variety of possible implementations in medicine and can be very helpful. Neretheless, cardiovascular diseases causes about third of of all global death. Does ML work in cardiology domain and what is current progress in that regard? OBJECTIVE The review aims at (1) identifying studies where machine-learning algorithms were applied in the cardiology domain; (2) providing an overview based on identified literature of the state of the art of the ML algorithm applying in cardiology. METHODS For organizing this review, we have employed PRISMA statement. PRISMA is a set of items for reporting in systematic reviews and meta-analyses, focused on the reporting of reviews evaluating randomized trials, but can also be used as a basis for reporting systematic review. For the review, we have adopted PRISMA statement and have identified the following items: review questions, information sources, search strategy, selection criteria. RESULTS In total 27 scientific articles or conference papers written in English and reporting about implementation of an ML-method or algorithm in cardiology domain were included in this review. We have examined four aspects: aims of ML-systems, methods, datasets and evaluation metrics. CONCLUSIONS We suppose, this systematic review will be helpful for researchers developing machine-learning system for a medical domain and in particular for cardiology.


2022 ◽  
Vol 301 ◽  
pp. 113868
Author(s):  
Xuan Cuong Nguyen ◽  
Thi Thanh Huyen Nguyen ◽  
Quyet V. Le ◽  
Phuoc Cuong Le ◽  
Arun Lal Srivastav ◽  
...  

Author(s):  
Divya Chaudhary ◽  
Er. Richa Vasuja

In today's scenario all of data is being generated by everyone of us . so it becomes vital for us to handle this data. To do so new technologies are being developed such as machine learning, data mining etc. This paper gives the study related to machine learning(ML).Precise approximations are repetitively being produced by Machine Learning algorithms. Machine learning system effectively “learns” how to guess from training set of completed jobs. The main purpose of the review is to give a jagged estimate or overview about the mostly used algorithms in machine learning.


Machine learning is a branch of Artificial intelligence which provides algorithms that can learn from data and improve from experience, without human intervention. Now a day's many of the machine learning algorithms playing a vital role in data analytics. Such algorithms are possible to apply with the recent pandemic COVID situation across the globe. Machine learning algorithms are classified into 3 different groups based on the type of learning process, such as supervised learning, unsupervised learning, and reinforcement learning. By considering the medical observations on the COVID across the globe it has been discussed and concluded to analyze under the supervised learning process. The data set is acquired from the reliable source, it is processed and fed into the classification algorithms. Since learning behaviors are carried out by knowing the input data and expected output data. The data is labeled and has been classified based on labels. In the proposed work, three different algorithms are used to experiment with the COVID'19 dataset and compared for their efficiency and algorithm selection decision is made.


2020 ◽  
Vol 8 (6) ◽  
pp. 1964-1968

Drug reviews are commonly used in pharmaceutical industry to improve the medications given to patients. Generally, drug review contains details of drug name, usage, ratings and comments by the patients. However, these reviews are not clean, and there is a need to improve the cleanness of the review so that they can be benefited for both pharmacists and patients. To do this, we propose a new approach that includes different steps. First, we add extra parameters in the review data by applying VADER sentimental analysis to clean the review data. Then, we apply different machine learning algorithms, namely linear SVC, logistic regression, SVM, random forest, and Naive Bayes on the drug review specify dataset names. However, we found that the accuracy of these algorithms for these datasets is limited. To improve this, we apply stratified K-fold algorithm in combination with Logistic regression. With this approach, the accuracy is increased to 96%.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Othmane Touri ◽  
Rida Ahroum ◽  
Boujemâa Achchab

Purpose The displaced commercial risk is one of the specific risks in the Islamic finance that creates a serious debate among practitioners and researchers about its management. The purpose of this paper is to assess a new approach to manage this risk using machine learning algorithms. Design/methodology/approach To attempt this purpose, the authors use several machine learning algorithms applied to a set of financial data related to banks from different regions and consider the deposit variation intensity as an indicator. Findings Results show acceptable prediction accuracy. The model could be used to optimize the prudential reserves for banks and the incomes distributed to depositors. Research limitations/implications However, the model uses several variables as proxies since data are not available for some specific indicators, such as the profit equalization reserves and the investment risk reserves. Originality/value Previous studies have analyzed the origin and impact of DCR. To the best of authors’ knowledge, none of them has provided an ex ante management tool for this risk. Furthermore, the authors suggest the use of a new approach based on machine learning algorithms.


2020 ◽  
Author(s):  
David Goretzko ◽  
Markus Bühner

Determining the number of factors is one of the most crucial decisions a researcher has to face when conducting an exploratory factor analysis. As no common factor retention criterion can be seen as generally superior, a new approach is proposed - combining extensive data simulation with state-of-the-art machine learning algorithms. First, data was simulated under a broad range of realistic conditions and three algorithms were trained using specially designed features based on the correlation matrices of the simulated data sets. Subsequently, the new approach was compared to four common factor retention criteria with regard to its accuracy in determining the correct number of factors in a large-scale simulation experiment. Sample size, variables per factor, correlations between factors, primary and cross-loadings as well as the correct number of factors were varied to gain comprehensive knowledge of the efficiency of our new method. A gradient boosting model outperformed all other criteria, so in a second step, we improved this model by tuning several hyperparameters of the algorithm and using common retention criteria as additional features. This model reached an out-of-sample accuracy of 99.3% (the pre-trained model can be obtained from https://osf.io/mvrau/). A great advantage of this approach is the possibility to continuously extend the data basis (e.g. using ordinal data) as well as the set of features to improve the predictive performance and to increase generalizability.


Sign in / Sign up

Export Citation Format

Share Document