scholarly journals COVID-19 World Vaccination Progress Using Machine Learning Classification Algorithms

2021 ◽  
Vol 1 (2) ◽  
pp. 100-105
Author(s):  
Nasiba M. Abdulkareem ◽  
Adnan Mohsin Abdulazeez ◽  
Diyar Qader Zeebaree ◽  
Dathar A. Hasan

In December 2019, SARS-CoV-2 caused coronavirus disease (COVID-19) distributed to all countries, infecting thousands of people and causing deaths. COVID-19 induces mild sickness in most cases, although it may render some people very ill. Therefore, vaccines are in various phases of clinical progress, and some of them being approved for national use. The current state reveals that there is a critical need for a quick and timely solution to the Covid-19 vaccine development. Non-clinical methods such as data mining and machine learning techniques may help do this. This study will focus on the COVID-19 World Vaccination Progress using Machine learning classification Algorithms. The findings of the paper show which algorithm is better for a given dataset. Weka is used to run tests on real-world data, and four output classification algorithms (Decision Tree, K-nearest neighbors, Random Tree, and Naive Bayes) are used to analyze and draw conclusions. The comparison is based on accuracy and performance period, and it was discovered that the Decision Tree outperforms other algorithms in terms of time and accuracy.

2019 ◽  
Vol 119 (3) ◽  
pp. 676-696 ◽  
Author(s):  
Zhongyi Hu ◽  
Raymond Chiong ◽  
Ilung Pranata ◽  
Yukun Bao ◽  
Yuqing Lin

Purpose Malicious web domain identification is of significant importance to the security protection of internet users. With online credibility and performance data, the purpose of this paper to investigate the use of machine learning techniques for malicious web domain identification by considering the class imbalance issue (i.e. there are more benign web domains than malicious ones). Design/methodology/approach The authors propose an integrated resampling approach to handle class imbalance by combining the synthetic minority oversampling technique (SMOTE) and particle swarm optimisation (PSO), a population-based meta-heuristic algorithm. The authors use the SMOTE for oversampling and PSO for undersampling. Findings By applying eight well-known machine learning classifiers, the proposed integrated resampling approach is comprehensively examined using several imbalanced web domain data sets with different imbalance ratios. Compared to five other well-known resampling approaches, experimental results confirm that the proposed approach is highly effective. Practical implications This study not only inspires the practical use of online credibility and performance data for identifying malicious web domains but also provides an effective resampling approach for handling the class imbalance issue in the area of malicious web domain identification. Originality/value Online credibility and performance data are applied to build malicious web domain identification models using machine learning techniques. An integrated resampling approach is proposed to address the class imbalance issue. The performance of the proposed approach is confirmed based on real-world data sets with different imbalance ratios.


Advancement in medical science has always been one of the most vital aspects of the human race. With the progress in technology, the use of modern techniques and equipment is always imposed on treatment purposes. Nowadays, machine learning techniques have widely been used in medical science for assuring accuracy. In this work, we have constructed computational model building techniques for liver disease prediction accurately. We used some efficient classification algorithms: Random Forest, Perceptron, Decision Tree, K-Nearest Neighbors (KNN), and Support Vector Machine (SVM) for predicting liver diseases. Our works provide the implementation of hybrid model construction and comparative analysis for improving prediction performance. At first, classification algorithms are applied to the original liver patient datasets collected from the UCI repository. Then we analyzed features and tweaked to improve the performance of our predictor and made a comparative analysis among the classifiers. We examined that, KNN algorithm outperformed all other techniques with feature selection.


Symmetry ◽  
2020 ◽  
Vol 12 (4) ◽  
pp. 499 ◽  
Author(s):  
Iqbal H. Sarker ◽  
Yoosef B. Abushark ◽  
Asif Irshad Khan

This paper mainly formulates the problem of predicting context-aware smartphone apps usage based on machine learning techniques. In the real world, people use various kinds of smartphone apps differently in different contexts that include both the user-centric context and device-centric context. In the area of artificial intelligence and machine learning, decision tree model is one of the most popular approaches for predicting context-aware smartphone usage. However, real-life smartphone apps usage data may contain higher dimensions of contexts, which may cause several issues such as increases model complexity, may arise over-fitting problem, and consequently decreases the prediction accuracy of the context-aware model. In order to address these issues, in this paper, we present an effective principal component analysis (PCA) based context-aware smartphone apps prediction model, “ContextPCA” using decision tree machine learning classification technique. PCA is an unsupervised machine learning technique that can be used to separate symmetric and asymmetric components, and has been adopted in our “ContextPCA” model, in order to reduce the context dimensions of the original data set. The experimental results on smartphone apps usage datasets show that “ContextPCA” model effectively predicts context-aware smartphone apps in terms of precision, recall, f-score and ROC values in various test cases.


2021 ◽  
Vol 36 (1) ◽  
pp. 609-615
Author(s):  
Mandhapati Rajesh ◽  
Dr.K. Malathi

Aim: Predicting the Heartdiseases using medical parameters of cardiac patients to get a good accuracy rate using machine learning methods like innovative Decision Tree (DT) algorithm. Materials and Methods: Supervised Machine learning Techniques with innovative Decision Tree (N = 20) and K Nearest Neighbour (KNN) (N = 20) are performed with five different datasets at each time to record five samples. Results: The Decision Tree is used to predict heart disease with the help of various medical conditions, the accuracy is achieved for DT is 98% and KNN is 72.2%. The two algorithms Decision Tree and KNN are statistically insignificant (=.737) with the independent sample T-Test value (p<0.005) with a confidence level of 95%. Conclusion: Prediction and classification of heart disease significantly seem to be better in DT than KNN.


Diabetes Mellitus is considered one of the chronic diseases of humankind which causes an increase in blood sugar. Many complications are reported if DM remains untreated and unidentified. Identification of this disease requires a lot of physical and mental trauma and effort which involves visiting a doctor, blood and urine test at the diagnostic center which consumes more time. Difficulties can be over crossed using the trending technology of Machine learning. The idea of the model is to prognosticate the occurrence of a diabetic with high accuracy. Therefore, two machine learning classification algorithms namely Fine Decision Tree and Support Vector Machine are used in this experiment to detect diabetes at an early stage. Therefore two machine learning classification algorithms namely Fine Decision Tree and Support Vector Machine are used in this experiment to detect diabetes at an early stage.


Author(s):  
Gediminas Adomavicius ◽  
Yaqiong Wang

Numerical predictive modeling is widely used in different application domains. Although many modeling techniques have been proposed, and a number of different aggregate accuracy metrics exist for evaluating the overall performance of predictive models, other important aspects, such as the reliability (or confidence and uncertainty) of individual predictions, have been underexplored. We propose to use estimated absolute prediction error as the indicator of individual prediction reliability, which has the benefits of being intuitive and providing highly interpretable information to decision makers, as well as allowing for more precise evaluation of reliability estimation quality. As importantly, the proposed reliability indicator allows the reframing of reliability estimation itself as a canonical numeric prediction problem, which makes the proposed approach general-purpose (i.e., it can work in conjunction with any outcome prediction model), alleviates the need for distributional assumptions, and enables the use of advanced, state-of-the-art machine learning techniques to learn individual prediction reliability patterns directly from data. Extensive experimental results on multiple real-world data sets show that the proposed machine learning-based approach can significantly improve individual prediction reliability estimation as compared with a number of baselines from prior work, especially in more complex predictive scenarios.


Hydrology ◽  
2021 ◽  
Vol 8 (4) ◽  
pp. 183
Author(s):  
Paul Muñoz ◽  
Johanna Orellana-Alvear ◽  
Jörg Bendix ◽  
Jan Feyen ◽  
Rolando Célleri

Worldwide, machine learning (ML) is increasingly being used for developing flood early warning systems (FEWSs). However, previous studies have not focused on establishing a methodology for determining the most efficient ML technique. We assessed FEWSs with three river states, No-alert, Pre-alert and Alert for flooding, for lead times between 1 to 12 h using the most common ML techniques, such as multi-layer perceptron (MLP), logistic regression (LR), K-nearest neighbors (KNN), naive Bayes (NB), and random forest (RF). The Tomebamba catchment in the tropical Andes of Ecuador was selected as a case study. For all lead times, MLP models achieve the highest performance followed by LR, with f1-macro (log-loss) scores of 0.82 (0.09) and 0.46 (0.20) for the 1 h and 12 h cases, respectively. The ranking was highly variable for the remaining ML techniques. According to the g-mean, LR models correctly forecast and show more stability at all states, while the MLP models perform better in the Pre-alert and Alert states. The proposed methodology for selecting the optimal ML technique for a FEWS can be extrapolated to other case studies. Future efforts are recommended to enhance the input data representation and develop communication applications to boost the awareness of society of floods.


Software engineering is an important area that deals with development and maintenance of software. After developing a software, it is always important to track its performance. One has to always see whether the software functions according to customer requirements. To ensure this, faulty and non- faulty modules must be identified. For this purpose, one can make use of a model for binary class classification of faults. Different technique's outputs differ in one or the other way with respect to the following: fault dataset used, complexity, classification algorithm implemented, etc. Various machine learning techniques can be used for this purpose. But this paper deals with the best classification algorithms available till date and they are decision tree, random forest, naive bayes and logistic regression (tree-based techniques and bayesian based techniques). The motive behind developing such a project is to identify the faulty modules within a software before the actual software testing takes place. As a result, the time consumed by testers or the workload of the testers can be reduced to an extent. This work is very well useful to those working in software industry and also to those people carrying out research in software engineering where the lifecycle of development of a software is discussed.


2021 ◽  
Author(s):  
◽  
Cao Truong Tran

<p>Classification is a major task in machine learning and data mining. Many real-world datasets suffer from the unavoidable issue of missing values. Classification with incomplete data has to be carefully handled because inadequate treatment of missing values will cause large classification errors.    Existing most researchers working on classification with incomplete data focused on improving the effectiveness, but did not adequately address the issue of the efficiency of applying the classifiers to classify unseen instances, which is much more important than the act of creating classifiers. A common approach to classification with incomplete data is to use imputation methods to replace missing values with plausible values before building classifiers and classifying unseen instances. This approach provides complete data which can be then used by any classification algorithm, but sophisticated imputation methods are usually computationally intensive, especially for the application process of classification. Another approach to classification with incomplete data is to build a classifier that can directly work with missing values. This approach does not require time for estimating missing values, but it often generates inaccurate and complex classifiers when faced with numerous missing values. A recent approach to classification with incomplete data which also avoids estimating missing values is to build a set of classifiers which then is used to select applicable classifiers for classifying unseen instances. However, this approach is also often inaccurate and takes a long time to find applicable classifiers when faced with numerous missing values.   The overall goal of the thesis is to simultaneously improve the effectiveness and efficiency of classification with incomplete data by using evolutionary machine learning techniques for feature selection, clustering, ensemble learning, feature construction and constructing classifiers.   The thesis develops approaches for improving imputation for classification with incomplete data by integrating clustering and feature selection with imputation. The approaches improve both the effectiveness and the efficiency of using imputation for classification with incomplete data.   The thesis develops wrapper-based feature selection methods to improve input space for classification algorithms that are able to work directly with incomplete data. The methods not only improve the classification accuracy, but also reduce the complexity of classifiers able to work directly with incomplete data.   The thesis develops a feature construction method to improve input space for classification algorithms with incomplete data by proposing interval genetic programming-genetic programming with a set of interval functions. The method improves the classification accuracy and reduces the complexity of classifiers.   The thesis develops an ensemble approach to classification with incomplete data by integrating imputation, feature selection, and ensemble learning. The results show that the approach is more accurate, and faster than previous common methods for classification with incomplete data.   The thesis develops interval genetic programming to directly evolve classifiers for incomplete data. The results show that classifiers generated by interval genetic programming can be more effective and efficient than classifiers generated the combination of imputation and traditional genetic programming. Interval genetic programming is also more effective than common classification algorithms able to work directly with incomplete data.    In summary, the thesis develops a range of approaches for simultaneously improving the effectiveness and efficiency of classification with incomplete data by using a range of evolutionary machine learning techniques.</p>


Sign in / Sign up

Export Citation Format

Share Document