Predicting Coronavirus Pandemic in Real-Time Using Machine Learning and Big Data Streaming System

Twitter is a virtual social network where people share their posts and opinions about the current situation, such as the coronavirus pandemic. It is considered the most significant streaming data source for machine learning research in terms of analysis, prediction, knowledge extraction, and opinions. Sentiment analysis is a text analysis method that has gained further significance due to social networks’ emergence. Therefore, this paper introduces a real-time system for sentiment prediction on Twitter streaming data for tweets about the coronavirus pandemic. The proposed system aims to find the optimal machine learning model that obtains the best performance for coronavirus sentiment analysis prediction and then uses it in real-time. The proposed system has been developed into two components: developing an offline sentiment analysis and modeling an online prediction pipeline. The system has two components: the offline and the online components. For the offline component of the system, the historical tweets’ dataset was collected in duration 23/01/2020 and 01/06/2020 and filtered by #COVID-19 and #Coronavirus hashtags. Two feature extraction methods of textual data analysis were used, n-gram and TF-ID, to extract the dataset’s essential features, collected using coronavirus hashtags. Then, five regular machine learning algorithms were performed and compared: decision tree, logistic regression, k-nearest neighbors, random forest, and support vector machine to select the best model for the online prediction component. The online prediction pipeline was developed using Twitter Streaming API, Apache Kafka, and Apache Spark. The experimental results indicate that the RF model using the unigram feature extraction method has achieved the best performance, and it is used for sentiment prediction on Twitter streaming data for coronavirus.

Download Full-text

Breast Cancer Identification from Patients’ Tweet Streaming Using Machine Learning Solution on Spark

Complexity ◽

10.1155/2021/6653508 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Nahla F. Omran ◽

Sara F. Abd-el Ghany ◽

Hager Saleh ◽

Ayman Nabil

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Real Time ◽

Random Forest Classifier ◽

Streaming Data ◽

Support Vector ◽

Real Time System ◽

Selection Algorithms

Twitter integrates with streaming data technologies and machine learning to add new value to healthcare. This paper presented a real-time system to predict breast cancer based on streaming patient’s health data from Twitter. The proposed system consists of two major components: developing an offline building model and an online prediction pipeline. For the first component, we made a correlation between the features to determine the correlation between features and reduce the number of features from the Breast Cancer Wisconsin Diagnostic dataset. Two feature selection algorithms are recursive feature elimination and univariate feature selection algorithms which are applied to features after correlation to select the essential features. Four decision trees, logistic regression, support vector machine, and random forest classifier have been used on features after correlation and feature selection. Also, hyperparameter tuning and cross-validation have been applied with machine learning to optimize models and enhance accuracy. Apache Spark, Apache Kafka, and Twitter Streaming API are used to develop the second component. The best model with the highest accuracy obtained from the first component predicts breast cancer in real time from tweets’ streaming. The results showed that the best model is the random forest classifier which achieved the best accuracy.

Download Full-text

Financial Context News Sentiment Analysis for the Lithuanian Language

Applied Sciences ◽

10.3390/app11104443 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4443

Author(s):

Rokas Štrimaitis ◽

Pavel Stefanovič ◽

Simona Ramanauskaitė ◽

Asta Slotkienė

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Experimental Investigations ◽

Support Vector ◽

Applied Machine Learning ◽

Bayes Algorithm ◽

Website Content

Financial area analysis is not limited to enterprise performance analysis. It is worth analyzing as wide an area as possible to obtain the full impression of a specific enterprise. News website content is a datum source that expresses the public’s opinion on enterprise operations, status, etc. Therefore, it is worth analyzing the news portal article text. Sentiment analysis in English texts and financial area texts exist, and are accurate, the complexity of Lithuanian language is mostly concentrated on sentiment analysis of comment texts, and does not provide high accuracy. Therefore in this paper, the supervised machine learning model was implemented to assign sentiment analysis on financial context news, gathered from Lithuanian language websites. The analysis was made using three commonly used classification algorithms in the field of sentiment analysis. The hyperparameters optimization using the grid search was performed to discover the best parameters of each classifier. All experimental investigations were made using the newly collected datasets from four Lithuanian news websites. The results of the applied machine learning algorithms show that the highest accuracy is obtained using a non-balanced dataset, via the multinomial Naive Bayes algorithm (71.1%). The other algorithm accuracies were slightly lower: a long short-term memory (71%), and a support vector machine (70.4%).

Download Full-text

Performance Analysis of Machine Learning Algorithms and Feature Extraction Methods for Sentiment Analysis

10.1109/icses52305.2021.9633882 ◽

2021 ◽

Author(s):

Anshumaan Chauhan ◽

Ayushi Agarwal ◽

Razia Sulthana

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Performance Analysis ◽

Sentiment Analysis ◽

Learning Algorithms ◽

Extraction Methods ◽

Machine Learning Algorithms

Download Full-text

Feature-Based Opinion Mining and Managed Machine Learning with Sentiment Classification Models

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b4555.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 3992-3998

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Opinion Mining ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbors ◽

Data Intensive ◽

Learning Tasks ◽

Feature Based

Sentiment Analysis is individuals' opinions and feedbacks study towards a substance, which can be items, services, movies, people or events. The opinions are mostly expressed as remarks or reviews. With the social network, gatherings and websites, these reviews rose as a significant factor for the client’s decision to buy anything or not. These days, a vast scalable computing environment provides us with very sophisticated way of carrying out various data-intensive natural language processing (NLP) and machine-learning tasks to examine these reviews. One such example is text classification, a compelling method for predicting the clients' sentiment. In this paper, we attempt to center our work of sentiment analysis on movie review database. We look at the sentiment expression to order the extremity of the movie reviews on a size of 0(highly disliked) to 4(highly preferred) and perform feature extraction and ranking and utilize these features to prepare our multilabel classifier to group the movie review into its right rating. This paper incorporates sentiment analysis utilizing feature-based opinion mining and managed machine learning. The principle center is to decide the extremity of reviews utilizing nouns, verbs, and adjectives as opinion words. In addition, a comparative study on different classification approaches has been performed to determine the most appropriate classifier to suit our concern problem space. In our study, we utilized six distinctive machine learning algorithms – Naïve Bayes, Logistic Regression, SVM (Support Vector Machine), RF (Random Forest) KNN (K nearest neighbors) and SoftMax Regression.

Download Full-text

A Proposal of Implementation of Sitting Posture Monitoring System for Wheelchair Utilizing Machine Learning Methods

Sensors ◽

10.3390/s21196349 ◽

2021 ◽

Vol 21 (19) ◽

pp. 6349

Author(s):

Jawad Ahmad ◽

Johan Sidén ◽

Henrik Andersson

Keyword(s):

Machine Learning ◽

Pressure Distribution ◽

Real Time ◽

Monitoring System ◽

Pressure Ulcers ◽

Machine Learning Algorithms ◽

Raspberry Pi ◽

Support Vector ◽

Processing Unit ◽

Posture Recognition

This paper presents a posture recognition system aimed at detecting sitting postures of a wheelchair user. The main goals of the proposed system are to identify and inform irregular and improper posture to prevent sitting-related health issues such as pressure ulcers, with the potential that it could also be used for individuals without mobility issues. In the proposed monitoring system, an array of 16 screen printed pressure sensor units was employed to obtain pressure data, which are sampled and processed in real-time using read-out electronics. The posture recognition was performed for four sitting positions: right-, left-, forward- and backward leaning based on k-nearest neighbors (k-NN), support vector machines (SVM), random forest (RF), decision tree (DT) and LightGBM machine learning algorithms. As a result, a posture classification accuracy of up to 99.03 percent can be achieved. Experimental studies illustrate that the system can provide real-time pressure distribution value in the form of a pressure map on a standard PC and also on a raspberry pi system equipped with a touchscreen monitor. The stored pressure distribution data can later be shared with healthcare professionals so that abnormalities in sitting patterns can be identified by employing a post-processing unit. The proposed system could be used for risk assessments related to pressure ulcers. It may be served as a benchmark by recording and identifying individuals’ sitting patterns and the possibility of being realized as a lightweight portable health monitoring device.

Download Full-text

Predicting CoVID-19 community mortality risk using machine learning and development of an online prognostic tool

PeerJ ◽

10.7717/peerj.10083 ◽

2020 ◽

Vol 8 ◽

pp. e10083 ◽

Cited By ~ 1

Author(s):

Ashis Kumar Das ◽

Shiba Mishra ◽

Saji Saraswathy Gopalan

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Open Source ◽

Mortality Risk ◽

Machine Learning Algorithms ◽

Brier Score ◽

Gradient Boosting ◽

Support Vector ◽

Prediction Tool ◽

Online Prediction

Background The recent pandemic of CoVID-19 has emerged as a threat to global health security. There are very few prognostic models on CoVID-19 using machine learning. Objectives To predict mortality among confirmed CoVID-19 patients in South Korea using machine learning and deploy the best performing algorithm as an open-source online prediction tool for decision-making. Materials and Methods Mortality for confirmed CoVID-19 patients (n = 3,524) between January 20, 2020 and May 30, 2020 was predicted using five machine learning algorithms (logistic regression, support vector machine, K nearest neighbor, random forest and gradient boosting). The performance of the algorithms was compared, and the best performing algorithm was deployed as an online prediction tool. Results The logistic regression algorithm was the best performer in terms of discrimination (area under ROC curve = 0.830), calibration (Matthews Correlation Coefficient = 0.433; Brier Score = 0.036) and. The best performing algorithm (logistic regression) was deployed as the online CoVID-19 Community Mortality Risk Prediction tool named CoCoMoRP (https://ashis-das.shinyapps.io/CoCoMoRP/). Conclusions We describe the development and deployment of an open-source machine learning tool to predict mortality risk among CoVID-19 confirmed patients using publicly available surveillance data. This tool can be utilized by potential stakeholders such as health providers and policymakers to triage patients at the community level in addition to other approaches.

Download Full-text

Invisible experience to real-time assessment in elite tennis athlete training: Sport-specific movement classification based on wearable MEMS sensor data

Proceedings of the Institution of Mechanical Engineers Part P Journal of Sports Engineering and Technology ◽

10.1177/17543371211050312 ◽

2021 ◽

pp. 175433712110503

Author(s):

Mingyue Wu ◽

Ran Wang ◽

Yang Hu ◽

Mengjiao Fan ◽

Yufan Wang ◽

...

Keyword(s):

Machine Learning ◽

Real Time ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Test Accuracy ◽

Z Score ◽

Mems Sensor ◽

Score Normalization

This study examined the reliability of a tennis stroke classification and assessment platform consisting of a single low-cost MEMS sensor in a wrist-worn wearable device, smartphone, and computer. The data that was collected was transmitted via Bluetooth and analyzed by machine learning algorithms. Twelve right-handed male elite tennis athletes participated in the study, and each athlete performed 150 strokes. The results from three machine learning algorithms regarding their recognition and classification of the real-time data stream were compared. Stroke recognition and classification went through pre-processing, segmentation, feature extraction, and classification with Support Vector Machine (SVM), including SVM without normalization, SVM with Min–Max, SVM with Z-score normalization, K-nearest neighbor (K-NN), and Naive Bayes (NB) machine learning algorithms. During the data training process, 10-fold cross-validation was used to avoid overfitting and suitable parameters were found within the SVM classifiers. The best classifier was achieved when C = 1 using the RBF kernel function. Different machine learning algorithms’ classification of unique stroke types yielded highly reliable clusters within each stroke type with the highest test accuracy of 99% achieved by SVM with Min–Max normalization and 98.4% achieved using SVM with a Z-score normalization classifier.

Download Full-text

Application of Bayesian Learning Mechanism in Power System Transient Stability Assessment

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.108-111.765 ◽

2010 ◽

Vol 108-111 ◽

pp. 765-770

Author(s):

Lin Niu ◽

Jian Guo Zhao ◽

Ke Jun Li ◽

Zhen Yu Zhou

Keyword(s):

Machine Learning ◽

Power System ◽

Real Time ◽

Bayesian Learning ◽

Transient Stability ◽

Decision Function ◽

Machine Learning Algorithms ◽

Support Vector ◽

Svm Classifier ◽

Stability Assessment

One of the most challenging problems in real-time operation of power system is the prediction of transient stability. Fast and accurate techniques are imperative to achieve on-line transient stability assessment (TSA). This problem has been approached by various machine learning algorithms, however they find a class decision estimate rather than a probabilistic confidence of the class distribution. To counter the shortcoming of common machine learning methods, a novel machine learning technique, i.e. ‘relevance vector machine’ (RVM), for TSA is presented in this paper. RVM is based on a probabilistic Bayesian learning framework, and as a feature it can yield a decision function that depends on only a very fewer number of so-called relevance vectors. The proposed method is tested on New England power system, and compared with a state-of-the-art ‘support vector machine’ (SVM) classifier. The classification performance is evaluated using false discriminate rate (FDR). It is demonstrated that the RVM classifier can yield a decision function that is much sparser than the SVM classifier while providing higher classification accuracy. Consequently, the RVM classifier greatly reduces the computational complexity, making it more suitable for real-time implementation.

Download Full-text

Aspect Based Sentimental Analysis of Hotel Reviews: A Comparative Study

Sukkur IBA Journal of Computing and Mathematical Sciences ◽

10.30537/sjcms.v4i1.567 ◽

2020 ◽

Vol 4 (1) ◽

pp. 11-20

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Sentiment Analysis ◽

Opinion Mining ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Practical Significance ◽

Support Vector ◽

Use Of The Internet ◽

And Task

The increasing use of the internet enables users to share their opinion about what they like and dislike regarding products and services. For efficient decision making, there is a need to analyze these reviews. Sentiment analysis or opinion mining is commonly used to detect polarity (positive or negative) of reviews. But, it does not show the aspect or orientation of the text. In this study, state-of-art approaches based on supervised machine learning employed to perform three tasks on the dataset provided by SemEval. Tasks A and B are related to predicting the aspect of the restaurant’s reviews, whereas task C shows their polarity. Additionally, this study aims to compare the performance of two feature engineering techniques and five machine learning algorithms to evaluate their performance on a publicly available dataset named SemEval-2015 Task 12. The experimental results showed that the word2vec features when used with the support vector machine algorithm outperformed by giving 76%, 72% and 79% off overall accuracies for Task A, Task B, and Task C respectively. Our comparative study holds practical significance and can be used as a baseline study in the domain of aspect-based sentiment analysis.

Download Full-text

Comparative Analysis of Machine Learning Algorithms with and without Feature Extraction

International Journal for Modern Trends in Science and Technology - RTT2020 ◽

10.46501/ijmtst061243 ◽

2020 ◽

Vol 6 (12) ◽

pp. 235-239

Author(s):

Vatsal Gupta and Saurabh Gautam

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Image Recognition ◽

Input Image ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbors ◽

Machine Learning Classification ◽

Security Services ◽

Computational Resources

Image recognition is one of the core disciplines in Computer Vision. It is one of the most widely researched topics of the last few decades. Many advances in image recognition in the past decade, has made it one of the most efficient and powerful disciplines of all, having its applications in every sector including Finance, Healthcare, Security services, Agriculture and many more. Feature extraction is an integral part of image recognition. It helps in training the model more efficiently and with a higher accuracy, by getting rid of any unwanted or unnecessary features, thus reducing the dimensionality of the input image. This also helps in reducing the computational resources required by the algorithm to train, thus making it affordable for people with low end setups. Here we compare the accuracies of different machine learning classification algorithms, and their training times, with and without using feature Extraction. For the purpose of extracting features, a convolutional neural network was used. The model was trained and tested on the data of 12 classes containing a total of 2,175 images. For comparisons, we chose the Logistic regression, K-Nearest Neighbors Classifier, Random forest Classifier, and Support Vector Machine Classifier.

Download Full-text