Machine Learning Algorithm’s Measurement and Analytical Visualization of User’s Reviews for Google Play Store

Mapping Intimacies ◽

10.20944/preprints202003.0249.v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Abdul Karim ◽

Azhari Azhari ◽

Samir Brahim Belhaouri ◽

Ali Adil Qureshi

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Ve Bayes ◽

Android Apps ◽

Online Marketplace ◽

Document Frequency ◽

Logistic Regression Algorithm ◽

Google Play

The fact is quite transparent that almost everybody around the world is using android apps. Half of the population of this planet is associated with messaging, social media, gaming, and browsers. This online marketplace provides free and paid access to users. On the Google Play store, users are encouraged to download countless of applications belonging to predefined categories. In this research paper, we have scrapped thousands of users reviews and app ratings. We have scrapped 148 apps’ reviews from 14 categories. We have collected 506259 reviews from Google play store and subsequently checked the semantics of reviews about some applications form users to determine whether reviews are positive, negative, or neutral. We have evaluated the results by using different machine learning algorithms like Naïve Bayes, Random Forest, and Logistic Regression algorithm. we have calculated Term Frequency (TF) and Inverse Document Frequency (IDF) with different parameters like accuracy, precision, recall, and F1 and compared the statistical result of these algorithms. We have visualized these statistical results in the form of a bar chart. In this paper, the analysis of each algorithm is performed one by one, and the results have been compared. Eventually, We've discovered that Logistic Regression is the best algorithm for a review-analysis of all Google play store. We have proved that Logistic Regression gets the speed of precision, accuracy, recall, and F1 in both after preprocessing and data collection of this dataset.

Download Full-text

Methodology for Analyzing the Traditional Algorithms Performance of User Reviews Using Machine Learning Techniques

Algorithms ◽

10.3390/a13080202 ◽

2020 ◽

Vol 13 (8) ◽

pp. 202

Author(s):

Abdul Karim ◽

Azhari Azhari ◽

Samir Brahim Belhaouri ◽

Ali Adil Qureshi ◽

Maqsood Ahmad

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Data Sets ◽

User Reviews ◽

Almost Everywhere ◽

Document Frequency ◽

Learning Techniques ◽

Google Play

Android-based applications are widely used by almost everyone around the globe. Due to the availability of the Internet almost everywhere at no charge, almost half of the globe is engaged with social networking, social media surfing, messaging, browsing and plugins. In the Google Play Store, which is one of the most popular Internet application stores, users are encouraged to download thousands of applications and various types of software. In this research study, we have scraped thousands of user reviews and the ratings of different applications. We scraped 148 application reviews from 14 different categories. A total of 506,259 reviews were accumulated and assessed. Based on the semantics of reviews of the applications, the results of the reviews were classified negative, positive or neutral. In this research, different machine-learning algorithms such as logistic regression, random forest and naïve Bayes were tuned and tested. We also evaluated the outcome of term frequency (TF) and inverse document frequency (IDF), measured different parameters such as accuracy, precision, recall and F1 score (F1) and present the results in the form of a bar graph. In conclusion, we compared the outcome of each algorithm and found that logistic regression is one of the best algorithms for the review-analysis of the Google Play Store from an accuracy perspective. Furthermore, we were able to prove and demonstrate that logistic regression is better in terms of speed, rate of accuracy, recall and F1 perspective. This conclusion was achieved after preprocessing a number of data values from these data sets.

Download Full-text

Google Play Content Scraping and Knowledge Engineering using Natural Language Processing Techniques with the Analysis of User Reviews

Journal of Intelligent Systems ◽

10.1515/jisys-2019-0197 ◽

2020 ◽

Vol 30 (1) ◽

pp. 192-208 ◽

Cited By ~ 1

Author(s):

Hamza Aldabbas ◽

Abdullah Bajahzar ◽

Meshrif Alruily ◽

Ali Adil Qureshi ◽

Rana M. Amir Latif ◽

...

Keyword(s):

Logistic Regression ◽

Language Processing ◽

Mobile Application ◽

Knowledge Engineering ◽

Machine Learning Algorithms ◽

Application Development ◽

User Reviews ◽

N Gram ◽

Logistic Regression Algorithm ◽

Google Play

Abstract To maintain the competitive edge and evaluating the needs of the quality app is in the mobile application market. The user’s feedback on these applications plays an essential role in the mobile application development industry. The rapid growth of web technology gave people an opportunity to interact and express their review, rate and share their feedback about applications. In this paper we have scrapped 506259 of user reviews and applications rate from Google Play Store from 14 different categories. The statistical information was measured in the results using different of common machine learning algorithms such as the Logistic Regression, Random Forest Classifier, and Multinomial Naïve Bayes. Different parameters including the accuracy, precision, recall, and F1 score were used to evaluate Bigram, Trigram, and N-gram, and the statistical result of these algorithms was compared. The analysis of each algorithm, one by one, is performed, and the result has been evaluated. It is concluded that logistic regression is the best algorithm for review analysis of the Google Play Store applications. The results have been checked scientifically, and it is found that the accuracy of the logistic regression algorithm for analyzing different reviews based on three classes, i.e., positive, negative, and neutral.

Download Full-text

Implementation of Machine Learning Algorithms for Prediction of Fluidelastic Instability in Tube Arrays

Journal of Pressure Vessel Technology ◽

10.1115/1.4049876 ◽

2021 ◽

Vol 143 (2) ◽

Author(s):

Joaquin E. Moran ◽

Yasser Selima

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Two Phase ◽

Factors Affecting ◽

Logistic Regression Models ◽

Number Of Factors ◽

Tube Arrays ◽

Fluidelastic Instability

Abstract Fluidelastic instability (FEI) in tube arrays has been studied extensively experimentally and theoretically for the last 50 years, due to its potential to cause significant damage in short periods. Incidents similar to those observed at San Onofre Nuclear Generating Station indicate that the problem is not yet fully understood, probably due to the large number of factors affecting the phenomenon. In this study, a new approach for the analysis and interpretation of FEI data using machine learning (ML) algorithms is explored. FEI data for both single and two-phase flows have been collected from the literature and utilized for training a machine learning algorithm in order to either provide estimates of the reduced velocity (single and two-phase) or indicate if the bundle is stable or unstable under certain conditions (two-phase). The analysis included the use of logistic regression as a classification algorithm for two-phase flow problems to determine if specific conditions produce a stable or unstable response. The results of this study provide some insight into the capability and potential of logistic regression models to analyze FEI if appropriate quantities of experimental data are available.

Download Full-text

Deep Learning Technique to Predict Heart Disease using IoT Based ECG Data

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7166.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 2559-2562

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Deep Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Rule Based ◽

Learning Techniques ◽

Learning Technique ◽

Logistic Regression Algorithm ◽

Target Output

Scientific Knowledge and Electronic devices are growing day by day. In this aspect, many expert systems are involved in the healthcare industry using machine learning algorithms. Deep neural networks beat the machine learning techniques and often take raw data i.e., unrefined data to calculate the target output. Deep learning or feature learning is used to focus on features which is very important and gives a complete understanding of the model generated. Existing methodology used data mining technique like rule based classification algorithm and machine learning algorithm like hybrid logistic regression algorithm to preprocess data and extract meaningful insights of data. This is, however a supervised data. The proposed work is based on unsupervised data that is there is no labelled data and deep neural techniques is deployed to get the target output. Machine learning algorithms are compared with proposed deep learning techniques using TensorFlow and Keras in the aspect of accuracy. Deep learning methodology outfits the existing rule based classification and hybrid logistic regression algorithm in terms of accuracy. The designed methodology is tested on the public MIT-BIH arrhythmia database, classifying four kinds of abnormal beats. The proposed approach based on deep learning technique offered a better performance, improving the results when compared to machine learning approaches of the state-of-the-art

Download Full-text

Predicting the Grade of Prostate Cancer Based on a Biparametric MRI Radiomics Signature

Contrast Media & Molecular Imaging ◽

10.1155/2021/7830909 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Li Zhang ◽

Xia Zhe ◽

Min Tang ◽

Jing Zhang ◽

Jialiang Ren ◽

...

Keyword(s):

Prostate Cancer ◽

Machine Learning ◽

Logistic Regression ◽

Learning Algorithm ◽

Area Under The Curve ◽

Machine Learning Algorithms ◽

Support Vector ◽

Low Grade ◽

High Grade ◽

Random Forest Tree

Purpose. This study aimed to investigate the value of biparametric magnetic resonance imaging (bp-MRI)-based radiomics signatures for the preoperative prediction of prostate cancer (PCa) grade compared with visual assessments by radiologists based on the Prostate Imaging Reporting and Data System Version 2.1 (PI-RADS V2.1) scores of multiparametric MRI (mp-MRI). Methods. This retrospective study included 142 consecutive patients with histologically confirmed PCa who were undergoing mp-MRI before surgery. MRI images were scored and evaluated by two independent radiologists using PI-RADS V2.1. The radiomics workflow was divided into five steps: (a) image selection and segmentation, (b) feature extraction, (c) feature selection, (d) model establishment, and (e) model evaluation. Three machine learning algorithms (random forest tree (RF), logistic regression, and support vector machine (SVM)) were constructed to differentiate high-grade from low-grade PCa. Receiver operating characteristic (ROC) analysis was used to compare the machine learning-based analysis of bp-MRI radiomics models with PI-RADS V2.1. Results. In all, 8 stable radiomics features out of 804 extracted features based on T2-weighted imaging (T2WI) and ADC sequences were selected. Radiomics signatures successfully categorized high-grade and low-grade PCa cases ( P < 0.05 ) in both the training and test datasets. The radiomics model-based RF method (area under the curve, AUC: 0.982; 0.918), logistic regression (AUC: 0.886; 0.886), and SVM (AUC: 0.943; 0.913) in both the training and test cohorts had better diagnostic performance than PI-RADS V2.1 (AUC: 0.767; 0.813) when predicting PCa grade. Conclusions. The results of this clinical study indicate that machine learning-based analysis of bp-MRI radiomic models may be helpful for distinguishing high-grade and low-grade PCa that outperformed the PI-RADS V2.1 scores based on mp-MRI. The machine learning algorithm RF model was slightly better.

Download Full-text

Machine learning algorithm to predict delirium from emergency department data

10.1101/2021.02.19.21251956 ◽

2021 ◽

Author(s):

Sangil Lee ◽

Brianna Mueller ◽

W. Nick Street ◽

Ryan M. Carnahan

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Logistic Regression ◽

Random Forest ◽

Barthel Index ◽

Risk Model ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Sensitivity Threshold ◽

Mortality And Morbidity

AbstractIntroductionDelirium is a cerebral dysfunction seen commonly in the acute care setting. Delirium is associated with increased mortality and morbidity and is frequently missed in the emergency department (ED) by clinical gestalt alone. Identifying those at risk of delirium may help prioritize screening and interventions.ObjectiveOur objective was to identify clinically valuable predictive models for prevalent delirium within the first 24 hours of hospitalization based on the available data by assessing the performance of logistic regression and a variety of machine learning models.MethodsThis was a retrospective cohort study to develop and validate a predictive risk model to detect delirium using patient data obtained around an ED encounter. Data from electronic health records for patients hospitalized from the ED between January 1, 2014, and December 31, 2019, were extracted. Eligible patients were aged 65 or older, admitted to an inpatient unit from the emergency department, and had at least one DOSS assessment or CAM-ICU recorded while hospitalized. The outcome measure of this study was delirium within one day of hospitalization determined by a positive DOSS or CAM assessment. We developed the model with and without the Barthel index for activity of daily living, since this was measured after hospital admission.ResultsThe area under the ROC curves for delirium ranged from .69 to .77 without the Barthel index. Random forest and gradient-boosted machine showed the highest AUC of .77. At the 90% sensitivity threshold, gradient-boosted machine, random forest, and logistic regression achieved a specificity of 35%. After the Barthel index was included, random forest, gradient-boosted machine, and logistic regression models demonstrated the best predictive ability with respective AUCs of .85 to .86.ConclusionThis study demonstrated the use of machine learning algorithms to identify the combination of variables that are predictive of delirium within 24 hours of hospitalization from the ED.

Download Full-text

Analisis Klasifikasi Sentimen Terhadap Sekolah Daring pada Twitter Menggunakan Supervised Machine Learning

Jurnal Teknik Informatika dan Sistem Informasi ◽

10.28932/jutisi.v7i1.3216 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Ni Luh Putu Chandra Savitri ◽

Radya Amirur Rahman ◽

Reyhan Venyutzky ◽

Nur Aini Rakhmawati

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Teaching Method ◽

Learning Algorithm ◽

Confusion Matrix ◽

Machine Learning Algorithms ◽

Sentence Length ◽

Supervised Machine Learning ◽

Positive Side ◽

Online School

Covid-19 pandemic urges countries to limit interaction of their people to reduce transmission. Indonesia requires people to do activities at home, one of which is online school. Many people share their thoughts through social media Twitter. Therefore, authors conducted sentiment analysis using supervised machine learning algorithm to determine distribution of words used in commenting on online schools, relationship between sentence, length and sentiment, and best algorithms that can be used to get most accurate results. In this study, authors used the method of crawling with RapidMiner to get data from Twitter. Then authors do data cleansing, data processing with classification methods using Random Forest Classifier , Logistic Regression , BernoulliNB and SVC algorithm. After that authors evaluate using confusion matrix, accuracy rate and classification report. In this research, authors found there are positive, negative, and neutral sentiments expressed on the online school implementation through comments. Authors ranked top three most used words used to express positive sentiments which includes bahagia, rajin and senang. On negative sentiments, top three words are capek, muak and bosen. On neutral sentiments, top three words are tidur, capek, and buka. Lengthy Tweets are usually imbued with negative remarks. On the other hand, the tweet tends to be positive and neutral tweet is usually stable. Authors conclude that the weakness of online school is the amount of workload that makes students tired alongside ineffective teaching method which makes it hard for students to understand the material given by school. However, on the positive side, some people agree with policies that are implemented and they feel like they gained some benefits from the implementation. From the four supervised machine learning algorithms that have been tested, Logistic Regression shows the highest accuracy, 0,87. The analysis shows that society tends to be neutral to the implementation of online school.

Download Full-text

Detecting Malicious Applications on Android is based on Static Analysis using Machine Learning Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f3631.049620 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1283-1287

Keyword(s):

Machine Learning ◽

Operating System ◽

Mobile Devices ◽

Static Analysis ◽

Learning Algorithm ◽

Malicious Code ◽

Machine Learning Algorithms ◽

Unusual Behavior ◽

Android Apps ◽

Android Operating System

Attacks on users through mobile devices in general, and mobile devices with Android operating system in particular, have been causing many serious consequences. Research [1] lists the vulnerabilities found in the Android operating system, making it the preferred target of cyberattackers. Report [2] statistics the number of cyberattacks via mobile devices and mobile devices using Android operating system. The report points out the insecurity of information from applications downloaded by users from Android apps stores. Therefore, to prevent the attack and distribution of malware through Android apps, it is necessary to research the method of detecting malicious code from the time users download applications to their devices. Recent approaches often rely on static analysis and dynamic analysis to look for unusual behavior in applications. In this paper, we will propose the use of static analysis techniques to build a behavior of malicious code in the application and machine learning algorithms to detect malicious behavior.

Download Full-text

Credit Scoring Using Machine Learning by Combing Social Network Information: Evidence from Peer-to-Peer Lending

Information ◽

10.3390/info10120397 ◽

2019 ◽

Vol 10 (12) ◽

pp. 397 ◽

Cited By ~ 1

Author(s):

Beibei Niu ◽

Jinzheng Ren ◽

Xiaotao Li

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Social Network ◽

Financial Institutions ◽

Learning Algorithm ◽

Credit Scoring ◽

Peer To Peer ◽

Machine Learning Algorithms ◽

Loan Default ◽

Network Information

Financial institutions use credit scoring to evaluate potential loan default risks. However, insufficient credit information limits the peer-to-peer (P2P) lending platform’s capacity to build effective credit scoring. In recent years, many types of data are used for credit scoring to compensate for the lack of credit history data. Whether social network information can be used to strengthen financial institutions’ predictive power has received much attention in the industry and academia. The aim of this study is to test the reliability of social network information in predicting loan default. We extract borrowers’ social network information from mobile phones and then use logistic regression to test the relationship between social network information and loan default. Three machine learning algorithms—random forest, AdaBoost, and LightGBM—were constructed to demonstrate the predictive performance of social network information. The logistic regression results show that there is a statistically significant correlation between social network information and loan default. The machine learning algorithm results show that social network information can improve loan default prediction performance significantly. The experiment results suggest that social network information is valuable for credit scoring.

Download Full-text

Predicting the risk of asthma attacks in children, adolescents and adults: protocol for a machine learning algorithm derived from a primary care-based retrospective cohort

BMJ Open ◽

10.1136/bmjopen-2019-036099 ◽

2020 ◽

Vol 10 (7) ◽

pp. e036099

Author(s):

Zain Hussain ◽

Syed Ahmar Shah ◽

Mome Mukherjee ◽

Aziz Sheikh

Keyword(s):

Machine Learning ◽

Primary Care ◽

Logistic Regression ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Research Database ◽

Data Ethics ◽

Adolescents And Adults ◽

And Performance

IntroductionMost asthma attacks and subsequent deaths are potentially preventable. We aim to develop a prognostic tool for identifying patients at high risk of asthma attacks in primary care by leveraging advances in machine learning.Methods and analysisCurrent prognostic tools use logistic regression to develop a risk scoring model for asthma attacks. We propose to build on this by systematically applying various well-known machine learning techniques to a large longitudinal deidentified primary care database, the Optimum Patient Care Research Database, and comparatively evaluate their performance with the existing logistic regression model and against each other. Machine learning algorithms vary in their predictive abilities based on the dataset and the approach to analysis employed. We will undertake feature selection, classification (both one-class and two-class classifiers) and performance evaluation. Patients who have had actively treated clinician-diagnosed asthma, aged 8–80 years and with 3 years of continuous data, from 2016 to 2018, will be selected. Risk factors will be obtained from the first year, while the next 2 years will form the outcome period, in which the primary endpoint will be the occurrence of an asthma attack.Ethics and disseminationWe have obtained approval from OPCRD’s Anonymous Data Ethics Protocols and Transparency (ADEPT) Committee. We will seek ethics approval from The University of Edinburgh’s Research Ethics Group (UREG). We aim to present our findings at scientific conferences and in peer-reviewed journals.

Download Full-text