scholarly journals Machine Learning Algorithm’s Measurement and Analytical Visualization of User’s Reviews for Google Play Store

Author(s):  
Abdul Karim ◽  
Azhari Azhari ◽  
Samir Brahim Belhaouri ◽  
Ali Adil Qureshi

The fact is quite transparent that almost everybody around the world is using android apps. Half of the population of this planet is associated with messaging, social media, gaming, and browsers. This online marketplace provides free and paid access to users. On the Google Play store, users are encouraged to download countless of applications belonging to predefined categories. In this research paper, we have scrapped thousands of users reviews and app ratings. We have scrapped 148 apps’ reviews from 14 categories. We have collected 506259 reviews from Google play store and subsequently checked the semantics of reviews about some applications form users to determine whether reviews are positive, negative, or neutral. We have evaluated the results by using different machine learning algorithms like Naïve Bayes, Random Forest, and Logistic Regression algorithm. we have calculated Term Frequency (TF) and Inverse Document Frequency (IDF) with different parameters like accuracy, precision, recall, and F1 and compared the statistical result of these algorithms. We have visualized these statistical results in the form of a bar chart. In this paper, the analysis of each algorithm is performed one by one, and the results have been compared. Eventually, We've discovered that Logistic Regression is the best algorithm for a review-analysis of all Google play store. We have proved that Logistic Regression gets the speed of precision, accuracy, recall, and F1 in both after preprocessing and data collection of this dataset.

Algorithms ◽  
2020 ◽  
Vol 13 (8) ◽  
pp. 202
Author(s):  
Abdul Karim ◽  
Azhari Azhari ◽  
Samir Brahim Belhaouri ◽  
Ali Adil Qureshi ◽  
Maqsood Ahmad

Android-based applications are widely used by almost everyone around the globe. Due to the availability of the Internet almost everywhere at no charge, almost half of the globe is engaged with social networking, social media surfing, messaging, browsing and plugins. In the Google Play Store, which is one of the most popular Internet application stores, users are encouraged to download thousands of applications and various types of software. In this research study, we have scraped thousands of user reviews and the ratings of different applications. We scraped 148 application reviews from 14 different categories. A total of 506,259 reviews were accumulated and assessed. Based on the semantics of reviews of the applications, the results of the reviews were classified negative, positive or neutral. In this research, different machine-learning algorithms such as logistic regression, random forest and naïve Bayes were tuned and tested. We also evaluated the outcome of term frequency (TF) and inverse document frequency (IDF), measured different parameters such as accuracy, precision, recall and F1 score (F1) and present the results in the form of a bar graph. In conclusion, we compared the outcome of each algorithm and found that logistic regression is one of the best algorithms for the review-analysis of the Google Play Store from an accuracy perspective. Furthermore, we were able to prove and demonstrate that logistic regression is better in terms of speed, rate of accuracy, recall and F1 perspective. This conclusion was achieved after preprocessing a number of data values from these data sets.


2020 ◽  
Vol 30 (1) ◽  
pp. 192-208 ◽  
Author(s):  
Hamza Aldabbas ◽  
Abdullah Bajahzar ◽  
Meshrif Alruily ◽  
Ali Adil Qureshi ◽  
Rana M. Amir Latif ◽  
...  

Abstract To maintain the competitive edge and evaluating the needs of the quality app is in the mobile application market. The user’s feedback on these applications plays an essential role in the mobile application development industry. The rapid growth of web technology gave people an opportunity to interact and express their review, rate and share their feedback about applications. In this paper we have scrapped 506259 of user reviews and applications rate from Google Play Store from 14 different categories. The statistical information was measured in the results using different of common machine learning algorithms such as the Logistic Regression, Random Forest Classifier, and Multinomial Naïve Bayes. Different parameters including the accuracy, precision, recall, and F1 score were used to evaluate Bigram, Trigram, and N-gram, and the statistical result of these algorithms was compared. The analysis of each algorithm, one by one, is performed, and the result has been evaluated. It is concluded that logistic regression is the best algorithm for review analysis of the Google Play Store applications. The results have been checked scientifically, and it is found that the accuracy of the logistic regression algorithm for analyzing different reviews based on three classes, i.e., positive, negative, and neutral.


2021 ◽  
Vol 143 (2) ◽  
Author(s):  
Joaquin E. Moran ◽  
Yasser Selima

Abstract Fluidelastic instability (FEI) in tube arrays has been studied extensively experimentally and theoretically for the last 50 years, due to its potential to cause significant damage in short periods. Incidents similar to those observed at San Onofre Nuclear Generating Station indicate that the problem is not yet fully understood, probably due to the large number of factors affecting the phenomenon. In this study, a new approach for the analysis and interpretation of FEI data using machine learning (ML) algorithms is explored. FEI data for both single and two-phase flows have been collected from the literature and utilized for training a machine learning algorithm in order to either provide estimates of the reduced velocity (single and two-phase) or indicate if the bundle is stable or unstable under certain conditions (two-phase). The analysis included the use of logistic regression as a classification algorithm for two-phase flow problems to determine if specific conditions produce a stable or unstable response. The results of this study provide some insight into the capability and potential of logistic regression models to analyze FEI if appropriate quantities of experimental data are available.


Scientific Knowledge and Electronic devices are growing day by day. In this aspect, many expert systems are involved in the healthcare industry using machine learning algorithms. Deep neural networks beat the machine learning techniques and often take raw data i.e., unrefined data to calculate the target output. Deep learning or feature learning is used to focus on features which is very important and gives a complete understanding of the model generated. Existing methodology used data mining technique like rule based classification algorithm and machine learning algorithm like hybrid logistic regression algorithm to preprocess data and extract meaningful insights of data. This is, however a supervised data. The proposed work is based on unsupervised data that is there is no labelled data and deep neural techniques is deployed to get the target output. Machine learning algorithms are compared with proposed deep learning techniques using TensorFlow and Keras in the aspect of accuracy. Deep learning methodology outfits the existing rule based classification and hybrid logistic regression algorithm in terms of accuracy. The designed methodology is tested on the public MIT-BIH arrhythmia database, classifying four kinds of abnormal beats. The proposed approach based on deep learning technique offered a better performance, improving the results when compared to machine learning approaches of the state-of-the-art


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Li Zhang ◽  
Xia Zhe ◽  
Min Tang ◽  
Jing Zhang ◽  
Jialiang Ren ◽  
...  

Purpose. This study aimed to investigate the value of biparametric magnetic resonance imaging (bp-MRI)-based radiomics signatures for the preoperative prediction of prostate cancer (PCa) grade compared with visual assessments by radiologists based on the Prostate Imaging Reporting and Data System Version 2.1 (PI-RADS V2.1) scores of multiparametric MRI (mp-MRI). Methods. This retrospective study included 142 consecutive patients with histologically confirmed PCa who were undergoing mp-MRI before surgery. MRI images were scored and evaluated by two independent radiologists using PI-RADS V2.1. The radiomics workflow was divided into five steps: (a) image selection and segmentation, (b) feature extraction, (c) feature selection, (d) model establishment, and (e) model evaluation. Three machine learning algorithms (random forest tree (RF), logistic regression, and support vector machine (SVM)) were constructed to differentiate high-grade from low-grade PCa. Receiver operating characteristic (ROC) analysis was used to compare the machine learning-based analysis of bp-MRI radiomics models with PI-RADS V2.1. Results. In all, 8 stable radiomics features out of 804 extracted features based on T2-weighted imaging (T2WI) and ADC sequences were selected. Radiomics signatures successfully categorized high-grade and low-grade PCa cases ( P < 0.05 ) in both the training and test datasets. The radiomics model-based RF method (area under the curve, AUC: 0.982; 0.918), logistic regression (AUC: 0.886; 0.886), and SVM (AUC: 0.943; 0.913) in both the training and test cohorts had better diagnostic performance than PI-RADS V2.1 (AUC: 0.767; 0.813) when predicting PCa grade. Conclusions. The results of this clinical study indicate that machine learning-based analysis of bp-MRI radiomic models may be helpful for distinguishing high-grade and low-grade PCa that outperformed the PI-RADS V2.1 scores based on mp-MRI. The machine learning algorithm RF model was slightly better.


2021 ◽  
Author(s):  
Sangil Lee ◽  
Brianna Mueller ◽  
W. Nick Street ◽  
Ryan M. Carnahan

AbstractIntroductionDelirium is a cerebral dysfunction seen commonly in the acute care setting. Delirium is associated with increased mortality and morbidity and is frequently missed in the emergency department (ED) by clinical gestalt alone. Identifying those at risk of delirium may help prioritize screening and interventions.ObjectiveOur objective was to identify clinically valuable predictive models for prevalent delirium within the first 24 hours of hospitalization based on the available data by assessing the performance of logistic regression and a variety of machine learning models.MethodsThis was a retrospective cohort study to develop and validate a predictive risk model to detect delirium using patient data obtained around an ED encounter. Data from electronic health records for patients hospitalized from the ED between January 1, 2014, and December 31, 2019, were extracted. Eligible patients were aged 65 or older, admitted to an inpatient unit from the emergency department, and had at least one DOSS assessment or CAM-ICU recorded while hospitalized. The outcome measure of this study was delirium within one day of hospitalization determined by a positive DOSS or CAM assessment. We developed the model with and without the Barthel index for activity of daily living, since this was measured after hospital admission.ResultsThe area under the ROC curves for delirium ranged from .69 to .77 without the Barthel index. Random forest and gradient-boosted machine showed the highest AUC of .77. At the 90% sensitivity threshold, gradient-boosted machine, random forest, and logistic regression achieved a specificity of 35%. After the Barthel index was included, random forest, gradient-boosted machine, and logistic regression models demonstrated the best predictive ability with respective AUCs of .85 to .86.ConclusionThis study demonstrated the use of machine learning algorithms to identify the combination of variables that are predictive of delirium within 24 hours of hospitalization from the ED.


Author(s):  
Ni Luh Putu Chandra Savitri ◽  
Radya Amirur Rahman ◽  
Reyhan Venyutzky ◽  
Nur Aini Rakhmawati

Covid-19 pandemic urges countries to limit interaction of their people to reduce transmission. Indonesia requires people to do activities at home, one of which is online school. Many people share their thoughts through social media Twitter. Therefore, authors conducted sentiment analysis using supervised machine learning algorithm to determine distribution of words used in commenting on online schools, relationship between sentence, length and sentiment, and best algorithms that can be used to get most accurate results. In this study, authors used the method of crawling with RapidMiner to get data from Twitter. Then authors do data cleansing, data processing with classification methods using Random Forest Classifier , Logistic Regression , BernoulliNB and SVC algorithm. After that authors evaluate using confusion matrix, accuracy rate and classification report. In this research, authors found there are positive, negative, and neutral sentiments expressed on the online school implementation through comments. Authors ranked top three most used words used to express positive sentiments which includes bahagia, rajin and senang. On negative sentiments, top three words are capek, muak and bosen. On neutral sentiments, top three words are tidur, capek, and buka. Lengthy Tweets are usually imbued with negative remarks. On the other hand, the tweet tends to be positive and neutral tweet is usually stable. Authors conclude that the weakness of online school is the amount of workload that makes students tired alongside ineffective teaching method which makes it hard for students to understand the material given by school. However, on the positive side, some people agree with policies that are implemented and they feel like they gained some benefits from the implementation. From the four supervised machine learning algorithms that have been tested, Logistic Regression shows the highest accuracy, 0,87. The analysis shows that society tends to be neutral to the implementation of online school.


Attacks on users through mobile devices in general, and mobile devices with Android operating system in particular, have been causing many serious consequences. Research [1] lists the vulnerabilities found in the Android operating system, making it the preferred target of cyberattackers. Report [2] statistics the number of cyberattacks via mobile devices and mobile devices using Android operating system. The report points out the insecurity of information from applications downloaded by users from Android apps stores. Therefore, to prevent the attack and distribution of malware through Android apps, it is necessary to research the method of detecting malicious code from the time users download applications to their devices. Recent approaches often rely on static analysis and dynamic analysis to look for unusual behavior in applications. In this paper, we will propose the use of static analysis techniques to build a behavior of malicious code in the application and machine learning algorithms to detect malicious behavior.


Information ◽  
2019 ◽  
Vol 10 (12) ◽  
pp. 397 ◽  
Author(s):  
Beibei Niu ◽  
Jinzheng Ren ◽  
Xiaotao Li

Financial institutions use credit scoring to evaluate potential loan default risks. However, insufficient credit information limits the peer-to-peer (P2P) lending platform’s capacity to build effective credit scoring. In recent years, many types of data are used for credit scoring to compensate for the lack of credit history data. Whether social network information can be used to strengthen financial institutions’ predictive power has received much attention in the industry and academia. The aim of this study is to test the reliability of social network information in predicting loan default. We extract borrowers’ social network information from mobile phones and then use logistic regression to test the relationship between social network information and loan default. Three machine learning algorithms—random forest, AdaBoost, and LightGBM—were constructed to demonstrate the predictive performance of social network information. The logistic regression results show that there is a statistically significant correlation between social network information and loan default. The machine learning algorithm results show that social network information can improve loan default prediction performance significantly. The experiment results suggest that social network information is valuable for credit scoring.


BMJ Open ◽  
2020 ◽  
Vol 10 (7) ◽  
pp. e036099
Author(s):  
Zain Hussain ◽  
Syed Ahmar Shah ◽  
Mome Mukherjee ◽  
Aziz Sheikh

IntroductionMost asthma attacks and subsequent deaths are potentially preventable. We aim to develop a prognostic tool for identifying patients at high risk of asthma attacks in primary care by leveraging advances in machine learning.Methods and analysisCurrent prognostic tools use logistic regression to develop a risk scoring model for asthma attacks. We propose to build on this by systematically applying various well-known machine learning techniques to a large longitudinal deidentified primary care database, the Optimum Patient Care Research Database, and comparatively evaluate their performance with the existing logistic regression model and against each other. Machine learning algorithms vary in their predictive abilities based on the dataset and the approach to analysis employed. We will undertake feature selection, classification (both one-class and two-class classifiers) and performance evaluation. Patients who have had actively treated clinician-diagnosed asthma, aged 8–80 years and with 3 years of continuous data, from 2016 to 2018, will be selected. Risk factors will be obtained from the first year, while the next 2 years will form the outcome period, in which the primary endpoint will be the occurrence of an asthma attack.Ethics and disseminationWe have obtained approval from OPCRD’s Anonymous Data Ethics Protocols and Transparency (ADEPT) Committee. We will seek ethics approval from The University of Edinburgh’s Research Ethics Group (UREG). We aim to present our findings at scientific conferences and in peer-reviewed journals.


Sign in / Sign up

Export Citation Format

Share Document