Google Play Content Scraping and Knowledge Engineering using Natural Language Processing Techniques with the Analysis of User Reviews

The fact is quite transparent that almost everybody around the world is using android apps. Half of the population of this planet is associated with messaging, social media, gaming, and browsers. This online marketplace provides free and paid access to users. On the Google Play store, users are encouraged to download countless of applications belonging to predefined categories. In this research paper, we have scrapped thousands of users reviews and app ratings. We have scrapped 148 apps’ reviews from 14 categories. We have collected 506259 reviews from Google play store and subsequently checked the semantics of reviews about some applications form users to determine whether reviews are positive, negative, or neutral. We have evaluated the results by using different machine learning algorithms like Naïve Bayes, Random Forest, and Logistic Regression algorithm. we have calculated Term Frequency (TF) and Inverse Document Frequency (IDF) with different parameters like accuracy, precision, recall, and F1 and compared the statistical result of these algorithms. We have visualized these statistical results in the form of a bar chart. In this paper, the analysis of each algorithm is performed one by one, and the results have been compared. Eventually, We've discovered that Logistic Regression is the best algorithm for a review-analysis of all Google play store. We have proved that Logistic Regression gets the speed of precision, accuracy, recall, and F1 in both after preprocessing and data collection of this dataset.

Download Full-text

Methodology for Analyzing the Traditional Algorithms Performance of User Reviews Using Machine Learning Techniques

Algorithms ◽

10.3390/a13080202 ◽

2020 ◽

Vol 13 (8) ◽

pp. 202

Author(s):

Abdul Karim ◽

Azhari Azhari ◽

Samir Brahim Belhaouri ◽

Ali Adil Qureshi ◽

Maqsood Ahmad

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Data Sets ◽

User Reviews ◽

Almost Everywhere ◽

Document Frequency ◽

Learning Techniques ◽

Google Play

Android-based applications are widely used by almost everyone around the globe. Due to the availability of the Internet almost everywhere at no charge, almost half of the globe is engaged with social networking, social media surfing, messaging, browsing and plugins. In the Google Play Store, which is one of the most popular Internet application stores, users are encouraged to download thousands of applications and various types of software. In this research study, we have scraped thousands of user reviews and the ratings of different applications. We scraped 148 application reviews from 14 different categories. A total of 506,259 reviews were accumulated and assessed. Based on the semantics of reviews of the applications, the results of the reviews were classified negative, positive or neutral. In this research, different machine-learning algorithms such as logistic regression, random forest and naïve Bayes were tuned and tested. We also evaluated the outcome of term frequency (TF) and inverse document frequency (IDF), measured different parameters such as accuracy, precision, recall and F1 score (F1) and present the results in the form of a bar graph. In conclusion, we compared the outcome of each algorithm and found that logistic regression is one of the best algorithms for the review-analysis of the Google Play Store from an accuracy perspective. Furthermore, we were able to prove and demonstrate that logistic regression is better in terms of speed, rate of accuracy, recall and F1 perspective. This conclusion was achieved after preprocessing a number of data values from these data sets.

Download Full-text

A Tracer Study of ICT Graduate Students at Mzuzu University, Malawi

Mousaion ◽

10.25159/2663-659x/5227 ◽

2019 ◽

Vol 36 (3) ◽

Author(s):

Chimango Nyasulu ◽

Winner Chawinga ◽

George Chipeta

Keyword(s):

Language Processing ◽

Mobile Application ◽

First Century ◽

Application Development ◽

Job Requirements ◽

Level Of Satisfaction ◽

Information And Communication ◽

Quantitative Design ◽

The Right ◽

The University

Governments the world over are increasingly challenging universities to produce human resources with the right skills sets and knowledge required to drive their economies in this twenty-first century. It therefore becomes important for universities to produce graduates that bring tangible and meaningful contributions to the economies. Graduate tracer studies are hailed to be one of the ways in which universities can respond and reposition themselves to the actual needs of the industry. It is against this background that this study was conducted to establish the relevance of the Department of Information and Communication Technology at Mzuzu University to the Malawian economy by systematically investigating occupations of its former students after graduating from the University. The study adopted a quantitative design by distributing an online-based questionnaire with predominantly closed-ended questions. The study focused on three key objectives: to identify key employing sectors of ICT graduates, to gauge the relevance of the ICT programme to its former students’ jobs and businesses, and to establish the level of satisfaction of the ICT curriculum from the perspectives of former ICT graduates. The key findings from the study are that the ICT programme is relevant to the industry. However, some respondents were of the view that the curriculum should be strengthened by revising it through an addition of courses such as Mobile Application Development, Machine Learning, Natural Language Processing, Data Mining, and LINUX Administration to keep abreast with the ever-changing ICT trends and job requirements. The study strongly recommends the need for regular reviews of the curriculum so that it is continually responding to and matches the needs of the industry.

Download Full-text

Machine-Learning-Based External Plagiarism Detecting Methodology From Monolingual Documents

Feature Dimension Reduction for Content-Based Image Identification - Advances in Multimedia and Interactive Technologies ◽

10.4018/978-1-5225-5775-3.ch007 ◽

2018 ◽

pp. 122-139

Author(s):

Saugata Bose ◽

Ritambhra Korpal

Keyword(s):

Machine Learning ◽

Language Processing ◽

Confusion Matrix ◽

False Negative ◽

False Negative Rate ◽

Search Space ◽

Machine Learning Algorithms ◽

C4.5 Decision Tree ◽

N Gram ◽

Four Levels

In this chapter, an initiative is proposed where natural language processing (NLP) techniques and supervised machine learning algorithms have been combined to detect external plagiarism. The major emphasis is on to construct a framework to detect plagiarism from monolingual texts by implementing n-gram frequency comparison approach. The framework is based on 120 characteristics which have been extracted during pre-processing steps using simple NLP approach. Afterward, filter metrics has been applied to select most relevant features and supervised classification learning algorithm has been used later to classify the documents in four levels of plagiarism. Then, confusion matrix was built to estimate the false positives and false negatives. Finally, the authors have shown C4.5 decision tree-based classifier's suitability on calculating accuracy over naive Bayes. The framework achieved 89% accuracy with low false positive and false negative rate and it shows higher precision and recall value comparing to passage similarities method, sentence similarity method, and search space reduction method.

Download Full-text

Deep Learning Technique to Predict Heart Disease using IoT Based ECG Data

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7166.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 2559-2562

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Deep Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Rule Based ◽

Learning Techniques ◽

Learning Technique ◽

Logistic Regression Algorithm ◽

Target Output

Scientific Knowledge and Electronic devices are growing day by day. In this aspect, many expert systems are involved in the healthcare industry using machine learning algorithms. Deep neural networks beat the machine learning techniques and often take raw data i.e., unrefined data to calculate the target output. Deep learning or feature learning is used to focus on features which is very important and gives a complete understanding of the model generated. Existing methodology used data mining technique like rule based classification algorithm and machine learning algorithm like hybrid logistic regression algorithm to preprocess data and extract meaningful insights of data. This is, however a supervised data. The proposed work is based on unsupervised data that is there is no labelled data and deep neural techniques is deployed to get the target output. Machine learning algorithms are compared with proposed deep learning techniques using TensorFlow and Keras in the aspect of accuracy. Deep learning methodology outfits the existing rule based classification and hybrid logistic regression algorithm in terms of accuracy. The designed methodology is tested on the public MIT-BIH arrhythmia database, classifying four kinds of abnormal beats. The proposed approach based on deep learning technique offered a better performance, improving the results when compared to machine learning approaches of the state-of-the-art

Download Full-text

Classification of Google Play Store Application Reviews Using Machine Learning

10.20944/preprints202007.0646.v1 ◽

2020 ◽

Author(s):

Abdul Karim ◽

Azhari Azhari ◽

Meshrif Alruily ◽

Hamza Aldabbas ◽

Samir Brahim Belhaouri ◽

...

Keyword(s):

Machine Learning ◽

Text Classification ◽

Mobile Application ◽

Mobile App ◽

User Preferences ◽

User Reviews ◽

Star Rating ◽

Single Feature ◽

Google Play

Google play store allow the user to download a mobile application (app) and user get inspired by the rating and reviews of the mobile app. A recent study analyzes that user preferences, user opinion for improvement, user sentiment about particular feature and detail with descriptions of experiences are very useful for an application developer. However, many application reviews are very large and difficult to process manually. Star rating is given of the whole application and the developer cannot analyze the single feature. In this research, we have scrapped 282,231 user reviews through different data scraping techniques. We have applied the text classification on these user reviews. We have applied different algorithms and find the precision, accuracy, F1 score and recall. In evaluated results, we have to also find the best algorithm.

Download Full-text

An Optimized Classification of Apps Reviews for Improving Requirement Engineering

Recent Patents on Computer Science ◽

10.2174/2213275912666190716114919 ◽

2019 ◽

Vol 12 ◽

Cited By ~ 1

Author(s):

MPS Bhatia ◽

Akshi Kumar ◽

Rohit Beniwal

Keyword(s):

Logistic Regression ◽

Language Processing ◽

Requirement Engineering ◽

Great Effort ◽

Bug Report ◽

Feature Request ◽

Nature Inspired Algorithm ◽

Google Play ◽

Source Of Information

Background: The App Stores, for example, Google Play and Apple Play Store provide a platform that allows users to provide feedback on the apps in the form of reviews. An app review typically includes star rating followed by a comment. Recent studies have shown that these reviews possess a vital source of information that can be used by the app developers and the vendors for improving the future versions of an app. However, in most of the cases, these reviews are present in the unstructured form and extracting useful information from them requires a great effort. Objective: This article provides an optimized classification approach that automatically classifies the reviews into a bug report, feature request, and shortcoming & improvement request relevant to Requirement Engineering. Method: Our methodology merges three techniques, namely (1) Text Analysis, (2) Natural Language Processing, and (3) Sentiment Analysis to extract features set, which is then used to automatically classify app reviews into their relevant categories. Results: Result shows that we achieved best results with precision of 67.8 % and recall of 41.5 % with Logistic Regression Machine Learning technique, which we further optimized with PSO nature-inspired algorithm, i.e., with Logistic Regression + PSO, thus, resulting in a precision of 74.4 % and recall of 45.0 %. Conclusion: This optimized automatic classification improves the Requirement Engineering where developer straightforwardly knows what to improve further in the concerned app.

Download Full-text

Detecting Hypoglycemia Incidents Reported in Patients’ Secure Messages: Using Cost-Sensitive Learning and Oversampling to Reduce Data Imbalance (Preprint)

10.2196/preprints.11990 ◽

2018 ◽

Author(s):

Jinying Chen ◽

John Lalor ◽

Weisong Liu ◽

Emily Druhl ◽

Edgard Granillo ◽

...

Keyword(s):

Logistic Regression ◽

Language Processing ◽

Improve Patient Safety ◽

Processing System ◽

Machine Learning Algorithms ◽

Support Vector ◽

Secure Messaging ◽

Cost Sensitive Learning ◽

Data Imbalance ◽

Patient Reported

BACKGROUND Improper dosing of medications such as insulin can cause hypoglycemic episodes, which may lead to severe morbidity or even death. Although secure messaging was designed for exchanging nonurgent messages, patients sometimes report hypoglycemia events through secure messaging. Detecting these patient-reported adverse events may help alert clinical teams and enable early corrective actions to improve patient safety. OBJECTIVE We aimed to develop a natural language processing system, called HypoDetect (Hypoglycemia Detector), to automatically identify hypoglycemia incidents reported in patients’ secure messages. METHODS An expert in public health annotated 3000 secure message threads between patients with diabetes and US Department of Veterans Affairs clinical teams as containing patient-reported hypoglycemia incidents or not. A physician independently annotated 100 threads randomly selected from this dataset to determine interannotator agreement. We used this dataset to develop and evaluate HypoDetect. HypoDetect incorporates 3 machine learning algorithms widely used for text classification: linear support vector machines, random forest, and logistic regression. We explored different learning features, including new knowledge-driven features. Because only 114 (3.80%) messages were annotated as positive, we investigated cost-sensitive learning and oversampling methods to mitigate the challenge of imbalanced data. RESULTS The interannotator agreement was Cohen kappa=.976. Using cross-validation, logistic regression with cost-sensitive learning achieved the best performance (area under the receiver operating characteristic curve=0.954, sensitivity=0.693, specificity 0.974, F1 score=0.590). Cost-sensitive learning and the ensembled synthetic minority oversampling technique improved the sensitivity of the baseline systems substantially (by 0.123 to 0.728 absolute gains). Our results show that a variety of features contributed to the best performance of HypoDetect. CONCLUSIONS Despite the challenge of data imbalance, HypoDetect achieved promising results for the task of detecting hypoglycemia incidents from secure messages. The system has a great potential to facilitate early detection and treatment of hypoglycemia.

Download Full-text

Classification of Google Play Store Application Reviews Using Machine Learning

10.20944/preprints202003.0231.v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Abdul Karim ◽

SAMIR BRAHIM BELHAOUARI ◽

Azhari SN ◽

Ali Adil Qureshi

Keyword(s):

Machine Learning ◽

Text Classification ◽

Mobile Application ◽

Mobile App ◽

User Preferences ◽

User Reviews ◽

Star Rating ◽

Single Feature ◽

Google Play

Google play store allow the user to download a mobile application (app) and user get inspired by the rating and reviews of the mobile app. A recent study analyzes that user preferences, user opinion for improvement, user sentiment about particular feature and detail with descriptions of experiences are very useful for an application developer. However, many application reviews are very large and difficult to process manually. Star rating is given of the whole application and the developer cannot analyze the single feature. In this research, we have scrapped 282,231 user reviews through different data scraping techniques. We have applied the text classification on these user reviews. We have applied different algorithms and find the precision, accuracy, F1 score and recall. In evaluated results, we have to find the best algorithm.

Download Full-text

Machine-Learning-Based External Plagiarism Detecting Methodology From Monolingual Documents

Scholarly Ethics and Publishing ◽

10.4018/978-1-5225-8057-7.ch021 ◽

2019 ◽

pp. 442-458

Author(s):

Saugata Bose ◽

Ritambhra Korpal

Keyword(s):

Machine Learning ◽

Language Processing ◽

Confusion Matrix ◽

False Negative ◽

False Negative Rate ◽

Search Space ◽

Machine Learning Algorithms ◽

C4.5 Decision Tree ◽

N Gram ◽

Four Levels

In this chapter, an initiative is proposed where natural language processing (NLP) techniques and supervised machine learning algorithms have been combined to detect external plagiarism. The major emphasis is on to construct a framework to detect plagiarism from monolingual texts by implementing n-gram frequency comparison approach. The framework is based on 120 characteristics which have been extracted during pre-processing steps using simple NLP approach. Afterward, filter metrics has been applied to select most relevant features and supervised classification learning algorithm has been used later to classify the documents in four levels of plagiarism. Then, confusion matrix was built to estimate the false positives and false negatives. Finally, the authors have shown C4.5 decision tree-based classifier's suitability on calculating accuracy over naive Bayes. The framework achieved 89% accuracy with low false positive and false negative rate and it shows higher precision and recall value comparing to passage similarities method, sentence similarity method, and search space reduction method.

Download Full-text