scholarly journals Google Play Content Scraping and Knowledge Engineering using Natural Language Processing Techniques with the Analysis of User Reviews

2020 ◽  
Vol 30 (1) ◽  
pp. 192-208 ◽  
Author(s):  
Hamza Aldabbas ◽  
Abdullah Bajahzar ◽  
Meshrif Alruily ◽  
Ali Adil Qureshi ◽  
Rana M. Amir Latif ◽  
...  

Abstract To maintain the competitive edge and evaluating the needs of the quality app is in the mobile application market. The user’s feedback on these applications plays an essential role in the mobile application development industry. The rapid growth of web technology gave people an opportunity to interact and express their review, rate and share their feedback about applications. In this paper we have scrapped 506259 of user reviews and applications rate from Google Play Store from 14 different categories. The statistical information was measured in the results using different of common machine learning algorithms such as the Logistic Regression, Random Forest Classifier, and Multinomial Naïve Bayes. Different parameters including the accuracy, precision, recall, and F1 score were used to evaluate Bigram, Trigram, and N-gram, and the statistical result of these algorithms was compared. The analysis of each algorithm, one by one, is performed, and the result has been evaluated. It is concluded that logistic regression is the best algorithm for review analysis of the Google Play Store applications. The results have been checked scientifically, and it is found that the accuracy of the logistic regression algorithm for analyzing different reviews based on three classes, i.e., positive, negative, and neutral.

Author(s):  
Abdul Karim ◽  
Azhari Azhari ◽  
Samir Brahim Belhaouri ◽  
Ali Adil Qureshi

The fact is quite transparent that almost everybody around the world is using android apps. Half of the population of this planet is associated with messaging, social media, gaming, and browsers. This online marketplace provides free and paid access to users. On the Google Play store, users are encouraged to download countless of applications belonging to predefined categories. In this research paper, we have scrapped thousands of users reviews and app ratings. We have scrapped 148 apps’ reviews from 14 categories. We have collected 506259 reviews from Google play store and subsequently checked the semantics of reviews about some applications form users to determine whether reviews are positive, negative, or neutral. We have evaluated the results by using different machine learning algorithms like Naïve Bayes, Random Forest, and Logistic Regression algorithm. we have calculated Term Frequency (TF) and Inverse Document Frequency (IDF) with different parameters like accuracy, precision, recall, and F1 and compared the statistical result of these algorithms. We have visualized these statistical results in the form of a bar chart. In this paper, the analysis of each algorithm is performed one by one, and the results have been compared. Eventually, We've discovered that Logistic Regression is the best algorithm for a review-analysis of all Google play store. We have proved that Logistic Regression gets the speed of precision, accuracy, recall, and F1 in both after preprocessing and data collection of this dataset.


Algorithms ◽  
2020 ◽  
Vol 13 (8) ◽  
pp. 202
Author(s):  
Abdul Karim ◽  
Azhari Azhari ◽  
Samir Brahim Belhaouri ◽  
Ali Adil Qureshi ◽  
Maqsood Ahmad

Android-based applications are widely used by almost everyone around the globe. Due to the availability of the Internet almost everywhere at no charge, almost half of the globe is engaged with social networking, social media surfing, messaging, browsing and plugins. In the Google Play Store, which is one of the most popular Internet application stores, users are encouraged to download thousands of applications and various types of software. In this research study, we have scraped thousands of user reviews and the ratings of different applications. We scraped 148 application reviews from 14 different categories. A total of 506,259 reviews were accumulated and assessed. Based on the semantics of reviews of the applications, the results of the reviews were classified negative, positive or neutral. In this research, different machine-learning algorithms such as logistic regression, random forest and naïve Bayes were tuned and tested. We also evaluated the outcome of term frequency (TF) and inverse document frequency (IDF), measured different parameters such as accuracy, precision, recall and F1 score (F1) and present the results in the form of a bar graph. In conclusion, we compared the outcome of each algorithm and found that logistic regression is one of the best algorithms for the review-analysis of the Google Play Store from an accuracy perspective. Furthermore, we were able to prove and demonstrate that logistic regression is better in terms of speed, rate of accuracy, recall and F1 perspective. This conclusion was achieved after preprocessing a number of data values from these data sets.


Mousaion ◽  
2019 ◽  
Vol 36 (3) ◽  
Author(s):  
Chimango Nyasulu ◽  
Winner Chawinga ◽  
George Chipeta

Governments the world over are increasingly challenging universities to produce human resources with the right skills sets and knowledge required to drive their economies in this twenty-first century. It therefore becomes important for universities to produce graduates that bring tangible and meaningful contributions to the economies. Graduate tracer studies are hailed to be one of the ways in which universities can respond and reposition themselves to the actual needs of the industry. It is against this background that this study was conducted to establish the relevance of the Department of Information and Communication Technology at Mzuzu University to the Malawian economy by systematically investigating occupations of its former students after graduating from the University. The study adopted a quantitative design by distributing an online-based questionnaire with predominantly closed-ended questions. The study focused on three key objectives: to identify key employing sectors of ICT graduates, to gauge the relevance of the ICT programme to its former students’ jobs and businesses, and to establish the level of satisfaction of the ICT curriculum from the perspectives of former ICT graduates. The key findings from the study are that the ICT programme is relevant to the industry. However, some respondents were of the view that the curriculum should be strengthened by revising it through an addition of courses such as Mobile Application Development, Machine Learning, Natural Language Processing, Data Mining, and LINUX Administration to keep abreast with the ever-changing ICT trends and job requirements. The study strongly recommends the need for regular reviews of the curriculum so that it is continually responding to and matches the needs of the industry.


Author(s):  
Saugata Bose ◽  
Ritambhra Korpal

In this chapter, an initiative is proposed where natural language processing (NLP) techniques and supervised machine learning algorithms have been combined to detect external plagiarism. The major emphasis is on to construct a framework to detect plagiarism from monolingual texts by implementing n-gram frequency comparison approach. The framework is based on 120 characteristics which have been extracted during pre-processing steps using simple NLP approach. Afterward, filter metrics has been applied to select most relevant features and supervised classification learning algorithm has been used later to classify the documents in four levels of plagiarism. Then, confusion matrix was built to estimate the false positives and false negatives. Finally, the authors have shown C4.5 decision tree-based classifier's suitability on calculating accuracy over naive Bayes. The framework achieved 89% accuracy with low false positive and false negative rate and it shows higher precision and recall value comparing to passage similarities method, sentence similarity method, and search space reduction method.


Scientific Knowledge and Electronic devices are growing day by day. In this aspect, many expert systems are involved in the healthcare industry using machine learning algorithms. Deep neural networks beat the machine learning techniques and often take raw data i.e., unrefined data to calculate the target output. Deep learning or feature learning is used to focus on features which is very important and gives a complete understanding of the model generated. Existing methodology used data mining technique like rule based classification algorithm and machine learning algorithm like hybrid logistic regression algorithm to preprocess data and extract meaningful insights of data. This is, however a supervised data. The proposed work is based on unsupervised data that is there is no labelled data and deep neural techniques is deployed to get the target output. Machine learning algorithms are compared with proposed deep learning techniques using TensorFlow and Keras in the aspect of accuracy. Deep learning methodology outfits the existing rule based classification and hybrid logistic regression algorithm in terms of accuracy. The designed methodology is tested on the public MIT-BIH arrhythmia database, classifying four kinds of abnormal beats. The proposed approach based on deep learning technique offered a better performance, improving the results when compared to machine learning approaches of the state-of-the-art


Author(s):  
Abdul Karim ◽  
Azhari Azhari ◽  
Meshrif Alruily ◽  
Hamza Aldabbas ◽  
Samir Brahim Belhaouri ◽  
...  

Google play store allow the user to download a mobile application (app) and user get inspired by the rating and reviews of the mobile app. A recent study analyzes that user preferences, user opinion for improvement, user sentiment about particular feature and detail with descriptions of experiences are very useful for an application developer. However, many application reviews are very large and difficult to process manually. Star rating is given of the whole application and the developer cannot analyze the single feature. In this research, we have scrapped 282,231 user reviews through different data scraping techniques. We have applied the text classification on these user reviews. We have applied different algorithms and find the precision, accuracy, F1 score and recall. In evaluated results, we have to also find the best algorithm.


Author(s):  
MPS Bhatia ◽  
Akshi Kumar ◽  
Rohit Beniwal

Background: The App Stores, for example, Google Play and Apple Play Store provide a platform that allows users to provide feedback on the apps in the form of reviews. An app review typically includes star rating followed by a comment. Recent studies have shown that these reviews possess a vital source of information that can be used by the app developers and the vendors for improving the future versions of an app. However, in most of the cases, these reviews are present in the unstructured form and extracting useful information from them requires a great effort. Objective: This article provides an optimized classification approach that automatically classifies the reviews into a bug report, feature request, and shortcoming & improvement request relevant to Requirement Engineering. Method: Our methodology merges three techniques, namely (1) Text Analysis, (2) Natural Language Processing, and (3) Sentiment Analysis to extract features set, which is then used to automatically classify app reviews into their relevant categories. Results: Result shows that we achieved best results with precision of 67.8 % and recall of 41.5 % with Logistic Regression Machine Learning technique, which we further optimized with PSO nature-inspired algorithm, i.e., with Logistic Regression + PSO, thus, resulting in a precision of 74.4 % and recall of 45.0 %. Conclusion: This optimized automatic classification improves the Requirement Engineering where developer straightforwardly knows what to improve further in the concerned app.


2018 ◽  
Author(s):  
Jinying Chen ◽  
John Lalor ◽  
Weisong Liu ◽  
Emily Druhl ◽  
Edgard Granillo ◽  
...  

BACKGROUND Improper dosing of medications such as insulin can cause hypoglycemic episodes, which may lead to severe morbidity or even death. Although secure messaging was designed for exchanging nonurgent messages, patients sometimes report hypoglycemia events through secure messaging. Detecting these patient-reported adverse events may help alert clinical teams and enable early corrective actions to improve patient safety. OBJECTIVE We aimed to develop a natural language processing system, called HypoDetect (Hypoglycemia Detector), to automatically identify hypoglycemia incidents reported in patients’ secure messages. METHODS An expert in public health annotated 3000 secure message threads between patients with diabetes and US Department of Veterans Affairs clinical teams as containing patient-reported hypoglycemia incidents or not. A physician independently annotated 100 threads randomly selected from this dataset to determine interannotator agreement. We used this dataset to develop and evaluate HypoDetect. HypoDetect incorporates 3 machine learning algorithms widely used for text classification: linear support vector machines, random forest, and logistic regression. We explored different learning features, including new knowledge-driven features. Because only 114 (3.80%) messages were annotated as positive, we investigated cost-sensitive learning and oversampling methods to mitigate the challenge of imbalanced data. RESULTS The interannotator agreement was Cohen kappa=.976. Using cross-validation, logistic regression with cost-sensitive learning achieved the best performance (area under the receiver operating characteristic curve=0.954, sensitivity=0.693, specificity 0.974, F1 score=0.590). Cost-sensitive learning and the ensembled synthetic minority oversampling technique improved the sensitivity of the baseline systems substantially (by 0.123 to 0.728 absolute gains). Our results show that a variety of features contributed to the best performance of HypoDetect. CONCLUSIONS Despite the challenge of data imbalance, HypoDetect achieved promising results for the task of detecting hypoglycemia incidents from secure messages. The system has a great potential to facilitate early detection and treatment of hypoglycemia.


Author(s):  
Abdul Karim ◽  
SAMIR BRAHIM BELHAOUARI ◽  
Azhari SN ◽  
Ali Adil Qureshi

Google play store allow the user to download a mobile application (app) and user get inspired by the rating and reviews of the mobile app. A recent study analyzes that user preferences, user opinion for improvement, user sentiment about particular feature and detail with descriptions of experiences are very useful for an application developer. However, many application reviews are very large and difficult to process manually. Star rating is given of the whole application and the developer cannot analyze the single feature. In this research, we have scrapped 282,231 user reviews through different data scraping techniques. We have applied the text classification on these user reviews. We have applied different algorithms and find the precision, accuracy, F1 score and recall. In evaluated results, we have to find the best algorithm.


Author(s):  
Saugata Bose ◽  
Ritambhra Korpal

In this chapter, an initiative is proposed where natural language processing (NLP) techniques and supervised machine learning algorithms have been combined to detect external plagiarism. The major emphasis is on to construct a framework to detect plagiarism from monolingual texts by implementing n-gram frequency comparison approach. The framework is based on 120 characteristics which have been extracted during pre-processing steps using simple NLP approach. Afterward, filter metrics has been applied to select most relevant features and supervised classification learning algorithm has been used later to classify the documents in four levels of plagiarism. Then, confusion matrix was built to estimate the false positives and false negatives. Finally, the authors have shown C4.5 decision tree-based classifier's suitability on calculating accuracy over naive Bayes. The framework achieved 89% accuracy with low false positive and false negative rate and it shows higher precision and recall value comparing to passage similarities method, sentence similarity method, and search space reduction method.


Sign in / Sign up

Export Citation Format

Share Document