Classifying Mobile Applications Using Word Embeddings

2022 ◽  
Vol 31 (2) ◽  
pp. 1-30
Author(s):  
Fahimeh Ebrahimi ◽  
Miroslav Tushev ◽  
Anas Mahmoud

Modern application stores enable developers to classify their apps by choosing from a set of generic categories, or genres, such as health, games, and music. These categories are typically static—new categories do not necessarily emerge over time to reflect innovations in the mobile software landscape. With thousands of apps classified under each category, locating apps that match a specific consumer interest can be a challenging task. To overcome this challenge, in this article, we propose an automated approach for classifying mobile apps into more focused categories of functionally related application domains. Our aim is to enhance apps visibility and discoverability. Specifically, we employ word embeddings to generate numeric semantic representations of app descriptions. These representations are then classified to generate more cohesive categories of apps. Our empirical investigation is conducted using a dataset of 600 apps, sampled from the Education, Health&Fitness, and Medical categories of the Apple App Store. The results show that our classification algorithms achieve their best performance when app descriptions are vectorized using GloVe, a count-based model of word embeddings. Our findings are further validated using a dataset of Sharing Economy apps and the results are evaluated by 12 human subjects. The results show that GloVe combined with Support Vector Machines can produce app classifications that are aligned to a large extent with human-generated classifications.

2008 ◽  
Vol 2 (4) ◽  
Author(s):  
Claudio Castellini

In critical human/robotic interactions such as, e.g., teleoperation by a disabled master or with insufficient bandwidth, it is highly desirable to have semi-autonomous robotic artifacts interact with a human being. Semi-autonomous grasping, for instance, consists of having a smart slave able to guess the master’s intentions and initiating a grasping sequence whenever the master wants to grasp an object in the slave’s workspace. In this paper we investigate the possibility of building such an intelligent robotic artifact by training a machine learning system on data gathered from several human subjects while trying to grasp objects in a teleoperation setup. In particular, we investigate the usefulness of gaze tracking in such a scenario. The resulting system must be light enough to be usable on-line and flexible enough to adapt to different masters, e.g., elderly and/or slow. The outcome of the experiment is that such a system, based upon Support Vector Machines, meets all the requirements, being (a) highly accurate, (b) compact and fast, and (c) largely unaffected by the subjects’ diversity. It is also clearly shown that gaze tracking significantly improves both the accuracy and compactness of the obtained models, if compared with the use of the hand position alone. The system can be trained with something like 3.5 minutes of human data in the worst case.task is neutral.


2016 ◽  
Author(s):  
Ludymila L. A. Gomes ◽  
Awdren L. Fontão ◽  
Allan J. S. Bezerra ◽  
Arilo C. Dias-Neto

The growing of mobile platforms in the last years has changed the software development scenario and challenged developers around the world in building successful mobile applications (apps). Users are the core of a mobile software ecosystem (MSECO). Thus, the quality of an app would be related to the user satisfaction, which could be measured by its popularity in App Store. In this paper, we describe the results of a mapping study that identified and analyzed how metrics on apps’ popularity have been addressed in the technical literature. 18 metrics were identified as related to apps’ popularity (users rating and downloads the most cited). After that, we conducted a survey with 47 developers acting within the main MSECOs (Android, iOS and Windows) in order to evaluate these 18 metrics regarding their usefulness to characterize app's popularity. As results, we observed developers understand the importance of metrics to indicate popularity of apps in a different way when compared to the current research.


2020 ◽  
Author(s):  
Andrea Ferrario ◽  
Burcu Demiray ◽  
Kristina Yordanova ◽  
Minxia Luo ◽  
Mike Martin

BACKGROUND Reminiscence is the act of thinking or talking about personal experiences that occurred in the past. It is a central task of old age that is essential for healthy aging, and it serves multiple functions, such as decision-making and introspection, transmitting life lessons, and bonding with others. The study of social reminiscence behavior in everyday life can be used to generate data and detect reminiscence from general conversations. OBJECTIVE The aims of this original paper are to (1) preprocess coded transcripts of conversations in German of older adults with natural language processing (NLP), and (2) implement and evaluate learning strategies using different NLP features and machine learning algorithms to detect reminiscence in a corpus of transcripts. METHODS The methods in this study comprise (1) collecting and coding of transcripts of older adults’ conversations in German, (2) preprocessing transcripts to generate NLP features (bag-of-words models, part-of-speech tags, pretrained German word embeddings), and (3) training machine learning models to detect reminiscence using random forests, support vector machines, and adaptive and extreme gradient boosting algorithms. The data set comprises 2214 transcripts, including 109 transcripts with reminiscence. Due to class imbalance in the data, we introduced three learning strategies: (1) class-weighted learning, (2) a meta-classifier consisting of a voting ensemble, and (3) data augmentation with the Synthetic Minority Oversampling Technique (SMOTE) algorithm. For each learning strategy, we performed cross-validation on a random sample of the training data set of transcripts. We computed the area under the curve (AUC), the average precision (AP), precision, recall, as well as F1 score and specificity measures on the test data, for all combinations of NLP features, algorithms, and learning strategies. RESULTS Class-weighted support vector machines on bag-of-words features outperformed all other classifiers (AUC=0.91, AP=0.56, precision=0.5, recall=0.45, F1=0.48, specificity=0.98), followed by support vector machines on SMOTE-augmented data and word embeddings features (AUC=0.89, AP=0.54, precision=0.35, recall=0.59, F1=0.44, specificity=0.94). For the meta-classifier strategy, adaptive and extreme gradient boosting algorithms trained on word embeddings and bag-of-words outperformed all other classifiers and NLP features; however, the performance of the meta-classifier learning strategy was lower compared to other strategies, with highly imbalanced precision-recall trade-offs. CONCLUSIONS This study provides evidence of the applicability of NLP and machine learning pipelines for the automated detection of reminiscence in older adults’ everyday conversations in German. The methods and findings of this study could be relevant for designing unobtrusive computer systems for the real-time detection of social reminiscence in the everyday life of older adults and classifying their functions. With further improvements, these systems could be deployed in health interventions aimed at improving older adults’ well-being by promoting self-reflection and suggesting coping strategies to be used in the case of dysfunctional reminiscence cases, which can undermine physical and mental health.


Webology ◽  
2020 ◽  
Vol 17 (2) ◽  
pp. 945-956
Author(s):  
R. Harikrishnan ◽  
R. Jebakumar ◽  
S. Ganesh Kumar ◽  
Amu tha

Insurance industry facilitates the users to access the information easily in their jobs without the repetition of password and remember the multiple passwords. Current technology attracts the insurers in authentication process. The identity authentification processes requires the customers to jump through the many hoops, which construct an unpleasant customer experience. The proposed method reduces the challenges in insurance business data using the classification algorithms using the support vector machine (SVM)for the mobile Applications since the growing trend in mobile apps will make it easy for the users. A seasonal variations and correlation in this financial time series data using statistical methods and ultimately generate trading signals for the insurance data. The feature extraction process increases the user security. The classification process improves different level of user identity. The support vector machine increases the data validation process quickly. Finally the proposed work enhances the user authentication process. The frame work is implemented using the matlabR2014 software and results were simulated for mobile apps.


Linguamática ◽  
2019 ◽  
Vol 11 (1) ◽  
pp. 41-53
Author(s):  
Alessandra Harumi Iriguti ◽  
Valéria Delisandra Feltrim

A classificação de estrutura retórica é uma tarefa de PLN na qual se busca identificar os componentes retóricos de um discurso e seus relacionamentos. No caso deste trabalho, buscou-se identificar automaticamente categorias em nível de sentenças que compõem a estrutura retórica de resumos científicos. Especificamente, o objetivo foi avaliar o impacto de diferentes conjuntos de atributos na implementação de classificadores retóricos para resumos científicos escritos em português. Para isso, foram utilizados atributos superficiais (extraídos como valores TF-IDF e selecionados com o teste chi-quadrado), atributos morfossintáticos (implementados pelo classificador AZPort) e atributos extraídos a partir de modelos de word embeddings (Word2Vec, Wang2Vec e GloVe, todos previamente treinados). Tais conjuntos de atributos, bem como as suas combinações, foram usados para o treinamento de classificadores usando os seguintes algoritmos de aprendizado supervisionado: Support Vector Machines, Naive Bayes, K-Nearest Neighbors, Decision Trees e Conditional Random Fields (CRF). Os classificadores foram avaliados por meio de validação cruzada sobre três corpora compostos por resumos de teses e dissertações. O melhor resultado, 94% de F1, foi obtido pelo classificador CRF com as seguintes combinações de atributos: (i) Wang2Vec--Skip-gram de dimensões 100 com os atributos provenientes do AZPort; (ii) Wang2Vec--Skip-gram e GloVe de dimensão 300 com os atributos do AZPort; (iii) TF-IDF, AZPort e embeddings extraídos com os modelos Wang2Vec--Skip-gram de dimensões 100 e 300 e GloVe de dimensão 300. A partir dos resultados obtidos, conclui-se que os atributos provenientes do classificador AZPort foram fundamentais para o bom desempenho do classificador CRF, enquanto que a combinação com word embeddings se mostrou válida para a melhoria dos resultados.


10.2196/19133 ◽  
2020 ◽  
Vol 22 (9) ◽  
pp. e19133 ◽  
Author(s):  
Andrea Ferrario ◽  
Burcu Demiray ◽  
Kristina Yordanova ◽  
Minxia Luo ◽  
Mike Martin

Background Reminiscence is the act of thinking or talking about personal experiences that occurred in the past. It is a central task of old age that is essential for healthy aging, and it serves multiple functions, such as decision-making and introspection, transmitting life lessons, and bonding with others. The study of social reminiscence behavior in everyday life can be used to generate data and detect reminiscence from general conversations. Objective The aims of this original paper are to (1) preprocess coded transcripts of conversations in German of older adults with natural language processing (NLP), and (2) implement and evaluate learning strategies using different NLP features and machine learning algorithms to detect reminiscence in a corpus of transcripts. Methods The methods in this study comprise (1) collecting and coding of transcripts of older adults’ conversations in German, (2) preprocessing transcripts to generate NLP features (bag-of-words models, part-of-speech tags, pretrained German word embeddings), and (3) training machine learning models to detect reminiscence using random forests, support vector machines, and adaptive and extreme gradient boosting algorithms. The data set comprises 2214 transcripts, including 109 transcripts with reminiscence. Due to class imbalance in the data, we introduced three learning strategies: (1) class-weighted learning, (2) a meta-classifier consisting of a voting ensemble, and (3) data augmentation with the Synthetic Minority Oversampling Technique (SMOTE) algorithm. For each learning strategy, we performed cross-validation on a random sample of the training data set of transcripts. We computed the area under the curve (AUC), the average precision (AP), precision, recall, as well as F1 score and specificity measures on the test data, for all combinations of NLP features, algorithms, and learning strategies. Results Class-weighted support vector machines on bag-of-words features outperformed all other classifiers (AUC=0.91, AP=0.56, precision=0.5, recall=0.45, F1=0.48, specificity=0.98), followed by support vector machines on SMOTE-augmented data and word embeddings features (AUC=0.89, AP=0.54, precision=0.35, recall=0.59, F1=0.44, specificity=0.94). For the meta-classifier strategy, adaptive and extreme gradient boosting algorithms trained on word embeddings and bag-of-words outperformed all other classifiers and NLP features; however, the performance of the meta-classifier learning strategy was lower compared to other strategies, with highly imbalanced precision-recall trade-offs. Conclusions This study provides evidence of the applicability of NLP and machine learning pipelines for the automated detection of reminiscence in older adults’ everyday conversations in German. The methods and findings of this study could be relevant for designing unobtrusive computer systems for the real-time detection of social reminiscence in the everyday life of older adults and classifying their functions. With further improvements, these systems could be deployed in health interventions aimed at improving older adults’ well-being by promoting self-reflection and suggesting coping strategies to be used in the case of dysfunctional reminiscence cases, which can undermine physical and mental health.


Author(s):  
Nikola Ljubešić ◽  
Nataša Logar ◽  
Iztok Kosem

Collocations play a very important role in language description, especially in identifying meanings of words. Modern lexicography’s inevitable part of meaning deduction are lists of collocates ranked by some statistical measurement. In the paper, we present a comparison between two approaches to the ranking of collocates: (a) the logDice method, which is dominantly used and frequency-based, and (b) the fastText word embeddings method, which is new and semantic-based. The comparison was made on two Slovene datasets, one representing general language headwords and their collocates, and the other representing headwords and their collocates extracted from a language for special purposes corpus. In the experiment, two methods were used: for the quantitative part of the evaluation, we used supervised machine learning with the area-under-the-curve (AUC) ROC score and support-vector machines (SVMs) algorithm, and in the qualitative part the ranking results of the two methods were evaluated by lexicographers. The results were somewhat inconsistent; while the quantitative evaluation confirmed that the machine-learning-based approach produced better collocate ranking results than the frequency-based one, lexicographers in most cases considered the listings of collocates of both methods very similar.


2018 ◽  
Vol 18 ◽  
pp. 03002
Author(s):  
Anton Kukanov ◽  
Elena Andrianova

Nowadays the mobile apps market is experiencing unprecedented growth. The quantity of mobile applications, which is proposed for installation, has exceeded 6 million. It causes, that it’s difficult for common consumers to choose a safety and high-quality product from this amount. The proposed independent rating called up for helping ordinary consumers. It is based on the special standard of mobile apps quality requirements and group of test procedures, that allow to evaluate the quality of mobile software.


2020 ◽  
Author(s):  
Lewis Mervin ◽  
Avid M. Afzal ◽  
Ola Engkvist ◽  
Andreas Bender

In the context of bioactivity prediction, the question of how to calibrate a score produced by a machine learning method into reliable probability of binding to a protein target is not yet satisfactorily addressed. In this study, we compared the performance of three such methods, namely Platt Scaling, Isotonic Regression and Venn-ABERS in calibrating prediction scores for ligand-target prediction comprising the Naïve Bayes, Support Vector Machines and Random Forest algorithms with bioactivity data available at AstraZeneca (40 million data points (compound-target pairs) across 2112 targets). Performance was assessed using Stratified Shuffle Split (SSS) and Leave 20% of Scaffolds Out (L20SO) validation.


Sign in / Sign up

Export Citation Format

Share Document