Development of a Model and Algorithm for Data Aggregation and Classification for a Personalized Nutrition Recommendation System

The article demonstrates the design and implementation of a data aggregation algorithm for a future recommendation system in the field of personalized nutrition. It was based on theoretical materials on machine learning methods in natural language processing, as well as tutorials on building classification models using the Keras library. A distinctive feature of the classifier implemented within the framework of this project is the fact that it simultaneously accepts images and text data as input to obtain more accurate and balanced predictions. The implementation of the designed data aggregation algorithm for the recommendation system in the field of personalized nutrition is considered in detail. A review was made of the tools and approaches chosen at various stages of aggregation. The metrics for evaluating the predictions of the implemented model for the classification of geographic labels, as well as the analysis of the average sentiment of user reviews are determined and the results are visualized. Predicted geo tags and revealed comment sentiments were added to the main data frame as additional features.

Download Full-text

A Survey of Arabic Text Classification Models

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v8i6.pp4352-4355 ◽

2018 ◽

Vol 8 (6) ◽

pp. 4352 ◽

Cited By ~ 1

Author(s):

Ahed M. F. Al-Sbou

Keyword(s):

Language Processing ◽

Text Classification ◽

Arabic Language ◽

Arabic Text ◽

Classification Models ◽

Natural Languages ◽

Text Organization ◽

Arabic Text Classification ◽

Arabic Language Processing

<p>There is a huge content of Arabic text available over online that requires an organization of these texts. As result, here are many applications of natural languages processing (NLP) that concerns with text organization. One of the is text classification (TC). TC helps to make dealing with unorganized text. However, it is easier to classify them into suitable class or labels. This paper is a survey of Arabic text classification. Also, it presents comparison among different methods in the classification of Arabic texts, where Arabic text is represented a complex text due to its vocabularies. Arabic language is one of the richest languages in the world, where it has many linguistic bases. The researche in Arabic language processing is very few compared to English. As a result, these problems represent challenges in the classification, and organization of specific Arabic text. Text classification (TC) helps to access the most documents, or information that has already classified into specific classes, or categories to one or more classes or categories. In addition, classification of documents facilitate search engine to decrease the amount of document to, and then to become easier to search and matching with queries.</p>

Download Full-text

Concept of TF-IDF, Common Bag of Word and Word Embedding for Effective Sentiment Classification

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f4582.049620 ◽

2020 ◽

Vol 9 (4) ◽

pp. 2198-2201

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Sentiment Classification ◽

Word Embedding ◽

Text Representation ◽

Human Beings ◽

Text Data

Sentiment Classification is one of the well-known and most popular domain of machine learning and natural language processing. An algorithm is developed to understand the opinion of an entity similar to human beings. This research fining article presents a similar to the mention above. Concept of natural language processing is considered for text representation. Later novel word embedding model is proposed for effective classification of the data. Tf-IDF and Common BoW representation models were considered for representation of text data. Importance of these models are discussed in the respective sections. The proposed is testing using IMDB datasets. 50% training and 50% testing with three random shuffling of the datasets are used for evaluation of the model.

Download Full-text

Efficient Weighted Semantic Score Based on the Huffman Coding Algorithm and Knowledge Bases for Word Sequences Embedding

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2020040107 ◽

2020 ◽

Vol 16 (2) ◽

pp. 126-142

Author(s):

Nada Ben-Lhachemi ◽

El Habib Nfaoui

Keyword(s):

Language Processing ◽

Recommendation System ◽

Semantic Relatedness ◽

Knowledge Bases ◽

Word Embedding ◽

Huffman Coding ◽

Text Representation ◽

Text Data ◽

New Feature ◽

Embedding Methods

Learning text representation is forming a core for numerous natural language processing applications. Word embedding is a type of text representation that allows words with similar meaning to have similar representation. Word embedding techniques categorize semantic similarities between linguistic items based on their distributional properties in large samples of text data. Although these techniques are very efficient, handling semantic and pragmatics ambiguity with high accuracy is still a challenging research task. In this article, we propose a new feature as a semantic score which handles ambiguities between words. We use external knowledge bases and the Huffman Coding algorithm to compute this score that depicts the semantic relatedness between all fragments composing a given text. We combine this feature with word embedding methods to improve text representation. We evaluate our method on a hashtag recommendation system in Twitter where text is noisy and short. The experimental results demonstrate that, compared with state-of-the-art algorithms, our method achieves good results.

Download Full-text

Amrita-CEN-Senti-DB:Twitter Dataset for Sentimental Analysis and Application of Classical Machine Learning and Deep Learning

10.36227/techrxiv.12058968 ◽

2020 ◽

Author(s):

vinayakumar R

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Language Processing ◽

Learning Algorithm ◽

Support Vector ◽

Text Data ◽

Inverse Document Frequency ◽

Logistics Regression ◽

Document Frequency

<p><b>Social media is a platform in which tons and tons of text are generated each and every day. The data is so large that cannot be easily understood, so this has paved a path to a new field in the information technology which is natural language processing. In this paper, the text data which is used for the classification is tweets that determines the state of the person according of the sentiments which is positive, negative and neutral. Emotions are the way of expression of the person’s feelings which has a high influence on the decision making tasks. Here we have proposed the text representation, Term Frequency Inverse Document Frequency (tfidf), Keras embedding along with the machine learning and deep learning algorithms for the purpose of the classification of the sentiments, out of which Logistics Regression machine learning based methods out performs well when the features is taken in the limited amount as the features increases Support Vector Machine (SVM) which is also one of the machine learning algorithm out performs well making a benchmark accuracy for this dataset as the 75.8%. For the research purpose the dataset has been made publically available.</b><b></b></p>

Download Full-text

Enhancing Personalized Ads Using Interest Category Classification of SNS Users Based on Deep Neural Networks

Sensors ◽

10.3390/s21010199 ◽

2020 ◽

Vol 21 (1) ◽

pp. 199

Author(s):

Taekeun Hong ◽

Jin-A Choi ◽

Kiho Lim ◽

Pankoo Kim

Keyword(s):

Neural Network ◽

Deep Neural Networks ◽

Recommendation System ◽

Critical Role ◽

Social Networking Site ◽

Classification Models ◽

User Interest ◽

Hybrid Neural Network ◽

Textual Data

The classification and recommendation system for identifying social networking site (SNS) users’ interests plays a critical role in various industries, particularly advertising. Personalized advertisements help brands stand out from the clutter of online advertisements while enhancing relevance to consumers to generate favorable responses. Although most user interest classification studies have focused on textual data, the combined analysis of images and texts on user-generated posts can more precisely predict a consumer’s interests. Therefore, this research classifies SNS users’ interests by utilizing both texts and images. Consumers’ interests were defined using the Curlie directory, and various convolutional neural network (CNN)-based models and recurrent neural network (RNN)-based models were tested for our user interest classification system. In our hybrid neural network (NN) model, CNN-based classification models were used to classify images from users’ SNS postings while RNN-based classification models were used to classify textual data. The results of our extensive experiments show that the classification of users’ interests performed best when using texts and images together, at 96.55%, versus texts only, 41.38%, or images only, 93.1%. Our proposed system provides insights into personalized SNS advertising research and informs marketers on making (1) interest-based recommendations, (2) ranked-order recommendations, and (3) real-time recommendations.

Download Full-text

Can Statistical Machine Learning Algorithms Help for Classification of Obstructive Sleep Apnea Severity to Optimal Utilization of Polysomno graphy Resources?

Methods of Information in Medicine ◽

10.3414/me16-01-0084 ◽

2017 ◽

Vol 56 (04) ◽

pp. 308-318 ◽

Cited By ~ 2

Author(s):

Asli Bostanci ◽

Murat Turhan ◽

Selen Bozkurt

Keyword(s):

Machine Learning ◽

False Positive Rate ◽

True Positive Rate ◽

Classification Models ◽

True Positive ◽

Learning Methods ◽

Machine Learning Methods ◽

Obstructive Sleep ◽

Positive Rate

SummaryObjectives: The goal of this study is to evaluate the results of machine learning methods for the classification of OSA severity of patients with suspected sleep disorder breathing as normal, mild, moderate and severe based on non-polysomnographic variables: 1) clinical data, 2) symptoms and 3) physical examination.Methods: In order to produce classification models for OSA severity, five different machine learning methods (Bayesian network, Decision Tree, Random Forest, Neural Networks and Logistic Regression) were trained while relevant variables and their relationships were derived empirically from observed data. Each model was trained and evaluated using 10-fold cross-validation and to evaluate classification performances of all methods, true positive rate (TPR), false positive rate (FPR), Positive Predictive Value (PPV), F measure and Area Under Receiver Operating Characteristics curve (ROC-AUC) were used.Results: Results of 10-fold cross validated tests with different variable settings promisingly indicated that the OSA severity of suspected OSA patients can be classified, using non-polysomnographic features, with 0.71 true positive rate as the highest and, 0.15 false positive rate as the lowest, respectively. Moreover, the test results of different variables settings revealed that the accuracy of the classification models was significantly improved when physical examination variables were added to the model.Conclusions: Study results showed that machine learning methods can be used to estimate the probabilities of no, mild, moderate, and severe obstructive sleep apnea and such approaches may improve accurate initial OSA screening and help referring only the suspected moderate or severe OSA patients to sleep laboratories for the expensive tests.

Download Full-text

Amrita-CEN-Senti-DB:Twitter Dataset for Sentimental Analysis and Application of Classical Machine Learning and Deep Learning

10.36227/techrxiv.12058968.v1 ◽

2020 ◽

Author(s):

vinayakumar R

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Language Processing ◽

Learning Algorithm ◽

Support Vector ◽

Text Data ◽

Inverse Document Frequency ◽

Logistics Regression ◽

Document Frequency

Download Full-text

Sentiment Classification of News Text Data Using Intelligent Model

Frontiers in Psychology ◽

10.3389/fpsyg.2021.758967 ◽

2021 ◽

Vol 12 ◽

Author(s):

Shitao Zhang

Keyword(s):

Language Processing ◽

Dictionary Learning ◽

Learning Algorithm ◽

Sentiment Classification ◽

Training Data ◽

Text Data ◽

Discriminative Performance ◽

Traffic Jam ◽

Intelligent Model

Text sentiment classification is a fundamental sub-area in natural language processing. The sentiment classification algorithm is highly domain-dependent. For example, the phrase “traffic jam” expresses negative sentiment in the sentence “I was stuck in a traffic jam on the elevated for 2 h.” But in the domain of transportation, the phrase “traffic jam” in the sentence “Bread and water are essential terms in traffic jams” is without any sentiment. The most common method is to use the domain-specific data samples to classify the text in this domain. However, text sentiment analysis based on machine learning relies on sufficient labeled training data. Aiming at the problem of sentiment classification of news text data with insufficient label news data and the domain adaptation of text sentiment classifiers, an intelligent model, i.e., transfer learning discriminative dictionary learning algorithm (TLDDL) is proposed for cross-domain text sentiment classification. Based on the framework of dictionary learning, the samples from the different domains are projected into a subspace, and a domain-invariant dictionary is built to connect two different domains. To improve the discriminative performance of the proposed algorithm, the discrimination information preserved term and principal component analysis (PCA) term are combined into the objective function. The experiments are performed on three public text datasets. The experimental results show that the proposed algorithm improves the sentiment classification performance of texts in the target domain.

Download Full-text

A survey of arabic text classification models

International Journal of Informatics and Communication Technology (IJ-ICT) ◽

10.11591/ijict.v8i1.pp25-28 ◽

2019 ◽

Vol 8 (1) ◽

pp. 25

Author(s):

Ahed M. F. Al Sbou

Keyword(s):

Language Processing ◽

Text Classification ◽

Arabic Language ◽

Arabic Text ◽

Classification Models ◽

Natural Languages ◽

Text Organization ◽

Arabic Text Classification ◽

Arabic Language Processing

<span>There is a huge content of Arabic text available over online that requires an organization of these texts. As result, here are many applications of natural languages processing (NLP) that concerns with text organization. One of the is text classification (TC). TC helps to make dealing with unorganized text. However, it is easier to classify them into suitable class or labels. This paper is a survey of Arabic text classification. Also, it presents comparison among different methods in the classification of Arabic texts, where Arabic text is represented a complex text due to its vocabularies. Arabic language is one of the richest languages in the world, where it has many linguistic bases. The research in Arabic language processing is very few compared to English. As a result, these problems represent challenges in the classification, and organization of specific Arabic text. Text classification (TC) helps to access the most documents, or information that has already classified into specific classes, or categories to one or more classes or categories. In addition, classification of documents facilitate search engine to decrease the amount of document to, and then to become easier to search and matching with queries.</span>

Download Full-text

Effectiveness of Supervised Classification Models for Hate Speech on Twitter

10.36227/techrxiv.13140281 ◽

2020 ◽

Author(s):

Kunal Srivastava ◽

Ryan Tabrizi ◽

Ayaan Rahim ◽

Lauryn Nakamitsu

Keyword(s):

Supervised Classification ◽

Hate Speech ◽

The Internet ◽

Classification Models ◽

Personal Attack ◽

Political Beliefs ◽

The Many ◽

Primary Medium ◽

Offensive Speech

<div> <div> <div> <p>Abstract </p> <p>The ceaseless connectivity imposed by the internet has made many vulnerable to offensive comments, be it their physical appearance, political beliefs, or religion. Some define hate speech as any kind of personal attack on one’s identity or beliefs. Of the many sites that grant the ability to spread such offensive speech, Twitter has arguably become the primary medium for individuals and groups to spread these hurtful comments. Such comments typically fail to be detected by Twitter’s anti-hate system and can linger online for hours before finally being taken down. Through sentiment analysis, this algorithm is able to distinguish hate speech effectively through the classification of sentiment. </p> </div> </div> </div>

Download Full-text