scholarly journals Youtube Video Ranking: A NLP based System

2019 ◽  
Vol 8 (4) ◽  
pp. 1370-1375

YouTube is an acclaimed video information source on the web among various social media sites, where users are sharing, commenting and liking/dis-liking the video along with the continuous uploading of videos in real-time. Generally, the quality, popularity and relevance of results obtained from searching a query are obtained based on a rating system. Now and then few irrelevant and substandard videos are ranked higher because of higher views and likes. To address this issue, we put forth a sentiment analysis approach on the user comments based on Natural Language Processing. The suggested analysis will be helpful in providing a desirable result to the search query. The effectuality of the system has been proved in this paper using a data driven approach in terms of accuracy.

Author(s):  
Irina Wedel ◽  
Michael Palk ◽  
Stefan Voß

AbstractSocial media enable companies to assess consumers’ opinions, complaints and needs. The systematic and data-driven analysis of social media to generate business value is summarized under the term Social Media Analytics which includes statistical, network-based and language-based approaches. We focus on textual data and investigate which conversation topics arise during the time of a new product introduction on Twitter and how the overall sentiment is during and after the event. The analysis via Natural Language Processing tools is conducted in two languages and four different countries, such that cultural differences in the tonality and customer needs can be identified for the product. Different methods of sentiment analysis and topic modeling are compared to identify the usability in social media and in the respective languages English and German. Furthermore, we illustrate the importance of preprocessing steps when applying these methods and identify relevant product insights.


2018 ◽  
Vol 17 (03) ◽  
pp. 883-910 ◽  
Author(s):  
P. D. Mahendhiran ◽  
S. Kannimuthu

Contemporary research in Multimodal Sentiment Analysis (MSA) using deep learning is becoming popular in Natural Language Processing. Enormous amount of data are obtainable from social media such as Facebook, WhatsApp, YouTube, Twitter and microblogs every day. In order to deal with these large multimodal data, it is difficult to identify the relevant information from social media websites. Hence, there is a need to improve an intellectual MSA. Here, Deep Learning is used to improve the understanding and performance of MSA better. Deep Learning delivers automatic feature extraction and supports to achieve the best performance to enhance the combined model that integrates Linguistic, Acoustic and Video information extraction method. This paper focuses on the various techniques used for classifying the given portion of natural language text, audio and video according to the thoughts, feelings or opinions expressed in it, i.e., whether the general attitude is Neutral, Positive or Negative. From the results, it is perceived that Deep Learning classification algorithm gives better results compared to other machine learning classifiers such as KNN, Naive Bayes, Random Forest, Random Tree and Neural Net model. The proposed MSA in deep learning is to identify sentiment in web videos which conduct the poof-of-concept experiments that proved, in preliminary experiments using the ICT-YouTube dataset, our proposed multimodal system achieves an accuracy of 96.07%.


2018 ◽  
Vol 4 (4) ◽  
pp. 487-501 ◽  
Author(s):  
Kun Kuang ◽  
Meng Jiang ◽  
Peng Cui ◽  
Hengliang Luo ◽  
Shiqiang Yang

2020 ◽  
Vol 08 (01) ◽  
pp. 113-131
Author(s):  
Ridouane Tachicart ◽  
Karim Bouzoubaa

With the increase of Web use in Morocco today, Internet has become an important source of information. Specifically, across social media, the Moroccan people use several languages in their communication leaving behind unstructured user-generated text (UGT) that presents several opportunities for Natural Language Processing. Among the languages found in this data, Moroccan Arabic (MA) stands with an important content and several features. In this paper, we investigate online written text generated by Moroccan users in social media with an emphasis on Moroccan Arabic. For this purpose, we follow several steps, using some tools such as a language identification system, in order to conduct a deep study of this data. The most interesting findings that have emerged are the use of code-switching, multi-script and low amount of words in the Moroccan UGT. Moreover, we used the investigated data in order to build a new Moroccan language resource. The latter consists in building a Moroccan words orthographic variants lexicon following an unsupervised approach and using character neural embedding. This lexicon can be useful for several NLP tasks such as spelling normalization.


2015 ◽  
Vol 7 (1) ◽  
Author(s):  
Paula Carvalho ◽  
Mário J. Silva

This paper describes the main characteristics of SentiLex-PT, a sentiment lexicon designed for the extraction of sentiment and opinion about human entities in Portuguese texts. The potential of this resource is illustrated on its application to two types of corpora, the SentiCorpus-PT, a social media corpus, consisting of user comments to news articles, and a literary piece of the early twentieth century, The Poor (Os Pobres), by Raul Brandão. The data were processed by UNITEX, a natural language processing system based on dictionaries and grammars.


2019 ◽  
Vol 141 (12) ◽  
Author(s):  
Dedy Suryadi ◽  
Harrison M. Kim

Abstract This paper proposes a data-driven methodology to automatically identify product usage contexts from online customer reviews. Product usage context is one of the factors that affect product design, consumer behavior, and consumer satisfaction. The previous works identify the usage contexts using the survey-based method or subjectively determine them. The proposed methodology, on the other hand, uses machine learning and Natural Language Processing tools to identify and cluster usage contexts from a large volume of customer reviews. Furthermore, aspect sentiment analysis is applied to capture the sentiment toward a particular usage context in a sentence. The methodology is implemented to two data sets of products, i.e., laptop and tablet. The result shows that the methodology is able to capture relevant product usage contexts and cluster bigrams that refer to similar usage context. The aspect sentiment analysis enables the observation of a product’s position with respect to its competitors for a particular usage context. For a product designer, the observation may indicate a requirement to improve the product. It may also indicate a possible market opportunity in a usage context in which most of the current products are perceived negatively by customers. Finally, it is shown that overall rating might not be a strong indicator for representing customer sentiment toward a particular usage context, due to the moderate linear correlation for most of the usage contexts in the case study.


Author(s):  
Emad Badawi ◽  
Guy-Vincent Jourdan ◽  
Gregor Bochmann ◽  
Iosif-Viorel Onut

The “Game Hack” Scam (GHS) is a mostly unreported cyberattack in which attackers attempt to convince victims that they will be provided with free, unlimited “resources” or other advantages for their favorite game. The endgame of the scammers ranges from monetizing for themselves the victims time and resources by having them click through endless “surveys”, filing out “market research” forms, etc., to collecting personal information, getting the victims to subscribe to questionable services, up to installing questionable executable files on their machines. Other scams such as the “Technical Support Scam”, the “Survey Scam”, and the “Romance Scam” have been analyzed before but to the best of our knowledge, GHS has not been well studied so far and is indeed mostly unknown. In this paper, our aim is to investigate and gain more knowledge on this type of scam by following a data-driven approach; we formulate GHS-related search queries, and used multiple search engines to collect data about the websites to which GHS victims are directed when they search online for various game hacks and tricks. We analyze the collected data to provide new insight into GHS and research the extent of this scam. We show that despite its low profile, the click traffic generated by the scam is in the hundreds of millions. We also show that GHS attackers use social media, streaming sites, blogs, and even unrelated sites such as change.org or jeuxvideo.com to carry out their attacks and reach a large number of victims. Our data collection spans a year; in that time, we uncovered 65,905 different GHS URLs, mapped onto over 5,900 unique domains.We were able to link attacks to attackers and found that they routinely target a vast array of games. Furthermore, we find that GHS instances are on the rise, and so is the number of victims. Our low-end estimation is that these attacks have been clicked at least 150 million times in the last five years. Finally, in keeping with similar large-scale scam studies, we find that the current public blacklists are inadequate and suggest that our method is more effective at detecting these attacks.


JAMIA Open ◽  
2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Fuchiang R Tsui ◽  
Lingyun Shi ◽  
Victor Ruiz ◽  
Neal D Ryan ◽  
Candice Biernesser ◽  
...  

Abstract Objective Limited research exists in predicting first-time suicide attempts that account for two-thirds of suicide decedents. We aimed to predict first-time suicide attempts using a large data-driven approach that applies natural language processing (NLP) and machine learning (ML) to unstructured (narrative) clinical notes and structured electronic health record (EHR) data. Methods This case-control study included patients aged 10–75 years who were seen between 2007 and 2016 from emergency departments and inpatient units. Cases were first-time suicide attempts from coded diagnosis; controls were randomly selected without suicide attempts regardless of demographics, following a ratio of nine controls per case. Four data-driven ML models were evaluated using 2-year historical EHR data prior to suicide attempt or control index visits, with prediction windows from 7 to 730 days. Patients without any historical notes were excluded. Model evaluation on accuracy and robustness was performed on a blind dataset (30% cohort). Results The study cohort included 45 238 patients (5099 cases, 40 139 controls) comprising 54 651 variables from 5.7 million structured records and 798 665 notes. Using both unstructured and structured data resulted in significantly greater accuracy compared to structured data alone (area-under-the-curve [AUC]: 0.932 vs. 0.901 P < .001). The best-predicting model utilized 1726 variables with AUC = 0.932 (95% CI, 0.922–0.941). The model was robust across multiple prediction windows and subgroups by demographics, points of historical most recent clinical contact, and depression diagnosis history. Conclusions Our large data-driven approach using both structured and unstructured EHR data demonstrated accurate and robust first-time suicide attempt prediction, and has the potential to be deployed across various populations and clinical settings.


Sign in / Sign up

Export Citation Format

Share Document