scholarly journals Identifying Narrative Contexts in Brazilian Popular Music Lyrics Using Sparse Topic Models: A Comparison Between Human-Based and Machine-Based Classification

2019 ◽  
Author(s):  
André Dalmora ◽  
Tiago Tavares

Music lyrics can convey a great part of the meaning in popular songs. Such meaning is important for humans to understand songs as related to typical narratives, such as romantic interests or life stories. This understanding is part of affective aspects that can be used to choose songs to play in particular situations. This paper analyzes the effectiveness of using text mining tools to classify lyrics according to their narrative contexts. For such, we used a vote-based dataset and several machine learning algorithms. Also, we compared the classification results to that of a typical human. Last, we compare the problems of identifying narrative contexts and of identifying lyric valence. Our results indicate that narrative contexts can be identified more consistently than valence. Also, we show that human-based classification typically do not reach a high accuracy, which suggests an upper bound for automatic classification. narrative contexts. For such, we built a dataset containing Brazilian popular music lyrics which were raters voted online according to its context and valence. We approached the problem using a machine learning pipeline in which lyrics are projected into a vector space and then classified using general-purpose algorithms. We experimented with document representations based on sparse topic models [11, 12, 13, 14], which aims to find groups of words that typically appear together in the dataset. Also, we extracted part-of-speech tags for each lyric and used their histogram as features in the classification process.

Information ◽  
2019 ◽  
Vol 10 (3) ◽  
pp. 98 ◽  
Author(s):  
Tariq Ahmad ◽  
Allan Ramsay ◽  
Hanady Ahmed

Assigning sentiment labels to documents is, at first sight, a standard multi-label classification task. Many approaches have been used for this task, but the current state-of-the-art solutions use deep neural networks (DNNs). As such, it seems likely that standard machine learning algorithms, such as these, will provide an effective approach. We describe an alternative approach, involving the use of probabilities to construct a weighted lexicon of sentiment terms, then modifying the lexicon and calculating optimal thresholds for each class. We show that this approach outperforms the use of DNNs and other standard algorithms. We believe that DNNs are not a universal panacea and that paying attention to the nature of the data that you are trying to learn from can be more important than trying out ever more powerful general purpose machine learning algorithms.


IoT ◽  
2020 ◽  
Vol 1 (2) ◽  
pp. 218-239 ◽  
Author(s):  
Ravikumar Patel ◽  
Kalpdrum Passi

In the derived approach, an analysis is performed on Twitter data for World Cup soccer 2014 held in Brazil to detect the sentiment of the people throughout the world using machine learning techniques. By filtering and analyzing the data using natural language processing techniques, sentiment polarity was calculated based on the emotion words detected in the user tweets. The dataset is normalized to be used by machine learning algorithms and prepared using natural language processing techniques like word tokenization, stemming and lemmatization, part-of-speech (POS) tagger, name entity recognition (NER), and parser to extract emotions for the textual data from each tweet. This approach is implemented using Python programming language and Natural Language Toolkit (NLTK). A derived algorithm extracts emotional words using WordNet with its POS (part-of-speech) for the word in a sentence that has a meaning in the current context, and is assigned sentiment polarity using the SentiWordNet dictionary or using a lexicon-based method. The resultant polarity assigned is further analyzed using naïve Bayes, support vector machine (SVM), K-nearest neighbor (KNN), and random forest machine learning algorithms and visualized on the Weka platform. Naïve Bayes gives the best accuracy of 88.17% whereas random forest gives the best area under the receiver operating characteristics curve (AUC) of 0.97.


2019 ◽  
pp. 030573561987160 ◽  
Author(s):  
Manuel Anglada-Tort ◽  
Amanda E Krause ◽  
Adrian C North

The present study investigated how the gender distribution of the United Kingdom’s most popular artists has changed over time and the extent to which these changes might relate to popular music lyrics. Using data mining and machine learning techniques, we analyzed all songs that reached the UK weekly top 5 sales charts from 1960 to 2015 (4,222 songs). DICTION software facilitated a computerized analysis of the lyrics, measuring a total of 36 lyrical variables per song. Results showed a significant inequality in gender representation on the charts. However, the presence of female musicians increased significantly over the time span. The most critical inflection points leading to changes in the prevalence of female musicians were in 1968, 1976, and 1984. Linear mixed-effect models showed that the total number of words and the use of self-reference in popular music lyrics changed significantly as a function of musicians’ gender distribution over time, and particularly around the three critical inflection points identified. Irrespective of gender, there was a significant trend toward increasing repetition in the lyrics over time. Results are discussed in terms of the potential advantages of using machine learning techniques to study naturalistic singles sales charts data.


Author(s):  
Parag Jain

Most popular machine learning algorithms like k-nearest neighbour, k-means, SVM uses a metric to identify the distance(or similarity) between data instances. It is clear that performances of these algorithm heavily depends on the metric being used. In absence of prior knowledge about data we can only use general purpose metrics like Euclidean distance, Cosine similarity or Manhattan distance etc, but these metric often fail to capture the correct behaviour of data which directly affects the performance of the learning algorithm. Solution to this problem is to tune the metric according to the data and the problem, manually deriving the metric for high dimensional data which is often difficult to even visualize is not only tedious but is extremely difficult. Which leads to put effort on \textit{metric learning} which satisfies the data geometry.Goal of metric learning algorithm is to learn a metric which assigns small distance to similar points and relatively large distance to dissimilar points.


Author(s):  
Zahra Bokaee Nezhad ◽  
Mohammad Ali Deihimi

Sarcasm is a form of communication where the individual states the opposite of what is implied. Therefore, detecting a sarcastic tone is somewhat complicated due to its ambiguous nature. On the other hand, identification of sarcasm is vital to various natural language processing tasks such as sentiment analysis and text summarisation. However, research on sarcasm detection in Persian is very limited. This paper investigated the sarcasm detection technique on Persian tweets by combining deep learning-based and machine learning-based approaches. Four sets of features that cover different types of sarcasm were proposed, namely deep polarity, sentiment, part of speech, and punctuation features. These features were utilised to classify the tweets as sarcastic and nonsarcastic. In this study, the deep polarity feature was proposed by conducting a sentiment analysis using deep neural network architecture. In addition, to extract the sentiment feature, a Persian sentiment dictionary was developed, which consisted of four sentiment categories. The study also used a new Persian proverb dictionary in the preparation step to enhance the accuracy of the proposed model. The performance of the model is analysed using several standard machine learning algorithms. The results of the experiment showed that the method outperformed the baseline method and reached an accuracy of 80.82%. The study also examined the importance of each proposed feature set and evaluated its added value to the classification.


Author(s):  
Hong Cui

Despite the sub-language nature of taxonomic descriptions of animals and plants, researchers have warned about the existence of large variations among different description collections in terms of information content and its representation. These variations impose a serious threat to the development of automatic tools to structure large volumes of text-based descriptions. This paper presents a general approach to mark up different collections of taxonomic descriptions with XML, using two large-scale floras as examples. The markup system, MARTT, is based on machine learning methods and enhanced by machine learned domain rules and conventions. Experiments show that our simple and efficient machine learning algorithms outperform significantly general purpose algorithms and that rules learned from one flora can be used when marking up a second flora and help to improve the markup performance, especially for elements that have sparse training examples.Malgré la nature de sous-langage des descriptions taxinomiques des animaux et des plantes, les chercheurs reconnaissent l’existence de vastes variations parmi différentes collections de descriptions, en termes de contenu informationnel et de leur représentation. Ces variations présentent une menace sérieuse pour le développement d’outils automatiques pour la structuration de larges… 


2021 ◽  
Author(s):  
Luca Carbone ◽  
Jonathan Jan Benjamin Mijs

Economic inequality is on the rise in Western societies and ‘meritocracy’ remains a widespread narrative used to justify it. An emerging literature has documented the impact of meritocratic narratives in media, mostly focusing on newspapers. In this paper, we study music as a potential source of cultural frames about economic inequality. We construct an original dataset combining user data from Spotify with lyrics from Genius to inductively explore whether popular music features themes of economic inequality. In order to do so, we employ unsupervised computational text analysis to classify the content of the 3,660 most popular songs across 23 European countries. Informed by Lizardo’s enculturation framework, we study popular music lyrics through the lens of public culture and explore its links with individual beliefs about inequality as a reflection of private culture. We find that, in more unequal societies, songs that frame inequalities as a structural issue (songs about “Struggle” or omnipresent “Risks”) are more popular than those adopting a meritocratic frame (songs we describe as “Bragging Rights” or those telling a “Rags to Riches” tale). Moreover, we find that the presence in public culture of a certain frame is associated with the expression of frame-consistent individual beliefs about inequality (private culture). We conclude by offering reflections on the promise of automatic text classification for the study of music lyrics, the theorized role of popular music in the study of culture, and by proposing venues for future research.


Sensors ◽  
2020 ◽  
Vol 20 (24) ◽  
pp. 7294
Author(s):  
Adrián Campazas-Vega ◽  
Ignacio Samuel Crespo-Martínez ◽  
Ángel Manuel Guerrero-Higueras ◽  
Camino Fernández-Llamas

Advanced persistent threats (APTs) are a growing concern in cybersecurity. Many companies and governments have reported incidents related to these threats. Throughout the life cycle of an APT, one of the most commonly used techniques for gaining access is network attacks. Tools based on machine learning are effective in detecting these attacks. However, researchers usually have problems with finding suitable datasets for fitting their models. The problem is even harder when flow data are required. In this paper, we describe a framework to gather flow datasets using a NetFlow sensor. We also present the Docker-based framework for gathering netflow data (DOROTHEA), a Docker-based solution implementing the above framework. This tool aims to easily generate taggable network traffic to build suitable datasets for fitting classification models. In order to demonstrate that datasets gathered with DOROTHEA can be used for fitting classification models for malicious-traffic detection, several models were built using the model evaluator (MoEv), a general-purpose tool for training machine-learning algorithms. After carrying out the experiments, four models obtained detection rates higher than 93%, thus demonstrating the validity of the datasets gathered with the tool.


2020 ◽  
Vol 62 (5) ◽  
pp. 578-598 ◽  
Author(s):  
Samer Muthana Sarsam ◽  
Hosam Al-Samarraie ◽  
Ahmed Ibrahim Alzahrani ◽  
Bianca Wright

Recognizing both literal and figurative meanings is crucial to understanding users’ opinions on various topics or events in social media. Detecting the sarcastic posts on social media has received much attention recently, particularly because sarcastic comments in the form of tweets often include positive words that represent negative or undesirable characteristics. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement was used to understand the application of different machine learning algorithms for sarcasm detection in Twitter. Extensive database searching led to the inclusion of 31 studies classified into two groups: Adapted Machine Learning Algorithms (AMLA) and Customized Machine Learning Algorithms (CMLA). The review results revealed that Support Vector Machine (SVM) was the best and the most commonly used AMLA for sarcasm detection in Twitter. In addition, combining Convolutional Neural Network (CNN) and SVM was found to offer a high prediction accuracy. Moreover, our result showed that using lexical, pragmatic, frequency, and part-of-speech tagging can contribute to the performance of SVM, whereas both lexical and personal features can enhance the performance of CNN-SVM. This work also addressed the main challenges faced by prior scholars when predicting sarcastic tweets. Such knowledge can be useful for future researchers or machine learning developers to consider the major issues of classifying sarcastic posts in social media.


Author(s):  
Aryan Karn

Computer vision is an area of research concerned with assisting computers in seeing. Computer vision issues aim to infer something about the world from observed picture data at the most abstract level. It is a multidisciplinary subject that may be loosely classified as a branch of artificial intelligence and machine learning, both of which may include using specific techniques and using general-purpose learning methods. As an interdisciplinary field of research, it may seem disorganized, with methods taken and reused from various engineering and computer science disciplines. While one specific vision issue may be readily solved with a hand-crafted statistical technique, another may need a vast and sophisticated ensemble of generic machine learning algorithms. Computer vision as a discipline is at the cutting edge of science. As with any frontier, it is thrilling and chaotic, with often no trustworthy authority to turn to. Numerous beneficial concepts lack a theoretical foundation, and some theories are rendered ineffective in reality; developed regions are widely dispersed, and often one seems totally unreachable from the other.


Sign in / Sign up

Export Citation Format

Share Document