scholarly journals Music Emotion Classification based on Lyrics-Audio using Corpus based Emotion

Author(s):  
Fika Hastarita Rachman ◽  
Riyanarto Sarno ◽  
Chastine Fatichah

Music has lyrics and audio. That’s components can be a feature for music emotion classification. Lyric features were extracted from text data and audio features were extracted from audio signal data.In the classification of emotions, emotion corpus is required for lyrical feature extraction. Corpus Based Emotion (CBE) succeed to increase the value of F-Measure for emotion classification on text documents. The music document has an unstructured format compared with the article text document. So it requires good preprocessing and conversion process before classification process. We used MIREX Dataset for this research. Psycholinguistic and stylistic features were used as lyrics features. Psycholinguistic feature was a feature that related to the category of emotion. In this research, CBE used to support the extraction process of psycholinguistic feature. Stylistic features related with usage of unique words in the lyrics, e.g. ‘ooh’, ‘ah’, ‘yeah’, etc. Energy, temporal and spectrum features were extracted for audio features.The best test result for music emotion classification was the application of Random Forest methods for lyrics and audio features. The value of F-measure was 56.8%.

2020 ◽  
Vol 25 (6) ◽  
pp. 755-769
Author(s):  
Noorullah R. Mohammed ◽  
Moulana Mohammed

Text data clustering is performed for organizing the set of text documents into the desired number of coherent and meaningful sub-clusters. Modeling the text documents in terms of topics derivations is a vital task in text data clustering. Each tweet is considered as a text document, and various topic models perform modeling of tweets. In existing topic models, the clustering tendency of tweets is assessed initially based on Euclidean dissimilarity features. Cosine metric is more suitable for more informative assessment, especially of text clustering. Thus, this paper develops a novel cosine based external and interval validity assessment of cluster tendency for improving the computational efficiency of tweets data clustering. In the experimental, tweets data clustering results are evaluated using cluster validity indices measures. Experimentally proved that cosine based internal and external validity metrics outperforms the other using benchmarked and Twitter-based datasets.


Author(s):  
Junzo Watada ◽  
◽  
Keisuke Aoki ◽  
Masahiro Kawano ◽  
Muhammad Suzuri Hitam ◽  
...  

The availability of multimedia text document information has disseminated text mining among researchers. Text documents, integrate numerical and linguistic data, making text mining interesting and challenging. We propose text mining based on a fuzzy quantification model and fuzzy thesaurus. In text mining, we focus on: 1) Sentences included in Japanese text that are broken down into words. 2) Fuzzy thesaurus for finding words matching keywords in text. 3) Fuzzy multivariate analysis to analyze semantic meaning in predefined case studies. We use a fuzzy thesaurus to translate words using Chinese and Japanese characters into keywords. This speeds up processing without requiring a dictionary to separate words. Fuzzy multivariate analysis is used to analyze such processed data and to extract latent mutual related structures in text data, i.e., to extract otherwise obscured knowledge. We apply dual scaling to mining library and Web page text information, and propose integrating the result in Kansei engineering for possible application in sales, marketing, and production.


Author(s):  
Adam Piotr Idczak

It is estimated that approximately 80% of all data gathered by companies are text documents. This article is devoted to one of the most common problems in text mining, i. e. text classification in sentiment analysis, which focuses on determining document’s sentiment. Lack of defined structure of the text makes this problem more challenging. This has led to development of various techniques used in determining document’s sentiment. In this paper the comparative analysis of two methods in sentiment classification: naive Bayes classifier and logistic regression was conducted. Analysed texts are written in Polish language and come from banks. Classification was conducted by means of bag-of-n-grams approach where text document is presented as set of terms and each term consists of n words. The results show that logistic regression performed better.


2021 ◽  
Vol 21 (3) ◽  
pp. 3-10
Author(s):  
Petr ŠALOUN ◽  
◽  
Barbora CIGÁNKOVÁ ◽  
David ANDREŠIČ ◽  
Lenka KRHUTOVÁ ◽  
...  

For a long time, both professionals and the lay public showed little interest in informal carers. Yet these people deals with multiple and common issues in their everyday lives. As the population is aging we can observe a change of this attitude. And thanks to the advances in computer science, we can offer them some effective assistance and support by providing necessary information and connecting them with both professional and lay public community. In this work we describe a project called “Research and development of support networks and information systems for informal carers for persons after stroke” producing an information system visible to public as a web portal. It does not provide just simple a set of information but using means of artificial intelligence, text document classification and crowdsourcing further improving its accuracy, it also provides means of effective visualization and navigation over the content made by most by the community itself and personalized on a level of informal carer’s phase of the care-taking timeline. In can be beneficial for informal carers as it allows to find a content specific to their current situation. This work describes our approach to classification of text documents and its improvement through crowdsourcing. Its goal is to test text documents classifier based on documents similarity measured by N-grams method and to design evaluation and crowdsourcing-based classification improvement mechanism. Interface for crowdsourcing was created using CMS WordPress. In addition to data collection, the purpose of interface is to evaluate classification accuracy, which leads to extension of classifier test data set, thus the classification is more successful.


Author(s):  
Nilupulee Nathawitharana ◽  
Damminda Alahakoon ◽  
Sumith Matharage

Humans are used to expressing themselves with written language and language provides a medium with which we can describe our experiences in detail incorporating individuality. Even though documents provide a rich source of information, it becomes very difficult to identify, extract, summarize and search when vast amounts of documents are collected especially over time. Document clustering is a technique that has been widely used to group documents based on similarity of content represented by the words used. Once key groups are identified further drill down into sub-groupings is facilitated by the use of hierarchical clustering. Clustering and hierarchical clustering are very useful when applied to numerical and categorical data and cluster accuracy and purity measures exist to evaluate the outcomes of a clustering exercise. Although the same measures have been applied to text clustering, text clusters are based on words or terms which can be repeated across documents associated with different topics. Therefore text data cannot be considered as a direct ‘coding’ of a particular experience or situation in contrast to numerical and categorical data and term overlap is a very common characteristic in text clustering. In this paper we propose a new technique and methodology for term overlap capture from text documents, highlighting the different situations such overlap could signify and discuss why such understanding is important for obtaining value from text clustering. Experiments were conducted using a widely used text document collection where the proposed methodology allowed exploring the term diversity for a given document collection and obtain clusters with minimum term overlap.


2021 ◽  
Vol 9 (4) ◽  
pp. 555
Author(s):  
Ni Made Sinta Wahyuni ◽  
Ngurah Agus Sanjaya ER

Named Entity Recognition (NER) is a process to identify words or phrases as a named entity, such as a person, location, time expression, or organization. In this research, we are interested in developing a NER which able to identify the time expression entity in Balinese text documents. The time expression entity becomes an important component in the text because it is usually followed by important facts and information. NER was built using a rules-based approach. The rules are built based on direct observation of documents and pay attention to the morphological and contextual structures. Based on the experiments conducted, the average results of the precision, recall, and f-measure values were 0.85, 0.87, and 0.85.


Author(s):  
Shaymaa H. Mohammed ◽  
Salam Al-augby

<p>With the rapid growth of information technology, the amount of unstructured text data in digital libraries is rapidly increased and has become a big challenge in analyzing, organizing and how to classify text automatically in E-research repository to get the benefit from them is the cornerstone. The manual categorization of text documents requires a lot of financial, human resources for management. In order to get so, topic modeling are used to classify documents. This paper addresses a comparison study on scientific unstructured text document classification (e-books) based on the full text where applying the most popular topic modeling approach (LDA, LSA) to cluster the words into a set of topics as important keywords for classification. Our dataset consists of (300) books contain about 23 million words based on full text. In the used topic models (LSA, LDA) each word in the corpus of vocabulary is connected with one or more topics with a probability, as estimated by the model. Many (LDA, LSA) models were built with different values of coherence and pick the one that produces the highest coherence value. The result of this paper showed that LDA has better results than LSA and the best results obtained from the LDA method was (<strong>0.592179</strong>) of coherence value when the number of topics was <strong>20 while</strong> the LSA coherence value was <strong>(0.5773026)</strong> when the number of topics was 10.</p>


Author(s):  
Laith Mohammad Abualigah ◽  
Essam Said Hanandeh ◽  
Ahamad Tajudin Khader ◽  
Mohammed Abdallh Otair ◽  
Shishir Kumar Shandilya

Background: Considering the increasing volume of text document information on Internet pages, dealing with such a tremendous amount of knowledge becomes totally complex due to its large size. Text clustering is a common optimization problem used to manage a large amount of text information into a subset of comparable and coherent clusters. Aims: This paper presents a novel local clustering technique, namely, β-hill climbing, to solve the problem of the text document clustering through modeling the β-hill climbing technique for partitioning the similar documents into the same cluster. Methods: The β parameter is the primary innovation in β-hill climbing technique. It has been introduced in order to perform a balance between local and global search. Local search methods are successfully applied to solve the problem of the text document clustering such as; k-medoid and kmean techniques. Results: Experiments were conducted on eight benchmark standard text datasets with different characteristics taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed β-hill climbing achieved better results in comparison with the original hill climbing technique in solving the text clustering problem. Conclusion: The performance of the text clustering is useful by adding the β operator to the hill climbing.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 27-27
Author(s):  
Ricardo V Ventura ◽  
Rafael Z Lopes ◽  
Lucas T Andrietta ◽  
Fernando Bussiman ◽  
Julio Balieiro ◽  
...  

Abstract The Brazilian gaited horse industry is growing steadily, even after a recession period that affected different economic sectors in the whole country. Recent numbers suggested an increase on the exports, which reveals the relevance of this horse market segment. Horses are classified according to the gait criteria, which divide the horses in two groups associated with the animal movements: lateral (Marcha Picada) or diagonal (Marcha_Batida). These two gait groups usually show remarkable differences related to speed and number of steps per fixed unit of time, among other factors. Audio retrieval refers to the process of information extraction obtained from audio signals. This new data analysis area, in comparison to traditional methods to evaluate and classify gait types (as, for example, human subjective evaluation and video monitoring), provides a potential method to collect phenotypes in a reduced cost manner. Audio files (n = 80) were obtained after extracting audio features from freely available YouTube videos. Videos were manually labeled according to the two gait groups (Marcha Picada or Marcha Batida) and thirty animals were used after a quality control filter step. This study aimed to investigate different metrics associated with audio signal processing, in order to first cluster animals according to the gait type and subsequently include additional traits that could be useful to improve accuracy during the identification of genetically superior animals. Twenty-eight metrics, based on frequency or physical audio aspects, were carried out individually or in groups of relative importance to perform Principal Component Analysis (PCA), as well as to describe the two gait types. The PCA results indicated that over 87% of the animals were correctly clustered. Challenges regarding environmental interferences and noises must be further investigated. These first findings suggest that audio information retrieval could potentially be implemented in animal breeding programs, aiming to improve horse gait.


Sign in / Sign up

Export Citation Format

Share Document