How Populist are Parties? Measuring Degrees of Populism in Party Manifestos Using Supervised Machine Learning

Abstract One of the main challenges in comparative studies on populism concerns its temporal and spatial measurements within and between a large number of parties and countries. Textual analysis has proved useful for these purposes, and automated methods can further improve research in this direction. Here, we propose a method to derive a score of parties’ levels of populism using supervised machine learning to perform textual analysis on national manifestos. We illustrate the advantages of our approach, which allows for measuring populism for a vast number of parties and countries without resource-intensive human-coding processes and provides accurate, updated information for temporal and spatial comparisons of populism. Furthermore, our method allows for obtaining a continuous score of populism, which ensures more fine-grained analyses of the party landscape while reducing the risk of arbitrary classifications. To illustrate the potential contribution of this score, we use it as a proxy for parties’ levels of populism, analyzing average trends in six European countries from the early 2000s for nearly two decades.

Download Full-text

Summaries

Tijdschrift voor Communicatiewetenschappen ◽

10.5117/2018.046.001.007 ◽

2018 ◽

Vol 46 (1) ◽

Keyword(s):

Machine Learning ◽

Content Analysis ◽

Text Analysis ◽

English Language ◽

Automated Analysis ◽

Supervised Machine Learning ◽

Automated Text Analysis ◽

Research Problems ◽

Dutch Language ◽

Automated Methods

Damian Trilling & Jelle Boumans Automated analysis of Dutch language-based texts. An overview and research agenda While automated methods of content analysis are increasingly popular in today’s communication research, these methods have hardly been adopted by communication scholars studying texts in Dutch. This essay offers an overview of the possibilities and current limitations of automated text analysis approaches in the context of the Dutch language. Particularly in dictionary-based approaches, research is far less prolific as research on the English language. We divide the most common types of content-analytical research questions into three categories: 1) research problems for which automated methods ought to be used, 2) research problems for which automated methods could be used, and 3) research problems for which automated methods (currently) cannot be used. Finally, we give suggestions for the advancement of automated text analysis approaches for Dutch texts. Keywords: automated content analysis, Dutch, dictionaries, supervised machine learning, unsupervised machine learning

Download Full-text

Entity Type Recognition for Heterogeneous Semantic Graphs

AI Magazine ◽

10.1609/aimag.v36i1.2569 ◽

2015 ◽

Vol 36 (1) ◽

pp. 75-86 ◽

Cited By ~ 4

Author(s):

Jennifer Sleeman ◽

Tim Finin ◽

Anupam Joshi

Keyword(s):

Machine Learning ◽

Background Knowledge ◽

Knowledge Bases ◽

Heterogeneous Data ◽

Unstructured Data ◽

Supervised Machine Learning ◽

Coreference Resolution ◽

Multiple Sources ◽

Fine Grained ◽

High Level

We describe an approach for identifying fine-grained entity types in heterogeneous data graphs that is effective for unstructured data or when the underlying ontologies or semantic schemas are unknown. Identifying fine-grained entity types, rather than a few high-level types, supports coreference resolution in heterogeneous graphs by reducing the number of possible coreference relations that must be considered. Big data problems that involve integrating data from multiple sources can benefit from our approach when the datas ontologies are unknown, inaccessible or semantically trivial. For such cases, we use supervised machine learning to map entity attributes and relations to a known set of attributes and relations from appropriate background knowledge bases to predict instance entity types. We evaluated this approach in experiments on data from DBpedia, Freebase, and Arnetminer using DBpedia as the background knowledge base.

Download Full-text

Fine-grained classification of social science journal articles using textual data: A comparison of supervised machine learning approaches

Quantitative Science Studies ◽

10.1162/qss_a_00106 ◽

2020 ◽

pp. 1-26

Author(s):

Joshua Eykens ◽

Raf Guns ◽

Tim C.E. Engels

Keyword(s):

Social Sciences ◽

Machine Learning ◽

Social Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Gradient Boosting ◽

Fine Grained ◽

Textual Data

We compare two supervised machine learning algorithms—Multinomial Naïve Bayes and Gradient Boosting—to classify social science articles using textual data. The high level of granularity of the classification scheme used and the possibility that multiple categories are assigned to a document make this task challenging. To collect the training data, we query three discipline specific thesauri to retrieve articles corresponding to specialties in the classification. The resulting dataset consists of 113,909 records and covers 245 specialties, aggregated into 31 subdisciplines from three disciplines. Experts were consulted to validate the thesauri-based classification. The resulting multi-label dataset is used to train the machine learning algorithms in different configurations. We deploy a multi-label classifier chaining model, allowing for an arbitrary number of categories to be assigned to each document. The best results are obtained with Gradient Boosting. The approach does not rely on citation data. It can be applied in settings where such information is not available. We conclude that fine-grained text-based classification of social sciences publications at a subdisciplinary level is a hard task, for humans and machines alike. A combination of human expertise and machine learning is suggested as a way forward to improve the classification of social sciences documents.

Download Full-text

Textual analysis of traitor-based dataset through semi supervised machine learning

Future Generation Computer Systems ◽

10.1016/j.future.2021.06.036 ◽

2021 ◽

Author(s):

Faisal Janjua ◽

Asif Masood ◽

Haider Abbas ◽

Imran Rashid ◽

Zaki Murtaza

Keyword(s):

Machine Learning ◽

Textual Analysis ◽

Supervised Machine Learning

Download Full-text

Machine Learning for the Communication Optimization in Distributed Systems

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.1.19491 ◽

2018 ◽

Vol 7 (4.1) ◽

pp. 47

Author(s):

Zarina Kazhmaganbetova ◽

Shnar Imangaliyev ◽

Altynbek Sharipbay

Keyword(s):

Machine Learning ◽

Communication Patterns ◽

Cloud Services ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Communication Optimization ◽

Meta Data ◽

Vast Number ◽

Distributed Application ◽

Learning Techniques

The objective of the work that is presented in this paper was the problem of the communication optimization and detection of the issues of computing resources performance degradation [1, 2] with the usage of machine learning techniques. Computer networks transmit payload data and the meta-data from numerous sources towards vast number of destinations, especially in multi-tenant environments [3, 4]. Meta data describes the payload data and could be analyzed for anomalies detection in the communication patterns. Communication patterns depend on the payload itself and technical protocol used. The technical patterns are the research target as their analysis could spotlight the vulnerable behavior, for example: unusual traffic, extra load transported and etc.There was a big data used to train model with a supervised machine learning. Dataset was collected from the network interfaces of the distributed application infrastructure. Machine Learning tools had been retained from the cloud services provider – Amazon Web Services. The stochastic gradient descent technique was utilized for the model training, so that it could represent the communication patterns in the system. The learning target parameter was a packet length, the regression was performed to understand the relationship between packet meta-data (timestamp, protocol, the source server) and its length. The root mean square error calculation was applied to evaluate the learning efficiency. After model was prepared using training dataset, the model was tested with the test dataset and then applied on the target dataset (dataset for prediction) to check whether it was capable to detect anomalies.The experimental part showed the applicability of machine learning for the communication optimization in the distributed application environment. By means of the trained artificial intelligence model, it was possible to predict target parameters of traffic and computing resources usage with purpose to avoid service degradation. Additionally, one could reveal anomalies in the transferred traffic between application components. The application of techniques is envisioned in information security field and in the field of efficient network resources planning.Further research could be in application machine learning techniques for more complicated distributed environments and enlarging the number of protocols to prepare communication patterns.

Download Full-text

Temporal and Spatial Detection of the Onset of Local Necking and Assessment of its Growth Behavior

Materials ◽

10.3390/ma13112427 ◽

2020 ◽

Vol 13 (11) ◽

pp. 2427

Author(s):

Christian Jaremenko ◽

Emanuela Affronti ◽

Marion Merklein ◽

Andreas Maier

Keyword(s):

Machine Learning ◽

Growth Behavior ◽

Supervised Machine Learning ◽

Learning Approach ◽

Material Failure ◽

Local Necking ◽

Machine Learning Approach ◽

Strain Paths ◽

Temporal And Spatial

This study proposes a method for the temporal and spatial determination of the onset of local necking determined by means of a Nakajima test set-up for a DC04 deep drawing and a DP800 dual-phase steel, as well as an AA6014 aluminum alloy. Furthermore, the focus lies on the observation of the progress of the necking area and its transformation throughout the remainder of the forming process. The strain behavior is learned by a machine learning approach on the basis of the images when the process is close to material failure. These learned failure characteristics are transferred to new forming sequences, so that critical areas indicating material failure can be identified at an early stage, and consequently enable the determination of the beginning of necking and the analysis of the necking area. This improves understanding of the necking behavior and facilitates the determination of the evaluation area for strain paths. The growth behavior and traceability of the necking area is objectified by the proposed weakly supervised machine learning approach, thereby rendering a heuristic-based determination unnecessary. Furthermore, a simultaneous evaluation on image and pixel scale is provided that enables a distinct selection of the failure quantile of the probabilistic forming limit curve.

Download Full-text

SENTiVENT: enabling supervised information extraction of company-specific events in economic and financial news

Language Resources and Evaluation ◽

10.1007/s10579-021-09562-4 ◽

2021 ◽

Author(s):

Gilles Jacobs ◽

Véronique Hoste

Keyword(s):

Machine Learning ◽

Event Extraction ◽

Training Data ◽

Supervised Machine Learning ◽

Annotation Scheme ◽

Fine Grained ◽

Business News ◽

Financial News ◽

Gold Standard Dataset ◽

Benchmark Datasets

AbstractWe present SENTiVENT, a corpus of fine-grained company-specific events in English economic news articles. The domain of event processing is highly productive and various general domain, fine-grained event extraction corpora are freely available but economically-focused resources are lacking. This work fills a large need for a manually annotated dataset for economic and financial text mining applications. A representative corpus of business news is crawled and an annotation scheme developed with an iteratively refined economic event typology. The annotations are compatible with benchmark datasets (ACE/ERE) so state-of-the-art event extraction systems can be readily applied. This results in a gold-standard dataset annotated with event triggers, participant arguments, event co-reference, and event attributes such as type, subtype, negation, and modality. An adjudicated reference test set is created for use in annotator and system evaluation. Agreement scores are substantial and annotator performance adequate, indicating that the annotation scheme produces consistent event annotations of high quality. In an event detection pilot study, satisfactory results were obtained with a macro-averaged $$F_1$$ F 1 -score of $$59\%$$ 59 % validating the dataset for machine learning purposes. This dataset thus provides a rich resource on events as training data for supervised machine learning for economic and financial applications. The dataset and related source code is made available at https://osf.io/8jec2/.

Download Full-text

Exploring the Use of Machine Learning to Automate the Qualitative Coding of Church-related Tweets

Fieldwork in Religion ◽

10.1558/firn.40610 ◽

2020 ◽

Vol 14 (2) ◽

pp. 140-159

Author(s):

Anthony-Paul Cooper ◽

Emmanuel Awuni Kolog ◽

Erkki Sutinen

Keyword(s):

Machine Learning ◽

Online Community ◽

High Volume ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Social Media Data ◽

Twitter Data ◽

Resource Intensity ◽

Media Data ◽

Better Than

This article builds on previous research around the exploration of the content of church-related tweets. It does so by exploring whether the qualitative thematic coding of such tweets can, in part, be automated by the use of machine learning. It compares three supervised machine learning algorithms to understand how useful each algorithm is at a classification task, based on a dataset of human-coded church-related tweets. The study finds that one such algorithm, Naïve-Bayes, performs better than the other algorithms considered, returning Precision, Recall and F-measure values which each exceed an acceptable threshold of 70%. This has far-reaching consequences at a time where the high volume of social media data, in this case, Twitter data, means that the resource-intensity of manual coding approaches can act as a barrier to understanding how the online community interacts with, and talks about, church. The findings presented in this article offer a way forward for scholars of digital theology to better understand the content of online church discourse.

Download Full-text

Application of Supervised Machine Learning Algorithms for Lithofacies Classification.

10.2523/19349-ms ◽

2019 ◽

Author(s):

Subhadeep Sarkar ◽

Chandan Majumdar

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Lithofacies Classification

Download Full-text

Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

10.26434/chemrxiv.5513581.v1 ◽

2017 ◽

Author(s):

Sabrina Jaeger ◽

Simone Fulle ◽

Samo Turk

Keyword(s):

Machine Learning ◽

Language Processing ◽

Supervised Machine Learning ◽

Learning Approach ◽

Learning Approaches ◽

Unsupervised Machine Learning ◽

Feature Representations ◽

Machine Learning Approach ◽

The Individual ◽

Vector Representations

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.

Download Full-text