scholarly journals How Populist are Parties? Measuring Degrees of Populism in Party Manifestos Using Supervised Machine Learning

2021 ◽  
pp. 1-17
Author(s):  
Jessica Di Cocco ◽  
Bernardo Monechi

Abstract One of the main challenges in comparative studies on populism concerns its temporal and spatial measurements within and between a large number of parties and countries. Textual analysis has proved useful for these purposes, and automated methods can further improve research in this direction. Here, we propose a method to derive a score of parties’ levels of populism using supervised machine learning to perform textual analysis on national manifestos. We illustrate the advantages of our approach, which allows for measuring populism for a vast number of parties and countries without resource-intensive human-coding processes and provides accurate, updated information for temporal and spatial comparisons of populism. Furthermore, our method allows for obtaining a continuous score of populism, which ensures more fine-grained analyses of the party landscape while reducing the risk of arbitrary classifications. To illustrate the potential contribution of this score, we use it as a proxy for parties’ levels of populism, analyzing average trends in six European countries from the early 2000s for nearly two decades.

2018 ◽  
Vol 46 (1) ◽  

Damian Trilling & Jelle Boumans Automated analysis of Dutch language-based texts. An overview and research agenda While automated methods of content analysis are increasingly popular in today’s communication research, these methods have hardly been adopted by communication scholars studying texts in Dutch. This essay offers an overview of the possibilities and current limitations of automated text analysis approaches in the context of the Dutch language. Particularly in dictionary-based approaches, research is far less prolific as research on the English language. We divide the most common types of content-analytical research questions into three categories: 1) research problems for which automated methods ought to be used, 2) research problems for which automated methods could be used, and 3) research problems for which automated methods (currently) cannot be used. Finally, we give suggestions for the advancement of automated text analysis approaches for Dutch texts. Keywords: automated content analysis, Dutch, dictionaries, supervised machine learning, unsupervised machine learning


AI Magazine ◽  
2015 ◽  
Vol 36 (1) ◽  
pp. 75-86 ◽  
Author(s):  
Jennifer Sleeman ◽  
Tim Finin ◽  
Anupam Joshi

We describe an approach for identifying fine-grained entity types in heterogeneous data graphs that is effective for unstructured data or when the underlying ontologies or semantic schemas are unknown. Identifying fine-grained entity types, rather than a few high-level types, supports coreference resolution in heterogeneous graphs by reducing the number of possible coreference relations that must be considered. Big data problems that involve integrating data from multiple sources can benefit from our approach when the datas ontologies are unknown, inaccessible or semantically trivial. For such cases, we use supervised machine learning to map entity attributes and relations to a known set of attributes and relations from appropriate background knowledge bases to predict instance entity types. We evaluated this approach in experiments on data from DBpedia, Freebase, and Arnetminer using DBpedia as the background knowledge base.


2020 ◽  
pp. 1-26
Author(s):  
Joshua Eykens ◽  
Raf Guns ◽  
Tim C.E. Engels

We compare two supervised machine learning algorithms—Multinomial Naïve Bayes and Gradient Boosting—to classify social science articles using textual data. The high level of granularity of the classification scheme used and the possibility that multiple categories are assigned to a document make this task challenging. To collect the training data, we query three discipline specific thesauri to retrieve articles corresponding to specialties in the classification. The resulting dataset consists of 113,909 records and covers 245 specialties, aggregated into 31 subdisciplines from three disciplines. Experts were consulted to validate the thesauri-based classification. The resulting multi-label dataset is used to train the machine learning algorithms in different configurations. We deploy a multi-label classifier chaining model, allowing for an arbitrary number of categories to be assigned to each document. The best results are obtained with Gradient Boosting. The approach does not rely on citation data. It can be applied in settings where such information is not available. We conclude that fine-grained text-based classification of social sciences publications at a subdisciplinary level is a hard task, for humans and machines alike. A combination of human expertise and machine learning is suggested as a way forward to improve the classification of social sciences documents.


Author(s):  
Faisal Janjua ◽  
Asif Masood ◽  
Haider Abbas ◽  
Imran Rashid ◽  
Zaki Murtaza

2018 ◽  
Vol 7 (4.1) ◽  
pp. 47
Author(s):  
Zarina Kazhmaganbetova ◽  
Shnar Imangaliyev ◽  
Altynbek Sharipbay

The objective of the work that is presented in this paper was the problem of the communication optimization and detection of the issues of computing resources performance degradation [1, 2] with the usage of machine learning techniques. Computer networks transmit payload data and the meta-data from numerous sources towards vast number of destinations, especially in multi-tenant environments [3, 4]. Meta data describes the payload data and could be analyzed for anomalies detection in the communication patterns. Communication patterns depend on the payload itself and technical protocol used. The technical patterns are the research target as their analysis could spotlight the vulnerable behavior, for example: unusual traffic, extra load transported and etc.There was a big data used to train model with a supervised machine learning. Dataset was collected from the network interfaces of the distributed application infrastructure. Machine Learning tools had been retained from the cloud services provider – Amazon Web Services. The stochastic gradient descent technique was utilized for the model training, so that it could represent the communication patterns in the system. The learning target parameter was a packet length, the regression was performed to understand the relationship between packet meta-data (timestamp, protocol, the source server) and its length. The root mean square error calculation was applied to evaluate the learning efficiency. After model was prepared using training dataset, the model was tested with the test dataset and then applied on the target dataset (dataset for prediction) to check whether it was capable to detect anomalies.The experimental part showed the applicability of machine learning for the communication optimization in the distributed application environment. By means of the trained artificial intelligence model, it was possible to predict target parameters of traffic and computing resources usage with purpose to avoid service degradation. Additionally, one could reveal anomalies in the transferred traffic between application components. The application of techniques is envisioned in information security field and in the field of efficient network resources planning.Further research could be in application machine learning techniques for more complicated distributed environments and enlarging the number of protocols to prepare communication patterns.  


Materials ◽  
2020 ◽  
Vol 13 (11) ◽  
pp. 2427
Author(s):  
Christian Jaremenko ◽  
Emanuela Affronti ◽  
Marion Merklein ◽  
Andreas Maier

This study proposes a method for the temporal and spatial determination of the onset of local necking determined by means of a Nakajima test set-up for a DC04 deep drawing and a DP800 dual-phase steel, as well as an AA6014 aluminum alloy. Furthermore, the focus lies on the observation of the progress of the necking area and its transformation throughout the remainder of the forming process. The strain behavior is learned by a machine learning approach on the basis of the images when the process is close to material failure. These learned failure characteristics are transferred to new forming sequences, so that critical areas indicating material failure can be identified at an early stage, and consequently enable the determination of the beginning of necking and the analysis of the necking area. This improves understanding of the necking behavior and facilitates the determination of the evaluation area for strain paths. The growth behavior and traceability of the necking area is objectified by the proposed weakly supervised machine learning approach, thereby rendering a heuristic-based determination unnecessary. Furthermore, a simultaneous evaluation on image and pixel scale is provided that enables a distinct selection of the failure quantile of the probabilistic forming limit curve.


Author(s):  
Gilles Jacobs ◽  
Véronique Hoste

AbstractWe present SENTiVENT, a corpus of fine-grained company-specific events in English economic news articles. The domain of event processing is highly productive and various general domain, fine-grained event extraction corpora are freely available but economically-focused resources are lacking. This work fills a large need for a manually annotated dataset for economic and financial text mining applications. A representative corpus of business news is crawled and an annotation scheme developed with an iteratively refined economic event typology. The annotations are compatible with benchmark datasets (ACE/ERE) so state-of-the-art event extraction systems can be readily applied. This results in a gold-standard dataset annotated with event triggers, participant arguments, event co-reference, and event attributes such as type, subtype, negation, and modality. An adjudicated reference test set is created for use in annotator and system evaluation. Agreement scores are substantial and annotator performance adequate, indicating that the annotation scheme produces consistent event annotations of high quality. In an event detection pilot study, satisfactory results were obtained with a macro-averaged $$F_1$$ F 1 -score of $$59\%$$ 59 % validating the dataset for machine learning purposes. This dataset thus provides a rich resource on events as training data for supervised machine learning for economic and financial applications. The dataset and related source code is made available at https://osf.io/8jec2/.


2020 ◽  
Vol 14 (2) ◽  
pp. 140-159
Author(s):  
Anthony-Paul Cooper ◽  
Emmanuel Awuni Kolog ◽  
Erkki Sutinen

This article builds on previous research around the exploration of the content of church-related tweets. It does so by exploring whether the qualitative thematic coding of such tweets can, in part, be automated by the use of machine learning. It compares three supervised machine learning algorithms to understand how useful each algorithm is at a classification task, based on a dataset of human-coded church-related tweets. The study finds that one such algorithm, Naïve-Bayes, performs better than the other algorithms considered, returning Precision, Recall and F-measure values which each exceed an acceptable threshold of 70%. This has far-reaching consequences at a time where the high volume of social media data, in this case, Twitter data, means that the resource-intensity of manual coding approaches can act as a barrier to understanding how the online community interacts with, and talks about, church. The findings presented in this article offer a way forward for scholars of digital theology to better understand the content of online church discourse.


2017 ◽  
Author(s):  
Sabrina Jaeger ◽  
Simone Fulle ◽  
Samo Turk

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.


Sign in / Sign up

Export Citation Format

Share Document