scholarly journals Cross-Domain Topic Classification for Political Texts

2021 ◽  
pp. 1-22
Author(s):  
Moritz Osnabrügge ◽  
Elliott Ash ◽  
Massimo Morelli

Abstract We introduce and assess the use of supervised learning in cross-domain topic classification. In this approach, an algorithm learns to classify topics in a labeled source corpus and then extrapolates topics in an unlabeled target corpus from another domain. The ability to use existing training data makes this method significantly more efficient than within-domain supervised learning. It also has three advantages over unsupervised topic models: the method can be more specifically targeted to a research question and the resulting topics are easier to validate and interpret. We demonstrate the method using the case of labeled party platforms (source corpus) and unlabeled parliamentary speeches (target corpus). In addition to the standard within-domain error metrics, we further validate the cross-domain performance by labeling a subset of target-corpus documents. We find that the classifier accurately assigns topics in the parliamentary speeches, although accuracy varies substantially by topic. We also propose tools diagnosing cross-domain classification. To illustrate the usefulness of the method, we present two case studies on how electoral rules and the gender of parliamentarians influence the choice of speech topics.

Author(s):  
Marco Valeri ◽  
Leslie Fadlon

Obiettivo del paper è verificare se il rapporto tra la destinazione turistica e le imprese turistiche, che di essa fanno parte, può essere definito di natura co-evolutiva.Il paper è il frutto della prosecuzione di precedenti nostre ricerche sul tema del destination management e destination governance. La research question su cui si fonda l'impianto teorico del paper è: nello scenario turistico nazionale esistono modelli di ospitalità turistica concepiti come esempi di co-evoluzione tra la destinazione turistica ed il territorio? In un contesto turistico, divenuto da tempo complesso, le imprese si trovano a relazionarsi sempre più con turisti, sia italiani sia stranieri, attenti alla qualità del proprio tempo libero da dedicare all'esperienza turistica ed a riscoprire le autenticità del territorio che visitano. La necessità disoddisfare le esigenze più disparate ha favorito l'affermazione e lo sviluppo di particolariformule imprenditoriali turistiche sostenibili e coerenti con le evoluzioni delle esigenze dei turisti. A tal proposito, per intercettare e governare le dinamiche emergenti nel settore turistico, è necessario partire da una analisi delle problematiche di governance e di management della destinazione e dell'impresa turistica. Nel paper la prospettiva di analisi che risulta essere più appropriata per qualificare meglio la natura del rapporto tra la destinazione turistica e le imprese turistiche è la prospettiva co-evolutiva. Secondo tale prospettiva le imprese turistiche co-evolvono con le destinazioni turistiche nella ricerca di vantaggi competitivi duraturi nel tempo: le imprese turistiche sono considerate risorse critiche per lo sviluppo del territorio e viceversa. Il processo di co-evoluzione presuppone l'individuazione di un organo di governo capace di valorizzare le componenti di dotazione e sistemiche di cui dispone il territorio e di stimolare i comportamenti organizzativi delle diverse imprese turistiche. L'assenza di case studies costituisce un limite del paper. Pertanto in una prospettiva di ricerca futura si intenderà proseguire l'analisi proposta arricchendola di evidenze empiriche, ritenute utili per alimentare il dibattito sulla tematica affrontata e per le conseguenti implicazioni imprenditoriali e manageriali.


Electronics ◽  
2021 ◽  
Vol 10 (15) ◽  
pp. 1807
Author(s):  
Sascha Grollmisch ◽  
Estefanía Cano

Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained with only a fraction of the labeled data. The commonality between recent SSL methods is that they strongly rely on the augmentation of unannotated data. This is vastly unexplored for audio data. In this work, SSL using the state-of-the-art FixMatch approach is evaluated on three audio classification tasks, including music, industrial sounds, and acoustic scenes. The performance of FixMatch is compared to Convolutional Neural Networks (CNN) trained from scratch, Transfer Learning, and SSL using the Mean Teacher approach. Additionally, a simple yet effective approach for selecting suitable augmentation methods for FixMatch is introduced. FixMatch with the proposed modifications always outperformed Mean Teacher and the CNNs trained from scratch. For the industrial sounds and music datasets, the CNN baseline performance using the full dataset was reached with less than 5% of the initial training data, demonstrating the potential of recent SSL methods for audio data. Transfer Learning outperformed FixMatch only for the most challenging dataset from acoustic scene classification, showing that there is still room for improvement.


Author(s):  
Carlos Lassance ◽  
Vincent Gripon ◽  
Antonio Ortega

For the past few years, deep learning (DL) robustness (i.e. the ability to maintain the same decision when inputs are subject to perturbations) has become a question of paramount importance, in particular in settings where misclassification can have dramatic consequences. To address this question, authors have proposed different approaches, such as adding regularizers or training using noisy examples. In this paper we introduce a regularizer based on the Laplacian of similarity graphs obtained from the representation of training data at each layer of the DL architecture. This regularizer penalizes large changes (across consecutive layers in the architecture) in the distance between examples of different classes, and as such enforces smooth variations of the class boundaries. We provide theoretical justification for this regularizer and demonstrate its effectiveness to improve robustness on classical supervised learning vision datasets for various types of perturbations. We also show it can be combined with existing methods to increase overall robustness.


2021 ◽  
Vol 17 (3) ◽  
pp. 1-20
Author(s):  
Vanh Khuyen Nguyen ◽  
Wei Emma Zhang ◽  
Adnan Mahmood

Intrusive Load Monitoring (ILM) is a method to measure and collect the energy consumption data of individual appliances via smart plugs or smart sockets. A major challenge of ILM is automatic appliance identification, in which the system is able to determine automatically a label of the active appliance connected to the smart device. Existing ILM techniques depend on labels input by end-users and are usually under the supervised learning scheme. However, in reality, end-users labeling is laboriously rendering insufficient training data to fit the supervised learning models. In this work, we propose a semi-supervised learning (SSL) method that leverages rich signals from the unlabeled dataset and jointly learns the classification loss for the labeled dataset and the consistency training loss for unlabeled dataset. The samples fit into consistency learning are generated by a transformation that is built upon weighted versions of DTW Barycenter Averaging algorithm. The work is inspired by two recent advanced works in SSL in computer vision and combines the advantages of the two. We evaluate our method on the dataset collected from our developed Internet-of-Things based energy monitoring system in a smart home environment. We also examine the method’s performances on 10 benchmark datasets. As a result, the proposed method outperforms other methods on our smart appliance datasets and most of the benchmarks datasets, while it shows competitive results on the rest datasets.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Huu-Thanh Duong ◽  
Tram-Anh Nguyen-Thi

AbstractIn literature, the machine learning-based studies of sentiment analysis are usually supervised learning which must have pre-labeled datasets to be large enough in certain domains. Obviously, this task is tedious, expensive and time-consuming to build, and hard to handle unseen data. This paper has approached semi-supervised learning for Vietnamese sentiment analysis which has limited datasets. We have summarized many preprocessing techniques which were performed to clean and normalize data, negation handling, intensification handling to improve the performances. Moreover, data augmentation techniques, which generate new data from the original data to enrich training data without user intervention, have also been presented. In experiments, we have performed various aspects and obtained competitive results which may motivate the next propositions.


2013 ◽  
Vol 427-429 ◽  
pp. 2309-2312
Author(s):  
Hai Bin Mei ◽  
Ming Hua Zhang

Alert classifiers built with the supervised classification technique require large amounts of labeled training alerts. Preparing for such training data is very difficult and expensive. Thus accuracy and feasibility of current classifiers are greatly restricted. This paper employs semi-supervised learning to build alert classification model to reduce the number of needed labeled training alerts. Alert context properties are also introduced to improve the classification performance. Experiments have demonstrated the accuracy and feasibility of our approach.


Author(s):  
Filip Cyuńczyk

The main goal of the article is to conduct case studies of CEE memory policies introduced after the fall of communism and to present them as an interesting field for examining the instrumentalization of law. The primary research question is: Do several case studies of several memory policies implemented in post-communist states help to examine the theoretical concept of the instrumentalization of law? In this paper, I intend to show the hidden potential of such studies. I present some of the specific elements of new constitutionalization attempts in CEE, which included narratives of memory in several constitutions in the region. I also show their relation to the concept of instrumentalization of law. Finally, I describe some political acts of instrumentalization of law in the field of collective memory.


2020 ◽  
Vol 34 (05) ◽  
pp. 9193-9200
Author(s):  
Shaolei Wang ◽  
Wangxiang Che ◽  
Qi Liu ◽  
Pengda Qin ◽  
Ting Liu ◽  
...  

Most existing approaches to disfluency detection heavily rely on human-annotated data, which is expensive to obtain in practice. To tackle the training data bottleneck, we investigate methods for combining multiple self-supervised tasks-i.e., supervised tasks where data can be collected without manual labeling. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled news data, and propose two self-supervised pre-training tasks: (i) tagging task to detect the added noisy words. (ii) sentence classification to distinguish original sentences from grammatically-incorrect sentences. We then combine these two tasks to jointly train a network. The pre-trained network is then fine-tuned using human-annotated disfluency detection training data. Experimental results on the commonly used English Switchboard test set show that our approach can achieve competitive performance compared to the previous systems (trained using the full dataset) by using less than 1% (1000 sentences) of the training data. Our method trained on the full dataset significantly outperforms previous methods, reducing the error by 21% on English Switchboard.


In Hungary, there are a large number of built heritage. Of these, this current research focuses on the castles. Nowadays castles can be filled with many functions, such as schools, common lodging houses, hospitals or residential buildings. The most optimal form of usage is tourism utilization, such as museums, hotels, event venues. Organizing festivals is also a tool for this, it generates revenue for the castle, and makes it widely available to enhance the visibility and acquaintance of the venues. A festival is also intended to suffice the needs of tourists and local people, and these castles provide a suitable scene for this. Thus, the current research aims to present and evaluate the form of utilization of castles in which festivals are organized. The study aims to answer the research question through several case studies, that festivals contribute to the survival of the castles as they generate revenue and create more attractive destinations. To answer this question, a primary research method is needed where interviews with the owners of the venues and the festival directors come to the fore. In addition, the available secondary data are required that numerical support the generated revenue and number of visitors alike. In the end, the research will be carried out where both the utilization of the castles and the festival tourism will be of paramount importance and a joint impact assessment will be implemented.


2021 ◽  
Author(s):  
Haibin Di ◽  
Chakib Kada Kloucha ◽  
Cen Li ◽  
Aria Abubakar ◽  
Zhun Li ◽  
...  

Abstract Delineating seismic stratigraphic features and depositional facies is of importance to successful reservoir mapping and identification in the subsurface. Robust seismic stratigraphy interpretation is confronted with two major challenges. The first one is to maximally automate the process particularly with the increasing size of seismic data and complexity of target stratigraphies, while the second challenge is to efficiently incorporate available structures into stratigraphy model building. Machine learning, particularly convolutional neural network (CNN), has been introduced into assisting seismic stratigraphy interpretation through supervised learning. However, the small amount of available expert labels greatly restricts the performance of such supervised CNN. Moreover, most of the exiting CNN implementations are based on only amplitude, which fails to use necessary structural information such as faults for constraining the machine learning. To resolve both challenges, this paper presents a semi-supervised learning workflow for fault-guided seismic stratigraphy interpretation, which consists of two components. The first component is seismic feature engineering (SFE), which aims at learning the provided seismic and fault data through a unsupervised convolutional autoencoder (CAE), while the second one is stratigraphy model building (SMB), which aims at building an optimal mapping function between the features extracted from the SFE CAE and the target stratigraphic labels provided by an experienced interpreter through a supervised CNN. Both components are connected by embedding the encoder of the SFE CAE into the SMB CNN, which forces the SMB learning based on these features commonly existing in the entire study area instead of those only at the limited training data; correspondingly, the risk of overfitting is greatly eliminated. More innovatively, the fault constraint is introduced by customizing the SMB CNN of two output branches, with one to match the target stratigraphies and the other to reconstruct the input fault, so that the fault continues contributing to the process of SMB learning. The performance of such fault-guided seismic stratigraphy interpretation is validated by an application to a real seismic dataset, and the machine prediction not only matches the manual interpretation accurately but also clearly illustrates the depositional process in the study area.


Sign in / Sign up

Export Citation Format

Share Document