scholarly journals Authorship Attribution with Topic Models

2014 ◽  
Vol 40 (2) ◽  
pp. 269-310 ◽  
Author(s):  
Yanir Seroussi ◽  
Ingrid Zukerman ◽  
Fabian Bohnert

Authorship attribution deals with identifying the authors of anonymous texts. Traditionally, research in this field has focused on formal texts, such as essays and novels, but recently more attention has been given to texts generated by on-line users, such as e-mails and blogs. Authorship attribution of such on-line texts is a more challenging task than traditional authorship attribution, because such texts tend to be short, and the number of candidate authors is often larger than in traditional settings. We address this challenge by using topic models to obtain author representations. In addition to exploring novel ways of applying two popular topic models to this task, we test our new model that projects authors and documents to two disjoint topic spaces. Utilizing our model in authorship attribution yields state-of-the-art performance on several data sets, containing either formal texts written by a few authors or informal texts generated by tens to thousands of on-line users. We also present experimental results that demonstrate the applicability of topical author representations to two other problems: inferring the sentiment polarity of texts, and predicting the ratings that users would give to items such as movies.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Changyong Li ◽  
Yongxian Fan ◽  
Xiaodong Cai

Abstract Background With the development of deep learning (DL), more and more methods based on deep learning are proposed and achieve state-of-the-art performance in biomedical image segmentation. However, these methods are usually complex and require the support of powerful computing resources. According to the actual situation, it is impractical that we use huge computing resources in clinical situations. Thus, it is significant to develop accurate DL based biomedical image segmentation methods which depend on resources-constraint computing. Results A lightweight and multiscale network called PyConvU-Net is proposed to potentially work with low-resources computing. Through strictly controlled experiments, PyConvU-Net predictions have a good performance on three biomedical image segmentation tasks with the fewest parameters. Conclusions Our experimental results preliminarily demonstrate the potential of proposed PyConvU-Net in biomedical image segmentation with resources-constraint computing.


2019 ◽  
Vol 9 (13) ◽  
pp. 2684 ◽  
Author(s):  
Hongyang Li ◽  
Lizhuang Liu ◽  
Zhenqi Han ◽  
Dan Zhao

Peeling fibre is an indispensable process in the production of preserved Szechuan pickle, the accuracy of which can significantly influence the quality of the products, and thus the contour method of fibre detection, as a core algorithm of the automatic peeling device, is studied. The fibre contour is a kind of non-salient contour, characterized by big intra-class differences and small inter-class differences, meaning that the feature of the contour is not discriminative. The method called dilated-holistically-nested edge detection (Dilated-HED) is proposed to detect the fibre contour, which is built based on the HED network and dilated convolution. The experimental results for our dataset show that the Pixel Accuracy (PA) is 99.52% and the Mean Intersection over Union (MIoU) is 49.99%, achieving state-of-the-art performance.


2019 ◽  
Vol 277 ◽  
pp. 01012 ◽  
Author(s):  
Clare E. Matthews ◽  
Paria Yousefi ◽  
Ludmila I. Kuncheva

Many existing methods for video summarisation are not suitable for on-line applications, where computational and memory constraints mean that feature extraction and frame selection must be simple and efficient. Our proposed method uses RGB moments to represent frames, and a control-chart procedure to identify shots from which keyframes are then selected. The new method produces summaries of higher quality than two state-of-the-art on-line video summarisation methods identified as the best among nine such methods in our previous study. The summary quality is measured against an objective ideal for synthetic data sets, and compared to user-generated summaries of real videos.


Symmetry ◽  
2019 ◽  
Vol 11 (12) ◽  
pp. 1486
Author(s):  
Zhinan Gou ◽  
Zheng Huo ◽  
Yuanzhen Liu ◽  
Yi Yang

Supervised topic modeling has been successfully applied in the fields of document classification and tag recommendation in recent years. However, most existing models neglect the fact that topic terms have the ability to distinguish topics. In this paper, we propose a term frequency-inverse topic frequency (TF-ITF) method for constructing a supervised topic model, in which the weight of each topic term indicates the ability to distinguish topics. We conduct a series of experiments with not only the symmetric Dirichlet prior parameters but also the asymmetric Dirichlet prior parameters. Experimental results demonstrate that the result of introducing TF-ITF into a supervised topic model outperforms several state-of-the-art supervised topic models.


Author(s):  
Rina Refianti ◽  
Achmad Benny Mutiara ◽  
Asep Juarna ◽  
Adang Suhendra

In recent years, two new data clustering algorithms have been proposed. One of them isAffinity Propagation (AP). AP is a new data clustering technique that use iterative message passing and consider all data points as potential exemplars. Two important inputs of AP are a similarity matrix (SM) of the data and the parameter ”preference” p. Although the original AP algorithm has shown much success in data clustering, it still suffer from one limitation: it is not easy to determine the value of the parameter ”preference” p which can result an optimal clustering solution. To resolve this limitation, we propose a new model of the parameter ”preference” p, i.e. it is modeled based on the similarity distribution. Having the SM and p, Modified Adaptive AP (MAAP) procedure is running. MAAP procedure means that we omit the adaptive p-scanning algorithm as in original Adaptive-AP (AAP) procedure. Experimental results on random non-partition and partition data sets show that (i) the proposed algorithm, MAAP-DDP, is slower than original AP for random non-partition dataset, (ii) for random 4-partition dataset and real datasets the proposed algorithm has succeeded to identify clusters according to the number of dataset’s true labels with the execution times that are comparable with those original AP. Beside that the MAAP-DDP algorithm demonstrates more feasible and effective than original AAP procedure.


2019 ◽  
Vol 9 (16) ◽  
pp. 3389 ◽  
Author(s):  
Biqing Zeng ◽  
Heng Yang ◽  
Ruyang Xu ◽  
Wu Zhou ◽  
Xuli Han

Aspect-based sentiment classification (ABSC) aims to predict sentiment polarities of different aspects within sentences or documents. Many previous studies have been conducted to solve this problem, but previous works fail to notice the correlation between the aspect’s sentiment polarity and the local context. In this paper, a Local Context Focus (LCF) mechanism is proposed for aspect-based sentiment classification based on Multi-head Self-Attention (MHSA). This mechanism is called LCF design, and utilizes the Context features Dynamic Mask (CDM) and Context Features Dynamic Weighted (CDW) layers to pay more attention to the local context words. Moreover, a BERT-shared layer is adopted to LCF design to capture internal long-term dependencies of local context and global context. Experiments are conducted on three common ABSC datasets: the laptop and restaurant datasets of SemEval-2014 and the ACL twitter dataset. Experimental results demonstrate that the LCF baseline model achieves considerable performance. In addition, we conduct ablation experiments to prove the significance and effectiveness of LCF design. Especially, by incorporating with BERT-shared layer, the LCF-BERT model refreshes state-of-the-art performance on all three benchmark datasets.


2021 ◽  
Vol 15 ◽  
Author(s):  
Yibing Yu ◽  
Shuang Shi ◽  
Yifei Wang ◽  
Xinkang Lian ◽  
Jing Liu ◽  
...  

At present, most of departments in colleges have their own official accounts, which have become the primary channel for announcements and news. In the official accounts, the popularity of articles is influenced by many different factors, such as the content of articles, the aesthetics of the layout, and so on. This paper mainly studies how to learn a computational model for predicting page view on college official accounts with quality-aware features extracted from pictures. First, we built a new picture database by collecting 1,000 pictures from the official accounts of nine well-known universities in the city of Beijing. Then, we proposed a new model for predicting page view by using a selective ensemble technology to fuse three sets of quality-aware features that could represent how a picture looks. Experimental results show that the proposed model has achieved competitive performance against state-of-the-art relevant models on the task for inferring page view from pictures on college official accounts.


2015 ◽  
Vol 24 (03) ◽  
pp. 1550003 ◽  
Author(s):  
Armin Daneshpazhouh ◽  
Ashkan Sami

The task of semi-supervised outlier detection is to find the instances that are exceptional from other data, using some labeled examples. In many applications such as fraud detection and intrusion detection, this issue becomes more important. Most existing techniques are unsupervised. On the other hand, semi-supervised approaches use both negative and positive instances to detect outliers. However, in many real world applications, very few positive labeled examples are available. This paper proposes an innovative approach to address this problem. The proposed method works as follows. First, some reliable negative instances are extracted by a kNN-based algorithm. Afterwards, fuzzy clustering using both negative and positive examples is utilized to detect outliers. Experimental results on real data sets demonstrate that the proposed approach outperforms the previous unsupervised state-of-the-art methods in detecting outliers.


2012 ◽  
Vol 21 (01) ◽  
pp. 1250007 ◽  
Author(s):  
LIANGXIAO JIANG ◽  
DIANHONG WANG ◽  
ZHIHUA CAI

Many approaches are proposed to improve naive Bayes by weakening its conditional independence assumption. In this paper, we work on the approach of instance weighting and propose an improved naive Bayes algorithm by discriminative instance weighting. We called it Discriminatively Weighted Naive Bayes. In each iteration of it, different training instances are discriminatively assigned different weights according to the estimated conditional probability loss. The experimental results based on a large number of UCI data sets validate its effectiveness in terms of the classification accuracy and AUC. Besides, the experimental results on the running time show that our Discriminatively Weighted Naive Bayes performs almost as efficiently as the state-of-the-art Discriminative Frequency Estimate learning method, and significantly more efficient than Boosted Naive Bayes. At last, we apply the idea of discriminatively weighted learning in our algorithm to some state-of-the-art naive Bayes text classifiers, such as multinomial naive Bayes, complement naive Bayes and the one-versus-all-but-one model, and have achieved remarkable improvements.


2015 ◽  
Author(s):  
Stéphane Pesant ◽  
Fabrice Not ◽  
Marc Picheral ◽  
Stefanie Kandels-Lewis ◽  
Noan Le Bescot ◽  
...  

The Tara Oceans expedition (2009-2013) sampled contrasting ecosystems of the world oceans, collecting environmental data and plankton, from viruses to metazoans, for later analysis using modern sequencing and state-of-the-art imaging technologies. It surveyed 210 ecosystems in 20 biogeographic provinces, collecting over 35000 samples of seawater and plankton. The interpretation of such an extensive collection of samples in their ecological context requires means to explore, assess and access raw and validated data sets. To address this challenge, the Tara Oceans Consortium offers open science resources, including the use of open access archives for nucleotides (ENA) and for environmental, biogeochemical, taxonomic and morphological data (PANGAEA), and the development of on line discovery tools and collaborative annotation tools for sequences and images. Here, we present an overview of Tara Oceans Data, and we provide detailed registries (data sets) of all campaigns (from port-to-port), stations and sampling events.


Sign in / Sign up

Export Citation Format

Share Document