On divergence-based author obfuscation: An attack on the state of the art in statistical authorship verification

2020 ◽  
Vol 62 (2) ◽  
pp. 99-115
Author(s):  
Janek Bevendorff ◽  
Tobias Wenzel ◽  
Martin Potthast ◽  
Matthias Hagen ◽  
Benno Stein

AbstractAuthorship verification is the task of determining whether two texts were written by the same author based on a writing style analysis. Author obfuscation is the adversarial task of preventing a successful verification by altering a text’s style so that it does not resemble that of its original author anymore. This paper introduces new algorithms for both tasks and reports on a comprehensive evaluation to ascertain the merits of the state of the art in authorship verification to withstand obfuscation.After introducing a new generalization of the well-known unmasking algorithm for short texts, thus completing our collection of state-of-the-art algorithms for verification, we introduce an approach that (1) models writing style difference as the Jensen-Shannon distance between the character n-gram distributions of texts, and (2) manipulates an author’s writing style in a sophisticated manner using heuristic search. For obfuscation, we explore the huge space of textual variants in order to find a paraphrased version of the to-be-obfuscated text that has a sufficiently high Jensen-Shannon distance at minimal costs in terms of text quality loss. We analyze, quantify, and illustrate the rationale of this approach, define paraphrasing operators, derive text length-invariant thresholds for termination, and develop an effective obfuscation framework. Our authorship obfuscation approach defeats the presented state-of-the-art verification approaches, while keeping text changes at a minimum. As a final contribution, we discuss and experimentally evaluate a reverse obfuscation attack against our obfuscation approach as well as possible remedies.

2013 ◽  
Vol 2013 ◽  
pp. 1-6
Author(s):  
Dunbo Cai ◽  
Sheng Xu ◽  
Tongzhou Zhao ◽  
Yanduo Zhang

Pruning techniques and heuristics are two keys to the heuristic search-based planning. Thehelpful actionspruning (HAP) strategy andrelaxed-plan-based heuristicsare two representatives among those methods and are still popular in the state-of-the-art planners. Here, we present new analyses on the properties of HAP. Specifically, we show new reasons for which HAP can cause incompleteness of a search procedure. We prove that, in general, HAP is incomplete for planning with conditional effects if factored expansions of actions are used. To preserve completeness, we propose a pruning strategy that is based onrelevance analysisandconfrontation. We will show that bothrelevance analysisandconfrontationare necessary. We call it theconfrontation and goal relevant actionspruning (CGRAP) strategy. However, CGRAP is computationally hard to be exactly computed. Therefore, we suggest practical approximations from the literature.


Author(s):  
Daniel Höller ◽  
Pascal Bercher ◽  
Gregor Behnke ◽  
Susanne Biundo

Planning is the task of finding a sequence of actions that achieves the goal(s) of an agent. It is solved based on a model describing the environment and how to change it. There are several approaches to solve planning tasks, two of the most popular are classical planning and hierarchical planning. Solvers are often based on heuristic search, but especially regarding domain-independent heuristics, techniques in classical planning are more sophisticated. However, due to the different problem classes, it is difficult to use them in hierarchical planning. In this paper we describe how to use arbitrary classical heuristics in hierarchical planning and show that the resulting system outperforms the state of the art in hierarchical planning.


2016 ◽  
Vol 57 ◽  
pp. 229-271 ◽  
Author(s):  
Marcel Steinmetz ◽  
Jörg Hoffmann ◽  
Olivier Buffet

Unavoidable dead-ends are common in many probabilistic planning problems, e.g. when actions may fail or when operating under resource constraints. An important objective in such settings is MaxProb, determining the maximal probability with which the goal can be reached, and a policy achieving that probability. Yet algorithms for MaxProb probabilistic planning are severely underexplored, to the extent that there is scant evidence of what the empirical state of the art actually is. We close this gap with a comprehensive empirical analysis. We design and explore a large space of heuristic search algorithms, systematizing known algorithms and contributing several new algorithm variants. We consider MaxProb, as well as weaker objectives that we baptize AtLeastProb (requiring to achieve a given goal probabilty threshold) and ApproxProb (requiring to compute the maximum goal probability up to a given accuracy). We explore both the general case where there may be 0-reward cycles, and the practically relevant special case of acyclic planning, such as planning with a limited action-cost budget. We design suitable termination criteria, search algorithm variants, dead-end pruning methods using classical planning heuristics, and node selection strategies. We design a benchmark suite comprising more than 1000 instances adapted from the IPPC, resource-constrained planning, and simulated penetration testing. Our evaluation clarifies the state of the art, characterizes the behavior of a wide range of heuristic search algorithms, and demonstrates significant benefits of our new algorithm variants.


2018 ◽  
Vol 24 (5) ◽  
pp. 649-676 ◽  
Author(s):  
XURI TANG

AbstractThis paper reviews the state-of-the-art of one emergent field in computational linguistics—semantic change computation. It summarizes the literature by proposing a framework that identifies five components in the field: diachronic corpus, diachronic word sense characterization, change modelling, evaluation and data visualization. Despite its potentials, the review shows that current studies are mainly focused on testifying hypotheses of semantic change from theoretical linguistics and that several core issues remain to be tackled: the need of diachronic corpora for languages other than English, the comparison and development of approaches to diachronic word sense characterization and change modelling, the need of comprehensive evaluation data and further exploration of data visualization techniques for hypothesis justification.


2006 ◽  
Vol 32 (4) ◽  
pp. 527-549 ◽  
Author(s):  
José B. Mariño ◽  
Rafael E. Banchs ◽  
Josep M. Crego ◽  
Adrià de Gispert ◽  
Patrik Lambert ◽  
...  

This article describes in detail an n-gram approach to statistical machine translation. This approach consists of a log-linear combination of a translation model based on n-grams of bilingual units, which are referred to as tuples, along with four specific feature functions. Translation performance, which happens to be in the state of the art, is demonstrated with Spanish-to-English and English-to-Spanish translations of the European Parliament Plenary Sessions (EPPS).


2016 ◽  
Vol 6 (1) ◽  
pp. 168-177
Author(s):  
Gabriela Andrejková ◽  
Abdulwahed Almarimi

AbstractTexts (books, novels, papers, short messages) are sequences of sentences, words or symbols. Each author has an unique writing style. It can be characterized by some collection of attributes obtained from texts. The text verification is the case of an authorship verification where we have some text and we analyze if all parts of this textwere written by the same (unknown or known) author. In this paper, there are analyzed and compared results of two developed methods for a text verification based on ngrams of symbols and on local histograms of words. The results of a symbol n-gram method and a method of word histograms for a dissimilarities searching in text parts of each text are analyzed and evaluated. The searched dissimilarities call for an attention to the text (or not) if the text parts were written by the same author or not. The attention depends on selected parameters prepared in experiments. Results illustrate usability of the methods to dissimilarities searching in text parts.


2015 ◽  
Vol 2 (1) ◽  
pp. 42
Author(s):  
Ruben Dorado

ONTARE. REVISTA DE INVESTIGACIÓN DE LA FACULTAD DE INGENIERÍAThis article describes the exploration task known as smoothing for statistical language representation. It also reviews some of the state- of-the-art methods that improve the representation of language in a statistical way. Specifically, these reported methods improve statistical models known as N-gram models. This paper also shows a method to measure models in order to compare them. 


Author(s):  
Hanning Gao ◽  
Lingfei Wu ◽  
Po Hu ◽  
Fangli Xu

The task of RDF-to-text generation is to generate a corresponding descriptive text given a set of RDF triples. Most of the previous approaches either cast this task as a sequence-to-sequence problem or employ graph-based encoder for modeling RDF triples and decode a text sequence. However, none of these methods can explicitly model both local and global structure information between and within the triples. To address these issues, we propose to jointly learn local and global structure information via combining two new graph-augmented structural neural encoders (i.e., a bidirectional graph encoder and a bidirectional graph-based meta-paths encoder) for the input triples. Experimental results on two different WebNLG datasets show that our proposed model outperforms the state-of-the-art baselines. Furthermore, we perform a human evaluation that demonstrates the effectiveness of the proposed method by evaluating generated text quality using various subjective metrics.


Author(s):  
Yuheng Hu

Viewers often use social media platforms like Twitter to express their views about televised programs and events like the presidential debate, the Oscars, and the State of the Union speech. Although this promises tremendous opportunities to analyze the feedback on a program or an event using viewer-generated content on social media, there are significant technical challenges to doing so. Specifically, given a televised event and related tweets about this event, we need methods to effectively align these tweets and the corresponding event. In turn, this will raise many questions, such as how to segment the event and how to classify a tweet based on whether it is generally about the entire event or specifically about one particular event segment. In this paper, we propose and develop a novel joint Bayesian model that aligns an event and its related tweets based on the influence of the event’s topics. Our model allows the automated event segmentation and tweet classification concurrently. We present an efficient inference method for this model and a comprehensive evaluation of its effectiveness compared with the state-of-the-art methods. We find that the topics, segments, and alignment provided by our model are significantly more accurate and robust.


2017 ◽  
Author(s):  
André G. Pereira ◽  
Luciana S. Buriol ◽  
Marcus Ritt

Moving-blocks problems are extremely hard to solve and a representative abstraction of many applications. Despite their importance, the known computational complexity results are limited to few versions of these problems. In addition, there are no effective methods to optimally solve them. We address both of these issues. This thesis proves the PSPACE-completeness of many versions of moving-blocks problems. Moreover, we propose new methods to optimally solve these problems based on heuristic search with admissible heuristic functions and tie-breaking strategies. Our methods advance the state of the art, create new lines of research and improve the results of applications.


Sign in / Sign up

Export Citation Format

Share Document