Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution

Abstract Source code attribution classifiers have recently become powerful. We consider the possibility that an adversary could craft code with the intention of causing a misclassification, i.e., creating a forgery of another author’s programming style in order to hide the forger’s own identity or blame the other author. We find that it is possible for a non-expert adversary to defeat such a system. In order to inform the design of adversarially resistant source code attribution classifiers, we conduct two studies with C/C++ programmers to explore the potential tactics and capabilities both of such adversaries and, conversely, of human analysts doing source code authorship attribution. Through the quantitative and qualitative analysis of these studies, we (1) evaluate a state-of-the-art machine classifier against forgeries, (2) evaluate programmers as human analysts/forgery detectors, and (3) compile a set of modifications made to create forgeries. Based on our analyses, we then suggest features that future source code attribution systems might incorporate in order to be adversarially resistant.

Download Full-text

Spatiotemporal Super-Resolution with Cross-Task Consistency and Its Semi-supervised Extension

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/86 ◽

2020 ◽

Author(s):

Han-Yi Lin ◽

Pi-Cheng Hsiu ◽

Tei-Wei Kuo ◽

Yen-Yu Lin

Keyword(s):

High Resolution ◽

State Of The Art ◽

Source Code ◽

Super Resolution ◽

The Other ◽

Training Data ◽

Frame Rate ◽

Stream Network ◽

Network Training ◽

Temporal Dimensions

Spatiotemporal super-resolution (SR) aims to upscale both the spatial and temporal dimensions of input videos, and produces videos with higher frame resolutions and rates. It involves two essential sub-tasks: spatial SR and temporal SR. We design a two-stream network for spatiotemporal SR in this work. One stream contains a temporal SR module followed by a spatial SR module, while the other stream has the same two modules in the reverse order. Based on the interchangeability of performing the two sub-tasks, the two network streams are supposed to produce consistent spatiotemporal SR results. Thus, we present a cross-stream consistency to enforce the similarity between the outputs of the two streams. In this way, the training of the two streams is correlated, which allows the two SR modules to share their supervisory signals and improve each other. In addition, the proposed cross-stream consistency does not consume labeled training data and can guide network training in an unsupervised manner. We leverage this property to carry out semi-supervised spatiotemporal SR. It turns out that our method makes the most of training data, and can derive an effective model with few high-resolution and high-frame-rate videos, achieving the state-of-the-art performance. The source code of this work is available at https://hankweb.github.io/STSRwithCrossTask/.

Download Full-text

The Colony Predation Algorithm

Journal of Bionic Engineering ◽

10.1007/s42235-021-0050-y ◽

2021 ◽

Vol 18 (3) ◽

pp. 674-710

Author(s):

Jiaze Tu ◽

Huiling Chen ◽

Mingjing Wang ◽

Amir H. Gandomi

Keyword(s):

Mathematical Model ◽

State Of The Art ◽

Source Code ◽

The Other ◽

Superior Performance ◽

Design Problems ◽

Optimal Position ◽

Cross Border ◽

Engineering Design Problems ◽

Engineering Problems

AbstractThis paper proposes a new stochastic optimizer called the Colony Predation Algorithm (CPA) based on the corporate predation of animals in nature. CPA utilizes a mathematical mapping following the strategies used by animal hunting groups, such as dispersing prey, encircling prey, supporting the most likely successful hunter, and seeking another target. Moreover, the proposed CPA introduces new features of a unique mathematical model that uses a success rate to adjust the strategy and simulate hunting animals’ selective abandonment behavior. This paper also presents a new way to deal with cross-border situations, whereby the optimal position value of a cross-border situation replaces the cross-border value to improve the algorithm’s exploitation ability. The proposed CPA was compared with state-of-the-art metaheuristics on a comprehensive set of benchmark functions for performance verification and on five classical engineering design problems to evaluate the algorithm’s efficacy in optimizing engineering problems. The results show that the proposed algorithm exhibits competitive, superior performance in different search landscapes over the other algorithms. Moreover, the source code of the CPA will be publicly available after publication.

Download Full-text

Computational stylistics and authorship attribution: what it measures and why it works

10.31237/osf.io/pcnvj ◽

2020 ◽

Author(s):

Jeremi Ochab

Keyword(s):

English Language ◽

State Of The Art ◽

The Other ◽

Supervised Machine Learning ◽

Authorship Attribution ◽

Statistical Distributions ◽

Clustering Methods ◽

Parts Of Speech ◽

Part Of Speech ◽

Automatic Grouping

The topic of this thesis is the computational methods for measurement of authorialstyle and algorithms of authorial attribution.The first aim of the thesis was an attempt at a quantifiable separation of various layers of authorial style (in the present case the lexical and grammatical layers) in order to estimate their influence on the results of a chosen method of authorial attribution. Within the scope of these studies I compared the distance, so called Burrows's Delta, between a pair of English novels by two chosen authors and automatically generated texts, whose statistical distributions of parts of speech were borrowed from one of the authors, while the vocabulary from the other one; additionally, in the computatrificial texts I left the sets of words of the first author if they belonged to a particular part of speech. Such procedure allowed to create a hybrid text, which was attributed to the first author, even though the majority of lexical items were that of the second author.The second aim was to identify the influences of the style and language of the original on the style of the translation. This part of research involved among others adapting Polish and English part of speech tag sets to form a common translatorial tag set. Beside making a couple of simple observations concerning the distributions and coocurrences of parts of speech in the two languages, I managed to determine some features of the selected translatorial corpus, which lie on the fringes of what seems a norm for Polish.The third aim was testing the accuracy of state of the art (unsupervised) clustering methods for automatic grouping of texts according to their author. The results show that the methods recognise authorship worse than the known supervised machine learning methods.In the thesis I made use of corpora totalling around 550 digitised English language novels and 100 Polish ones, as well as a parallel corpus of 39 novels of a single English author together with their translations by a single Polish translator. The research conducted involved utilising existing part of speech taggers (both for English and Polish), authorship attribution programmes, and programmes for graph clustering.

Download Full-text

Video captioning with stacked attention and semantic hard pull

PeerJ Computer Science ◽

10.7717/peerj-cs.664 ◽

2021 ◽

Vol 7 ◽

pp. e664

Author(s):

Md. Mushfiqur Rahman ◽

Thasin Abedin ◽

Khondokar S.S. Prottoy ◽

Ayana Moshruba ◽

Fazlul Hasan Siddiqui

Keyword(s):

Qualitative Analysis ◽

Language Processing ◽

State Of The Art ◽

The Other ◽

Video Sequences ◽

Automated Scoring ◽

Video Captioning ◽

Human Evaluation ◽

Novel Approaches ◽

Evaluation Metric

Video captioning, i.e., the task of generating captions from video sequences creates a bridge between the Natural Language Processing and Computer Vision domains of computer science. The task of generating a semantically accurate description of a video is quite complex. Considering the complexity, of the problem, the results obtained in recent research works are praiseworthy. However, there is plenty of scope for further investigation. This paper addresses this scope and proposes a novel solution. Most video captioning models comprise two sequential/recurrent layers—one as a video-to-context encoder and the other as a context-to-caption decoder. This paper proposes a novel architecture, namely Semantically Sensible Video Captioning (SSVC) which modifies the context generation mechanism by using two novel approaches—“stacked attention” and “spatial hard pull”. As there are no exclusive metrics for evaluating video captioning models, we emphasize both quantitative and qualitative analysis of our model. Hence, we have used the BLEU scoring metric for quantitative analysis and have proposed a human evaluation metric for qualitative analysis, namely the Semantic Sensibility (SS) scoring metric. SS Score overcomes the shortcomings of common automated scoring metrics. This paper reports that the use of the aforementioned novelties improves the performance of state-of-the-art architectures.

Download Full-text

Computer Vision and robotics in postal automation

Human Systems Management ◽

10.3233/hsm-1999-183-411 ◽

1999 ◽

Vol 18 (3-4) ◽

pp. 265-273

Author(s):

Giovanni B. Garibotto

Keyword(s):

Image Processing ◽

Computer Vision ◽

Pattern Recognition ◽

Material Handling ◽

State Of The Art ◽

Short Description ◽

The Other ◽

Functional Requirements ◽

Postal Automation ◽

And Robotics

The paper is intended to provide an overview of advanced robotic technologies within the context of Postal Automation services. The main functional requirements of the application are briefly referred, as well as the state of the art and new emerging solutions. Image Processing and Pattern Recognition have always played a fundamental role in Address Interpretation and Mail sorting and the new challenging objective is now off-line handwritten cursive recognition, in order to be able to handle all kind of addresses in a uniform way. On the other hand, advanced electromechanical and robotic solutions are extremely important to solve the problems of mail storage, transportation and distribution, as well as for material handling and logistics. Finally a short description of new services of Postal Automation is referred, by considering new emerging services of hybrid mail and paper to electronic conversion.

Download Full-text

Occupant pre-crash kinematics in rotated seat arrangements

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/09544070211004504 ◽

2021 ◽

pp. 095440702110045

Author(s):

Alexander Diederich ◽

Christophe Bastien ◽

Karthikeyan Ekambaram ◽

Alexis Wilson

Keyword(s):

State Of The Art ◽

The Other ◽

Human Model ◽

Multi Objective ◽

Emergency Braking ◽

Seating Arrangement ◽

Current State ◽

Occupant Comfort ◽

Occupant Kinematics ◽

Belt System

The introduction of automated L5 driving technologies will revolutionise the design of vehicle interiors and seating configurations, improving occupant comfort and experience. It is foreseen that pre-crash emergency braking and swerving manoeuvres will affect occupant posture, which could lead to an interaction with a deploying airbag. This research addresses the urgent safety need of defining the occupant’s kinematics envelope during that pre-crash phase, considering rotated seat arrangements and different seatbelt configurations. The research used two different sets of volunteer tests experiencing L5 vehicle manoeuvres, based in the first instance on 22 50th percentile fit males wearing a lap-belt (OM4IS), while the other dataset is based on 87 volunteers with a BMI range of 19 to 67 kg/m2 wearing a 3-point belt (UMTRI). Unique biomechanics kinematics corridors were then defined, as a function of belt configuration and vehicle manoeuvre, to calibrate an Active Human Model (AHM) using a multi-objective optimisation coupled with a Correlation and Analysis (CORA) rating. The research improved the AHM omnidirectional kinematics response over current state of the art in a generic lap-belted environment. The AHM was then tested in a rotated seating arrangement under extreme braking, highlighting that maximum lateral and frontal motions are comparable, independent of the belt system, while the asymmetry of the 3-point belt increased the occupant’s motion towards the seatbelt buckle. It was observed that the frontal occupant kinematics decrease by 200 mm compared to a lap-belted configuration. This improved omnidirectional AHM is the first step towards designing safer future L5 vehicle interiors.

Download Full-text

Image Restoration by Learning Morphological Opening-Closing Network

Mathematical Morphology - Theory and Applications ◽

10.1515/mathm-2020-0103 ◽

2020 ◽

Vol 4 (1) ◽

pp. 87-107

Author(s):

Ranjan Mondal ◽

Moni Shankar Dey ◽

Bhabatosh Chanda

Keyword(s):

Neural Network ◽

Image Restoration ◽

State Of The Art ◽

Source Code ◽

Back Propagation ◽

Image Features ◽

Main Difficulty ◽

The Right ◽

Right Order ◽

Morphological Opening

AbstractMathematical morphology is a powerful tool for image processing tasks. The main difficulty in designing mathematical morphological algorithm is deciding the order of operators/filters and the corresponding structuring elements (SEs). In this work, we develop morphological network composed of alternate sequences of dilation and erosion layers, which depending on learned SEs, may form opening or closing layers. These layers in the right order along with linear combination (of their outputs) are useful in extracting image features and processing them. Structuring elements in the network are learned by back-propagation method guided by minimization of the loss function. Efficacy of the proposed network is established by applying it to two interesting image restoration problems, namely de-raining and de-hazing. Results are comparable to that of many state-of-the-art algorithms for most of the images. It is also worth mentioning that the number of network parameters to handle is much less than that of popular convolutional neural network for similar tasks. The source code can be found here https://github.com/ranjanZ/Mophological-Opening-Closing-Net

Download Full-text

All-gather Algorithms Resilient to Imbalanced Process Arrival Patterns

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3460122 ◽

2021 ◽

Vol 18 (4) ◽

pp. 1-22

Author(s):

Jerzy Proficz

Keyword(s):

Experimental Evaluation ◽

Data Exchange ◽

State Of The Art ◽

Monitoring And Evaluation ◽

The Other ◽

Early Data ◽

Cluster Architecture ◽

Novel Algorithms

Two novel algorithms for the all-gather operation resilient to imbalanced process arrival patterns (PATs) are presented. The first one, Background Disseminated Ring (BDR), is based on the regular parallel ring algorithm often supplied in MPI implementations and exploits an auxiliary background thread for early data exchange from faster processes to accelerate the performed all-gather operation. The other algorithm, Background Sorted Linear synchronized tree with Broadcast (BSLB), is built upon the already existing PAP-aware gather algorithm, that is, Background Sorted Linear Synchronized tree (BSLS), followed by a regular broadcast distributing gathered data to all participating processes. The background of the imbalanced PAP subject is described, along with the PAP monitoring and evaluation topics. An experimental evaluation of the algorithms based on a proposed mini-benchmark is presented. The mini-benchmark was performed over 2,000 times in a typical HPC cluster architecture with homogeneous compute nodes. The obtained results are analyzed according to different PATs, data sizes, and process numbers, showing that the proposed optimization works well for various configurations, is scalable, and can significantly reduce the all-gather elapsed times, in our case, up to factor 1.9 or 47% in comparison with the best state-of-the-art solution.

Download Full-text

Development of Multiple Behaviors in Evolving Robots

Robotics ◽

10.3390/robotics10010001 ◽

2020 ◽

Vol 10 (1) ◽

pp. 1

Author(s):

Victor Massagué Respall ◽

Stefano Nolfi

Keyword(s):

Evolutionary Algorithms ◽

State Of The Art ◽

Evolutionary Robotics ◽

The Other ◽

Evolutionary Strategies ◽

Multiple Behaviors ◽

Pareto Fronts ◽

Expected Fitness

We investigate whether standard evolutionary robotics methods can be extended to support the evolution of multiple behaviors by forcing the retention of variations that are adaptive with respect to all required behaviors. This is realized by selecting the individuals located in the first Pareto fronts of the multidimensional fitness space in the case of a standard evolutionary algorithms and by computing and using multiple gradients of the expected fitness in the case of a modern evolutionary strategies that move the population in the direction of the gradient of the fitness. The results collected on two extended versions of state-of-the-art benchmarking problems indicate that the latter method permits to evolve robots capable of producing the required multiple behaviors in the majority of the replications and produces significantly better results than all the other methods considered.

Download Full-text

Density Guarantee on Finding Multiple Subgraphs and Subtensors

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3446668 ◽

2021 ◽

Vol 15 (5) ◽

pp. 1-32

Author(s):

Quang-huy Duong ◽

Heri Ramampiaro ◽

Kjetil Nørvåg ◽

Thu-lan Dam

Keyword(s):

Lower Bound ◽

State Of The Art ◽

The State ◽

The Other ◽

Exact Methods ◽

Practical Solution ◽

Novel Approach ◽

Wide Range ◽

Real World Datasets ◽

Tensor Data

Dense subregion (subgraph & subtensor) detection is a well-studied area, with a wide range of applications, and numerous efficient approaches and algorithms have been proposed. Approximation approaches are commonly used for detecting dense subregions due to the complexity of the exact methods. Existing algorithms are generally efficient for dense subtensor and subgraph detection, and can perform well in many applications. However, most of the existing works utilize the state-or-the-art greedy 2-approximation algorithm to capably provide solutions with a loose theoretical density guarantee. The main drawback of most of these algorithms is that they can estimate only one subtensor, or subgraph, at a time, with a low guarantee on its density. While some methods can, on the other hand, estimate multiple subtensors, they can give a guarantee on the density with respect to the input tensor for the first estimated subsensor only. We address these drawbacks by providing both theoretical and practical solution for estimating multiple dense subtensors in tensor data and giving a higher lower bound of the density. In particular, we guarantee and prove a higher bound of the lower-bound density of the estimated subgraph and subtensors. We also propose a novel approach to show that there are multiple dense subtensors with a guarantee on its density that is greater than the lower bound used in the state-of-the-art algorithms. We evaluate our approach with extensive experiments on several real-world datasets, which demonstrates its efficiency and feasibility.

Download Full-text