scholarly journals Towards Improving Code Stylometry Analysis in Underground Forums

2021 ◽  
Vol 2022 (1) ◽  
pp. 126-147
Author(s):  
Michal Tereszkowski-Kaminski ◽  
Sergio Pastrana ◽  
Jorge Blasco ◽  
Guillermo Suarez-Tangil

Abstract Code Stylometry has emerged as a powerful mechanism to identify programmers. While there have been significant advances in the field, existing mechanisms underperform in challenging domains. One such domain is studying the provenance of code shared in underground forums, where code posts tend to have small or incomplete source code fragments. This paper proposes a method designed to deal with the idiosyncrasies of code snippets shared in these forums. Our system fuses a forum-specific learning pipeline with Conformal Prediction to generate predictions with precise confidence levels as a novelty. We see that identifying unreliable code snippets is paramount to generate high-accuracy predictions, and this is a task where traditional learning settings fail. Overall, our method performs as twice as well as the state-of-the-art in a constrained setting with a large number of authors (i.e., 100). When dealing with a smaller number of authors (i.e., 20), it performs at high accuracy (89%). We also evaluate our work on an open-world assumption and see that our method is more effective at retaining samples.

Nanomaterials ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 560
Author(s):  
Alexandra Carvalho ◽  
Mariana C. F. Costa ◽  
Valeria S. Marangoni ◽  
Pei Rou Ng ◽  
Thi Le Hang Nguyen ◽  
...  

We show that the degree of oxidation of graphene oxide (GO) can be obtained by using a combination of state-of-the-art ab initio computational modeling and X-ray photoemission spectroscopy (XPS). We show that the shift of the XPS C1s peak relative to pristine graphene, ΔEC1s, can be described with high accuracy by ΔEC1s=A(cO−cl)2+E0, where c0 is the oxygen concentration, A=52.3 eV, cl=0.122, and E0=1.22 eV. Our results demonstrate a precise determination of the oxygen content of GO samples.


Author(s):  
Jonas Austerjost ◽  
Robert Söldner ◽  
Christoffer Edlund ◽  
Johan Trygg ◽  
David Pollard ◽  
...  

Machine vision is a powerful technology that has become increasingly popular and accurate during the last decade due to rapid advances in the field of machine learning. The majority of machine vision applications are currently found in consumer electronics, automotive applications, and quality control, yet the potential for bioprocessing applications is tremendous. For instance, detecting and controlling foam emergence is important for all upstream bioprocesses, but the lack of robust foam sensing often leads to batch failures from foam-outs or overaddition of antifoam agents. Here, we report a new low-cost, flexible, and reliable foam sensor concept for bioreactor applications. The concept applies convolutional neural networks (CNNs), a state-of-the-art machine learning system for image processing. The implemented method shows high accuracy for both binary foam detection (foam/no foam) and fine-grained classification of foam levels.


Entropy ◽  
2020 ◽  
Vol 22 (9) ◽  
pp. 993 ◽  
Author(s):  
Bin Yang ◽  
Dingyi Gan ◽  
Yongchuan Tang ◽  
Yan Lei

Quantifying uncertainty is a hot topic for uncertain information processing in the framework of evidence theory, but there is limited research on belief entropy in the open world assumption. In this paper, an uncertainty measurement method that is based on Deng entropy, named Open Deng entropy (ODE), is proposed. In the open world assumption, the frame of discernment (FOD) may be incomplete, and ODE can reasonably and effectively quantify uncertain incomplete information. On the basis of Deng entropy, the ODE adopts the mass value of the empty set, the cardinality of FOD, and the natural constant e to construct a new uncertainty factor for modeling the uncertainty in the FOD. Numerical example shows that, in the closed world assumption, ODE can be degenerated to Deng entropy. An ODE-based information fusion method for sensor data fusion is proposed in uncertain environments. By applying it to the sensor data fusion experiment, the rationality and effectiveness of ODE and its application in uncertain information fusion are verified.


2009 ◽  
pp. 257-281
Author(s):  
Cristiano Fugazza ◽  
Stefano David ◽  
Anna Montesanto ◽  
Cesare Rocchi

There are different approaches to modeling a computational system, each providing different semantics. We present a comparison among different approaches to semantics and we aim at identifying which peculiarities are needed to provide a system with uniquely interpretable semantics. We discuss different approaches, namely, Description Logics, Artificial Neural Networks, and relational database management systems. We identify classification (the process of building a taxonomy) as common trait. However, in this chapter we also argue that classification is not enough to provide a system with a Semantics, which emerges only when relations among classes are established and used among instances. Our contribution also analyses additional features of the formalisms that distinguish the approaches: closed versus. open world assumption, dynamic versus. static nature of knowledge, the management of knowledge, and the learning process.


Author(s):  
Yasir Hussain ◽  
Zhiqiu Huang ◽  
Yu Zhou ◽  
Senzhang Wang

In recent years, deep learning models have shown great potential in source code modeling and analysis. Generally, deep learning-based approaches are problem-specific and data-hungry. A challenging issue of these approaches is that they require training from scratch for a different related problem. In this work, we propose a transfer learning-based approach that significantly improves the performance of deep learning-based source code models. In contrast to traditional learning paradigms, transfer learning can transfer the knowledge learned in solving one problem into another related problem. First, we present two recurrent neural network-based models RNN and GRU for the purpose of transfer learning in the domain of source code modeling. Next, via transfer learning, these pre-trained (RNN and GRU) models are used as feature extractors. Then, these extracted features are combined into attention learner for different downstream tasks. The attention learner leverages from the learned knowledge of pre-trained models and fine-tunes them for a specific downstream task. We evaluate the performance of the proposed approach with extensive experiments with the source code suggestion task. The results indicate that the proposed approach outperforms the state-of-the-art models in terms of accuracy, precision, recall and F-measure without training the models from scratch.


Author(s):  
Gopalendu Pal ◽  
Anquan Wang ◽  
Michael F. Modest

k-distribution-based approaches are promising models for radiation calculations in strongly nongray participating media. Advanced k-distribution methods were found to achieve close-to benchmark line-by-line (LBL) accuracy for strongly inhomogeneous multi-phase media accompanied by several orders of magnitude smaller computational cost. In this paper, a k-distribution-based portable spectral module is developed, incorporating several state-of-the-art k-distribution methods along with compact and high-accuracy databases of k-distributions. The module construction is flexible — the user can choose among various k-distribution methods with their relevant k-distribution databases, to carry out accurate radiation calculations. The spectral module is portable, such that it can be coupled to any flow solver code with its own grid structure, discretization scheme, and solver libraries. This open source code module is made available for free for all noncommercial purposes. This article outlines in detail the design and the use of the spectral module. The k-distribution methods included in the module are briefly described with a discussion of their advantages, disadvantages and their domain of applicability. Examples are provided for various sample radiation calculations in multi-phase mixtures using the new spectral module and the results are compared with LBL calculations.


Author(s):  
Faiz Maazouzi ◽  
Hafed Zarzour ◽  
Yaser Jararweh

With the enormous amount of information circulating on the Web, it is becoming increasingly difficult to find the necessary and useful information quickly and efficiently. However, with the emergence of recommender systems in the 1990s, reducing information overload became easy. In the last few years, many recommender systems employ the collaborative filtering technology, which has been proven to be one of the most successful techniques in recommender systems. Nowadays, the latest generation of collaborative filtering methods still requires further improvements to make the recommendations more efficient and accurate. Therefore, the objective of this article is to propose a new effective recommender system for TED talks that first groups users according to their preferences, and then provides a powerful mechanism to improve the quality of recommendations for users. In this context, the authors used the Pearson Correlation Coefficient (PCC) method and TED talks to create the TED user-user matrix. Then, they used the k-means clustering method to group the same users in clusters and create a predictive model. Finally, they used this model to make relevant recommendations to other users. The experimental results on real dataset show that their approach significantly outperforms the state-of-the-art methods in terms of RMSE, precision, recall, and F1 scores.


2019 ◽  
Vol 8 (9) ◽  
pp. 365 ◽  
Author(s):  
Jetlund ◽  
Onstein ◽  
Huang

This study aims to improve the implementation of models of geospatial information in Web Ontology Language (OWL). Large amounts of geospatial information are maintained in Geographic Information Systems (GIS) based on models according to the Unified Modeling Language (UML) and standards from ISO/TC 211 and the Open Geospatial Consortium (OGC). Sharing models and geospatial information in the Semantic Web will increase the usability and value of models and information, as well as enable linking with spatial and non-spatial information from other domains. Methods for conversion from UML to OWL for basic concepts used in models of geospatial information have been studied and evaluated. Primary conversion challenges have been identified with specific attention to whether adapted rules for UML modelling could contribute to improved conversions. Results indicated that restrictions related to abstract classes, unions, compositions and code lists in UML are challenging in the Open World Assumption (OWA) on which OWL is based. Two conversion challenges are addressed by adding more semantics to UML models: global properties and reuse of external concepts. The proposed solution is formalized in a UML profile supported by rules and recommendations and demonstrated with a UML model based on the Intelligent Transport Systems (ITS) standard ISO 14825 Geographic Data Files (GDF). The scope of the resulting ontology will determine to what degree the restrictions shall be maintained in OWL, and different conversion methods are needed for different scopes.


2019 ◽  
Vol 11 (12) ◽  
pp. 1417 ◽  
Author(s):  
Sina Ghassemi ◽  
Enrico Magli

A cloud screening unit on a satellite platform for Earth observation can play an important role in optimizing communication resources by selecting images with interesting content while skipping those that are highly contaminated by clouds. In this study, we address the cloud screening problem by investigating an encoder–decoder convolutional neural network (CNN). CNNs usually employ millions of parameters to provide high accuracy; on the other hand, the satellite platform imposes hardware constraints on the processing unit. Hence, to allow an onboard implementation, we investigate experimentally several solutions to reduce the resource consumption by CNN while preserving its classification accuracy. We experimentally explore approaches such as halving the computation precision, using fewer spectral bands, reducing the input size, decreasing the number of network filters and also making use of shallower networks, with the constraint that the resulting CNN must have sufficiently small memory footprint to fit the memory of a low-power accelerator for embedded systems. The trade-off between the network performance and resource consumption has been studied over the publicly available SPARCS dataset. Finally, we show that the proposed network can be implemented on the satellite board while performing with reasonably high accuracy compared with the state-of-the-art.


Sign in / Sign up

Export Citation Format

Share Document