Towards Improving Code Stylometry Analysis in Underground Forums

Abstract Code Stylometry has emerged as a powerful mechanism to identify programmers. While there have been significant advances in the field, existing mechanisms underperform in challenging domains. One such domain is studying the provenance of code shared in underground forums, where code posts tend to have small or incomplete source code fragments. This paper proposes a method designed to deal with the idiosyncrasies of code snippets shared in these forums. Our system fuses a forum-specific learning pipeline with Conformal Prediction to generate predictions with precise confidence levels as a novelty. We see that identifying unreliable code snippets is paramount to generate high-accuracy predictions, and this is a task where traditional learning settings fail. Overall, our method performs as twice as well as the state-of-the-art in a constrained setting with a large number of authors (i.e., 100). When dealing with a smaller number of authors (i.e., 20), it performs at high accuracy (89%). We also evaluate our work on an open-world assumption and see that our method is more effective at retaining samples.

Download Full-text

The Degree of Oxidation of Graphene Oxide

Nanomaterials ◽

10.3390/nano11030560 ◽

2021 ◽

Vol 11 (3) ◽

pp. 560

Author(s):

Alexandra Carvalho ◽

Mariana C. F. Costa ◽

Valeria S. Marangoni ◽

Pei Rou Ng ◽

Thi Le Hang Nguyen ◽

...

Keyword(s):

Graphene Oxide ◽

Ab Initio ◽

State Of The Art ◽

High Accuracy ◽

Precise Determination ◽

Photoemission Spectroscopy ◽

Pristine Graphene ◽

X Ray ◽

Degree Of Oxidation

We show that the degree of oxidation of graphene oxide (GO) can be obtained by using a combination of state-of-the-art ab initio computational modeling and X-ray photoemission spectroscopy (XPS). We show that the shift of the XPS C1s peak relative to pristine graphene, ΔEC1s, can be described with high accuracy by ΔEC1s=A(cO−cl)2+E0, where c0 is the oxygen concentration, A=52.3 eV, cl=0.122, and E0=1.22 eV. Our results demonstrate a precise determination of the oxygen content of GO samples.

Download Full-text

A Machine Vision Approach for Bioreactor Foam Sensing

SLAS TECHNOLOGY Translating Life Sciences Innovation ◽

10.1177/24726303211008861 ◽

2021 ◽

pp. 247263032110088

Author(s):

Jonas Austerjost ◽

Robert Söldner ◽

Christoffer Edlund ◽

Johan Trygg ◽

David Pollard ◽

...

Keyword(s):

Machine Learning ◽

Machine Vision ◽

State Of The Art ◽

Low Cost ◽

High Accuracy ◽

Consumer Electronics ◽

Learning System ◽

Automotive Applications ◽

Fine Grained

Machine vision is a powerful technology that has become increasingly popular and accurate during the last decade due to rapid advances in the field of machine learning. The majority of machine vision applications are currently found in consumer electronics, automotive applications, and quality control, yet the potential for bioprocessing applications is tremendous. For instance, detecting and controlling foam emergence is important for all upstream bioprocesses, but the lack of robust foam sensing often leads to batch failures from foam-outs or overaddition of antifoam agents. Here, we report a new low-cost, flexible, and reliable foam sensor concept for bioreactor applications. The concept applies convolutional neural networks (CNNs), a state-of-the-art machine learning system for image processing. The implemented method shows high accuracy for both binary foam detection (foam/no foam) and fine-grained classification of foam levels.

Download Full-text

Incomplete Information Management Using an Improved Belief Entropy in Dempster-Shafer Evidence Theory

Entropy ◽

10.3390/e22090993 ◽

2020 ◽

Vol 22 (9) ◽

pp. 993 ◽

Cited By ~ 1

Author(s):

Bin Yang ◽

Dingyi Gan ◽

Yongchuan Tang ◽

Yan Lei

Keyword(s):

Data Fusion ◽

Incomplete Information ◽

Information Fusion ◽

Evidence Theory ◽

Sensor Data ◽

Uncertain Information ◽

Open World ◽

Belief Entropy ◽

Open World Assumption ◽

Deng Entropy

Quantifying uncertainty is a hot topic for uncertain information processing in the framework of evidence theory, but there is limited research on belief entropy in the open world assumption. In this paper, an uncertainty measurement method that is based on Deng entropy, named Open Deng entropy (ODE), is proposed. In the open world assumption, the frame of discernment (FOD) may be incomplete, and ODE can reasonably and effectively quantify uncertain incomplete information. On the basis of Deng entropy, the ODE adopts the mass value of the empty set, the cardinality of FOD, and the natural constant e to construct a new uncertainty factor for modeling the uncertainty in the FOD. Numerical example shows that, in the closed world assumption, ODE can be degenerated to Deng entropy. An ODE-based information fusion method for sensor data fusion is proposed in uncertain environments. By applying it to the sensor data fusion experiment, the rationality and effectiveness of ODE and its application in uncertain information fusion are verified.

Download Full-text

Probabilistic Databases with an Infinite Open-World Assumption

Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems - PODS '19 ◽

10.1145/3294052.3319681 ◽

2019 ◽

Cited By ~ 5

Author(s):

Martin Grohe ◽

Peter Lindner

Keyword(s):

Probabilistic Databases ◽

Open World ◽

Open World Assumption

Download Full-text

Approaches to Semantics in Knowledge Management

Database Technologies ◽

10.4018/978-1-60566-058-5.ch019 ◽

2009 ◽

pp. 257-281

Author(s):

Cristiano Fugazza ◽

Stefano David ◽

Anna Montesanto ◽

Cesare Rocchi

Keyword(s):

Neural Networks ◽

Knowledge Management ◽

Artificial Neural Networks ◽

Database Management ◽

Description Logics ◽

Open World ◽

Open World Assumption ◽

Relational Database Management ◽

Static Nature ◽

Relational Database Management Systems

There are different approaches to modeling a computational system, each providing different semantics. We present a comparison among different approaches to semantics and we aim at identifying which peculiarities are needed to provide a system with uniquely interpretable semantics. We discuss different approaches, namely, Description Logics, Artificial Neural Networks, and relational database management systems. We identify classification (the process of building a taxonomy) as common trait. However, in this chapter we also argue that classification is not enough to provide a system with a Semantics, which emerges only when relations among classes are established and used among instances. Our contribution also analyses additional features of the formalisms that distinguish the approaches: closed versus. open world assumption, dynamic versus. static nature of knowledge, the management of knowledge, and the learning process.

Download Full-text

Deep Transfer Learning for Source Code Modeling

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194020500230 ◽

2020 ◽

Vol 30 (05) ◽

pp. 649-668

Author(s):

Yasir Hussain ◽

Zhiqiu Huang ◽

Yu Zhou ◽

Senzhang Wang

Keyword(s):

Neural Network ◽

Deep Learning ◽

Transfer Learning ◽

A k-Distribution-Based Spectral Module for Radiation Calculations in Multi-Phase Mixtures

Volume 1: Heat Transfer in Energy Systems; Thermophysical Properties; Heat Transfer Equipment; Heat Transfer in Electronic Equipment ◽

10.1115/ht2009-88245 ◽

2009 ◽

Cited By ~ 1

Author(s):

Gopalendu Pal ◽

Anquan Wang ◽

Michael F. Modest

Keyword(s):

State Of The Art ◽

Source Code ◽

Computational Cost ◽

High Accuracy ◽

Grid Structure ◽

Participating Media ◽

Flow Solver ◽

Domain Of Applicability ◽

Code Module ◽

Multi Phase

k-distribution-based approaches are promising models for radiation calculations in strongly nongray participating media. Advanced k-distribution methods were found to achieve close-to benchmark line-by-line (LBL) accuracy for strongly inhomogeneous multi-phase media accompanied by several orders of magnitude smaller computational cost. In this paper, a k-distribution-based portable spectral module is developed, incorporating several state-of-the-art k-distribution methods along with compact and high-accuracy databases of k-distributions. The module construction is flexible — the user can choose among various k-distribution methods with their relevant k-distribution databases, to carry out accurate radiation calculations. The spectral module is portable, such that it can be coupled to any flow solver code with its own grid structure, discretization scheme, and solver libraries. This open source code module is made available for free for all noncommercial purposes. This article outlines in detail the design and the use of the spectral module. The k-distribution methods included in the module are briefly described with a discussion of their advantages, disadvantages and their domain of applicability. Examples are provided for various sample radiation calculations in multi-phase mixtures using the new spectral module and the results are compared with LBL calculations.

Download Full-text

An Effective Recommender System Based on Clustering Technique for TED Talks

International Journal of Information Technology and Web Engineering ◽

10.4018/ijitwe.2020010103 ◽

2020 ◽

Vol 15 (1) ◽

pp. 35-51 ◽

Cited By ~ 2

Author(s):

Faiz Maazouzi ◽

Hafed Zarzour ◽

Yaser Jararweh

Keyword(s):

Collaborative Filtering ◽

Recommender Systems ◽

Recommender System ◽

Information Overload ◽

State Of The Art ◽

Pearson Correlation ◽

Enormous Amount ◽

Powerful Mechanism ◽

Ted Talks

With the enormous amount of information circulating on the Web, it is becoming increasingly difficult to find the necessary and useful information quickly and efficiently. However, with the emergence of recommender systems in the 1990s, reducing information overload became easy. In the last few years, many recommender systems employ the collaborative filtering technology, which has been proven to be one of the most successful techniques in recommender systems. Nowadays, the latest generation of collaborative filtering methods still requires further improvements to make the recommendations more efficient and accurate. Therefore, the objective of this article is to propose a new effective recommender system for TED talks that first groups users according to their preferences, and then provides a powerful mechanism to improve the quality of recommendations for users. In this context, the authors used the Pearson Correlation Coefficient (PCC) method and TED talks to create the TED user-user matrix. Then, they used the k-means clustering method to group the same users in clusters and create a predictive model. Finally, they used this model to make relevant recommendations to other users. The experimental results on real dataset show that their approach significantly outperforms the state-of-the-art methods in terms of RMSE, precision, recall, and F1 scores.

Download Full-text

Adapted Rules for UML Modelling of Geospatial Information for Model-Driven Implementation as OWL Ontologies

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8090365 ◽

2019 ◽

Vol 8 (9) ◽

pp. 365 ◽

Cited By ~ 2

Author(s):

Jetlund ◽

Onstein ◽

Huang

Keyword(s):

Spatial Information ◽

Unified Modeling Language ◽

Transport Systems ◽

Geospatial Information ◽

Unified Modeling ◽

Model Driven ◽

Open World ◽

Ontology Language ◽

Data Files ◽

Open World Assumption

This study aims to improve the implementation of models of geospatial information in Web Ontology Language (OWL). Large amounts of geospatial information are maintained in Geographic Information Systems (GIS) based on models according to the Unified Modeling Language (UML) and standards from ISO/TC 211 and the Open Geospatial Consortium (OGC). Sharing models and geospatial information in the Semantic Web will increase the usability and value of models and information, as well as enable linking with spatial and non-spatial information from other domains. Methods for conversion from UML to OWL for basic concepts used in models of geospatial information have been studied and evaluated. Primary conversion challenges have been identified with specific attention to whether adapted rules for UML modelling could contribute to improved conversions. Results indicated that restrictions related to abstract classes, unions, compositions and code lists in UML are challenging in the Open World Assumption (OWA) on which OWL is based. Two conversion challenges are addressed by adding more semantics to UML models: global properties and reuse of external concepts. The proposed solution is formalized in a UML profile supported by rules and recommendations and demonstrated with a UML model based on the Intelligent Transport Systems (ITS) standard ISO 14825 Geographic Data Files (GDF). The scope of the resulting ontology will determine to what degree the restrictions shall be maintained in OWL, and different conversion methods are needed for different scopes.

Download Full-text

Convolutional Neural Networks for On-Board Cloud Screening

Remote Sensing ◽

10.3390/rs11121417 ◽

2019 ◽

Vol 11 (12) ◽

pp. 1417 ◽

Cited By ~ 2

Author(s):

Sina Ghassemi ◽

Enrico Magli

Keyword(s):

Network Performance ◽

State Of The Art ◽

High Accuracy ◽

Resource Consumption ◽

Processing Unit ◽

Spectral Bands ◽

Input Size ◽

Satellite Platform ◽

Screening Unit ◽

Cloud Screening

A cloud screening unit on a satellite platform for Earth observation can play an important role in optimizing communication resources by selecting images with interesting content while skipping those that are highly contaminated by clouds. In this study, we address the cloud screening problem by investigating an encoder–decoder convolutional neural network (CNN). CNNs usually employ millions of parameters to provide high accuracy; on the other hand, the satellite platform imposes hardware constraints on the processing unit. Hence, to allow an onboard implementation, we investigate experimentally several solutions to reduce the resource consumption by CNN while preserving its classification accuracy. We experimentally explore approaches such as halving the computation precision, using fewer spectral bands, reducing the input size, decreasing the number of network filters and also making use of shallower networks, with the constraint that the resulting CNN must have sufficiently small memory footprint to fit the memory of a low-power accelerator for embedded systems. The trade-off between the network performance and resource consumption has been studied over the publicly available SPARCS dataset. Finally, we show that the proposed network can be implemented on the satellite board while performing with reasonably high accuracy compared with the state-of-the-art.

Download Full-text