Cross-Compiler Bipartite Vulnerability Search

Open-source libraries are widely used in software development, and the functions from these libraries may contain security vulnerabilities that can provide gateways for attackers. This paper provides a function similarity technique to identify vulnerable functions in compiled programs and proposes a new technique called Cross-Compiler Bipartite Vulnerability Search (CCBVS). CCBVS uses a novel training process, and bipartite matching to filter SVM model false positives to improve the quality of similar function identification. This research uses debug symbols in programs compiled from open-source software products to generate the ground truth. This automatic extraction of ground truth allows experimentation with a wide range of programs. The results presented in the paper show that an SVM model trained on a wide variety of programs compiled for Windows and Linux, x86 and Intel 64 architectures can be used to predict function similarity and that the use of bipartite matching substantially improves the function similarity matching performance.

Download Full-text

An open-source R-package and web application for high-quality probabilistic predictions in hydrology

10.5194/egusphere-egu21-8549 ◽

2021 ◽

Author(s):

Jason Hunter ◽

Mark Thyer ◽

Dmitri Kavetski ◽

David McInerney

Keyword(s):

Open Source ◽

Web Application ◽

R Package ◽

Error Model ◽

Objective Functions ◽

High Quality ◽

Wide Range ◽

Probabilistic Error

Probabilistic predictions provide crucial information regarding the uncertainty of hydrological predictions, which are a key input for risk-based decision-making. However, they are often excluded from hydrological modelling applications because suitable probabilistic error models can be both challenging to construct and interpret, and the quality of results are often reliant on the objective function used to calibrate the hydrological model.We present an open-source R-package and an online web application that achieves the following two aims. Firstly, these resources are easy-to-use and accessible, so that users need not have specialised knowledge in probabilistic modelling to apply them. Secondly, the probabilistic error model that we describe provides high-quality probabilistic predictions for a wide range of commonly-used hydrological objective functions, which it is only able to do by including a new innovation that resolves a long-standing issue relating to model assumptions that previously prevented this broad application. &#160;We demonstrate our methods by comparing our new probabilistic error model with an existing reference error model in an empirical case study that uses 54 perennial Australian catchments, the hydrological model GR4J, 8 common objective functions and 4 performance metrics (reliability, precision, volumetric bias and errors in the flow duration curve). The existing reference error model introduces additional flow dependencies into the residual error structure when it is used with most of the study objective functions, which in turn leads to poor-quality probabilistic predictions. In contrast, the new probabilistic error model achieves high-quality probabilistic predictions for all objective functions used in this case study.The new probabilistic error model and the open-source software and web application aims to facilitate the adoption of probabilistic predictions in the hydrological modelling community, and to improve the quality of predictions and decisions that are made using those predictions. In particular, our methods can be used to achieve high-quality probabilistic predictions from hydrological models that are calibrated with a wide range of common objective functions.

Download Full-text

DeepMAsED: evaluating the quality of metagenomic assemblies

Bioinformatics ◽

10.1093/bioinformatics/btaa124 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3011-3017 ◽

Cited By ~ 5

Author(s):

Olga Mineeva ◽

Mateo Rojas-Carulla ◽

Ruth E Ley ◽

Bernhard Schölkopf ◽

Nicholas D Youngblut

Keyword(s):

Large Scale ◽

State Of The Art ◽

Ground Truth ◽

Supplementary Information ◽

Learning Approach ◽

Wide Range ◽

Metagenome Assembly ◽

Model Training ◽

Reference Genomes

Abstract Motivation Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large-scale metagenome assemblies. Results We present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates a 1% contig misassembly rate in two recent large-scale metagenome assembly publications. Conclusions DeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modeling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects. Availability and implementation DeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

IMPROVEMENT OF DEM GENERATION FROM ASTER IMAGES USING SATELLITE JITTER ESTIMATION AND OPEN SOURCE IMPLEMENTATION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xl-1-w5-249-2015 ◽

2015 ◽

Vol XL-1-W5 ◽

pp. 249-253 ◽

Cited By ~ 2

Author(s):

L. Girod ◽

C. Nuth ◽

A. Kääb

Keyword(s):

Open Source ◽

Thermal Emission ◽

Ground Truth ◽

Three Dimensions ◽

Glacier Mass Balance ◽

Mountain Glacier ◽

Ground Control Points ◽

Rational Polynomial ◽

Cross Track

The Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) system embarked on the Terra (EOS AM-1) satellite has been a source of stereoscopic images covering the whole globe at a 15m resolution at a consistent quality for over 15 years. The potential of this data in terms of geomorphological analysis and change detection in three dimensions is unrivaled and needs to be exploited. However, the quality of the DEMs and ortho-images currently delivered by NASA (ASTER DMO products) is often of insufficient quality for a number of applications such as mountain glacier mass balance. For this study, the use of Ground Control Points (GCPs) or of other ground truth was rejected due to the global “big data” type of processing that we hope to perform on the ASTER archive. We have therefore developed a tool to compute Rational Polynomial Coefficient (RPC) models from the ASTER metadata and a method improving the quality of the matching by identifying and correcting jitter induced cross-track parallax errors. Our method outputs more accurate DEMs with less unmatched areas and reduced overall noise. The algorithms were implemented in the open source photogrammetric library and software suite MicMac.

Download Full-text

Convolutional neural networks for improving image quality with noisy PET data

EJNMMI Research ◽

10.1186/s13550-020-00695-1 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Josh Schaefferkoetter ◽

Jianhua Yan ◽

Claudia Ortega ◽

Andrew Sertic ◽

Eli Lechtman ◽

...

Keyword(s):

Neural Network ◽

Ground Truth ◽

Lesion Detection ◽

Machine Learning Techniques ◽

Noise Levels ◽

Physician Performance ◽

Learning Techniques ◽

Wide Range ◽

Image Ranking

Abstract Goal PET is a relatively noisy process compared to other imaging modalities, and sparsity of acquisition data leads to noise in the images. Recent work has focused on machine learning techniques to improve PET images, and this study investigates a deep learning approach to improve the quality of reconstructed image volumes through denoising by a 3D convolution neural network. Potential improvements were evaluated within a clinical context by physician performance in a reading task. Methods A wide range of controlled noise levels was emulated from a set of chest PET data in patients with lung cancer, and a convolutional neural network was trained to denoise the reconstructed images using the full-count reconstructions as the ground truth. The benefits, over conventional Gaussian smoothing, were quantified across all noise levels by observer performance in an image ranking and lesion detection task. Results The CNN-denoised images were generally ranked by the physicians equal to or better than the Gaussian-smoothed images for all count levels, with the largest effects observed in the lowest-count image sets. For the CNN-denoised images, overall lesion contrast recovery was 60% and 90% at the 1 and 20 million count levels, respectively. Notwithstanding the reduced lesion contrast recovery in noisy data, the CNN-denoised images also yielded better lesion detectability in low count levels. For example, at 1 million true counts, the average true positive detection rate was around 40% for the CNN-denoised images and 30% for the smoothed images. Conclusion Significant improvements were found for CNN-denoising for very noisy images, and to some degree for all noise levels. The technique presented here offered however limited benefit for detection performance for images at the count levels routinely encountered in the clinic.

Download Full-text

Function Similarity Using Family Context

Electronics ◽

10.3390/electronics9071163 ◽

2020 ◽

Vol 9 (7) ◽

pp. 1163

Author(s):

Paul Black ◽

Iqbal Gondal ◽

Peter Vamplew ◽

Arun Lakhotia

Keyword(s):

False Positive Rate ◽

Family Context ◽

Support Vector ◽

Unexpected Finding ◽

Initial Experiment ◽

Unrelated Pair ◽

Svm Model ◽

Positive Rate ◽

A New Technique ◽

Function Similarity

Finding changed and similar functions between a pair of binaries is an important problem in malware attribution and for the identification of new malware capabilities. This paper presents a new technique called Function Similarity using Family Context (FSFC) for this problem. FSFC trains a Support Vector Machine (SVM) model using pairs of similar functions from two program variants. This method improves upon previous research called Cross Version Contextual Function Similarity (CVCFS) e epresenting a function using features extracted not just from the function itself, but also, from other functions with which it has a caller and callee relationship. We present the results of an initial experiment that shows that the use of additional features from the context of a function significantly decreases the false positive rate, obviating the need for a separate pass for cleaning false positives. The more surprising and unexpected finding is that the SVM model produced by FSFC can abstract function similarity features from one pair of program variants to find similar functions in an unrelated pair of program variants. If validated by a larger study, this new property leads to the possibility of creating generic similar function classifiers that can be packaged and distributed in reverse engineering tools such as IDA Pro and Ghidra.

Download Full-text

OP2A: How to Improve the Quality of the Web Portal of Open Source Software Products

Lecture Notes in Business Information Processing - Web Information Systems and Technologies ◽

10.1007/978-3-642-28082-5_11 ◽

2012 ◽

pp. 149-162 ◽

Cited By ~ 3

Author(s):

Luigi Lavazza ◽

Sandro Morasca ◽

Davide Taibi ◽

Davide Tosi

Keyword(s):

Open Source ◽

Open Source Software ◽

Web Portal ◽

Software Products ◽

The Web

Download Full-text

DeepMAsED: Evaluating the quality of metagenomic assemblies

10.1101/763813 ◽

2019 ◽

Cited By ~ 1

Author(s):

Mateo Rojas-Carulla ◽

Ruth E. Ley ◽

Bernhard Schölkopf ◽

Nicholas D. Youngblut

Keyword(s):

Large Scale ◽

State Of The Art ◽

Ground Truth ◽

Learning Approach ◽

Wide Range ◽

Metagenome Assembly ◽

Model Training ◽

Modelling Assumptions ◽

Reference Genomes

AbstractMotivation/backgroundMethodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large scale metagenome assemblies.ResultsWe present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates close to a 5% contig misassembly rate in two recent large-scale metagenome assembly publications.ConclusionsDeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modelling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects.AvailabilityDeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED.

Download Full-text

FastTrack: An open-source software for tracking varying numbers of deformable objects

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008697 ◽

2021 ◽

Vol 17 (2) ◽

pp. e1008697

Author(s):

Benjamin Gallois ◽

Raphaël Candelier

Keyword(s):

Open Source ◽

Cell Tracking ◽

Ad Hoc ◽

Ground Truth ◽

General Purpose ◽

Two Dimensions ◽

Deformable Objects ◽

Tracking Accuracy ◽

Link Type ◽

Wide Range

Analyzing the dynamical properties of mobile objects requires to extract trajectories from recordings, which is often done by tracking movies. We compiled a database of two-dimensional movies for very different biological and physical systems spanning a wide range of length scales and developed a general-purpose, optimized, open-source, cross-platform, easy to install and use, self-updating software called FastTrack. It can handle a changing number of deformable objects in a region of interest, and is particularly suitable for animal and cell tracking in two-dimensions. Furthermore, we introduce the probability of incursions as a new measure of a movie’s trackability that doesn’t require the knowledge of ground truth trajectories, since it is resilient to small amounts of errors and can be computed on the basis of an ad hoc tracking. We also leveraged the versatility and speed of FastTrack to implement an iterative algorithm determining a set of nearly-optimized tracking parameters—yet further reducing the amount of human intervention—and demonstrate that FastTrack can be used to explore the space of tracking parameters to optimize the number of swaps for a batch of similar movies. A benchmark shows that FastTrack is orders of magnitude faster than state-of-the-art tracking algorithms, with a comparable tracking accuracy. The source code is available under the GNU GPLv3 at https://github.com/FastTrackOrg/FastTrack and pre-compiled binaries for Windows, Mac and Linux are available at http://www.fasttrack.sh.

Download Full-text

OP2A - Assessing the Quality of the Portal of Open Source Software Products

Proceedings of the 7th International Conference on Web Information Systems and Technologies ◽

10.5220/0003275201840193 ◽

2011 ◽

Keyword(s):

Open Source ◽

Open Source Software ◽

Software Products

Download Full-text

INNOVATIVE MANAGEMENT SYSTEMS ADVERTISING AND PR PROCESSES ON THE network INTERNET

Integrated communications ◽

10.28925/2524-2644.2019.7.4 ◽

2019 ◽

Vol 25242644 ◽

pp. 33-37

Author(s):

Oleksandr Kurban

Keyword(s):

Information And Communication Technologies ◽

Communication Technologies ◽

Web 3.0 ◽

Software Products ◽

Internet Users ◽

Wide Range ◽

Innovative Management ◽

Information And Communication ◽

Wide Access

The main topic of the article is the research of information and communication technologies which are used in Internet with the purpose of advertising and PR technologies. It presents a promising trend of content use in the network space, which involves the use of web 3.0 technologies (automated management of information processes). The tools defined and characterized by the author are innovative and have a wide range of using for theoretical and methodological research as well as a practice. These technologies make possible reducing of material costs and improvement of the quality of work with content and acceleration of its distribution. The tools presented are of great importance for improving of such traditional marketing communications as advertising and PR. The systems characterized by the author are online services with wide access by Internet users for free and for a fee. Among the most popular services, today there are: SMM Box, RePublic, NovaPress Publisher, SMM Aero, Kuku. io, Publbox, Postmypost. These software products have a convenient interface and a simple navigation system, which makes them open even for inexperienced users.

Download Full-text