Generating realistic data sets for combinatorial auctions

Abstract Metagenome studies are becoming increasingly widespread, yielding important insights into microbial communities covering diverse environments from terrestrial and aquatic ecosystems to human skin and gut. With the advent of high-throughput sequencing platforms, the use of large scale shotgun sequencing approaches is now commonplace. However, a thorough independent benchmark comparing state-of-the-art metagenome analysis tools is lacking. Here, we present a benchmark where the most widely used tools are tested on complex, realistic data sets. Our results clearly show that the most widely used tools are not necessarily the most accurate, that the most accurate tool is not necessarily the most time consuming and that there is a high degree of variability between available tools. These findings are important as the conclusions of any metagenomics study are affected by errors in the predicted community composition and functional capacity. Data sets and results are freely available from http://www.ucbioinformatics.org/metabenchmark.html

Download Full-text

An evaluation of the accuracy and speed of metagenome analysis tools

10.1101/017830 ◽

2015 ◽

Cited By ~ 10

Author(s):

Stinus Lindgreen ◽

Karen L Adair ◽

Paul Gardner

Keyword(s):

Aquatic Ecosystems ◽

Large Scale ◽

High Throughput Sequencing ◽

State Of The Art ◽

Data Sets ◽

Metagenome Analysis ◽

Analysis Tools ◽

Sequencing Platforms ◽

High Degree ◽

Realistic Data

Metagenome studies are becoming increasingly widespread, yielding important insights into microbial communities covering diverse environments from terrestrial and aquatic ecosystems to human skin and gut. With the advent of high-throughput sequencing platforms, the use of large scale shotgun sequencing approaches is now commonplace. However, a thorough independent benchmark comparing state-of-the-art metagenome analysis tools is lacking. Here, we present a benchmark where the most widely used tools are tested on complex, realistic data sets. Our results clearly show that the most widely used tools are not necessarily the most accurate, that the most accurate tool is not necessarily the most time consuming, and that there is a high degree of variability between available tools. These findings are important as the conclusions of any metagenomics study are affected by errors in the predicted community composition. Data sets and results are freely available from http://www.ucbioinformatics.org/metabenchmark.html

Download Full-text

Node and Edge Eigenvector Centrality for Hypergraphs

10.21203/rs.3.rs-148524/v1 ◽

2021 ◽

Author(s):

Francesco Tudisco ◽

Desmond Higham

Keyword(s):

Eigenvalue Problems ◽

Point Of View ◽

Centrality Measures ◽

Data Sets ◽

Power Method ◽

Eigenvector Centrality ◽

Nonlinear Eigenvalue Problems ◽

Pairwise Interactions ◽

Important Nodes ◽

Realistic Data

Abstract Network scientists have shown that there is great value in studying pairwise interactions between components in a system. From a linear algebra point of view, this involves defining and evaluating functions of the associated adjacency matrix. Recent work indicates that there are further benefits from accounting directly for higher order interactions, notably through a hypergraph representation where an edge may involve multiple nodes. Building on these ideas, we motivate, define and analyze a class of spectral centrality measures for identifying important nodes and hyperedges in hypergraphs, generalizing existing network science concepts. By exploiting the latest developments in nonlinear Perron-Frobenius theory, we show how the resulting constrained nonlinear eigenvalue problems have unique solutions that can be computed efficiently via a nonlinear power method iteration. We illustrate the measures on realistic data sets.

Download Full-text

Creating Realistic Data Sets with Specified Properties via Simulation

Teaching Statistics ◽

10.1111/j.1467-9639.2009.00350.x ◽

2009 ◽

Vol 31 (1) ◽

pp. 7-11 ◽

Cited By ~ 3

Author(s):

Robert N. Goldman ◽

John D. McKenzie Jr.

Keyword(s):

Data Sets ◽

Realistic Data

Download Full-text

Concurrent disjoint set union

Distributed Computing ◽

10.1007/s00446-020-00388-x ◽

2021 ◽

Author(s):

Siddhartha V. Jayanti ◽

Robert E. Tarjan

Keyword(s):

Randomized Algorithms ◽

Design Space ◽

Randomized Algorithm ◽

Graph Connectivity ◽

Data Sets ◽

Worst Case ◽

Concurrent Algorithms ◽

Disjoint Set Union ◽

Parallel Graph ◽

Realistic Data

AbstractWe develop and analyze concurrent algorithms for the disjoint set union (“union-find” ) problem in the shared memory, asynchronous multiprocessor model of computation, with CAS (compare and swap) or DCAS (double compare and swap) as the synchronization primitive. We give a deterministic bounded wait-free algorithm that uses DCAS and has a total work bound of $$O\biggl ( m \cdot \left( \log {\left( \frac{np}{m} + 1 \right) } + \alpha {\left( n, \frac{m}{np} \right) } \right) \biggr )$$ O ( m · log np m + 1 + α n , m np ) for a problem with n elements and m operations solved by p processes, where $$\alpha $$ α is a functional inverse of Ackermann’s function. We give two randomized algorithms that use only CAS and have the same work bound in expectation. The analysis of the second randomized algorithm is valid even if the scheduler is adversarial. Our DCAS and randomized algorithms take $$O(\log n)$$ O ( log n ) steps per operation, worst-case for the DCAS algorithm, high-probability for the randomized algorithms. Our work and step bounds grow only logarithmically with p, making our algorithms truly scalable. We prove that for a class of symmetric algorithms that includes ours, no better step or work bound is possible. Our work is theoretical, but Alistarh et al (In search of the fastest concurrent union-find algorithm, 2019), Dhulipala et al (A framework for static and incremental parallel graph connectivity algorithms, 2020) and Hong et al (Exploring the design space of static and incremental graph connectivity algorithms on gpus, 2020) have implemented some of our algorithms on CPUs and GPUs and experimented with them. On many realistic data sets, our algorithms run as fast or faster than all others.

Download Full-text

Benchmarking Data Sets from PubChem BioAssay Data: Current Scenario and Room for Improvement

International Journal of Molecular Sciences ◽

10.3390/ijms21124380 ◽

2020 ◽

Vol 21 (12) ◽

pp. 4380

Author(s):

Viet-Khoa Tran-Nguyen ◽

Didier Rognan

Keyword(s):

Screening Methods ◽

Data Sets ◽

False Negatives ◽

In Silico Screening ◽

Data Set ◽

Biological Target ◽

Current Scenario ◽

Bioassay Data ◽

Data Collections ◽

Realistic Data

Developing realistic data sets for evaluating virtual screening methods is a task that has been tackled by the cheminformatics community for many years. Numerous artificially constructed data collections were developed, such as DUD, DUD-E, or DEKOIS. However, they all suffer from multiple drawbacks, one of which is the absence of experimental results confirming the impotence of presumably inactive molecules, leading to possible false negatives in the ligand sets. In light of this problem, the PubChem BioAssay database, an open-access repository providing the bioactivity information of compounds that were already tested on a biological target, is now a recommended source for data set construction. Nevertheless, there exist several issues with the use of such data that need to be properly addressed. In this article, an overview of benchmarking data collections built upon experimental PubChem BioAssay input is provided, along with a thorough discussion of noteworthy issues that one must consider during the design of new ligand sets from this database. The points raised in this review are expected to guide future developments in this regard, in hopes of offering better evaluation tools for novel in silico screening procedures.

Download Full-text

A MILP for multi-machine injection moulding sequencing in the scope of C2NET Project

International Journal of Production Management and Engineering ◽

10.4995/ijpme.2018.8913 ◽

2018 ◽

Vol 6 (1) ◽

pp. 29 ◽

Cited By ~ 2

Author(s):

Beatriz Andrés ◽

Raquel Sanchis ◽

Raúl Poler ◽

Manuel Díaz-Madroñero ◽

Josefa Mula

Keyword(s):

Decision Making ◽

Network Optimization ◽

Mixed Integer ◽

Decision Making Process ◽

Data Sets ◽

Computation Efficiency ◽

Funded Project ◽

Industry Planning ◽

Planning Problems ◽

Realistic Data

<p>The goal of C2NET European H2020 Funded Project is the creation of cloud-enabled tools for supporting the SMEs supply network optimization of manufacturing and logistic assets based on collaborative demand, production and delivery plans. In the scope of C2NET Project, and particularly in the Optimisation module (C2NET OPT), this paper proposes a novel holistic mixed integer linear programing (MILP) model to optimise the injection sequencing in a multi-machine case. The results of the MILP will support the production planner decision-making process in the calculation of (i) moulds setup in certain machines, and (ii) the amount of products to produce in order to minimise the setup, inventory, and backorders costs. The designed MILP takes part of the algorithms repository created in C2NET European Funded Project to solve realistic industry planning problems. The MILP is verified in realistic data considering three data sets with different sizes, in order to test it’s the computation efficiency.</p>

Download Full-text

Node and edge nonlinear eigenvector centrality for hypergraphs

Communications Physics ◽

10.1038/s42005-021-00704-2 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Francesco Tudisco ◽

Desmond J. Higham

Keyword(s):

Eigenvalue Problems ◽

Point Of View ◽

Centrality Measures ◽

Data Sets ◽

Power Method ◽

Eigenvector Centrality ◽

Nonlinear Eigenvalue Problems ◽

Pairwise Interactions ◽

Important Nodes ◽

Realistic Data

AbstractNetwork scientists have shown that there is great value in studying pairwise interactions between components in a system. From a linear algebra point of view, this involves defining and evaluating functions of the associated adjacency matrix. Recent work indicates that there are further benefits from accounting directly for higher order interactions, notably through a hypergraph representation where an edge may involve multiple nodes. Building on these ideas, we motivate, define and analyze a class of spectral centrality measures for identifying important nodes and hyperedges in hypergraphs, generalizing existing network science concepts. By exploiting the latest developments in nonlinear Perron−Frobenius theory, we show how the resulting constrained nonlinear eigenvalue problems have unique solutions that can be computed efficiently via a nonlinear power method iteration. We illustrate the measures on realistic data sets.

Download Full-text

Virtual-SMLM, a virtual environment for real-time interactive SMLM acquisition

10.1101/2020.03.05.967893 ◽

2020 ◽

Author(s):

J. Griffié ◽

T.A. Pham ◽

C. Sieben ◽

R. Lang ◽

V. Cevher ◽

...

Keyword(s):

Real Time ◽

Virtual Environment ◽

Single Molecule ◽

Data Sets ◽

Learning Approaches ◽

Environment Design ◽

Reconstruction Algorithms ◽

Localisation Microscopy ◽

Image Reconstruction Algorithms ◽

Realistic Data

AbstractAlthough single molecule localisation microscopy enables for the visualisation of cells nanoscale organisation, its dissemination remains limited mainly due to the complexity of the associated imaging acquisition, impacting on outputs’ reliability and reproducibility. We propose here the first all-in-one fully virtual environment for SMLM acquisition: Virtual-SMLM, including on-the-fly interactivity and real time display. It relies on a novel realistic approach to simulate fluorophores photo-physics based on independent pseudo-continuous emission traces. It also facilitates for user-specific experimental and optical environment design. As such, it constitutes a unique tool for the training of both users and machine learning approaches to automated SMLM, as well as for experimental validation, whilst providing realistic data sets for the development of image reconstruction algorithms and data analysis software.

Download Full-text

Mapping realistic data sets on parallel computers

[1993] Proceedings Seventh International Parallel Processing Symposium ◽

10.1109/ipps.1993.262867 ◽

2002 ◽

Cited By ~ 1

Author(s):

R. Ponnusamy ◽

N. Mansour ◽

A. Choudhary ◽

G.C. Fox

Keyword(s):

Parallel Computers ◽

Data Sets ◽

Realistic Data

Download Full-text