scalable algorithm Latest Research Papers

The sparse portfolio selection problem is one of the most famous and frequently studied problems in the optimization and financial economics literatures. In a universe of risky assets, the goal is to construct a portfolio with maximal expected return and minimum variance, subject to an upper bound on the number of positions, linear inequalities, and minimum investment constraints. Existing certifiably optimal approaches to this problem have not been shown to converge within a practical amount of time at real-world problem sizes with more than 400 securities. In this paper, we propose a more scalable approach. By imposing a ridge regularization term, we reformulate the problem as a convex binary optimization problem, which is solvable via an efficient outer-approximation procedure. We propose various techniques for improving the performance of the procedure, including a heuristic that supplies high-quality warm-starts, and a second heuristic for generating additional cuts that strengthens the root relaxation. We also study the problem’s continuous relaxation, establish that it is second-order cone representable, and supply a sufficient condition for its tightness. In numerical experiments, we establish that a conjunction of the imposition of ridge regularization and the use of the outer-approximation procedure gives rise to dramatic speedups for sparse portfolio selection problems.

Download Full-text

A Fast and Scalable Algorithm for Prior Art Search

IEEE Access ◽

10.1109/access.2022.3141494 ◽

2022 ◽

pp. 1-1

Author(s):

Juhyun Lee ◽

Sangsung Park ◽

Junseok Lee

Keyword(s):

Scalable Algorithm ◽

Prior Art

Download Full-text

VarGenius-HZD Allows Accurate Detection of Rare Homozygous or Hemizygous Deletions in Targeted Sequencing Leveraging Breadth of Coverage

Genes ◽

10.3390/genes12121979 ◽

2021 ◽

Vol 12 (12) ◽

pp. 1979

Author(s):

Francesco Musacchia ◽

Marianthi Karali ◽

Annalaura Torella ◽

Steve Laurie ◽

Valeria Policastro ◽

...

Keyword(s):

Genetic Diagnosis ◽

Targeted Sequencing ◽

Sequencing Data ◽

Scalable Algorithm ◽

1000 Genomes ◽

Inherited Retinal Dystrophies ◽

Retinal Dystrophies ◽

Homozygous Deletions ◽

Selection Of ◽

Higher Sensitivity

Homozygous deletions (HDs) may be the cause of rare diseases and cancer, and their discovery in targeted sequencing is a challenging task. Different tools have been developed to disentangle HD discovery but a sensitive caller is still lacking. We present VarGenius-HZD, a sensitive and scalable algorithm that leverages breadth-of-coverage for the detection of rare homozygous and hemizygous single-exon deletions (HDs). To assess its effectiveness, we detected both real and synthetic rare HDs in fifty exomes from the 1000 Genomes Project obtaining higher sensitivity in comparison with state-of-the-art algorithms that each missed at least one event. We then applied our tool on targeted sequencing data from patients with Inherited Retinal Dystrophies and solved five cases that still lacked a genetic diagnosis. We provide VarGenius-HZD either stand-alone or integrated within our recently developed software, enabling the automated selection of samples using the internal database. Hence, it could be extremely useful for both diagnostic and research purposes.

Download Full-text

gcMECM: graph clustering of mutual exclusivity of cancer mutations

BMC Bioinformatics ◽

10.1186/s12859-021-04505-w ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Ying Hu ◽

Chunhua Yan ◽

Qingrong Chen ◽

Daoud Meerzaman

Keyword(s):

Statistical Power ◽

Single Gene ◽

Low Frequency ◽

Graph Clustering ◽

Clinical Analysis ◽

Canonical Pathway ◽

Driver Genes ◽

Scalable Algorithm ◽

Cancer Driver ◽

The Impact

Abstract Background Next-generation sequencing platforms allow us to sequence millions of small fragments of DNA simultaneously, revolutionizing cancer research. Sequence analysis has revealed that cancer driver genes operate across multiple intricate pathways and networks with mutations often occurring in a mutually exclusive pattern. Currently, low-frequency mutations are understudied as cancer-relevant genes, especially in the context of networks. Results Here we describe a tool, gcMECM, that enables us to visualize the functionality of mutually exclusive genes in the subnetworks derived from mutation associations, gene–gene interactions, and graph clustering. These subnetworks have revealed crucial biological components in the canonical pathway, especially those mutated at low frequency. Examining the subnetwork, and not just the impact of a single gene, significantly increases the statistical power of clinical analysis and enables us to build models to better predict how and why cancer develops. Conclusions gcMECM uses a computationally efficient and scalable algorithm to identify subnetworks in a canonical pathway with mutually exclusive mutation patterns and distinct biological functions.

Download Full-text

Computing Constrained Shortest-Paths at Scale

Operations Research ◽

10.1287/opre.2021.2166 ◽

2021 ◽

Author(s):

Alberto Vera ◽

Siddhartha Banerjee ◽

Samitha Samaranayake

Keyword(s):

Shortest Paths ◽

Data Sets ◽

Network Clustering ◽

Real World Data ◽

Scalable Algorithm ◽

Theoretical Contribution ◽

Practical Algorithms ◽

Transportation Service ◽

Constrained Shortest Paths

Motivated by the needs of modern transportation service platforms, we study the problem of computing constrained shortest paths (CSP) at scale via preprocessing techniques. Our work makes two contributions in this regard: 1) We propose a scalable algorithm for CSP queries and show how its performance can be parametrized in terms of a new network primitive, the constrained highway dimension. This development extends recent work that established the highway dimension as the appropriate primitive for characterizing the performance of unconstrained shortest-path (SP) algorithms. Our main theoretical contribution is deriving conditions relating the two notions, thereby providing a characterization of networks where CSP and SP queries are of comparable hardness. 2) We develop practical algorithms for scalable CSP computation, augmenting our theory with additional network clustering heuristics. We evaluate these algorithms on real-world data sets to validate our theoretical findings. Our techniques are orders of magnitude faster than existing approaches while requiring only limited additional storage and preprocessing.

Download Full-text

On validation of solutions to linear programming problems on cluster computing systems

Numerical Methods and Programming (Vychislitel'nye Metody i Programmirovanie) ◽

10.26089/nummet.v22r416 ◽

2021 ◽

pp. 252-261

Author(s):

Л.Б. Соколинский ◽

И.М. Соколинская

Keyword(s):

Linear Programming ◽

Objective Function ◽

Large Scale ◽

Cluster Computing ◽

Parallel Implementation ◽

Main Idea ◽

Small Radius ◽

Feasible Region ◽

Computing Systems ◽

Scalable Algorithm

В статье представлен параллельный алгоритм валидации решений задач линейного программирования. Идея метода состоит в том, чтобы генерировать регулярный набор точек на гиперсфере малого радиуса, центрированной в точке тестируемого решения. Целевая функция вычисляется для каждой точки валидационного множества, принадлежащей допустимой области. Если все полученные значения меньше или равны значению целевой функции в точке, проверяемой как решение, то эта точка считается корректным решением. Параллельная реализация алгоритма VaLiPro выполнена на языке C++ с использованием параллельного BSF-каркаса, инкапсулирующего в проблемно-независимой части своего кода все аспекты, связанные с распараллеливанием программы на базе библиотеки MPI. Приводятся результаты масштабных вычислительных экспериментов на кластерной вычислительной системе, подтверждающие эффективность предложенного подхода. The paper presents and evaluates a scalable algorithm for validating solutions to linear programming (LP) problems on cluster computing systems. The main idea of the method is to generate a regular set of points (validation set) on a small-radius hypersphere centered at the solution point submitted to validation. The objective function is computed at each point of the validation that belongs to the feasible region. If all the values are less than or equal to the value of the objective function at the point that is to be validated, then this point is the correct solution. The parallel implementation of the VaLiPro algorithm is written in C++ through the parallel BSF-skeleton, which encapsulates all aspects related to the MPI-based parallelization of the program. We provide the results of large-scale computational experiments on a cluster computing system to study the scalability of the VaLiPro algorithm.

Download Full-text

The Larger the Better: Analysis of a Scalable Spectral Clustering Algorithm with Cosine Similarity

10.3233/faia210280 ◽

2021 ◽

Author(s):

Guangliang Chen

Keyword(s):

Perturbation Analysis ◽

Spectral Clustering ◽

Clustering Algorithm ◽

Linear Complexity ◽

Large Data ◽

Cosine Similarity ◽

Large Data Sets ◽

Data Sets ◽

Scalable Algorithm ◽

Spectral Clustering Algorithm

Chen (2018) proposed a scalable spectral clustering algorithm for cosine similarity to handle the task of clustering large data sets. It runs extremely fast, with a linear complexity in the size of the data, and achieves state of the art accuracy. This paper conducts perturbation analysis of the algorithm to understand the effect of discarding a perturbation term in an eigendecomposition step. Our results show that the accuracy of the approximation by the scalable algorithm depends on the connectivity of the clusters, their separation and sizes, and is especially accurate for large data sets.

Download Full-text

A Highly Scalable Method for Extractive Text Summarization Using Convex Optimization

Symmetry ◽

10.3390/sym13101824 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1824

Author(s):

Claudiu Popescu ◽

Lacrimioara Grama ◽

Corneliu Rusu

Keyword(s):

Convex Optimization ◽

Convex Relaxation ◽

Text Summarization ◽

Integer Programming Problem ◽

Scalable Algorithm ◽

Gradient Descent Algorithm ◽

Unitary Transformations ◽

Optimization Formulation ◽

Projected Gradient Descent

The paper describes a convex optimization formulation of the extractive text summarization problem and a simple and scalable algorithm to solve it. The optimization program is constructed as a convex relaxation of an intuitive but computationally hard integer programming problem. The objective function is highly symmetric, being invariant under unitary transformations of the text representations. Another key idea is to replace the constraint on the number of sentences in the summary with a convex surrogate. For solving the program we have designed a specific projected gradient descent algorithm and analyzed its performance in terms of execution time and quality of the approximation. Using the datasets DUC 2005 and Cornell Newsroom Summarization Dataset, we have shown empirically that the algorithm can provide competitive results for single document summarization and multi-document query-based summarization. On the Cornell Newsroom Summarization Dataset, it ranked second among the unsupervised methods tested. For the more challenging task of multi-document query-based summarization, the method was tested on the DUC 2005 Dataset. Our algorithm surpassed the other reported methods with respect to the ROUGE-SU4 metric, and it was at less than 0.01 from the top performing algorithms with respect to ROUGE-1 and ROUGE-2 metrics.

Download Full-text

A Scalable, Black-box, Hybrid Genetic Algorithm for Continuous Multimodal Optimization in Moderate Dimensions

10.21203/rs.3.rs-933228/v1 ◽

2021 ◽

Author(s):

Klaus Johannsen ◽

Nadine Goris ◽

Bjørnar Jensen ◽

Jerry Tjiputra

Keyword(s):

Optimization Problems ◽

Hybrid Genetic Algorithm ◽

Black Box ◽

Sequential Optimization ◽

Multimodal Optimization ◽

Objective Functions ◽

Special Configuration ◽

Scalable Algorithm ◽

Climate Research ◽

Local Searches

Abstract Optimization problems can be found in many areas of science and technology. Often, not only the global optimum, but also a (larger) number of near-optima are of interest. This gives rise to so-called multimodal optimization problems. In most of the cases, the number and quality of the optima is unknown and assumptions on the objective functions cannot be made. In this paper, we focus on continuous, unconstrained optimization in moderately high dimensional continuous spaces (<=10). We present a scalable algorithm with virtually no parameters, which performs well for general objective functions (non-convex, discontinuous). It is based on two well-established algorithms (CMA-ES, deterministic crowding). Novel elements of the algorithm are the detection of seed points for local searches and collision avoidance, both based on nearest neighbors, and a strategy for semi-sequential optimization to realize scalability. The performance of the proposed algorithm is numerically evaluated on the CEC2013 niching benchmark suite for 1-20 dimensional functions and a 9 dimensional real-world problem from constraint optimization in climate research. The algorithm shows good performance on the CEC2013 benchmarks and falls only short on higher dimensional and strongly inisotropic problems. In case of the climate related problem, the algorithm is able to find a high number (150) of optima, which are of relevance to climate research. The proposed algorithm does not require special configuration for the optimization problems considered in this paper, i.e. it shows good black-box behavior.

Download Full-text

scalable algorithm
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

RBCeq: A robust and scalable algorithm for accurate genetic blood typing

A Scalable Algorithm for Sparse Portfolio Selection

A Fast and Scalable Algorithm for Prior Art Search

VarGenius-HZD Allows Accurate Detection of Rare Homozygous or Hemizygous Deletions in Targeted Sequencing Leveraging Breadth of Coverage

gcMECM: graph clustering of mutual exclusivity of cancer mutations

Computing Constrained Shortest-Paths at Scale

On validation of solutions to linear programming problems on cluster computing systems

The Larger the Better: Analysis of a Scalable Spectral Clustering Algorithm with Cosine Similarity

A Highly Scalable Method for Extractive Text Summarization Using Convex Optimization

A Scalable, Black-box, Hybrid Genetic Algorithm for Continuous Multimodal Optimization in Moderate Dimensions

Export Citation Format

scalable algorithmRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

RBCeq: A robust and scalable algorithm for accurate genetic blood typing

A Scalable Algorithm for Sparse Portfolio Selection

A Fast and Scalable Algorithm for Prior Art Search

VarGenius-HZD Allows Accurate Detection of Rare Homozygous or Hemizygous Deletions in Targeted Sequencing Leveraging Breadth of Coverage

gcMECM: graph clustering of mutual exclusivity of cancer mutations

Computing Constrained Shortest-Paths at Scale

On validation of solutions to linear programming problems on cluster computing systems

The Larger the Better: Analysis of a Scalable Spectral Clustering Algorithm with Cosine Similarity

A Highly Scalable Method for Extractive Text Summarization Using Convex Optimization

A Scalable, Black-box, Hybrid Genetic Algorithm for Continuous Multimodal Optimization in Moderate Dimensions

scalable algorithm
Recently Published Documents