The Power of Linear-Time Data Reduction for Maximum Matching

George B. Mertzios; André Nichterlein; Rolf Niedermeier

doi:10.1007/s00453-020-00736-0

The Power of Linear-Time Data Reduction for Maximum Matching

Algorithmica ◽

10.1007/s00453-020-00736-0 ◽

2020 ◽

Vol 82 (12) ◽

pp. 3521-3565

Author(s):

George B. Mertzios ◽

André Nichterlein ◽

Rolf Niedermeier

Keyword(s):

Systematic Study ◽

Data Reduction ◽

Linear Time ◽

Maximum Matching ◽

Maximum Cardinality ◽

Solution Strategy ◽

Time Data ◽

Undirected Graphs ◽

Running Time ◽

General Graphs

Abstract Finding maximum-cardinality matchings in undirected graphs is arguably one of the most central graph primitives. For m-edge and n-vertex graphs, it is well-known to be solvable in $$O(m\sqrt{n})$$ O ( m n ) time; however, for several applications this running time is still too slow. We investigate how linear-time (and almost linear-time) data reduction (used as preprocessing) can alleviate the situation. More specifically, we focus on linear-time kernelization. We start a deeper and systematic study both for general graphs and for bipartite graphs. Our data reduction algorithms easily comply (in form of preprocessing) with every solution strategy (exact, approximate, heuristic), thus making them attractive in various settings.

Download Full-text

Data Reduction for Maximum Matching on Real-World Graphs

Journal of Experimental Algorithmics ◽

10.1145/3439801 ◽

2021 ◽

Vol 26 ◽

pp. 1-30

Author(s):

Tomohiro Koana ◽

Viatcheslav Korenwein ◽

André Nichterlein ◽

Rolf Niedermeier ◽

Philipp Zschoche

Keyword(s):

Real World ◽

Data Reduction ◽

Complexity Analysis ◽

Linear Time ◽

Theoretical Work ◽

Maximum Matching ◽

Maximum Cardinality ◽

Time Data ◽

Maximum Weight Matching ◽

Weighted Case

Finding a maximum-cardinality or maximum-weight matching in (edge-weighted) undirected graphs is among the most prominent problems of algorithmic graph theory. For n -vertex and m -edge graphs, the best-known algorithms run in Õ( m √ n ) time. We build on recent theoretical work focusing on linear-time data reduction rules for finding maximum-cardinality matchings and complement the theoretical results by presenting and analyzing (thereby employing the kernelization methodology of parameterized complexity analysis) new (near-)linear-time data reduction rules for both the unweighted and the positive-integer-weighted case. Moreover, we experimentally demonstrate that these data reduction rules provide significant speedups of the state-of-the art implementations for computing matchings in real-world graphs: the average speedup factor is 4.7 in the unweighted case and 12.72 in the weighted case.

Download Full-text

Linear-Time Algorithms for Maximum-Weight Induced Matchings and Minimum Chain Covers in Convex Bipartite Graphs

Algorithmica ◽

10.1007/s00453-021-00904-w ◽

2022 ◽

Author(s):

Boris Klemz ◽

Günter Rote

Keyword(s):

Bipartite Graph ◽

Linear Time ◽

Bipartite Graphs ◽

Compact Representation ◽

Maximum Cardinality ◽

Induced Matching ◽

Running Time ◽

Induced Matchings ◽

A Chain ◽

Convex Bipartite Graphs

AbstractA bipartite graph $$G=(U,V,E)$$ G = ( U , V , E ) is convex if the vertices in V can be linearly ordered such that for each vertex $$u\in U$$ u ∈ U , the neighbors of u are consecutive in the ordering of V. An induced matchingH of G is a matching for which no edge of E connects endpoints of two different edges of H. We show that in a convex bipartite graph with n vertices and mweighted edges, an induced matching of maximum total weight can be computed in $$O(n+m)$$ O ( n + m ) time. An unweighted convex bipartite graph has a representation of size O(n) that records for each vertex $$u\in U$$ u ∈ U the first and last neighbor in the ordering of V. Given such a compact representation, we compute an induced matching of maximum cardinality in O(n) time. In convex bipartite graphs, maximum-cardinality induced matchings are dual to minimum chain covers. A chain cover is a covering of the edge set by chain subgraphs, that is, subgraphs that do not contain induced matchings of more than one edge. Given a compact representation, we compute a representation of a minimum chain cover in O(n) time. If no compact representation is given, the cover can be computed in $$O(n+m)$$ O ( n + m ) time. All of our algorithms achieve optimal linear running time for the respective problem and model, and they improve and generalize the previous results in several ways: The best algorithms for the unweighted problem versions had a running time of $$O(n^2)$$ O ( n 2 ) (Brandstädt et al. in Theor. Comput. Sci. 381(1–3):260–265, 2007. 10.1016/j.tcs.2007.04.006). The weighted case has not been considered before.

Download Full-text

Machine Learning for Real-time Data Reduction in Cloud of Things

2020 2nd International Conference on Computer and Information Sciences (ICCIS) ◽

10.1109/iccis49240.2020.9257645 ◽

2020 ◽

Author(s):

Atheer Alahmed ◽

Amal Alrasheedi ◽

Maha Alharbi ◽

Norah Alrebdi ◽

Marwan Aleasa ◽

...

Keyword(s):

Machine Learning ◽

Real Time ◽

Data Reduction ◽

Time Data ◽

Real Time Data

Download Full-text

Finding Points in General Position

International Journal of Computational Geometry & Applications ◽

10.1142/s021819591750008x ◽

2017 ◽

Vol 27 (04) ◽

pp. 277-296 ◽

Cited By ~ 6

Author(s):

Vincent Froese ◽

Iyad Kanj ◽

André Nichterlein ◽

Rolf Niedermeier

Keyword(s):

Lower Bound ◽

General Position ◽

Subset Selection ◽

Selection Problem ◽

Fixed Parameter Tractability ◽

Maximum Cardinality ◽

Exponential Time ◽

Running Time ◽

Fixed Parameter ◽

Set Of Points

We study the General Position Subset Selection problem: Given a set of points in the plane, find a maximum-cardinality subset of points in general position. We prove that General Position Subset Selection is NP-hard, APX-hard, and present several fixed-parameter tractability results for the problem as well as a subexponential running time lower bound based on the Exponential Time Hypothesis.

Download Full-text

Advancing Clinical Cohort Selection with Genomics Analysis on a Distributed Platform

10.21203/rs.2.9249/v1 ◽

2019 ◽

Author(s):

Jaclyn Marjorie Smith ◽

Melvin Lathara ◽

Hollis Wright ◽

Brian Hill ◽

Nalini Ganapati ◽

...

Keyword(s):

Precision Medicine ◽

Large Scale ◽

Linear Time ◽

Distributed Storage ◽

Treatment Options ◽

Ease Of Use ◽

Time Data ◽

Worst Case ◽

Research Perspective ◽

Medicine Analysis

Abstract Background The affordability of next-generation genomic sequencing and the improvement of medical data management have contributed largely to the evolution of biological analysis from both a clinical and research perspective. Precision medicine is a response to these advancements that places individuals into better-defined subsets based on shared clinical and genetic features. The identification of personalized diagnosis and treatment options is dependent on the ability to draw insights from large-scale, multi-modal analysis of biomedical datasets. Driven by a real use case, we premise that platforms that support precision medicine analysis should maintain data in their optimal data stores, should support distributed storage and query mechanisms, and should scale as more samples are added to the system. Results We extended a genomics-based columnar data store, GenomicsDB, for ease of use within a distributed analytics platform for clinical and genomic data integration, known as the ODA framework. The framework supports interaction from an i2b2 plugin as well as a notebook environment. We show that the ODA framework exhibits worst-case linear scaling for array size (storage), import time (data construction), and query time for an increasing number of samples. We go on to show worst-case linear time for both import of clinical data and aggregate query execution time within a distributed environment. Conclusions This work highlights the integration of a distributed genomic database with a distributed compute environment to support scalable and efficient precision medicine queries from a HIPAA-compliant, cohort system in a real-world setting. The ODA framework is currently deployed in production to support precision medicine exploration and analysis from clinicians and researchers at UCLA David Geffen School of Medicine.

Download Full-text

A new approach to maximum matching in general graphs

Automata, Languages and Programming - Lecture Notes in Computer Science ◽

10.1007/bfb0032060 ◽

2005 ◽

pp. 586-597 ◽

Cited By ~ 22

Author(s):

Norbert Blum

Keyword(s):

Maximum Matching ◽

New Approach ◽

General Graphs

Download Full-text

An adaptive framework for real-time data reduction in AMI

Journal of King Saud University - Computer and Information Sciences ◽

10.1016/j.jksuci.2018.02.012 ◽

2019 ◽

Vol 31 (3) ◽

pp. 392-402 ◽

Cited By ~ 3

Author(s):

Marwa F. Mohamed ◽

Abd El-Rahman Shabayek ◽

Mahmoud El-Gayyar ◽

Hamed Nassar

Keyword(s):

Real Time ◽

Data Reduction ◽

Time Data ◽

Real Time Data ◽

Adaptive Framework

Download Full-text

Meaningful Modeling of Section Bus Running Times by Time Varying Mixture Distributions of Fixed Components

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198120918576 ◽

2020 ◽

Vol 2674 (8) ◽

pp. 626-637

Author(s):

Beda Büchel ◽

Francesco Corman

Keyword(s):

Travel Time ◽

Explanatory Power ◽

Time Varying ◽

Time Data ◽

Running Time ◽

Novel Approach ◽

Travel Time Data ◽

Conventional Models ◽

Bus Travel Time ◽

Transit Reliability

Understanding the variability of bus travel time is a key issue in the optimization of schedules, transit reliability, route choice analysis, and transit simulation. The statistical modeling of bus travel time data is of increasing importance given the increasing availability of data. In this paper, we introduce a novel approach to modeling the day-to-day variability of urban bus running times on a section level. First, the explanatory power of conventionally used distributions is examined, based on likelihood and effect size. We show that a mixture model is a powerful tool to increase fitting performance, but the applied components need to be justified. To overcome this issue, we propose a novel model consisting of two individual characteristic distributions representing either off-peak or peak hour dynamics. The observed running time distribution at every hour of the day can be described as a combination (mixture) of the two dynamics. The proposed time varying model uses a small set of parameters, which are physically interpretable and capable of accurately describing running time distributions. With our modeling approach, we reduce the complexity of mixture models and increase the explanatory power and fit compared with conventional models.

Download Full-text