Towards Computing a Near-Maximum Weighted Independent Set on Massive Graphs

Author(s):  
Jiewei Gu ◽  
Weiguo Zheng ◽  
Yuzheng Cai ◽  
Peng Peng
Author(s):  
Shaowei Cai ◽  
Wenying Hou ◽  
Jinkun Lin ◽  
Yuanjie Li

The minimum weight vertex cover (MWVC) problem is an important combinatorial optimization problem with various real-world applications. Due to its NP hardness, most works on solving MWVC focus on heuristic algorithms that can return a good quality solution in reasonable time. In this work, we propose two dynamic strategies that adjust the behavior of the algorithm during search, which are used to improve a state of the art local search for MWVC named FastWVC, resulting in two local search algorithms called DynWVC1 and DynWVC2. Previous MWVC algorithms are evaluated on graphs with random or hand crafted weights. In this work, we evaluate the algorithms on the vertex weighted graphs that obtained from an important real world problem, the map labeling problem. Experiments show that our algorithm obtains better results than previous algorithms for MWVC and maximum weight independent set (MWIS) on these real world instances. We also test our algorithms on massive graphs studied in previous works, and show significant improvements there.


2020 ◽  
Vol 25 (40) ◽  
pp. 4296-4302 ◽  
Author(s):  
Yuan Zhang ◽  
Zhenyan Han ◽  
Qian Gao ◽  
Xiaoyi Bai ◽  
Chi Zhang ◽  
...  

Background: β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. Methods: In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. Results: The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. Conclusion: This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.


2020 ◽  
Vol 15 ◽  
Author(s):  
Chun Qiu ◽  
Sai Li ◽  
Shenghui Yang ◽  
Lin Wang ◽  
Aihui Zeng ◽  
...  

Aim: To search the genes related to the mechanisms of the occurrence of glioma and to try to build a prediction model for glioblastomas. Background: The morbidity and mortality of glioblastomas are very high, which seriously endangers human health. At present, the goals of many investigations on gliomas are mainly to understand the cause and mechanism of these tumors at the molecular level and to explore clinical diagnosis and treatment methods. However, there is no effective early diagnosis method for this disease, and there are no effective prevention, diagnosis or treatment measures. Methods: First, the gene expression profiles derived from GEO were downloaded. Then, differentially expressed genes (DEGs) in the disease samples and the control samples were identified. After that, GO and KEGG enrichment analyses of DEGs were performed by DAVID. Furthermore, the correlation-based feature subset (CFS) method was applied to the selection of key DEGs. In addition, the classification model between the glioblastoma samples and the controls was built by an Support Vector Machine (SVM) based on selected key genes. Results and Discussion: Thirty-six DEGs, including 17 upregulated and 19 downregulated genes, were selected as the feature genes to build the classification model between the glioma samples and the control samples by the CFS method. The accuracy of the classification model by using a 10-fold cross-validation test and independent set test was 76.25% and 70.3%, respectively. In addition, PPP2R2B and CYBB can also be found in the top 5 hub genes screened by the protein– protein interaction (PPI) network. Conclusions: This study indicated that the CFS method is a useful tool to identify key genes in glioblastomas. In addition, we also predicted that genes such as PPP2R2B and CYBB might be potential biomarkers for the diagnosis of glioblastomas.


Author(s):  
Kyuhan Lee ◽  
Hyeonsoo Jo ◽  
Jihoon Ko ◽  
Sungsu Lim ◽  
Kijung Shin
Keyword(s):  

2021 ◽  
pp. 1-40
Author(s):  
NICK GILL ◽  
BIANCA LODÀ ◽  
PABLO SPIGA

Abstract Let G be a permutation group on a set $\Omega $ of size t. We say that $\Lambda \subseteq \Omega $ is an independent set if its pointwise stabilizer is not equal to the pointwise stabilizer of any proper subset of $\Lambda $ . We define the height of G to be the maximum size of an independent set, and we denote this quantity $\textrm{H}(G)$ . In this paper, we study $\textrm{H}(G)$ for the case when G is primitive. Our main result asserts that either $\textrm{H}(G)< 9\log t$ or else G is in a particular well-studied family (the primitive large–base groups). An immediate corollary of this result is a characterization of primitive permutation groups with large relational complexity, the latter quantity being a statistic introduced by Cherlin in his study of the model theory of permutation groups. We also study $\textrm{I}(G)$ , the maximum length of an irredundant base of G, in which case we prove that if G is primitive, then either $\textrm{I}(G)<7\log t$ or else, again, G is in a particular family (which includes the primitive large–base groups as well as some others).


2021 ◽  
Vol 15 (5) ◽  
pp. 1-52
Author(s):  
Lorenzo De Stefani ◽  
Erisa Terolli ◽  
Eli Upfal

We introduce Tiered Sampling , a novel technique for estimating the count of sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size M , which can be magnitudes smaller than the number of edges. Our methods address the challenging task of counting sparse motifs—sub-graph patterns—that have a low probability of appearing in a sample of M edges in the graph, which is the maximum amount of data available to the algorithms in each step. To obtain an unbiased and low variance estimate of the count, we partition the available memory into tiers (layers) of reservoir samples. While the base layer is a standard reservoir sample of edges, other layers are reservoir samples of sub-structures of the desired motif. By storing more frequent sub-structures of the motif, we increase the probability of detecting an occurrence of the sparse motif we are counting, thus decreasing the variance and error of the estimate. While we focus on the designing and analysis of algorithms for counting 4-cliques, we present a method which allows generalizing Tiered Sampling to obtain high-quality estimates for the number of occurrence of any sub-graph of interest, while reducing the analysis effort due to specific properties of the pattern of interest. We present a complete analytical analysis and extensive experimental evaluation of our proposed method using both synthetic and real-world data. Our results demonstrate the advantage of our method in obtaining high-quality approximations for the number of 4 and 5-cliques for large graphs using a very limited amount of memory, significantly outperforming the single edge sample approach for counting sparse motifs in large scale graphs.


Author(s):  
Can Lu ◽  
Jeffrey Xu Yu ◽  
Hao Wei ◽  
Yikai Zhang

Author(s):  
Yanfang Liu ◽  
Hong Zhao ◽  
William Zhu

Rough set is mainly concerned with the approximations of objects through an equivalence relation on a universe. Matroid is a generalization of linear algebra and graph theory. Recently, a matroidal structure of rough sets is established and applied to the problem of attribute reduction which is an important application of rough set theory. In this paper, we propose a new matroidal structure of rough sets and call it a parametric matroid. On the one hand, for an equivalence relation on a universe, a parametric set family, with any subset of the universe as its parameter, is defined through the lower approximation operator. This parametric set family is proved to satisfy the independent set axiom of matroids, therefore a matroid is generated, and we call it a parametric matroid of the rough set. Through the lower approximation operator, three equivalent representations of the parametric set family are obtained. Moreover, the parametric matroid of the rough set is proved to be the direct sum of a partition-circuit matroid and a free matroid. On the other hand, partition-circuit matroids are well studied through the lower approximation number, and then we use it to investigate the parametric matroid of the rough set. Several characteristics of the parametric matroid of the rough set, such as independent sets, bases, circuits, the rank function and the closure operator, are expressed by the lower approximation number.


Sign in / Sign up

Export Citation Format

Share Document