A Fast and Complete Enumeration of Pseudo-Cliques for Large Graphs

Author(s):  
Hongjie Zhai ◽  
Makoto Haraguchi ◽  
Yoshiaki Okubo ◽  
Etsuji Tomita
2021 ◽  
Vol 15 (4) ◽  
Author(s):  
Yun Peng ◽  
Xin Lin ◽  
Byron Choi ◽  
Bingsheng He

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yuta Yoshino ◽  
Hiroshi Kumon ◽  
Takaaki Mori ◽  
Taku Yoshida ◽  
Ayumi Tachibana ◽  
...  

Abstract Background Alanine:glyoxylate aminotransferase 2 (AGXT2; EC 2.6.1.44) is the only enzyme that degrades the R-form of 3-aminoisobutyrate, an intermediate metabolite of thymine. AGXT2, as well as diaminoarginine dimethylaminohydrolase 1 (DDAH1; EC 3.5.3.18), works as an enzyme that degrades asymmetric dimethylarginine (ADMA), which competitively inhibits the nitric oxide synthase family. Thus, these two enzyme activities may change vascular vulnerability for a lifetime via the nitric oxide (NO) system. We investigated the association between vascular conditions and diseases such as hypertension and diabetes mellitus and polymorphisms of these two genes in 750 older Japanese subjects (mean age ± standard deviation, 77.0 ± 7.6 years) recruited using the complete enumeration survey method in the Nakayama study. Demographic and biochemical data, such as blood pressure (BP) and casual blood sugar (CBS), were obtained. Four functional single nucleotide polymorphisms (SNPs; rs37370, rs37369, rs180749, and rs16899974) of AGXT2 and one functional insertion/deletion polymorphism in the promotor region with four SNPs (rs307894, rs669173, rs997251, and rs13373844) of DDAH1 were investigated. Plasma ADMA was also analyzed in 163 subjects. Results The results of multiple regression analysis showed that a loss of the functional haplotype of AGXT2, CAAA, was significantly positively correlated with BP (systolic BP, p = 0.034; diastolic BP, p = 0.025) and CBS (p = 0.021). No correlation was observed between DDAH1 and either BP or CBS. ADMA concentrations were significantly elevated in subjects with two CAAA haplotypes compared with subjects without the CAAA haplotype (p = 0.033). Conclusions Missense variants of AGXT2, but not DDAH1, may be related to vulnerability to vascular diseases such as hypertension and DM via the NO system.


2021 ◽  
Vol 15 (6) ◽  
pp. 1-27
Author(s):  
Marco Bressan ◽  
Stefano Leucci ◽  
Alessandro Panconesi

We address the problem of computing the distribution of induced connected subgraphs, aka graphlets or motifs , in large graphs. The current state-of-the-art algorithms estimate the motif counts via uniform sampling by leveraging the color coding technique by Alon, Yuster, and Zwick. In this work, we extend the applicability of this approach by introducing a set of algorithmic optimizations and techniques that reduce the running time and space usage of color coding and improve the accuracy of the counts. To this end, we first show how to optimize color coding to efficiently build a compact table of a representative subsample of all graphlets in the input graph. For 8-node motifs, we can build such a table in one hour for a graph with 65M nodes and 1.8B edges, which is times larger than the state of the art. We then introduce a novel adaptive sampling scheme that breaks the “additive error barrier” of uniform sampling, guaranteeing multiplicative approximations instead of just additive ones. This allows us to count not only the most frequent motifs, but also extremely rare ones. For instance, on one graph we accurately count nearly 10.000 distinct 8-node motifs whose relative frequency is so small that uniform sampling would literally take centuries to find them. Our results show that color coding is still the most promising approach to scalable motif counting.


2021 ◽  
Vol 15 (5) ◽  
pp. 1-52
Author(s):  
Lorenzo De Stefani ◽  
Erisa Terolli ◽  
Eli Upfal

We introduce Tiered Sampling , a novel technique for estimating the count of sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size M , which can be magnitudes smaller than the number of edges. Our methods address the challenging task of counting sparse motifs—sub-graph patterns—that have a low probability of appearing in a sample of M edges in the graph, which is the maximum amount of data available to the algorithms in each step. To obtain an unbiased and low variance estimate of the count, we partition the available memory into tiers (layers) of reservoir samples. While the base layer is a standard reservoir sample of edges, other layers are reservoir samples of sub-structures of the desired motif. By storing more frequent sub-structures of the motif, we increase the probability of detecting an occurrence of the sparse motif we are counting, thus decreasing the variance and error of the estimate. While we focus on the designing and analysis of algorithms for counting 4-cliques, we present a method which allows generalizing Tiered Sampling to obtain high-quality estimates for the number of occurrence of any sub-graph of interest, while reducing the analysis effort due to specific properties of the pattern of interest. We present a complete analytical analysis and extensive experimental evaluation of our proposed method using both synthetic and real-world data. Our results demonstrate the advantage of our method in obtaining high-quality approximations for the number of 4 and 5-cliques for large graphs using a very limited amount of memory, significantly outperforming the single edge sample approach for counting sparse motifs in large scale graphs.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Michał Ławniczak ◽  
Adam Sawicki ◽  
Małgorzata Białous ◽  
Leszek Sirko

AbstractWe identify and investigate isoscattering strings of concatenating quantum graphs possessing n units and 2n infinite external leads. We give an insight into the principles of designing large graphs and networks for which the isoscattering properties are preserved for $$n \rightarrow \infty $$ n → ∞ . The theoretical predictions are confirmed experimentally using $$n=2$$ n = 2 units, four-leads microwave networks. In an experimental and mathematical approach our work goes beyond prior results by demonstrating that using a trace function one can address the unsettled until now problem of whether scattering properties of open complex graphs and networks with many external leads are uniquely connected to their shapes. The application of the trace function reduces the number of required entries to the $$2n \times 2n $$ 2 n × 2 n scattering matrices $${\hat{S}}$$ S ^ of the systems to 2n diagonal elements, while the old measures of isoscattering require all $$(2n)^2$$ ( 2 n ) 2 entries. The studied problem generalizes a famous question of Mark Kac “Can one hear the shape of a drum?”, originally posed in the case of isospectral dissipationless systems, to the case of infinite strings of open graphs and networks.


2005 ◽  
Vol 11 (4) ◽  
pp. 457-468 ◽  
Author(s):  
E.R. Gansner ◽  
Y. Koren ◽  
S.C. North
Keyword(s):  

2015 ◽  
Vol 8 (3) ◽  
pp. 183-202 ◽  
Author(s):  
Danai Koutra ◽  
U Kang ◽  
Jilles Vreeken ◽  
Christos Faloutsos
Keyword(s):  

1961 ◽  
Vol 24 (2) ◽  
pp. 307-325 ◽  
Author(s):  
N. A. Jairazbhoy

In the Saṅgītaratnākara, a thirteenth-century musical text by Śārṅgadeva, listed under svaraprastāra (lit. extension of notes) is a complete enumeration of all the possible combinations of the 7 notes of the Indian musical scale. This enumeration begins with the single note (ārcika) and is followed by all the possible combinations of two notes (gāthika), three notes (sāmika), four notes (svarāntara), five notes (auḍuva), six notes (ṣāḍava), and seven notes (pūrṇa). Each of these series of kūṭatānas (series of notes in which the continuity of the sequence is broken) develop in the same logical order based on the precedence of the ascending line over the descending line. In Śārṅgadeva's arrangement the first of the 7 note series is the straight ascending line, sa ri ga ma pa dha ni, which, for easy comprehension will be rendered as l 2 3 4 5 6 7 in this paper; and the last of the 7 note series is the straight descending line, ni dha pa ma ga ri sa, rendered 7 6 5 4 3 2 1 here. The changes in the order of the notes take place from the beginning of the series, at first involving only the first two notes, then the first three notes, then the first four notes, and so on. In fact, the progression for the 7 note series includes the progressions for all the smaller series within it. Thus the 7th note of the 7 note series remains constant until the progressions of one, two, three, four, five, and six notes have been exhausted. Only then is the 7th note replaced by the 6th. The chart on p. 308 is an abbreviation showing the nature of the progression. The 2 and 3 note series involving the first 2 and 3 notes respectively are complete. Beginning from the 4 note series, the chart is abbreviated as follows. The 4 note series is divided into four groups determined by the terminal note, each involves change in the first 3 notes, and each of these groups corresponds to the previous 3 note series, which is in fact the first group of the 4 note series. Of the remaining groups only the first and last sequences are given with an indication as to the number of sequences comprising that group. Similar abbreviations are used in the longer series that follow. Commas have been placed to indicate that the preceding numbers now replace the original ascending sequence (mūlakrama) and that the progressions which follow in that group involve change in only these preceding numbers. For example, if one wishes to determine the complete series from 1 2 4,3 5 6 7 to the end of its particular group 4 2 1 3 5 6 7 the comma after 4 indicates that only the first three numbers change.


Sign in / Sign up

Export Citation Format

Share Document