scholarly journals An optimal cardinality estimation algorithm based on order statistics and its full analysis

2010 ◽  
Vol DMTCS Proceedings vol. AM,... (Proceedings) ◽  
Author(s):  
Jérémie Lumbroso

International audience Building on the ideas of Flajolet and Martin (1985), Alon et al. (1987), Bar-Yossef et al. (2002), Giroire (2005), we develop a new algorithm for cardinality estimation, based on order statistics which, according to Chassaing and Gerin (2006), is optimal among similar algorithms. This algorithm has a remarkably simple analysis that allows us to take its $\textit{fine-tuning}$ and the $\textit{characterization of its properties}$ further than has been done until now. We prove that, asymptotically, it is $\textit{strictly unbiased}$ (contrarily to Probabilistic Counting, Loglog, Hyperloglog), we verify that its relative precision is about $1/\sqrt{m-2}$ when $m$ words of storage are used, and we fully characterize the limit law of the estimates it provides, in terms of gamma distribution―-this is the first such algorithm for which the limit law has been established. We also develop a Poisson analysis for the pre-asymptotic regime. In this way, we are able to devise a complete algorithm, covering all cardinalities ranges from $0$ to very large.

2007 ◽  
Vol DMTCS Proceedings vol. AH,... (Proceedings) ◽  
Author(s):  
Philippe Flajolet ◽  
Éric Fusy ◽  
Olivier Gandouet ◽  
Frédéric Meunier

International audience This extended abstract describes and analyses a near-optimal probabilistic algorithm, HYPERLOGLOG, dedicated to estimating the number of \emphdistinct elements (the cardinality) of very large data ensembles. Using an auxiliary memory of m units (typically, "short bytes''), HYPERLOGLOG performs a single pass over the data and produces an estimate of the cardinality such that the relative accuracy (the standard error) is typically about $1.04/\sqrt{m}$. This improves on the best previously known cardinality estimator, LOGLOG, whose accuracy can be matched by consuming only 64% of the original memory. For instance, the new algorithm makes it possible to estimate cardinalities well beyond $10^9$ with a typical accuracy of 2% while using a memory of only 1.5 kilobytes. The algorithm parallelizes optimally and adapts to the sliding window model.


1989 ◽  
Vol 257 (2) ◽  
pp. 347-354 ◽  
Author(s):  
P N Sanderson ◽  
T N Huckerby ◽  
I A Nieduszynski

Dermatan sulphates, in which iduronate was the predominant uronate constituent, were partially digested by chondroitinase ABC to produce oligosaccharides of the following structure: delta UA-[GalNAc(4SO3)-IdoA]mGalNAc(4SO3) [where m = 0-5, delta UA represents beta-D-gluco-4-enepyranosyluronate, IdoA represents alpha-L-iduronate and GalNAc(4SO3) represents 2-acetamido-2-deoxy-beta-D-galactose 4-O-sulphate], which were fractionated by gel-permeation chromatography and examined by 100 MHz 13C-n.m.r. and 400/500 MHz 1H-n.m.r. spectroscopy. Experimental conditions were established for the removal of non-reducing terminal unsaturated uronate residues by treatment with HgCL2, and reducing terminal N-acetylgalactosamine residues of the oligosaccharides were reduced with alkaline borohydride. These modifications were shown by 13C-n.m.r. spectroscopy to have proceeded to completion. Assignments of both 13C-n.m.r. and 1H-n.m.r. resonances are reported for the GalNAc(4SO3)-IdoA repeat sequence in the oligosaccharides as well as for the terminal residues resulting from enzyme digestion and subsequent modifications. A full analysis of a trisaccharide derived from dermatan sulphate led to the amendment of published 13C-n.m.r. chemical-shift assignments for the polymer.


2015 ◽  
Vol Vol. 17 no. 1 (Graph Theory) ◽  
Author(s):  
Mauricio Soto ◽  
Christopher Thraves-Caro

Graph Theory International audience In this document, we study the scope of the following graph model: each vertex is assigned to a box in ℝd and to a representative element that belongs to that box. Two vertices are connected by an edge if and only if its respective boxes contain the opposite representative element. We focus our study on the case where boxes (and therefore representative elements) associated to vertices are spread in ℝ. We give both, a combinatorial and an intersection characterization of the model. Based on these characterizations, we determine graph families that contain the model (e. g., boxicity 2 graphs) and others that the new model contains (e. g., rooted directed path). We also study the particular case where each representative element is the center of its respective box. In this particular case, we provide constructive representations for interval, block and outerplanar graphs. Finally, we show that the general and the particular model are not equivalent by constructing a graph family that separates the two cases.


1999 ◽  
Vol Vol. 3 no. 4 ◽  
Author(s):  
Andrzej Proskurowski ◽  
Jan Arne Telle

International audience We introduce q-proper interval graphs as interval graphs with interval models in which no interval is properly contained in more than q other intervals, and also provide a forbidden induced subgraph characterization of this class of graphs. We initiate a graph-theoretic study of subgraphs of q-proper interval graphs with maximum clique size k+1 and give an equivalent characterization of these graphs by restricted path-decomposition. By allowing the parameter q to vary from 0 to k, we obtain a nested hierarchy of graph families, from graphs of bandwidth at most k to graphs of pathwidth at most k. Allowing both parameters to vary, we have an infinite lattice of graph classes ordered by containment.


2015 ◽  
Vol 11 (1) ◽  
pp. 73-89
Author(s):  
Devendra Kumar

Abstract In this paper we consider general class of distribution. Recurrence relations satisfied by the quotient moments and conditional quotient moments of lower generalized order statistics for a general class of distribution are derived. Further the results are deduced for quotient moments of order statistics and lower records and characterization of this distribution by considering the recurrence relation of conditional expectation for general class of distribution satisfied by the quotient moment of the lower generalized order statistics.


Sign in / Sign up

Export Citation Format

Share Document