ordered data
Recently Published Documents


TOTAL DOCUMENTS

161
(FIVE YEARS 44)

H-INDEX

16
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Daniel Bakkelund

AbstractPartial orders and directed acyclic graphs are commonly recurring data structures that arise naturally in numerous domains and applications and are used to represent ordered relations between entities in the domains. Examples are task dependencies in a project plan, transaction order in distributed ledgers and execution sequences of tasks in computer programs, just to mention a few. We study the problem of order preserving hierarchical clustering of this kind of ordered data. That is, if we have $$a<b$$ a < b in the original data and denote their respective clusters by [a] and [b], then we shall have $$[a]<[b]$$ [ a ] < [ b ] in the produced clustering. The clustering is similarity based and uses standard linkage functions, such as single- and complete linkage, and is an extension of classical hierarchical clustering. To achieve this, we develop a novel theory that extends classical hierarchical clustering to strictly partially ordered sets. We define the output from running classical hierarchical clustering on strictly ordered data to be partial dendrograms; sub-trees of classical dendrograms with several connected components. We then construct an embedding of partial dendrograms over a set into the family of ultrametrics over the same set. An optimal hierarchical clustering is defined as the partial dendrogram corresponding to the ultrametric closest to the original dissimilarity measure, measured in the p-norm. Thus, the method is a combination of classical hierarchical clustering and ultrametric fitting. A reference implementation is employed for experiments on both synthetic random data and real world data from a database of machine parts. When compared to existing methods, the experiments show that our method excels both in cluster quality and order preservation.


Author(s):  
Reza Alizadeh Noughabi ◽  
Adel Mohammadpour

Classical regression approaches are not robust when errors are heavy-tailed or asymmetric. That may be due to the non-existence of the mean or variance of the error distribution. Estimation based on trimmed data, which ignored outlier or leverage points, has an old history and frequently used. This procedure chooses fixed cut-off points. In this work, we use this idea recently applied for initial estimates of regression coefficients with heavy-tailed stable errors. We propose an effective procedure to calculate the cut-off points based on the tail index and skewness parameters of errors. We use the property of the existence of some moments of stable distribution order statistics. Data are trimmed based on ordered residuals of a least square regression. However, the trimmed data’s optimal number is determined based on the number of error order statistics whose variance exists. Then, we use the rest of the ordered data to estimate the regression coefficients. Based on these order statistics’ joint distribution, we analytically compute the bias and variance of the introduced estimator of regression parameters that was impossible for regression with stable errors.


Petir ◽  
2021 ◽  
Vol 14 (2) ◽  
pp. 159-169
Author(s):  
Endang Sunandar

There are various kinds of data sorting methods that we know of which are the Bubble Sort, Selection Sort, Insertion Sort, Quick Sort, Shell Sort, Heap Sort, and Radix Sort methods. All of these methods have advantages and disadvantages of each, whose use is determined based on needs. Each method has a different algorithm, where different algorithms affect the execution time. One interesting algorithm to be implemented on 2 variant models of data sorting is the Bubble Sort algorithm, the reason is that this algorithm has a fairly long and detailed process flow to produce an ordered data sequence from a previously unordered data sequence. Two (2) data sorting variant models that will be implemented using the Bubble Sort algorithm are: Ascending data sorting variants moving from left to right, and Descending data sorting variants moving from left to right. And the device used in implementing the Bubble Sort algorithm is the Java programming language.


2021 ◽  
Author(s):  
Jürgen Köfinger ◽  
Gerhard Hummer

2021 ◽  
pp. 107223
Author(s):  
Binbin Sang ◽  
Hongmei Chen ◽  
Lei Yang ◽  
Tianrui Li ◽  
Weihua Xu ◽  
...  

2021 ◽  
Author(s):  
Juergen Koefinger ◽  
Gerhard Hummer

<p>The inference of models from one-dimensional ordered data subject to noise is a fundamental and ubiquitous task in the physical and life sciences. A prototypical example is the analysis of small- and wide-angle solution scattering experiments using x-rays (SAXS/WAXS) or neutrons (SANS). In such cases, it is common practice to check the quality of a fit by using Pearson's chi-square test, which ignores the order of the data. We usually plot the residuals and check visually for systematic deviations without quantifying them. To quantify these deviations, we developed test statistics based on the distributions of the lengths of the runs of the signs of the residuals. Specifically, we use the probability of run-length distributions, for which we provide analytical expressions, to rank them and to calculate their P-values. We introduce the Shannon information distribution as an elegant and versatile tool for calculating P-values. We find that these distributions follow shifted gamma distributions, such that they are summarized by three parameters only. We show for a set of six models that our test statistics are more powerful than Pearson's chi-square test and common sign-based tests. We provide an open source Python 3 implementation of our tests free of charge at https://github.com/bio-phys/hplusminus.</p>


Computation ◽  
2021 ◽  
Vol 9 (2) ◽  
pp. 17
Author(s):  
Halima Saker ◽  
Rainer Machné ◽  
Jörg Fallmann ◽  
Douglas B. Murray ◽  
Ahmad M. Shahin ◽  
...  

The problem of segmenting linearly ordered data is frequently encountered in time-series analysis, computational biology, and natural language processing. Segmentations obtained independently from replicate data sets or from the same data with different methods or parameter settings pose the problem of computing an aggregate or consensus segmentation. This Segmentation Aggregation problem amounts to finding a segmentation that minimizes the sum of distances to the input segmentations. It is again a segmentation problem and can be solved by dynamic programming. The aim of this contribution is (1) to gain a better mathematical understanding of the Segmentation Aggregation problem and its solutions and (2) to demonstrate that consensus segmentations have useful applications. Extending previously known results we show that for a large class of distance functions only breakpoints present in at least one input segmentation appear in the consensus segmentation. Furthermore, we derive a bound on the size of consensus segments. As show-case applications, we investigate a yeast transcriptome and show that consensus segments provide a robust means of identifying transcriptomic units. This approach is particularly suited for dense transcriptomes with polycistronic transcripts, operons, or a lack of separation between transcripts. As a second application, we demonstrate that consensus segmentations can be used to robustly identify growth regimes from sets of replicate growth curves.


2021 ◽  
Vol 212 ◽  
pp. 106583
Author(s):  
Binbin Sang ◽  
Hongmei Chen ◽  
Lei Yang ◽  
Dapeng Zhou ◽  
Tianrui Li ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document