ordered data Latest Research Papers

AbstractPartial orders and directed acyclic graphs are commonly recurring data structures that arise naturally in numerous domains and applications and are used to represent ordered relations between entities in the domains. Examples are task dependencies in a project plan, transaction order in distributed ledgers and execution sequences of tasks in computer programs, just to mention a few. We study the problem of order preserving hierarchical clustering of this kind of ordered data. That is, if we have $$a<b$$ a < b in the original data and denote their respective clusters by [a] and [b], then we shall have $$[a]<[b]$$ [ a ] < [ b ] in the produced clustering. The clustering is similarity based and uses standard linkage functions, such as single- and complete linkage, and is an extension of classical hierarchical clustering. To achieve this, we develop a novel theory that extends classical hierarchical clustering to strictly partially ordered sets. We define the output from running classical hierarchical clustering on strictly ordered data to be partial dendrograms; sub-trees of classical dendrograms with several connected components. We then construct an embedding of partial dendrograms over a set into the family of ultrametrics over the same set. An optimal hierarchical clustering is defined as the partial dendrogram corresponding to the ultrametric closest to the original dissimilarity measure, measured in the p-norm. Thus, the method is a combination of classical hierarchical clustering and ultrametric fitting. A reference implementation is employed for experiments on both synthetic random data and real world data from a database of machine parts. When compared to existing methods, the experiments show that our method excels both in cluster quality and order preservation.

Download Full-text

Regression with Stable Errors Based on Order Statistics

Fluctuation and Noise Letters ◽

10.1142/s0219477522500146 ◽

2021 ◽

Author(s):

Reza Alizadeh Noughabi ◽

Adel Mohammadpour

Keyword(s):

Order Statistics ◽

Stable Distribution ◽

Optimal Number ◽

Least Square ◽

Regression Coefficients ◽

Regression Parameters ◽

The Mean ◽

Heavy Tailed ◽

Ordered Data ◽

Error Order

Classical regression approaches are not robust when errors are heavy-tailed or asymmetric. That may be due to the non-existence of the mean or variance of the error distribution. Estimation based on trimmed data, which ignored outlier or leverage points, has an old history and frequently used. This procedure chooses fixed cut-off points. In this work, we use this idea recently applied for initial estimates of regression coefficients with heavy-tailed stable errors. We propose an effective procedure to calculate the cut-off points based on the tail index and skewness parameters of errors. We use the property of the existence of some moments of stable distribution order statistics. Data are trimmed based on ordered residuals of a least square regression. However, the trimmed data’s optimal number is determined based on the number of error order statistics whose variance exists. Then, we use the rest of the ordered data to estimate the regression coefficients. Based on these order statistics’ joint distribution, we analytically compute the bias and variance of the introduced estimator of regression parameters that was impossible for regression with stable errors.

Download Full-text

Implementation Of Bubble Sort Algorithm On 2 Fruit Models Of Data Selection Using The Java Program Language

Petir ◽

10.33322/petir.v14i2.946 ◽

2021 ◽

Vol 14 (2) ◽

pp. 159-169

Author(s):

Endang Sunandar

Keyword(s):

Data Sequence ◽

Radix Sort ◽

Java Programming ◽

Advantages And Disadvantages ◽

Program Language ◽

Java Program ◽

Ordered Data ◽

Sort Algorithm ◽

Data Sorting ◽

Quick Sort

There are various kinds of data sorting methods that we know of which are the Bubble Sort, Selection Sort, Insertion Sort, Quick Sort, Shell Sort, Heap Sort, and Radix Sort methods. All of these methods have advantages and disadvantages of each, whose use is determined based on needs. Each method has a different algorithm, where different algorithms affect the execution time. One interesting algorithm to be implemented on 2 variant models of data sorting is the Bubble Sort algorithm, the reason is that this algorithm has a fairly long and detailed process flow to produce an ordered data sequence from a previously unordered data sequence. Two (2) data sorting variant models that will be implemented using the Bubble Sort algorithm are: Ascending data sorting variants moving from left to right, and Descending data sorting variants moving from left to right. And the device used in implementing the Bubble Sort algorithm is the Java programming language.

Download Full-text

Powerful Statistical Tests for Ordered Data

10.33774/chemrxiv-2021-mdt29-v3 ◽

2021 ◽

Author(s):

Jürgen Köfinger ◽

Gerhard Hummer

Keyword(s):

Statistical Tests ◽

Ordered Data

Download Full-text

Feature selection for dynamic interval-valued ordered data based on fuzzy dominance neighborhood rough set

Knowledge-Based Systems ◽

10.1016/j.knosys.2021.107223 ◽

2021 ◽

pp. 107223

Author(s):

Binbin Sang ◽

Hongmei Chen ◽

Lei Yang ◽

Tianrui Li ◽

Weihua Xu ◽

...

Keyword(s):

Feature Selection ◽

Rough Set ◽

Neighborhood Rough Set ◽

Selection For ◽

Ordered Data ◽

Interval Valued

Download Full-text

Towards multi-purpose main-memory storage structures: Exploiting sub-space distance equalities in totally ordered data sets for exact knn queries

Information Systems ◽

10.1016/j.is.2021.101791 ◽

2021 ◽

pp. 101791

Author(s):

Martin Schäler ◽

Christine Tex ◽

Veit Köppen ◽

David Broneske ◽

Gunter Saake

Keyword(s):

Main Memory ◽

Memory Storage ◽

Data Sets ◽

Ordered Data ◽

Storage Structures ◽

Space Distance

Download Full-text

Powerful Statistical Tests for Ordered Data

10.26434/chemrxiv.13373351.v2 ◽

2021 ◽

Author(s):

Juergen Koefinger ◽

Gerhard Hummer

Keyword(s):

Statistical Tests ◽

X Rays ◽

Test Statistics ◽

Chi Square ◽

P Values ◽

Gamma Distributions ◽

Versatile Tool ◽

Chi Square Test ◽

Common Sign ◽

Ordered Data

<p>The inference of models from one-dimensional ordered data subject to noise is a fundamental and ubiquitous task in the physical and life sciences. A prototypical example is the analysis of small- and wide-angle solution scattering experiments using x-rays (SAXS/WAXS) or neutrons (SANS). In such cases, it is common practice to check the quality of a fit by using Pearson's chi-square test, which ignores the order of the data. We usually plot the residuals and check visually for systematic deviations without quantifying them. To quantify these deviations, we developed test statistics based on the distributions of the lengths of the runs of the signs of the residuals. Specifically, we use the probability of run-length distributions, for which we provide analytical expressions, to rank them and to calculate their P-values. We introduce the Shannon information distribution as an elegant and versatile tool for calculating P-values. We find that these distributions follow shifted gamma distributions, such that they are summarized by three parameters only. We show for a set of six models that our test statistics are more powerful than Pearson's chi-square test and common sign-based tests. We provide an open source Python 3 implementation of our tests free of charge at https://github.com/bio-phys/hplusminus.</p>

Download Full-text

Weighted Consensus Segmentations

Computation ◽

10.3390/computation9020017 ◽

2021 ◽

Vol 9 (2) ◽

pp. 17

Author(s):

Halima Saker ◽

Rainer Machné ◽

Jörg Fallmann ◽

Douglas B. Murray ◽

Ahmad M. Shahin ◽

...

Keyword(s):

Time Series ◽

Language Processing ◽

Growth Curves ◽

Distance Functions ◽

Data Sets ◽

Aggregation Problem ◽

Ordered Data ◽

Polycistronic Transcripts ◽

Sum Of Distances ◽

Segmentation Problem

The problem of segmenting linearly ordered data is frequently encountered in time-series analysis, computational biology, and natural language processing. Segmentations obtained independently from replicate data sets or from the same data with different methods or parameter settings pose the problem of computing an aggregate or consensus segmentation. This Segmentation Aggregation problem amounts to finding a segmentation that minimizes the sum of distances to the input segmentations. It is again a segmentation problem and can be solved by dynamic programming. The aim of this contribution is (1) to gain a better mathematical understanding of the Segmentation Aggregation problem and its solutions and (2) to demonstrate that consensus segmentations have useful applications. Extending previously known results we show that for a large class of distance functions only breakpoints present in at least one input segmentation appear in the consensus segmentation. Furthermore, we derive a bound on the size of consensus segments. As show-case applications, we investigate a yeast transcriptome and show that consensus segments provide a robust means of identifying transcriptomic units. This approach is particularly suited for dense transcriptomes with polycistronic transcripts, operons, or a lack of separation between transcripts. As a second application, we demonstrate that consensus segmentations can be used to robustly identify growth regimes from sets of replicate growth curves.

Download Full-text

Incremental attribute reduction approaches for ordered data with time-evolving objects

Knowledge-Based Systems ◽

10.1016/j.knosys.2020.106583 ◽

2021 ◽

Vol 212 ◽

pp. 106583

Author(s):

Binbin Sang ◽

Hongmei Chen ◽

Lei Yang ◽

Dapeng Zhou ◽

Tianrui Li ◽

...

Keyword(s):

Attribute Reduction ◽

Ordered Data

Download Full-text

ordered data
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Computation of quantile sets for bivariate ordered data

Order preserving hierarchical agglomerative clustering

Regression with Stable Errors Based on Order Statistics

Implementation Of Bubble Sort Algorithm On 2 Fruit Models Of Data Selection Using The Java Program Language

Powerful Statistical Tests for Ordered Data

Feature selection for dynamic interval-valued ordered data based on fuzzy dominance neighborhood rough set

Towards multi-purpose main-memory storage structures: Exploiting sub-space distance equalities in totally ordered data sets for exact knn queries

Powerful Statistical Tests for Ordered Data

Weighted Consensus Segmentations

Incremental attribute reduction approaches for ordered data with time-evolving objects

Export Citation Format

ordered dataRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Computation of quantile sets for bivariate ordered data

Order preserving hierarchical agglomerative clustering

Regression with Stable Errors Based on Order Statistics

Implementation Of Bubble Sort Algorithm On 2 Fruit Models Of Data Selection Using The Java Program Language

Powerful Statistical Tests for Ordered Data

Feature selection for dynamic interval-valued ordered data based on fuzzy dominance neighborhood rough set

Towards multi-purpose main-memory storage structures: Exploiting sub-space distance equalities in totally ordered data sets for exact knn queries

Powerful Statistical Tests for Ordered Data

Weighted Consensus Segmentations

Incremental attribute reduction approaches for ordered data with time-evolving objects

ordered data
Recently Published Documents