ON SIMULTANEOUS CONSTRUCTION OF VORONOI DIAGRAM AND DELAUNAY TRIANGULATION BY PHYSARUM POLYCEPHALUM

2009 ◽  
Vol 19 (09) ◽  
pp. 3109-3117 ◽  
Author(s):  
TOMOHIRO SHIRAKAWA ◽  
ANDREW ADAMATZKY ◽  
YUKIO-PEGIO GUNJI ◽  
YOSHIHIRO MIYAKE

We experimentally demonstrate that both Voronoi diagram and its dual graph Delaunay triangulation are simultaneously constructed — for specific conditions — in cultures of plasmodium, a vegetative state of Physarum polycephalum. Every point of a given planar data set is represented by a tiny mass of plasmodium. The plasmodia spread from their initial locations but, in certain conditions, stop spreading when they encounter plasmodia originated from different locations. Thus space loci not occupied by the plasmodia represent edges of Voronoi diagram of the given planar set. At the same time, the plasmodia originating at neighboring locations form merging protoplasmic tubes, where the strongest tubes approximate Delaunay triangulation of the given planar set. The problems are solved by plasmodium only for limited data sets, however the results presented lay a sound ground for further investigations.

2018 ◽  
Vol 2018 ◽  
pp. 1-12 ◽  
Author(s):  
Suleman Nasiru

The need to develop generalizations of existing statistical distributions to make them more flexible in modeling real data sets is vital in parametric statistical modeling and inference. Thus, this study develops a new class of distributions called the extended odd Fréchet family of distributions for modifying existing standard distributions. Two special models named the extended odd Fréchet Nadarajah-Haghighi and extended odd Fréchet Weibull distributions are proposed using the developed family. The densities and the hazard rate functions of the two special distributions exhibit different kinds of monotonic and nonmonotonic shapes. The maximum likelihood method is used to develop estimators for the parameters of the new class of distributions. The application of the special distributions is illustrated by means of a real data set. The results revealed that the special distributions developed from the new family can provide reasonable parametric fit to the given data set compared to other existing distributions.


Author(s):  
Hiroyasu Matsushima ◽  
Keiki Takadama ◽  
◽  

In this paper, we propose a method to improve ECS-DMR which enables appropriate output for imbalanced data sets. In order to control generalization of LCS in imbalanced data set, we propose a method of applying imbalance ratio of data set to a sigmoid function, and then, appropriately update the matching range. In comparison with our previous work (ECS-DMR), the proposed method can control the generalization of the appropriate matching range automatically to extract the exemplars that cover the given problem space, wchich consists of imbalanced data set. From the experimental results, it is suggested that the proposed method provides stable performance to imbalanced data set. The effect of the proposed method using the sigmoid function considering the data balance is shown.


2021 ◽  
Author(s):  
Nathanael Andrews ◽  
Martin Enge

Abstract CIM-seq is a tool for deconvoluting RNA-seq data from cell multiplets (clusters of two or more cells) in order to identify physically interacting cell in a given tissue. The method requires two RNAseq data sets from the same tissue: one of single cells to be used as a reference, and one of cell multiplets to be deconvoluted. CIM-seq is compatible with both droplet based sequencing methods, such as Chromium Single Cell 3′ Kits from 10x genomics; and plate based methods, such as Smartseq2. The pipeline consists of three parts: 1) Dissociation of the target tissue, FACS sorting of single cells and multiplets, and conventional scRNA-seq 2) Feature selection and clustering of cell types in the single cell data set - generating a blueprint of transcriptional profiles in the given tissue 3) Computational deconvolution of multiplets through a maximum likelihood estimation (MLE) to determine the most likely cell type constituents of each multiplet.


2011 ◽  
Vol 2 (4) ◽  
pp. 12-23 ◽  
Author(s):  
Rekha Kandwal ◽  
Prerna Mahajan ◽  
Ritu Vijay

This paper revisits the problem of active learning and decision making when the cost of labeling incurs cost and unlabeled data is available in abundance. In many real world applications large amounts of data are available but the cost of correctly labeling it prohibits its use. In such cases, active learning can be employed. In this paper the authors propose rough set based clustering using active learning approach. The authors extend the basic notion of Hamming distance to propose a dissimilarity measure which helps in finding the approximations of clusters in the given data set. The underlying theoretical background for this decision is rough set theory. The authors have investigated our algorithm on the benchmark data sets from UCI machine learning repository which have shown promising results.


Author(s):  
Y. SARATH KUMAR ◽  
ESWAR KODALI ◽  
P. HARINI

In this paper we proposed a lexical-pattern-based approach to extract aliases of a given name. We use a set of names and their aliases as training data to extract lexical patterns that describe numerous ways in which information related to aliases of a name is presented on the web. An individual is typically referred by numerous name aliases on the web. Accurate identification of aliases of a given person name is useful in various web related tasks such as information retrieval, sentiment analysis, personal name disambiguation, and relation extraction. We propose a method to extract aliases of a given personal name from the web. Given a personal name, the proposed method first extracts a set of candidate aliases. Second, we rank the extracted candidates according to the likelihood of a candidate being a correct alias of the given name. We evaluate the proposed method on three data sets: an English personal names data set, an English place names data set, and a Japanese personal names data set. The proposed method outperforms numerous baselines and previously proposed name alias extraction methods, achieving a statistically significant mean reciprocal rank (MRR) of 0.67.


Geophysics ◽  
2013 ◽  
Vol 78 (2) ◽  
pp. E79-E94 ◽  
Author(s):  
John Deceuster ◽  
Olivier Kaufmann ◽  
Michel Van Camp

Electrical resistivity tomography (ERT) monitoring experiments are being conducted more often to image spatiotemporal changes in soil properties. When conducting long-term ERT monitoring, the identification of suspicious electrodes in a permanent spread is of major importance because changes in electrode contact properties of a single electrode may affect the quality of many measurements on each time-slice. An automated methodology was developed to detect these temporal changes in electrode contact properties, based on a Bayesian approach called “weights of evidence.” Contrasts [Formula: see text] and studentized contrasts [Formula: see text] are estimators of the influence of each electrode in the global data quality. A consolidated studentized contrast [Formula: see text] is introduced to consider the proportion of rejected quadripoles which contain a single electrode. These estimators are computed for each time-slice using [Formula: see text]-factor (coefficient of variation of repeated measurements) threshold values, from 0 to 10%, to discriminate between selected and rejected quadripoles. An automated detection strategy is proposed to identify suspicious electrodes by comparing the [Formula: see text] to the [Formula: see text] (maximum expected [Formula: see text] values when every electrode is good for the given data set). These [Formula: see text] are computed using Monte-Carlo simulations of a hundred random draws where the distribution of [Formula: see text]-factor values follows a Weibull cumulative distribution, with [Formula: see text] and [Formula: see text], fitted on a background data set filtered using a 5% threshold on absolute reciprocal errors. The efficiency of the methodology and its sensitivity to the selected reciprocal error threshold are assessed on synthetic and field data. Our approach is suitable to detect suspicious electrodes and slowly changing conditions affecting the galvanic contact resistances where classical approaches are shown to be inadequate except when the faulty electrode is disconnected. A data-weighting method is finally proposed to ensure that only good data will be used in the inversion of ERT monitoring data sets.


2017 ◽  
Vol 26 (1) ◽  
pp. 153-168 ◽  
Author(s):  
Vijay Kumar ◽  
Jitender Kumar Chhabra ◽  
Dinesh Kumar

AbstractThe main problem of classical clustering technique is that it is easily trapped in the local optima. An attempt has been made to solve this problem by proposing the grey wolf algorithm (GWA)-based clustering technique, called GWA clustering (GWAC), through this paper. The search capability of GWA is used to search the optimal cluster centers in the given feature space. The agent representation is used to encode the centers of clusters. The proposed GWAC technique is tested on both artificial and real-life data sets and compared to six well-known metaheuristic-based clustering techniques. The computational results are encouraging and demonstrate that GWAC provides better values in terms of precision, recall, G-measure, and intracluster distances. GWAC is further applied for gene expression data set and its performance is compared to other techniques. Experimental results reveal the efficiency of the GWAC over other techniques.


Stats ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 359-384
Author(s):  
Manabu Ichino ◽  
Kadri Umbleja ◽  
Hiroyuki Yaguchi

This paper presents an unsupervised feature selection method for multi-dimensional histogram-valued data. We define a multi-role measure, called the compactness, based on the concept size of given objects and/or clusters described using a fixed number of equal probability bin-rectangles. In each step of clustering, we agglomerate objects and/or clusters so as to minimize the compactness for the generated cluster. This means that the compactness plays the role of a similarity measure between objects and/or clusters to be merged. Minimizing the compactness is equivalent to maximizing the dis-similarity of the generated cluster, i.e., concept, against the whole concept in each step. In this sense, the compactness plays the role of cluster quality. We also show that the average compactness of each feature with respect to objects and/or clusters in several clustering steps is useful as a feature effectiveness criterion. Features having small average compactness are mutually covariate and are able to detect a geometrically thin structure embedded in the given multi-dimensional histogram-valued data. We obtain thorough understandings of the given data via visualization using dendrograms and scatter diagrams with respect to the selected informative features. We illustrate the effectiveness of the proposed method by using an artificial data set and real histogram-valued data sets.


Author(s):  
Manabu Ichino ◽  
Kadri Umbleja ◽  
Hiroyuki Yaguchi

This paper presents an unsupervised feature selection method for multi-dimensional histogram-valued data. We define a multi-role measure, called the compactness, based on the concept size of given objects and/or clusters described by a fixed number of equal probability bin-rectangles. In each step of clustering, we agglomerate objects and/or clusters so as to minimize the compactness for the generated cluster. This means that the compactness plays the role of a similarity measure between objects and/or clusters to be merged. To minimize the compactness is equivalent to maximize the dis-similarity of the generated cluster, i.e., concept, against the whole concept in each step. In this sense, the compactness plays the role of cluster quality. We also show that the average compactness of each feature with respect to objects and/or clusters in several clustering steps is useful as feature effectiveness criterion. Features having small average compactness are mutually covariate, and are able to detect geometrically thin structure embedded in the given multi-dimensional histogram-valued data. We obtain thorough understandings of the given data by the visualization using dendrograms and scatter diagrams with respect to the selected informative features. We illustrate the effectiveness of the proposed method by using an artificial data set and real histogram-valued data sets.


Geophysics ◽  
2016 ◽  
Vol 81 (2) ◽  
pp. F17-F26 ◽  
Author(s):  
Peter Menzel

Resampling of high-resolution data sets is often required for real-time applications in geosciences, e.g., interactive modeling and 3D visualization. To support interactivity and real-time computations, it is often necessary to resample the data sets to a resolution adequate to the application. Conventional resampling approaches create uniformly distributed results, which are not always the best possible solution for particular applications. I have developed a new resampling method called constrained indicator data resampling (CIDRe). This method results in irregular point distributions that are adapted to local parameter signal wavelengths of the given data. The algorithm identifies wavelength variations by analyzing gradients in the given parameter distribution. A higher point density is ensured in areas with larger gradients than in areas with smaller gradients, and thus the resulting data set shows an irregular point distribution. A synthetic data test showed that CIDRe is able to represent a data set better than conventional resampling algorithms. In a second application, CIDRe was used to reduce the number of gravity stations for interactive 3D density modeling, in which the resulting point distribution still allows accurate interactive modeling with a minimum number of data points.


Sign in / Sign up

Export Citation Format

Share Document