scholarly journals Distance estimation in numerical data sets with missing values

2013 ◽  
Vol 240 ◽  
pp. 115-128 ◽  
Author(s):  
Emil Eirola ◽  
Gauthier Doquire ◽  
Michel Verleysen ◽  
Amaury Lendasse
2018 ◽  
Vol 11 (2) ◽  
pp. 53-67
Author(s):  
Ajay Kumar ◽  
Shishir Kumar

Several initial center selection algorithms are proposed in the literature for numerical data, but the values of the categorical data are unordered so, these methods are not applicable to a categorical data set. This article investigates the initial center selection process for the categorical data and after that present a new support based initial center selection algorithm. The proposed algorithm measures the weight of unique data points of an attribute with the help of support and then integrates these weights along the rows, to get the support of every row. Further, a data object having the largest support is chosen as an initial center followed by finding other centers that are at the greatest distance from the initially selected center. The quality of the proposed algorithm is compared with the random initial center selection method, Cao's method, Wu method and the method introduced by Khan and Ahmad. Experimental analysis on real data sets shows the effectiveness of the proposed algorithm.


Entropy ◽  
2019 ◽  
Vol 21 (2) ◽  
pp. 138 ◽  
Author(s):  
Lin Sun ◽  
Lanying Wang ◽  
Jiucheng Xu ◽  
Shiguang Zhang

For continuous numerical data sets, neighborhood rough sets-based attribute reduction is an important step for improving classification performance. However, most of the traditional reduction algorithms can only handle finite sets, and yield low accuracy and high cardinality. In this paper, a novel attribute reduction method using Lebesgue and entropy measures in neighborhood rough sets is proposed, which has the ability of dealing with continuous numerical data whilst maintaining the original classification information. First, Fisher score method is employed to eliminate irrelevant attributes to significantly reduce computation complexity for high-dimensional data sets. Then, Lebesgue measure is introduced into neighborhood rough sets to investigate uncertainty measure. In order to analyze the uncertainty and noisy of neighborhood decision systems well, based on Lebesgue and entropy measures, some neighborhood entropy-based uncertainty measures are presented, and by combining algebra view with information view in neighborhood rough sets, a neighborhood roughness joint entropy is developed in neighborhood decision systems. Moreover, some of their properties are derived and the relationships are established, which help to understand the essence of knowledge and the uncertainty of neighborhood decision systems. Finally, a heuristic attribute reduction algorithm is designed to improve the classification performance of large-scale complex data. The experimental results under an instance and several public data sets show that the proposed method is very effective for selecting the most relevant attributes with high classification accuracy.


Author(s):  
Antonia J. Jones ◽  
Dafydd Evans ◽  
Steve Margetts ◽  
Peter J. Durrant

The Gamma Test is a non-linear modelling analysis tool that allows us to quantify the extent to which a numerical input/output data set can be expressed as a smooth relationship. In essence, it allows us to efficiently calculate that part of the variance of the output that cannot be accounted for by the existence of any smooth model based on the inputs, even though this model is unknown. A key aspect of this tool is its speed: the Gamma Test has time complexity O(Mlog M), where M is the number of datapoints. For data sets consisting of a few thousand points and a reasonable number of attributes, a single run of the Gamma Test typically takes a few seconds. In this chapter we will show how the Gamma Test can be used in the construction of predictive models and classifiers for numerical data. In doing so, we will demonstrate the use of this technique for feature selection, and for the selection of embedding dimension when dealing with a time-series.


Author(s):  
SUNG-GI LEE ◽  
DEOK-KYUN YUN

In this paper, we present a concept based on the similarity of categorical attribute values considering implicit relationships and propose a new and effective clustering procedure for mixed data. Our procedure obtains similarities between categorical values from careful analysis and maps the values in each categorical attribute into points in two-dimensional coordinate space using multidimensional scaling. These mapped values make it possible to interpret the relationships between attribute values and to directly apply categorical attributes to clustering algorithms using a Euclidean distance. After trivial modifications, our procedure for clustering mixed data uses the k-means algorithm, well known for its efficiency in clustering large data sets. We use the familiar soybean disease and adult data sets to demonstrate the performance of our clustering procedure. The satisfactory results that we have obtained demonstrate the effectiveness of our algorithm in discovering structure in data.


Author(s):  
W. Hanif ◽  
S. Kenny

Pipelines may experience damage (e.g. dent, gouge) during handling, installation and normal operations due to external interference. Pipelines in offshore environment may be prone to mechanical damage from events such as ice gouging, frost heave, and seismic fault movement. Damage mechanisms can be associated with deformation or metallurgical/metal loss that may include pipe dent, pipe ovality, ice gouging, pipe buckling, corrosion etc. The type and severity of pipe damage may influence operational, repair and intervention strategies. For conventional pipelines, the assessment of mechanical damage plays an important role in the development of integrity management programs that may be of greater significance for pipeline systems located in remote harsh environments due to remote location and logistical constraints. This study examines the effects of plain dents on pipe mechanical response using continuum finite element methods. ABAQUS/Standard (6.10-1) environment was used to simulate damage events and pipe response. Modelling procedures were developed and calibrated against physical and numerical data sets available in public domain. Once confidence in numerical procedures was achieved, an analysis matrix was established to account for a range of influential parameters including Diameter to wall thickness ratio (D/t), indenter diameter to pipe diameter ratio (ID/OD), hoop stress due to internal pressure to yield strength ratio (σh/σy), and kinematic boundary conditions. The results from this study provide a basis to support a broader initiative for developing an engineering tool for the assessment of damage interaction with pipeline girth welds and development of an engineering performance criterion.


1996 ◽  
Vol 5 (2) ◽  
pp. 113 ◽  
Author(s):  
Antony Unwin ◽  
George Hawkins ◽  
Heike Hofmann ◽  
Bernd Siegl

2020 ◽  
Author(s):  
Christopher Kadow ◽  
David Hall ◽  
Uwe Ulbrich

<p>Nowadays climate change research relies on climate information of the past. Historic climate records of temperature observations form global gridded datasets like HadCRUT4, which is investigated e.g. in the IPCC reports. However, record combining data-sets are sparse in the past. Even today they contain missing values. Here we show that machine learning technology can be applied to refill these missing climate values in observational datasets. We found that the technology of image inpainting using partial convolutions in a CUDA accelerated deep neural network can be trained by large Earth system model experiments from NOAA reanalysis (20CR) and the Coupled Model Intercomparison Project phase 5 (CMIP5). The derived deep neural networks are capable to independently refill added missing values of these experiments. The analysis shows a very high degree of reconstruction even in the cross-reconstruction of the trained networks on the other dataset. The network reconstruction reaches a better evaluation than other typical methods in climate science. In the end we will show the new reconstructed observational dataset HadCRUT4 and discuss further investigations.</p>


2019 ◽  
Vol 16 (9) ◽  
pp. 4008-4014
Author(s):  
Savita Wadhawan ◽  
Gautam Kumar ◽  
Vivek Bhatnagar

This paper presents the analysis of different population based algorithms for the rulebase generation from numerical data sets. As fuzzy rulebase generation is one of the key issues in fuzzy modeling. The algorithms are applied on a rapid Ni–Cd battery charger data set. In this paper, we compare the efficiency of different algorithms and conclude that SCA algorithms with local search give remarkable efficiency as compared to SCA algorithms alone. Also found that the efficiency of SCA with local search is comparable to memetic algorithms.


Sign in / Sign up

Export Citation Format

Share Document