Evidence Theory of One-Dimensional Compression KNN Classification Method

2010 ◽  
Vol 143-144 ◽  
pp. 1337-1341
Author(s):  
Wei Feng Yan ◽  
Gen Xiu Wu ◽  
Can Ze Li ◽  
Li Zhou

As only using Euclidean distance KNN algorithm has its limits, many researchers use other distance calculation methods as the replacement it to improve the accuracy of Data Classification. While combining the DS evidence theory with a series of KNN algorithm which discussed in this paper, we found that every algorithm has their merits. All of them ignore the analysis of the data set, through deeply analysis we found that the actual distance is determined by the larger value when two attribute values are in great difference. Therefore, what we do next is to compress the large-dimensional numerical data values. By this way, the accuracy of KNN, VSMKNN, KERKNN algorithm are obviously improved after experiment and then these new methods are called CDSKNN, CDSVSMKNN, CDSKERKNN.

2007 ◽  
Vol 56 (12) ◽  
pp. 101-110 ◽  
Author(s):  
A.-E. Stricker ◽  
I. Takács ◽  
A. Marquot

The Vesilind settling velocity function forms the basis of flux theory used both in state point analysis (for design and capacity rating) and one-dimensional dynamic models (for dynamic process modelling). This paper proposes new methods to address known shortcomings of these methods, based on an extensive set of batch settling tests conducted at different scales. The experimental method to determine the Vesilind parameters from a series of bench scale settling tests is reviewed. It is confirmed that settling cylinders must be slowly stirred in order to represent settling performance of full scale plants for the whole range of solids concentrations. Two new methods to extract the Vesilind parameters from settling test series are proposed and tested against the traditional manual method. Finally, the same data set is used to propose an extension to one-dimensional (1-D) dynamic settler models to account for compression settling. Using the modified empirical function, the model is able to describe the batch settling interface independently of the number of layers.


2014 ◽  
Vol 598 ◽  
pp. 481-485 ◽  
Author(s):  
Bao Wen Sun ◽  
Ming Li ◽  
Wei Zhang

Nowadays, there are several different kinds of methodology in selecting recommendation systems (CRS), and every method has its own evaluation criteria to pick up the best one. In this paper, a new MCDM method for recommendation system selection based on fuzzy VIKOR with multiple distances is introduced. It selects the best system by calculating values using three different distance calculation methods, which are Hamming distance, Euclidean distance and Hausdorff distance, and voting via Condorcet method. It minimizes the effect of distance and offers a more objective result than other methods and helps enterprises to select the most suitable recommendation system.


Animals ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 50
Author(s):  
Jennifer Salau ◽  
Jan Henning Haas ◽  
Wolfgang Junge ◽  
Georg Thaller

Machine learning methods have become increasingly important in animal science, and the success of an automated application using machine learning often depends on the right choice of method for the respective problem and data set. The recognition of objects in 3D data is still a widely studied topic and especially challenging when it comes to the partition of objects into predefined segments. In this study, two machine learning approaches were utilized for the recognition of body parts of dairy cows from 3D point clouds, i.e., sets of data points in space. The low cost off-the-shelf depth sensor Microsoft Kinect V1 has been used in various studies related to dairy cows. The 3D data were gathered from a multi-Kinect recording unit which was designed to record Holstein Friesian cows from both sides in free walking from three different camera positions. For the determination of the body parts head, rump, back, legs and udder, five properties of the pixels in the depth maps (row index, column index, depth value, variance, mean curvature) were used as features in the training data set. For each camera positions, a k nearest neighbour classifier and a neural network were trained and compared afterwards. Both methods showed small Hamming losses (between 0.007 and 0.027 for k nearest neighbour (kNN) classification and between 0.045 and 0.079 for neural networks) and could be considered successful regarding the classification of pixel to body parts. However, the kNN classifier was superior, reaching overall accuracies 0.888 to 0.976 varying with the camera position. Precision and recall values associated with individual body parts ranged from 0.84 to 1 and from 0.83 to 1, respectively. Once trained, kNN classification is at runtime prone to higher costs in terms of computational time and memory compared to the neural networks. The cost vs. accuracy ratio for each methodology needs to be taken into account in the decision of which method should be implemented in the application.


2018 ◽  
Vol 11 (2) ◽  
pp. 53-67
Author(s):  
Ajay Kumar ◽  
Shishir Kumar

Several initial center selection algorithms are proposed in the literature for numerical data, but the values of the categorical data are unordered so, these methods are not applicable to a categorical data set. This article investigates the initial center selection process for the categorical data and after that present a new support based initial center selection algorithm. The proposed algorithm measures the weight of unique data points of an attribute with the help of support and then integrates these weights along the rows, to get the support of every row. Further, a data object having the largest support is chosen as an initial center followed by finding other centers that are at the greatest distance from the initially selected center. The quality of the proposed algorithm is compared with the random initial center selection method, Cao's method, Wu method and the method introduced by Khan and Ahmad. Experimental analysis on real data sets shows the effectiveness of the proposed algorithm.


2010 ◽  
Vol 4 (1) ◽  
pp. 35-51 ◽  
Author(s):  
H.-W. Jacobi ◽  
F. Domine ◽  
W. R. Simpson ◽  
T. A. Douglas ◽  
M. Sturm

Abstract. The specific surface area (SSA) of the snow constitutes a powerful parameter to quantify the exchange of matter and energy between the snow and the atmosphere. However, currently no snow physics model can simulate the SSA. Therefore, two different types of empirical parameterizations of the specific surface area (SSA) of snow are implemented into the existing one-dimensional snow physics model CROCUS. The parameterizations are either based on diagnostic equations relating the SSA to parameters like snow type and density or on prognostic equations that describe the change of SSA depending on snow age, snowpack temperature, and the temperature gradient within the snowpack. Simulations with the upgraded CROCUS model were performed for a subarctic snowpack, for which an extensive data set including SSA measurements is available at Fairbanks, Alaska for the winter season 2003/2004. While a reasonable agreement between simulated and observed SSA values is obtained using both parameterizations, the model tends to overestimate the SSA. This overestimation is more pronounced using the diagnostic equations compared to the results of the prognostic equations. Parts of the SSA deviations using both parameterizations can be attributed to differences between simulated and observed snow heights, densities, and temperatures. Therefore, further sensitivity studies regarding the thermal budget of the snowpack were performed. They revealed that reducing the thermal conductivity of the snow or increasing the turbulent fluxes at the snow surfaces leads to a slight improvement of the simulated thermal budget of the snowpack compared to the observations. However, their impact on further simulated parameters like snow height and SSA remains small. Including additional physical processes in the snow model may have the potential to advance the simulations of the thermal budget of the snowpack and, thus, the SSA simulations.


This paper proposes an improved data compression technique compared to existing Lempel-Ziv-Welch (LZW) algorithm. LZW is a dictionary-updation based compression technique which stores elements from the data in the form of codes and uses them when those strings recur again. When the dictionary gets full, every element in the dictionary are removed in order to update dictionary with new entry. Therefore, the conventional method doesn’t consider frequently used strings and removes all the entry. This method is not an effective compression when the data to be compressed are large and when there are more frequently occurring string. This paper presents two new methods which are an improvement for the existing LZW compression algorithm. In this method, when the dictionary gets full, the elements that haven’t been used earlier are removed rather than removing every element of the dictionary which happens in the existing LZW algorithm. This is achieved by adding a flag to every element of the dictionary. Whenever an element is used the flag is set high. Thus, when the dictionary gets full, the dictionary entries where the flag was set high are kept and others are discarded. In the first method, the entries are discarded abruptly, whereas in the second method the unused elements are removed once at a time. Therefore, the second method gives enough time for the nascent elements of the dictionary. These techniques all fetch similar results when data set is small. This happens due to the fact that difference in the way they handle the dictionary when it’s full. Thus these improvements fetch better results only when a relatively large data is used. When all the three techniques' models were used to compare a data set with yields best case scenario, the compression ratios of conventional LZW is small compared to improved LZW method-1 and which in turn is small compared to improved LZW method-2.


Author(s):  
Hao Deng ◽  
Chao Ma ◽  
Lijun Shen ◽  
Chuanwu Yang

In this paper, we present a novel semi-supervised classification method based on sparse representation (SR) and multiple one-dimensional embedding-based adaptive interpolation (M1DEI). The main idea of M1DEI is to embed the data into multiple one-dimensional (1D) manifolds satisfying that the connected samples have shortest distance. In this way, the problem of high-dimensional data classification is transformed into a 1D classification problem. By alternating interpolation and averaging on the multiple 1D manifolds, the labeled sample set of the data can enlarge gradually. Obviously, proper metric facilitates more accurate embedding and further helps improve the classification performance. We develop a SR-based metric, which measures the affinity between samples more accurately than the common Euclidean distance. The experimental results on several databases show the effectiveness of the improvement.


Author(s):  
A. Andreini ◽  
A. Bonini ◽  
G. Caciolli ◽  
B. Facchini ◽  
S. Taddei

Due to the stringent cooling requirements of novel aero-engines combustor liners, a comprehensive understanding of the phenomena concerning the interaction of hot gases with typical coolant jets plays a major role in the design of efficient cooling systems. In this work, an aerodynamic analysis of the effusion cooling system of an aero-engine combustor liner was performed; the aim was the definition of a correlation for the discharge coefficient (CD) of the single effusion hole. The data were taken from a set of CFD RANS (Reynolds-averaged Navier-Stokes) simulations, in which the behavior of the effusion cooling system was investigated over a wide range of thermo/fluid-dynamics conditions. In some of these tests, the influence on the effusion flow of an additional air bleeding port was taken into account, making it possible to analyze its effects on effusion holes CD. An in depth analysis of the numerical data set has pointed out the opportunity of an efficient reduction through the ratio of the annulus and the hole Reynolds numbers: The dependence of the discharge coefficients from this parameter is roughly linear. The correlation was included in an in-house one-dimensional thermo/fluid network solver, and its results were compared with CFD data. An overall good agreement of pressure and mass flow rate distributions was observed. The main source of inaccuracy was observed in the case of relevant air bleed mass flow rates due to the inherent three-dimensional behavior of the flow close to bleed opening. An additional comparison with experimental data was performed in order to improve the confidence in the accuracy of the correlation: Within the validity range of pressure ratios in which the correlation is defined (>1.02), this comparison pointed out a good reliability in the prediction of discharge coefficients. An approach to model air bleeding was then proposed, with the assessment of its impact on liner wall temperature prediction.


Author(s):  
Antonia J. Jones ◽  
Dafydd Evans ◽  
Steve Margetts ◽  
Peter J. Durrant

The Gamma Test is a non-linear modelling analysis tool that allows us to quantify the extent to which a numerical input/output data set can be expressed as a smooth relationship. In essence, it allows us to efficiently calculate that part of the variance of the output that cannot be accounted for by the existence of any smooth model based on the inputs, even though this model is unknown. A key aspect of this tool is its speed: the Gamma Test has time complexity O(Mlog M), where M is the number of datapoints. For data sets consisting of a few thousand points and a reasonable number of attributes, a single run of the Gamma Test typically takes a few seconds. In this chapter we will show how the Gamma Test can be used in the construction of predictive models and classifiers for numerical data. In doing so, we will demonstrate the use of this technique for feature selection, and for the selection of embedding dimension when dealing with a time-series.


Sign in / Sign up

Export Citation Format

Share Document