An alternative classification method for northern Wisconsin lakes

1999 ◽  
Vol 56 (4) ◽  
pp. 661-669 ◽  
Author(s):  
Edward E Emmons ◽  
Martin J Jennings ◽  
Clayton Edwards

Wisconsin has nearly 15 000 lakes with great variation in limnology, morphometry, and origin. Classification of lakes into groups is a continuing goal. This study examines two alternative approaches to lake classification, one common and the other somewhat novel. Both approaches used lake morphometry and limnological variables and were compared for ability to form groups and assign lakes to groups with a high probability of correct classification. The first approach used nonhierarchical cluster analysis to form lake groups and discriminant analysis to put lakes into these groups. The second approach formed lake groups by iterative dichotomous splitting of the sampling space into smaller and smaller subspaces. Each binary split was done using nonhierarchical cluster analyses on a subset of the original variables. This iterative splitting resulted in a hierarchical classification tree with reduced dimensionality in comparison with the original data set. At each branch, multiple logistic regression was used to place lakes into nodes of the tree. Validation of both approaches was performed with a resubstitution analysis of the model building data set as well as a separate validation data set. The decision tree method yielded significantly lower rates of misclassification and was more easily interpreted than the discriminant analysis approach.

2019 ◽  
Vol 7 (3) ◽  
pp. SE113-SE122 ◽  
Author(s):  
Yunzhi Shi ◽  
Xinming Wu ◽  
Sergey Fomel

Salt boundary interpretation is important for the understanding of salt tectonics and velocity model building for seismic migration. Conventional methods consist of computing salt attributes and extracting salt boundaries. We have formulated the problem as 3D image segmentation and evaluated an efficient approach based on deep convolutional neural networks (CNNs) with an encoder-decoder architecture. To train the model, we design a data generator that extracts randomly positioned subvolumes from large-scale 3D training data set followed by data augmentation, then feed a large number of subvolumes into the network while using salt/nonsalt binary labels generated by thresholding the velocity model as ground truth labels. We test the model on validation data sets and compare the blind test predictions with the ground truth. Our results indicate that our method is capable of automatically capturing subtle salt features from the 3D seismic image with less or no need for manual input. We further test the model on a field example to indicate the generalization of this deep CNN method across different data sets.


2019 ◽  
Vol 2019 ◽  
pp. 1-10 ◽  
Author(s):  
Xiaojing Tian ◽  
Jun Wang ◽  
Zhongren Ma ◽  
Mingsheng Li ◽  
Zhenbo Wei

An E-panel, comprising an electronic nose (E-nose) and an electronic tongue (E-tongue), was used to distinguish the organoleptic characteristics of minced mutton adulterated with different proportions of pork. Meanwhile, the normalization, stepwise linear discriminant analysis (step-LDA), and principle component analysis were employed to merge the data matrix of E-nose and E-tongue. The discrimination results were evaluated and compared by canonical discriminant analysis (CDA) and Bayesian discriminant analysis (BAD). It was shown that the capability of discrimination of the combined system (classification error 0%∼1.67%) was superior or equable to that obtained with the two instruments separately, and E-tongue system (classification error for E-tongue 0∼2.5%) obtained higher accuracy than E-nose (classification error 0.83%∼10.83% for E-nose). For the combined system, the combination of extracted data of 6 PCs of E-nose and 5 PCs of E-tongue was proved to be the most effective method. In order to predict the pork proportion in adulterated mutton, multiple linear regression (MLR), partial least square analysis (PLS), and backpropagation neural network (BPNN) regression models were used, and the results were compared, aiming at building effective predictive models. Good correlations were found between the signals obtained from E-tongue, E-nose, and fusion data of E-nose and E-tongue and proportions of pork in minced mutton with correlation coefficients higher than 0.90 in the calibration and validation data sets. And BPNN was proved to be the most effective method for the prediction of pork proportions with R2 higher than 0.97 both for the calibration and validation data set. These results indicated that integration of E-nose and E-tongue could be a useful tool for the detection of mutton adulteration.


Author(s):  
Jahnavi Yeturu ◽  
Poongothai Elango ◽  
S. P. Raja ◽  
P. Nagendra Kumar

Genetics is the clinical review of congenital mutation, where the principal advantage of analyzing genetic mutation of humans is the exploration, analysis, interpretation and description of the genetic transmitted and inherited effect of several diseases such as cancer, diabetes and heart diseases. Cancer is the most troublesome and disordered affliction as the proportion of cancer sufferers is growing massively. Identification and discrimination of the mutations that impart to the enlargement of tumor from the unbiased mutations is difficult, as majority tumors of cancer are able to exercise genetic mutations. The genetic mutations are systematized and categorized to sort the cancer by way of medical observations and considering clinical studies. At the present time, genetic mutations are being annotated and these interpretations are being accomplished either manually or using the existing primary algorithms. Evaluation and classification of each and every individual genetic mutation was basically predicated on evidence from documented content built on medical literature. Consequently, as a means to build genetic mutations, basically, depending on the clinical evidences persists a challenging task. There exist various algorithms such as one hot encoding technique is used to derive features from genes and their variations, TF-IDF is used to extract features from the clinical text data. In order to increase the accuracy of the classification, machine learning algorithms such as support vector machine, logistic regression, Naive Bayes, etc., are experimented. A stacking model classifier has been developed to increase the accuracy. The proposed stacking model classifier has obtained the log loss 0.8436 and 0.8572 for cross-validation data set and test data set, respectively. By the experimentation, it has been proved that the proposed stacking model classifier outperforms the existing algorithms in terms of log loss. Basically, minimum log loss refers to the efficient model. Here the log loss has been reduced to less than 1 by using the proposed stacking model classifier. The performance of these algorithms can be gauged on the basis of the various measures like multi-class log loss.


2018 ◽  
Vol 49 ◽  
pp. 00017 ◽  
Author(s):  
Bernardeta Dębska

Resin mortars belong to the group of concrete-like construction composites. They are obtained by mixing a synthetic resin with a hardener and an appropriately selected aggregate. The latter component is usually as much as 90% of the composite mass and can largely shape the characteristics of the finished product. The fact that the type of filler used can significantly differentiate the values of physical and mechanical parameters of epoxy mortars is confirmed by the results of the exploratory data analysis method used in this article, which is discriminant analysis. This allows us to examine differences between groups of objects based on a set of selected independent variables (predictors). It is used to solve a wide range of classification and prediction problems. The core of discriminant analysis is a model presented in the form of a linear combination of independent variables, which allows classification of observations (e.g. test mortars) into one of the groups that are of interest to the researcher. In discriminant analysis one can distinguish the learning stage (model building), in which classification rules are created based on research results (training set) and the classification stage, i.e. the use of the model, e.g. for testing its prognostic accuracy.


Author(s):  
B. B. van der Horst ◽  
R. C. Lindenbergh ◽  
S. W. J. Puister

<p><strong>Abstract.</strong> Road surface anomalies affect driving conditions, such as driving comfort and safety. Examples for such anomalies are potholes, cracks and ravelling. Automatic detection and localisation of these anomalies can be used for targeted road maintenance. Currently road damage is detected by road inspectors who drive slowly on the road to look out for surface anomalies, which can be dangerous. For improving the safety road inspectors can evaluate road images. However, results may be different as this evaluation is subjective. In this research a method is created for detecting road damage by using mobile profile laser scan data. First features are created, based on a sliding window. Then K-means clustering is used to create training data for a Random Forest algorithm. Finally, mathematical morphological operations are used to clean the data and connect the damage points. The result is an objective and detailed damage classification. The method is tested on a 120 meters long road data set that includes different types of damage. Validation is done by comparing the results to a classification of a human road inspector. However, the damage classification of the proposed method contains more details which makes validation difficult. Nevertheless does this method result in 79% overlap with the validation data. Although the results are already promising, developments such as pre-processing the data could lead to improvements.</p>


2020 ◽  
Vol 4 (1) ◽  
pp. 64
Author(s):  
Md Zannatul Arif ◽  
Rahate Ahmed ◽  
Umma Habiba Sadia ◽  
Mst Shanta Islam Tultul ◽  
Rocky Chakma

The motive of the investigation is analyzing the categorization of fetal state code from the Cardiographic data set based on decision tree method. Cardiotocography is one of the important tools for monitoring heart rate, and this technique is widely used worldwide. Cardiotocography is applied for diagnosing pregnancy and checking fetal heart rate state condition until before delivery. This classification is necessary to predict fetal heart rate situation which is belonging. In this paper, we are using three input attributes of training data set quoted by LB, AC, and FM to categorize as normal, suspect or pathological where NSPF variable is used as a response variable. After drawing necessary analysis into three variables we get the 19 nodes of classification tree and also we have measured every single node according to statistic, criterion, weights, and values. The Cardiotocography Dataset applied in this study is received from UCI Machine Learning Repository. The dataset contains 2126 observation instances with 22 attributes. In this experiment, the highest accuracy is 98.7%. Overall, the experimental results proved the viability of Classification and Regression Trees and its potential for further predictions.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium provided the original work is properly cited. 


2006 ◽  
Vol 29 (1) ◽  
pp. 153-162
Author(s):  
Pratul Kumar Saraswati ◽  
Sanjeev V Sabnis

Paleontologists use statistical methods for prediction and classification of taxa. Over the years, the statistical analyses of morphometric data are carried out under the assumption of multivariate normality. In an earlier study, three closely resembling species of a biostratigraphically important genus Nummulites were discriminated by multi-group discrimination. Two discriminant functions that used diameter and thickness of the tests and height and length of chambers in the final whorl accounted for nearly 100% discrimination. In this paper Classification and Regression Tree (CART), a non-parametric method, is used for classification and prediction of the same data set. In all 111 iterations of CART methodology are performed by splitting the data set of 55 observations into training, validation and test data sets in varying proportions. In the validation data sets 40% of the iterations are correctly classified and only one case of misclassification in 49% of the iterations is noted. As regards test data sets, nearly 70% contain no misclassification cases whereas in about 25% test data sets only one case of misclassification is found. The results suggest that the method is highly successful in assigning an individual to a particular species. The key variables on the basis of which tree models are built are combinations of thickness of the test (T), height of the chambers in the final whorl (HL) and diameter of the test (D). Both discriminant analysis and CART thus appear to be comparable in discriminating the three species. However, CART reduces the number of requisite variables without increasing the misclassification error. The method is very useful for professional geologists for quick identification of species.


Author(s):  
Emilia Mendes

Building effort models or using techniques to obtain a measure of estimated effort does not mean that the effort estimates obtained will be accurate. As such, it is also important and necessary to assess the estimation accuracy of the effort models or techniques under scrutiny. For this, we need to employ a process called cross-validation. Cross-validation means that part of the original data set is used to build an effort model, or is used by an effort estimation technique, leaving the remainder of the data set (data not used in the model-building process) to be used to validate the model or technique. In addition, in parallel with conducting cross-validation, prediction accuracy measures are also obtained. Examples of de facto accuracy measures are the mean magnitude of relative error (MMRE), the median magnitude of relative error (MdMRE), and prediction at 25% (Pred[25]).


2017 ◽  
Author(s):  
Ariel Rokem ◽  
Yue Wu ◽  
Aaron Lee

AbstractDeep learning algorithms have tremendous potential utility in the classification of biomedical images. For example, images acquired with retinal optical coherence tomography (OCT) can be used to accurately classify patients with adult macular degeneration (AMD), and distinguish them from healthy control patients. However, previous research has suggested that large amounts of data are required in order to train deep learning algorithms, because of the large number of parameters that need to be fit. Here, we show that a moderate amount of data (data from approximately 1,800 patients) may be enough to reach close-to-maximal performance in the classification of AMD patients from OCT images. These results suggest that deep learning algorithms can be trained on moderate amounts of data, provided that images are relatively homogenous, and the effective number of parameters is sufficiently small. Furthermore, we demonstrate that in this application, cross-validation with a separate test set that is not used in any part of the training does not differ substantially from cross-validation with a validation data-set used to determine the optimal stopping point for training.


Sign in / Sign up

Export Citation Format

Share Document