scholarly journals Shape Dimensionality Metrics for Landmark Data

2020 ◽  
Author(s):  
F. Robin O’Keefe

AbstractThe study of modularity in geometric morphometric landmark data has focused attention on an underlying question, that of whole-shape modularity, or the pattern and strength of covariation among all landmarks. Measuring whole-shape modularity allows measurement of the dimensionality of the shape, but current methods used to measure this dimensionality are limited in application. This paper proposes a metric for measuring the “effective dimensionality”, De, of geometric morphometric landmark data based on the Shannon entropy of the eigenvalue vector of the covariance matrix of GPA landmark data. A permutation test to establish null rank deficiency is developed to allow standardization for comparing dimensionality metrics between data sets, and a bootstrap test is employed for measures of dispersion. These novel methods are applied to a data set of 14 landmarks taken from 119 dire wolf jaws from Rancho La Brea. Comparison with the current test based on eigenvalue dispersion demonstrates that the new metric is more sensitive to detecting population differences in whole-shape modularity. The effective dimensionality metric is extended, in the dense semilandmark case, to a measure of “latent dimensionality”, Dl. Latent dimensionality should be comparable among landmark spaces, whether they are homologous or not.

Author(s):  
Fred L. Bookstein

AbstractA matrix manipulation new to the quantitative study of develomental stability reveals unexpected morphometric patterns in a classic data set of landmark-based calvarial growth. There are implications for evolutionary studies. Among organismal biology’s fundamental postulates is the assumption that most aspects of any higher animal’s growth trajectories are dynamically stable, resilient against the types of small but functionally pertinent transient perturbations that may have originated in genotype, morphogenesis, or ecophenotypy. We need an operationalization of this axiom for landmark data sets arising from longitudinal data designs. The present paper introduces a multivariate approach toward that goal: a method for identification and interpretation of patterns of dynamical stability in longitudinally collected landmark data. The new method is based in an application of eigenanalysis unfamiliar to most organismal biologists: analysis of a covariance matrix of Boas coordinates (Procrustes coordinates without the size standardization) against their changes over time. These eigenanalyses may yield complex eigenvalues and eigenvectors (terms involving $$i=\sqrt{-1}$$ i = - 1 ); the paper carefully explains how these are to be scattered, gridded, and interpreted by their real and imaginary canonical vectors. For the Vilmann neurocranial octagons, the classic morphometric data set used as the running example here, there result new empirical findings that offer a pattern analysis of the ways perturbations of growth are attenuated or otherwise modified over the course of developmental time. The main finding, dominance of a generalized version of dynamical stability (negative autoregressions, as announced by the negative real parts of their eigenvalues, often combined with shearing and rotation in a helpful canonical plane), is surprising in its strength and consistency. A closing discussion explores some implications of this novel pattern analysis of growth regulation. It differs in many respects from the usual way covariance matrices are wielded in geometric morphometrics, differences relevant to a variety of study designs for comparisons of development across species.


Author(s):  
Fred L. Bookstein

AbstractThe geometric morphometric (GMM) construction of Procrustes shape coordinates from a data set of homologous landmark configurations puts exact algebraic constraints on position, orientation, and geometric scale. While position as digitized is not ordinarily a biologically meaningful quantity, and orientation is relevant mainly when some organismal function interacts with a Cartesian positional gradient such as horizontality, size per se is a crucially important biometric concept, especially in contexts like growth, biomechanics, or bioenergetics. “Normalizing” or “standardizing” size (usually by dividing the square root of the summed squared distances from the centroid out of all the Cartesian coordinates specimen by specimen), while associated with the elegant symmetries of the Mardia–Dryden distribution in shape space, nevertheless can substantially impeach the validity of any organismal inferences that ensue. This paper adapts two variants of standard morphometric least-squares, principal components and uniform strains, to circumvent size standardization while still accommodating an analytic toolkit for studies of differential growth that supports landmark-by-landmark graphics and thin-plate splines. Standardization of position and orientation but not size yields the coordinates Franz Boas first discussed in 1905. In studies of growth, a first principal component of these coordinates often appears to involve most landmarks shifting almost directly away from their centroid, hence the proposed model’s name, “centric allometry.” There is also a joint standardization of shear and dilation resulting in a variant of standard GMM’s “nonaffine shape coordinates” where scale information is subsumed in the affine term. Studies of growth allometry should go better in the Boas system than in the Procrustes shape space that is the current conventional workbench for GMM analyses. I demonstrate two examples of this revised approach (one developmental, one phylogenetic) that retrieve all the findings of a conventional shape-space-based approach while focusing much more closely on the phenomenon of allometric growth per se. A three-part Appendix provides an overview of the algebra, highlighting both similarities to the Procrustes approach and contrasts with it.


Paleobiology ◽  
2017 ◽  
Vol 43 (3) ◽  
pp. 508-520 ◽  
Author(s):  
Katie S. Collins ◽  
Michael F. Gazley

AbstractMost geometric morphometric studies are underpinned by sets of photographs of specimens. The camera lens distorts the images it takes, and the extent of the distortion will depend on factors such as the make and model of the lens and camera and user-controlled variation such as the zoom of the lens. Any study that uses populations of geometric data digitized from photographs will have shape variation introduced into the data set simply by the photographic process. We illustrate the nature and magnitude of this error using a 30-specimen data set of Recent New Zealand Mactridae (Mollusca: Bivalvia), using only a single camera and camera lens with four different photographic setups. We then illustrate the use of retrodeformation in Adobe Photoshop and test the magnitude of the variation in the data set using multivariate Procrustes analysis of variance. The effect of photographic method on the variance in the data set is significant, systematic, and predictable and, if not accounted for, could lead to misleading results, suggest clustering of specimens in ordinations that has no biological basis, or induce artificial oversplitting of taxa. Recommendations to minimize and quantify distortion include: (1) that studies avoid mixing data sets from different cameras, lenses, or photographic setups; (2) that studies avoid placing specimens or scale bars near the edges of the photographs; (3) that the same camera settings are maintained (as much as practical) for every image in a data set; (4) that care is taken when using full-frame cameras; and (5) that a reference grid is used to correct for or quantify distortion.


2018 ◽  
Vol 154 (2) ◽  
pp. 149-155
Author(s):  
Michael Archer

1. Yearly records of worker Vespula germanica (Fabricius) taken in suction traps at Silwood Park (28 years) and at Rothamsted Research (39 years) are examined. 2. Using the autocorrelation function (ACF), a significant negative 1-year lag followed by a lesser non-significant positive 2-year lag was found in all, or parts of, each data set, indicating an underlying population dynamic of a 2-year cycle with a damped waveform. 3. The minimum number of years before the 2-year cycle with damped waveform was shown varied between 17 and 26, or was not found in some data sets. 4. Ecological factors delaying or preventing the occurrence of the 2-year cycle are considered.


2018 ◽  
Vol 21 (2) ◽  
pp. 117-124 ◽  
Author(s):  
Bakhtyar Sepehri ◽  
Nematollah Omidikia ◽  
Mohsen Kompany-Zareh ◽  
Raouf Ghavami

Aims & Scope: In this research, 8 variable selection approaches were used to investigate the effect of variable selection on the predictive power and stability of CoMFA models. Materials & Methods: Three data sets including 36 EPAC antagonists, 79 CD38 inhibitors and 57 ATAD2 bromodomain inhibitors were modelled by CoMFA. First of all, for all three data sets, CoMFA models with all CoMFA descriptors were created then by applying each variable selection method a new CoMFA model was developed so for each data set, 9 CoMFA models were built. Obtained results show noisy and uninformative variables affect CoMFA results. Based on created models, applying 5 variable selection approaches including FFD, SRD-FFD, IVE-PLS, SRD-UVEPLS and SPA-jackknife increases the predictive power and stability of CoMFA models significantly. Result & Conclusion: Among them, SPA-jackknife removes most of the variables while FFD retains most of them. FFD and IVE-PLS are time consuming process while SRD-FFD and SRD-UVE-PLS run need to few seconds. Also applying FFD, SRD-FFD, IVE-PLS, SRD-UVE-PLS protect CoMFA countor maps information for both fields.


Author(s):  
Kyungkoo Jun

Background & Objective: This paper proposes a Fourier transform inspired method to classify human activities from time series sensor data. Methods: Our method begins by decomposing 1D input signal into 2D patterns, which is motivated by the Fourier conversion. The decomposition is helped by Long Short-Term Memory (LSTM) which captures the temporal dependency from the signal and then produces encoded sequences. The sequences, once arranged into the 2D array, can represent the fingerprints of the signals. The benefit of such transformation is that we can exploit the recent advances of the deep learning models for the image classification such as Convolutional Neural Network (CNN). Results: The proposed model, as a result, is the combination of LSTM and CNN. We evaluate the model over two data sets. For the first data set, which is more standardized than the other, our model outperforms previous works or at least equal. In the case of the second data set, we devise the schemes to generate training and testing data by changing the parameters of the window size, the sliding size, and the labeling scheme. Conclusion: The evaluation results show that the accuracy is over 95% for some cases. We also analyze the effect of the parameters on the performance.


2019 ◽  
Vol 73 (8) ◽  
pp. 893-901
Author(s):  
Sinead J. Barton ◽  
Bryan M. Hennelly

Cosmic ray artifacts may be present in all photo-electric readout systems. In spectroscopy, they present as random unidirectional sharp spikes that distort spectra and may have an affect on post-processing, possibly affecting the results of multivariate statistical classification. A number of methods have previously been proposed to remove cosmic ray artifacts from spectra but the goal of removing the artifacts while making no other change to the underlying spectrum is challenging. One of the most successful and commonly applied methods for the removal of comic ray artifacts involves the capture of two sequential spectra that are compared in order to identify spikes. The disadvantage of this approach is that at least two recordings are necessary, which may be problematic for dynamically changing spectra, and which can reduce the signal-to-noise (S/N) ratio when compared with a single recording of equivalent duration due to the inclusion of two instances of read noise. In this paper, a cosmic ray artefact removal algorithm is proposed that works in a similar way to the double acquisition method but requires only a single capture, so long as a data set of similar spectra is available. The method employs normalized covariance in order to identify a similar spectrum in the data set, from which a direct comparison reveals the presence of cosmic ray artifacts, which are then replaced with the corresponding values from the matching spectrum. The advantage of the proposed method over the double acquisition method is investigated in the context of the S/N ratio and is applied to various data sets of Raman spectra recorded from biological cells.


2013 ◽  
Vol 756-759 ◽  
pp. 3652-3658
Author(s):  
You Li Lu ◽  
Jun Luo

Under the study of Kernel Methods, this paper put forward two improved algorithm which called R-SVM & I-SVDD in order to cope with the imbalanced data sets in closed systems. R-SVM used K-means algorithm clustering space samples while I-SVDD improved the performance of original SVDD by imbalanced sample training. Experiment of two sets of system call data set shows that these two algorithms are more effectively and R-SVM has a lower complexity.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Yahya Albalawi ◽  
Jim Buckley ◽  
Nikola S. Nikolov

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.


2021 ◽  
Vol 99 (Supplement_1) ◽  
pp. 218-219
Author(s):  
Andres Fernando T Russi ◽  
Mike D Tokach ◽  
Jason C Woodworth ◽  
Joel M DeRouchey ◽  
Robert D Goodband ◽  
...  

Abstract The swine industry has been constantly evolving to select animals with improved performance traits and to minimize variation in body weight (BW) in order to meet packer specifications. Therefore, understanding variation presents an opportunity for producers to find strategies that could help reduce, manage, or deal with variation of pigs in a barn. A systematic review and meta-analysis was conducted by collecting data from multiple studies and available data sets in order to develop prediction equations for coefficient of variation (CV) and standard deviation (SD) as a function of BW. Information regarding BW variation from 16 papers was recorded to provide approximately 204 data points. Together, these data included 117,268 individually weighed pigs with a sample size that ranged from 104 to 4,108 pigs. A random-effects model with study used as a random effect was developed. Observations were weighted using sample size as an estimate for precision on the analysis, where larger data sets accounted for increased accuracy in the model. Regression equations were developed using the nlme package of R to determine the relationship between BW and its variation. Polynomial regression analysis was conducted separately for each variation measurement. When CV was reported in the data set, SD was calculated and vice versa. The resulting prediction equations were: CV (%) = 20.04 – 0.135 × (BW) + 0.00043 × (BW)2, R2=0.79; SD = 0.41 + 0.150 × (BW) - 0.00041 × (BW)2, R2 = 0.95. These equations suggest that there is evidence for a decreasing quadratic relationship between mean CV of a population and BW of pigs whereby the rate of decrease is smaller as mean pig BW increases from birth to market. Conversely, the rate of increase of SD of a population of pigs is smaller as mean pig BW increases from birth to market.


Sign in / Sign up

Export Citation Format

Share Document