Shape Dimensionality Metrics for Landmark Data

Mapping Intimacies ◽

10.1101/2020.07.23.218289 ◽

2020 ◽

Author(s):

F. Robin O’Keefe

Keyword(s):

Permutation Test ◽

Data Sets ◽

Data Set ◽

Geometric Morphometric ◽

Current Test ◽

Landmark Data ◽

Measures Of Dispersion ◽

Effective Dimensionality ◽

Dire Wolf ◽

Rancho La Brea

AbstractThe study of modularity in geometric morphometric landmark data has focused attention on an underlying question, that of whole-shape modularity, or the pattern and strength of covariation among all landmarks. Measuring whole-shape modularity allows measurement of the dimensionality of the shape, but current methods used to measure this dimensionality are limited in application. This paper proposes a metric for measuring the “effective dimensionality”, De, of geometric morphometric landmark data based on the Shannon entropy of the eigenvalue vector of the covariance matrix of GPA landmark data. A permutation test to establish null rank deficiency is developed to allow standardization for comparing dimensionality metrics between data sets, and a bootstrap test is employed for measures of dispersion. These novel methods are applied to a data set of 14 landmarks taken from 119 dire wolf jaws from Rancho La Brea. Comparison with the current test based on eigenvalue dispersion demonstrates that the new metric is more sensitive to detecting population differences in whole-shape modularity. The effective dimensionality metric is extended, in the dense semilandmark case, to a measure of “latent dimensionality”, Dl. Latent dimensionality should be comparable among landmark spaces, whether they are homologous or not.

Download Full-text

A New Method for Landmark-Based Studies of the Dynamic Stability of Growth, with Implications for Evolutionary Analyses

Evolutionary Biology ◽

10.1007/s11692-021-09548-8 ◽

2021 ◽

Author(s):

Fred L. Bookstein

Keyword(s):

Growth Regulation ◽

Pattern Analysis ◽

Developmental Time ◽

Dynamical Stability ◽

New Method ◽

Data Sets ◽

Data Set ◽

Landmark Data ◽

Study Designs ◽

Changes Over Time

AbstractA matrix manipulation new to the quantitative study of develomental stability reveals unexpected morphometric patterns in a classic data set of landmark-based calvarial growth. There are implications for evolutionary studies. Among organismal biology’s fundamental postulates is the assumption that most aspects of any higher animal’s growth trajectories are dynamically stable, resilient against the types of small but functionally pertinent transient perturbations that may have originated in genotype, morphogenesis, or ecophenotypy. We need an operationalization of this axiom for landmark data sets arising from longitudinal data designs. The present paper introduces a multivariate approach toward that goal: a method for identification and interpretation of patterns of dynamical stability in longitudinally collected landmark data. The new method is based in an application of eigenanalysis unfamiliar to most organismal biologists: analysis of a covariance matrix of Boas coordinates (Procrustes coordinates without the size standardization) against their changes over time. These eigenanalyses may yield complex eigenvalues and eigenvectors (terms involving $$i=\sqrt{-1}$$ i = - 1 ); the paper carefully explains how these are to be scattered, gridded, and interpreted by their real and imaginary canonical vectors. For the Vilmann neurocranial octagons, the classic morphometric data set used as the running example here, there result new empirical findings that offer a pattern analysis of the ways perturbations of growth are attenuated or otherwise modified over the course of developmental time. The main finding, dominance of a generalized version of dynamical stability (negative autoregressions, as announced by the negative real parts of their eigenvalues, often combined with shearing and rotation in a helpful canonical plane), is surprising in its strength and consistency. A closing discussion explores some implications of this novel pattern analysis of growth regulation. It differs in many respects from the usual way covariance matrices are wielded in geometric morphometrics, differences relevant to a variety of study designs for comparisons of development across species.

Download Full-text

Centric Allometry: Studying Growth Using Landmark Data

Evolutionary Biology ◽

10.1007/s11692-020-09530-w ◽

2021 ◽

Author(s):

Fred L. Bookstein

Keyword(s):

Principal Component ◽

Shape Space ◽

Allometric Growth ◽

Differential Growth ◽

Data Set ◽

Geometric Morphometric ◽

Franz Boas ◽

Landmark Data ◽

Per Se ◽

Algebraic Constraints

AbstractThe geometric morphometric (GMM) construction of Procrustes shape coordinates from a data set of homologous landmark configurations puts exact algebraic constraints on position, orientation, and geometric scale. While position as digitized is not ordinarily a biologically meaningful quantity, and orientation is relevant mainly when some organismal function interacts with a Cartesian positional gradient such as horizontality, size per se is a crucially important biometric concept, especially in contexts like growth, biomechanics, or bioenergetics. “Normalizing” or “standardizing” size (usually by dividing the square root of the summed squared distances from the centroid out of all the Cartesian coordinates specimen by specimen), while associated with the elegant symmetries of the Mardia–Dryden distribution in shape space, nevertheless can substantially impeach the validity of any organismal inferences that ensue. This paper adapts two variants of standard morphometric least-squares, principal components and uniform strains, to circumvent size standardization while still accommodating an analytic toolkit for studies of differential growth that supports landmark-by-landmark graphics and thin-plate splines. Standardization of position and orientation but not size yields the coordinates Franz Boas first discussed in 1905. In studies of growth, a first principal component of these coordinates often appears to involve most landmarks shifting almost directly away from their centroid, hence the proposed model’s name, “centric allometry.” There is also a joint standardization of shear and dilation resulting in a variant of standard GMM’s “nonaffine shape coordinates” where scale information is subsumed in the affine term. Studies of growth allometry should go better in the Boas system than in the Procrustes shape space that is the current conventional workbench for GMM analyses. I demonstrate two examples of this revised approach (one developmental, one phylogenetic) that retrieve all the findings of a conventional shape-space-based approach while focusing much more closely on the phenomenon of allometric growth per se. A three-part Appendix provides an overview of the algebra, highlighting both similarities to the Procrustes approach and contrasts with it.

Download Full-text

Does my posterior look big in this? The effect of photographic distortion on morphometric analyses

Paleobiology ◽

10.1017/pab.2016.48 ◽

2017 ◽

Vol 43 (3) ◽

pp. 508-520 ◽

Cited By ~ 5

Author(s):

Katie S. Collins ◽

Michael F. Gazley

Keyword(s):

Procrustes Analysis ◽

Data Sets ◽

Single Camera ◽

Photographic Method ◽

Shape Variation ◽

Data Set ◽

Geometric Morphometric ◽

Camera Lens ◽

Morphometric Analyses ◽

Reference Grid

AbstractMost geometric morphometric studies are underpinned by sets of photographs of specimens. The camera lens distorts the images it takes, and the extent of the distortion will depend on factors such as the make and model of the lens and camera and user-controlled variation such as the zoom of the lens. Any study that uses populations of geometric data digitized from photographs will have shape variation introduced into the data set simply by the photographic process. We illustrate the nature and magnitude of this error using a 30-specimen data set of Recent New Zealand Mactridae (Mollusca: Bivalvia), using only a single camera and camera lens with four different photographic setups. We then illustrate the use of retrodeformation in Adobe Photoshop and test the magnitude of the variation in the data set using multivariate Procrustes analysis of variance. The effect of photographic method on the variance in the data set is significant, systematic, and predictable and, if not accounted for, could lead to misleading results, suggest clustering of specimens in ordinations that has no biological basis, or induce artificial oversplitting of taxa. Recommendations to minimize and quantify distortion include: (1) that studies avoid mixing data sets from different cameras, lenses, or photographic setups; (2) that studies avoid placing specimens or scale bars near the edges of the photographs; (3) that the same camera settings are maintained (as much as practical) for every image in a data set; (4) that care is taken when using full-frame cameras; and (5) that a reference grid is used to correct for or quantify distortion.

Download Full-text

The social wasp Vespula germanica (Fabricius) (Hymenoptera: Vespidae) population dynamics in England over 39 years.

The Entomologist s monthly magazine ◽

10.31184/m00138908.1542.3906 ◽

2018 ◽

Vol 154 (2) ◽

pp. 149-155

Author(s):

Michael Archer

Keyword(s):

Population Dynamics ◽

Population Dynamic ◽

Ecological Factors ◽

Social Wasp ◽

Data Sets ◽

Data Set ◽

Vespula Germanica ◽

The Social ◽

Minimum Number ◽

Suction Traps

1. Yearly records of worker Vespula germanica (Fabricius) taken in suction traps at Silwood Park (28 years) and at Rothamsted Research (39 years) are examined. 2. Using the autocorrelation function (ACF), a significant negative 1-year lag followed by a lesser non-significant positive 2-year lag was found in all, or parts of, each data set, indicating an underlying population dynamic of a 2-year cycle with a damped waveform. 3. The minimum number of years before the 2-year cycle with damped waveform was shown varied between 17 and 26, or was not found in some data sets. 4. Ecological factors delaying or preventing the occurrence of the 2-year cycle are considered.

Download Full-text

Predictive and Descriptive CoMFA Models: The Effect of Variable Selection

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207321666180212162028 ◽

2018 ◽

Vol 21 (2) ◽

pp. 117-124 ◽

Cited By ~ 4

Author(s):

Bakhtyar Sepehri ◽

Nematollah Omidikia ◽

Mohsen Kompany-Zareh ◽

Raouf Ghavami

Keyword(s):

Variable Selection ◽

Predictive Power ◽

Selection Method ◽

Data Sets ◽

Data Set ◽

Comfa Model ◽

Variable Selection Method

Aims & Scope: In this research, 8 variable selection approaches were used to investigate the effect of variable selection on the predictive power and stability of CoMFA models. Materials & Methods: Three data sets including 36 EPAC antagonists, 79 CD38 inhibitors and 57 ATAD2 bromodomain inhibitors were modelled by CoMFA. First of all, for all three data sets, CoMFA models with all CoMFA descriptors were created then by applying each variable selection method a new CoMFA model was developed so for each data set, 9 CoMFA models were built. Obtained results show noisy and uninformative variables affect CoMFA results. Based on created models, applying 5 variable selection approaches including FFD, SRD-FFD, IVE-PLS, SRD-UVEPLS and SPA-jackknife increases the predictive power and stability of CoMFA models significantly. Result & Conclusion: Among them, SPA-jackknife removes most of the variables while FFD retains most of them. FFD and IVE-PLS are time consuming process while SRD-FFD and SRD-UVE-PLS run need to few seconds. Also applying FFD, SRD-FFD, IVE-PLS, SRD-UVE-PLS protect CoMFA countor maps information for both fields.

Download Full-text

Human Activity Recognition using Fourier Transform Inspired Deep Learning Combination Model

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327908666180727123657 ◽

2019 ◽

Vol 9 (1) ◽

pp. 16-31

Author(s):

Kyungkoo Jun

Keyword(s):

Fourier Transform ◽

Deep Learning ◽

Short Term Memory ◽

Window Size ◽

Sensor Data ◽

Data Sets ◽

Data Set ◽

Proposed Model ◽

Testing Data ◽

Labeling Scheme

Background & Objective: This paper proposes a Fourier transform inspired method to classify human activities from time series sensor data. Methods: Our method begins by decomposing 1D input signal into 2D patterns, which is motivated by the Fourier conversion. The decomposition is helped by Long Short-Term Memory (LSTM) which captures the temporal dependency from the signal and then produces encoded sequences. The sequences, once arranged into the 2D array, can represent the fingerprints of the signals. The benefit of such transformation is that we can exploit the recent advances of the deep learning models for the image classification such as Convolutional Neural Network (CNN). Results: The proposed model, as a result, is the combination of LSTM and CNN. We evaluate the model over two data sets. For the first data set, which is more standardized than the other, our model outperforms previous works or at least equal. In the case of the second data set, we devise the schemes to generate training and testing data by changing the parameters of the window size, the sliding size, and the labeling scheme. Conclusion: The evaluation results show that the accuracy is over 95% for some cases. We also analyze the effect of the parameters on the performance.

Download Full-text

An Algorithm for the Removal of Cosmic Ray Artifacts in Spectral Data Sets

Applied Spectroscopy ◽

10.1177/0003702819839098 ◽

2019 ◽

Vol 73 (8) ◽

pp. 893-901

Author(s):

Sinead J. Barton ◽

Bryan M. Hennelly

Keyword(s):

Cosmic Ray ◽

Data Sets ◽

Biological Cells ◽

Statistical Classification ◽

Signal To Noise ◽

Multivariate Statistical ◽

Data Set ◽

Artefact Removal ◽

Single Capture ◽

Acquisition Method

Cosmic ray artifacts may be present in all photo-electric readout systems. In spectroscopy, they present as random unidirectional sharp spikes that distort spectra and may have an affect on post-processing, possibly affecting the results of multivariate statistical classification. A number of methods have previously been proposed to remove cosmic ray artifacts from spectra but the goal of removing the artifacts while making no other change to the underlying spectrum is challenging. One of the most successful and commonly applied methods for the removal of comic ray artifacts involves the capture of two sequential spectra that are compared in order to identify spikes. The disadvantage of this approach is that at least two recordings are necessary, which may be problematic for dynamically changing spectra, and which can reduce the signal-to-noise (S/N) ratio when compared with a single recording of equivalent duration due to the inclusion of two instances of read noise. In this paper, a cosmic ray artefact removal algorithm is proposed that works in a similar way to the double acquisition method but requires only a single capture, so long as a data set of similar spectra is available. The method employs normalized covariance in order to identify a similar spectrum in the data set, from which a direct comparison reveals the presence of cosmic ray artifacts, which are then replaced with the corresponding values from the matching spectrum. The advantage of the proposed method over the double acquisition method is investigated in the context of the S/N ratio and is applied to various data sets of Raman spectra recorded from biological cells.

Download Full-text

Imbalanced Data Detection Kernel Method in Closed Systems

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.3652 ◽

2013 ◽

Vol 756-759 ◽

pp. 3652-3658

Author(s):

You Li Lu ◽

Jun Luo

Keyword(s):

Kernel Methods ◽

Kernel Method ◽

Imbalanced Data ◽

Data Detection ◽

Data Sets ◽

System Call ◽

Data Set ◽

Imbalanced Data Sets ◽

Lower Complexity ◽

Closed Systems

Under the study of Kernel Methods, this paper put forward two improved algorithm which called R-SVM & I-SVDD in order to cope with the imbalanced data sets in closed systems. R-SVM used K-means algorithm clustering space samples while I-SVDD improved the performance of original SVDD by imbalanced sample training. Experiment of two sets of system call data set shows that these two algorithms are more effectively and R-SVM has a lower complexity.

Download Full-text

Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media

Journal Of Big Data ◽

10.1186/s40537-021-00488-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Yahya Albalawi ◽

Jim Buckley ◽

Nikola S. Nikolov

Keyword(s):

Social Media ◽

Deep Learning ◽

Comprehensive Evaluation ◽

Classification Problem ◽

Data Sets ◽

Word Embeddings ◽

Data Set ◽

Lower Accuracy ◽

Health Related ◽

The Impact

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.

Download Full-text

PSVI-8 Meta-regression Analysis to Determine the Relationship Between Growing Pig Body Weight and Variation

Journal of Animal Science ◽

10.1093/jas/skab054.357 ◽

2021 ◽

Vol 99 (Supplement_1) ◽

pp. 218-219

Author(s):

Andres Fernando T Russi ◽

Mike D Tokach ◽

Jason C Woodworth ◽

Joel M DeRouchey ◽

Robert D Goodband ◽

...

Keyword(s):

Body Weight ◽

Regression Analysis ◽

Sample Size ◽

Polynomial Regression ◽

Data Sets ◽

Regression Equations ◽

Prediction Equations ◽

Data Set ◽

Rate Of Increase ◽

The Relationship

Abstract The swine industry has been constantly evolving to select animals with improved performance traits and to minimize variation in body weight (BW) in order to meet packer specifications. Therefore, understanding variation presents an opportunity for producers to find strategies that could help reduce, manage, or deal with variation of pigs in a barn. A systematic review and meta-analysis was conducted by collecting data from multiple studies and available data sets in order to develop prediction equations for coefficient of variation (CV) and standard deviation (SD) as a function of BW. Information regarding BW variation from 16 papers was recorded to provide approximately 204 data points. Together, these data included 117,268 individually weighed pigs with a sample size that ranged from 104 to 4,108 pigs. A random-effects model with study used as a random effect was developed. Observations were weighted using sample size as an estimate for precision on the analysis, where larger data sets accounted for increased accuracy in the model. Regression equations were developed using the nlme package of R to determine the relationship between BW and its variation. Polynomial regression analysis was conducted separately for each variation measurement. When CV was reported in the data set, SD was calculated and vice versa. The resulting prediction equations were: CV (%) = 20.04 – 0.135 × (BW) + 0.00043 × (BW)2, R2=0.79; SD = 0.41 + 0.150 × (BW) - 0.00041 × (BW)2, R2 = 0.95. These equations suggest that there is evidence for a decreasing quadratic relationship between mean CV of a population and BW of pigs whereby the rate of decrease is smaller as mean pig BW increases from birth to market. Conversely, the rate of increase of SD of a population of pigs is smaller as mean pig BW increases from birth to market.

Download Full-text