Modelling complex geological angular data with the Projected Normal distribution and mixtures of von Mises distributions

Abstract. Angular data are commonly encountered in the earth sciences and statistical descriptions and inferences about such data are necessary in structural geology. In this paper we compare two statistical distributions appropriate for complex angular data sets: the mixture of von Mises and the projected normal distribution. We show how the number of components in a mixture of von Mises distribution may be chosen, and how one may chose between the projected normal distribution and mixture of von Mises for a particular data set. We illustrate these methods with some structural geological data, showing how the fitted models can complement geological interpretation and permit statistical inference. One of our data sets suggests a special case of the projected normal distribution which we discuss briefly.

Download Full-text

Modelling complex geological circular data with the projected normal distribution and mixtures of von Mises distributions

Solid Earth ◽

10.5194/se-5-631-2014 ◽

2014 ◽

Vol 5 (2) ◽

pp. 631-639 ◽

Cited By ~ 4

Author(s):

R. M. Lark ◽

D. Clifford ◽

C. N. Waters

Keyword(s):

Normal Distribution ◽

Circular Data ◽

Data Sets ◽

Statistical Distributions ◽

Data Set ◽

Geological Interpretation ◽

The Earth ◽

Von Mises ◽

Special Case ◽

Projected Normal Distribution

Abstract. Circular data are commonly encountered in the earth sciences and statistical descriptions and inferences about such data are necessary in structural geology. In this paper we compare two statistical distributions appropriate for complex circular data sets: the mixture of von Mises and the projected normal distribution. We show how the number of components in a mixture of von Mises distribution may be chosen, and how one may choose between the projected normal distribution and the mixture of von Mises for a particular data set. We illustrate these methods with a few structural geological data, showing how the fitted models can complement geological interpretation and permit statistical inference. One of our data sets suggests a special case of the projected normal distribution which we discuss briefly.

Download Full-text

The Adjusted Log-logistic Generalized Exponential Distribution with Application to Lifetime Data

International Journal of Statistics and Probability ◽

10.5539/ijsp.v6n4p1 ◽

2017 ◽

Vol 5 (4) ◽

pp. 1

Author(s):

I. E. Okorie ◽

A. C. Akpanta ◽

J. Ohakwe ◽

D. C. Chikezie ◽

C. U. Onyemachi ◽

...

Keyword(s):

Exponential Distribution ◽

Likelihood Estimation ◽

Model Parameters ◽

Data Sets ◽

Lifetime Data ◽

Generalized Exponential Distribution ◽

Data Set ◽

Method Of Maximum Likelihood ◽

The One ◽

Special Case

This paper introduces a new generator of probability distribution-the adjusted log-logistic generalized (ALLoG) distribution and a new extension of the standard one parameter exponential distribution called the adjusted log-logistic generalized exponential (ALLoGExp) distribution. The ALLoGExp distribution is a special case of the ALLoG distribution and we have provided some of its statistical and reliability properties. Notably, the failure rate could be monotonically decreasing, increasing or upside-down bathtub shaped depending on the value of the parameters $\delta$ and $\theta$. The method of maximum likelihood estimation was proposed to estimate the model parameters. The importance and flexibility of he ALLoGExp distribution was demonstrated with a real and uncensored lifetime data set and its fit was compared with five other exponential related distributions. The results obtained from the model fittings shows that the ALLoGExp distribution provides a reasonably better fit than the one based on the other fitted distributions. The ALLoGExp distribution is therefore ecommended for effective modelling of lifetime data sets.

Download Full-text

Masserstein: robust linear deconvolution by optimal transport

10.1101/2020.06.02.129858 ◽

2020 ◽

Author(s):

Michał Ciach ◽

Błażej Miasojedow ◽

Grzegorz Skoraczyński ◽

Szymon Majewski ◽

Michał Startek ◽

...

Keyword(s):

Mass Spectra ◽

Optimal Transport ◽

State Of The Art ◽

Accurate Estimation ◽

Data Sets ◽

Data Set ◽

Diverse Range ◽

The Earth ◽

Abstract Setting ◽

Using Data

AbstractA common theme in many applications of computational mass spectrometry is fitting a linear combination of reference spectra to an experimental one in order to estimate the quantities of different ions, potentially with overlapping isotopic envelopes. In this work, we study this procedure in an abstract setting, in order to develop new approaches applicable to a diverse range of experiments. We introduce an application of a new spectral dissimilarity measure, known in other fields as the Wasserstein or the Earth Mover’s distance, in order to overcome the sensitivity of ordinary linear regression to measurement inaccuracies. Usinga a data set of 200 mass spectra, we demonstrate that our approach is capable of accurate estimation of ion proportions without extensive pre-processing required for state-of-the-art methods. The conclusions are further substantiated using data sets simulated in a way that mimics most of the measurement inaccuracies occurring in real experiments. We have implemented our methods in a Python 3 package, freely available at https://github.com/mciach/masserstein.

Download Full-text

An Alternative Data Transformation Approach for ADA Cut Point Determination: Why Not Use a Weibull Transformation?

The AAPS Journal ◽

10.1208/s12248-021-00625-6 ◽

2021 ◽

Vol 23 (5) ◽

Author(s):

Gregor Jordan ◽

Roland F. Staack

Keyword(s):

Normal Distribution ◽

False Positive Rate ◽

Confirmatory Test ◽

Data Sets ◽

Screening Assay ◽

Healthy Human ◽

Data Set ◽

Cut Point ◽

Drug Candidates ◽

Positive Rate

AbstractThe testing of protein drug candidates for inducing the generation of anti-drug antibodies (ADA) plays a fundamental role in drug development. The basis of the testing strategy includes a screening assay followed by a confirmatory test. Screening assay cut points (CP) are calculated mainly based on two approaches, either non-parametric, when the data set does not appear normally distributed, or parametric, in the case of a normal distribution. A normal distribution of data is preferred and may be achieved after outlier exclusion and, if necessary, transformation of the data. The authors present a Weibull transformation and a comparison with a decision tree-based approach that was tested on 10 data sets (healthy human volunteer matrix, different projects). Emphasis is placed on a transformation calculation that can be easily reproduced to make it accessible to non-mathematicians. The cut point value and the effect on the false positive rate as well as the number of excluded samples of both methods are compared.

Download Full-text

Modeling the statistical distributions of cosmogenic exposure dates from moraines

Geoscientific Model Development ◽

10.5194/gmd-3-293-2010 ◽

2010 ◽

Vol 3 (1) ◽

pp. 293-307 ◽

Cited By ~ 68

Author(s):

P. J. Applegate ◽

N. M. Urban ◽

B. J. C. Laabs ◽

K. Keller ◽

R. B. Alley

Keyword(s):

Cosmogenic Nuclides ◽

Test Cases ◽

Data Sets ◽

Statistical Distributions ◽

Data Set ◽

Degradation Model ◽

Exposure Dating ◽

Elegant Method ◽

Inheritance Model ◽

The Mean

Abstract. Geomorphic process modeling allows us to evaluate different methods for estimating moraine ages from cosmogenic exposure dates, and may provide a means to identify the processes responsible for the excess scatter among exposure dates on individual moraines. Cosmogenic exposure dating is an elegant method for estimating the ages of moraines, but individual exposure dates are sometimes biased by geomorphic processes. Because exposure dates may be either "too young" or "too old," there are a variety of methods for estimating the ages of moraines from exposure dates. In this paper, we present Monte Carlo-based models of moraine degradation and inheritance of cosmogenic nuclides, and we use the models to examine the effectiveness of these methods. The models estimate the statistical distributions of exposure dates that we would expect to obtain from single moraines, given reasonable geomorphic assumptions. The model of moraine degradation is based on prior examples, but the inheritance model is novel. The statistical distributions of exposure dates from the moraine degradation model are skewed toward young values; in contrast, the statistical distributions of exposure dates from the inheritance model are skewed toward old values. Sensitivity analysis shows that this difference is robust for reasonable parameter choices. Thus, the skewness can help indicate whether a particular data set has problems with inheritance or moraine degradation. Given representative distributions from these two models, we can determine which methods of estimating moraine ages are most successful in recovering the correct age for test cases where this value is known. The mean is a poor estimator of moraine age for data sets drawn from skewed parent distributions, and excluding outliers before calculating the mean does not improve this mismatch. The extreme estimators (youngest date and oldest date) perform well under specific circumstances, but fail in other cases. We suggest a simple estimator that uses the skewnesses of individual data sets to determine whether the youngest date, mean, or oldest date will provide the best estimate of moraine age. Although this method is perhaps the most globally robust of the estimators we tested, it sometimes fails spectacularly. The failure of simple methods to provide accurate estimates of moraine age points toward a need for more sophisticated statistical treatments.

Download Full-text

Statistical distributions for fitting diameter and height data in even-aged stands

Canadian Journal of Forest Research ◽

10.1139/x77-062 ◽

1977 ◽

Vol 7 (3) ◽

pp. 481-487 ◽

Cited By ~ 98

Author(s):

W. L. Hafley ◽

H. T. Schreuder

Keyword(s):

Data Sets ◽

Statistical Distributions ◽

Data Set ◽

Normal Distributions ◽

Log Likelihood ◽

Quality Of Fit ◽

Height Data

The beta, Johnson's SB, Weibull, lognormal, gamma, and normal distributions are discussed in terms of their flexibility in the skewness squared (β1) − kurtosis (β2) plane. The SB and the beta are clearly the most flexible distributions since they represent surfaces in the plane, whereas the Weibull, lognormal, and gamma are represented by lines, and the normal is represented by a single point.The six distributions are fit to 21 data sets for which both diameters and heights are available. The log likelihood criterion is used to rank the six distributions in regard to their fit to each data set. Overall, Johnson's SB distribution gave the best performance in terms of quality of fit to the variety of sample distributions.

Download Full-text

The social wasp Vespula germanica (Fabricius) (Hymenoptera: Vespidae) population dynamics in England over 39 years.

The Entomologist s monthly magazine ◽

10.31184/m00138908.1542.3906 ◽

2018 ◽

Vol 154 (2) ◽

pp. 149-155

Author(s):

Michael Archer

Keyword(s):

Population Dynamics ◽

Population Dynamic ◽

Ecological Factors ◽

Social Wasp ◽

Data Sets ◽

Data Set ◽

Vespula Germanica ◽

The Social ◽

Minimum Number ◽

Suction Traps

1. Yearly records of worker Vespula germanica (Fabricius) taken in suction traps at Silwood Park (28 years) and at Rothamsted Research (39 years) are examined. 2. Using the autocorrelation function (ACF), a significant negative 1-year lag followed by a lesser non-significant positive 2-year lag was found in all, or parts of, each data set, indicating an underlying population dynamic of a 2-year cycle with a damped waveform. 3. The minimum number of years before the 2-year cycle with damped waveform was shown varied between 17 and 26, or was not found in some data sets. 4. Ecological factors delaying or preventing the occurrence of the 2-year cycle are considered.

Download Full-text

Predictive and Descriptive CoMFA Models: The Effect of Variable Selection

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207321666180212162028 ◽

2018 ◽

Vol 21 (2) ◽

pp. 117-124 ◽

Cited By ~ 4

Author(s):

Bakhtyar Sepehri ◽

Nematollah Omidikia ◽

Mohsen Kompany-Zareh ◽

Raouf Ghavami

Keyword(s):

Variable Selection ◽

Predictive Power ◽

Selection Method ◽

Data Sets ◽

Data Set ◽

Comfa Model ◽

Variable Selection Method

Aims & Scope: In this research, 8 variable selection approaches were used to investigate the effect of variable selection on the predictive power and stability of CoMFA models. Materials & Methods: Three data sets including 36 EPAC antagonists, 79 CD38 inhibitors and 57 ATAD2 bromodomain inhibitors were modelled by CoMFA. First of all, for all three data sets, CoMFA models with all CoMFA descriptors were created then by applying each variable selection method a new CoMFA model was developed so for each data set, 9 CoMFA models were built. Obtained results show noisy and uninformative variables affect CoMFA results. Based on created models, applying 5 variable selection approaches including FFD, SRD-FFD, IVE-PLS, SRD-UVEPLS and SPA-jackknife increases the predictive power and stability of CoMFA models significantly. Result & Conclusion: Among them, SPA-jackknife removes most of the variables while FFD retains most of them. FFD and IVE-PLS are time consuming process while SRD-FFD and SRD-UVE-PLS run need to few seconds. Also applying FFD, SRD-FFD, IVE-PLS, SRD-UVE-PLS protect CoMFA countor maps information for both fields.

Download Full-text

Human Activity Recognition using Fourier Transform Inspired Deep Learning Combination Model

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327908666180727123657 ◽

2019 ◽

Vol 9 (1) ◽

pp. 16-31

Author(s):

Kyungkoo Jun

Keyword(s):

Fourier Transform ◽

Deep Learning ◽

Short Term Memory ◽

Window Size ◽

Sensor Data ◽

Data Sets ◽

Data Set ◽

Proposed Model ◽

Testing Data ◽

Labeling Scheme

Background & Objective: This paper proposes a Fourier transform inspired method to classify human activities from time series sensor data. Methods: Our method begins by decomposing 1D input signal into 2D patterns, which is motivated by the Fourier conversion. The decomposition is helped by Long Short-Term Memory (LSTM) which captures the temporal dependency from the signal and then produces encoded sequences. The sequences, once arranged into the 2D array, can represent the fingerprints of the signals. The benefit of such transformation is that we can exploit the recent advances of the deep learning models for the image classification such as Convolutional Neural Network (CNN). Results: The proposed model, as a result, is the combination of LSTM and CNN. We evaluate the model over two data sets. For the first data set, which is more standardized than the other, our model outperforms previous works or at least equal. In the case of the second data set, we devise the schemes to generate training and testing data by changing the parameters of the window size, the sliding size, and the labeling scheme. Conclusion: The evaluation results show that the accuracy is over 95% for some cases. We also analyze the effect of the parameters on the performance.

Download Full-text

An Algorithm for the Removal of Cosmic Ray Artifacts in Spectral Data Sets

Applied Spectroscopy ◽

10.1177/0003702819839098 ◽

2019 ◽

Vol 73 (8) ◽

pp. 893-901

Author(s):

Sinead J. Barton ◽

Bryan M. Hennelly

Keyword(s):

Cosmic Ray ◽

Data Sets ◽

Biological Cells ◽

Statistical Classification ◽

Signal To Noise ◽

Multivariate Statistical ◽

Data Set ◽

Artefact Removal ◽

Single Capture ◽

Acquisition Method

Cosmic ray artifacts may be present in all photo-electric readout systems. In spectroscopy, they present as random unidirectional sharp spikes that distort spectra and may have an affect on post-processing, possibly affecting the results of multivariate statistical classification. A number of methods have previously been proposed to remove cosmic ray artifacts from spectra but the goal of removing the artifacts while making no other change to the underlying spectrum is challenging. One of the most successful and commonly applied methods for the removal of comic ray artifacts involves the capture of two sequential spectra that are compared in order to identify spikes. The disadvantage of this approach is that at least two recordings are necessary, which may be problematic for dynamically changing spectra, and which can reduce the signal-to-noise (S/N) ratio when compared with a single recording of equivalent duration due to the inclusion of two instances of read noise. In this paper, a cosmic ray artefact removal algorithm is proposed that works in a similar way to the double acquisition method but requires only a single capture, so long as a data set of similar spectra is available. The method employs normalized covariance in order to identify a similar spectrum in the data set, from which a direct comparison reveals the presence of cosmic ray artifacts, which are then replaced with the corresponding values from the matching spectrum. The advantage of the proposed method over the double acquisition method is investigated in the context of the S/N ratio and is applied to various data sets of Raman spectra recorded from biological cells.

Download Full-text