scholarly journals Modelling complex geological angular data with the Projected Normal distribution and mixtures of von Mises distributions

2013 ◽  
Vol 5 (2) ◽  
pp. 2181-2202
Author(s):  
R. M. Lark ◽  
D. Clifford ◽  
C. N. Waters

Abstract. Angular data are commonly encountered in the earth sciences and statistical descriptions and inferences about such data are necessary in structural geology. In this paper we compare two statistical distributions appropriate for complex angular data sets: the mixture of von Mises and the projected normal distribution. We show how the number of components in a mixture of von Mises distribution may be chosen, and how one may chose between the projected normal distribution and mixture of von Mises for a particular data set. We illustrate these methods with some structural geological data, showing how the fitted models can complement geological interpretation and permit statistical inference. One of our data sets suggests a special case of the projected normal distribution which we discuss briefly.

Solid Earth ◽  
2014 ◽  
Vol 5 (2) ◽  
pp. 631-639 ◽  
Author(s):  
R. M. Lark ◽  
D. Clifford ◽  
C. N. Waters

Abstract. Circular data are commonly encountered in the earth sciences and statistical descriptions and inferences about such data are necessary in structural geology. In this paper we compare two statistical distributions appropriate for complex circular data sets: the mixture of von Mises and the projected normal distribution. We show how the number of components in a mixture of von Mises distribution may be chosen, and how one may choose between the projected normal distribution and the mixture of von Mises for a particular data set. We illustrate these methods with a few structural geological data, showing how the fitted models can complement geological interpretation and permit statistical inference. One of our data sets suggests a special case of the projected normal distribution which we discuss briefly.


2017 ◽  
Vol 5 (4) ◽  
pp. 1
Author(s):  
I. E. Okorie ◽  
A. C. Akpanta ◽  
J. Ohakwe ◽  
D. C. Chikezie ◽  
C. U. Onyemachi ◽  
...  

This paper introduces a new generator of probability distribution-the adjusted log-logistic generalized (ALLoG) distribution and a new extension of the standard one parameter exponential distribution called the adjusted log-logistic generalized exponential (ALLoGExp) distribution. The ALLoGExp distribution is a special case of the ALLoG distribution and we have provided some of its statistical and reliability properties. Notably, the failure rate could be monotonically decreasing, increasing or upside-down bathtub shaped depending on the value of the parameters $\delta$ and $\theta$. The method of maximum likelihood estimation was proposed to estimate the model parameters. The importance and flexibility of he ALLoGExp distribution was demonstrated with a real and uncensored lifetime data set and its fit was compared with five other exponential related distributions. The results obtained from the model fittings shows that the ALLoGExp distribution provides a reasonably better fit than the one based on the other fitted distributions. The ALLoGExp distribution is therefore ecommended for effective modelling of lifetime data sets.


2020 ◽  
Author(s):  
Michał Ciach ◽  
Błażej Miasojedow ◽  
Grzegorz Skoraczyński ◽  
Szymon Majewski ◽  
Michał Startek ◽  
...  

AbstractA common theme in many applications of computational mass spectrometry is fitting a linear combination of reference spectra to an experimental one in order to estimate the quantities of different ions, potentially with overlapping isotopic envelopes. In this work, we study this procedure in an abstract setting, in order to develop new approaches applicable to a diverse range of experiments. We introduce an application of a new spectral dissimilarity measure, known in other fields as the Wasserstein or the Earth Mover’s distance, in order to overcome the sensitivity of ordinary linear regression to measurement inaccuracies. Usinga a data set of 200 mass spectra, we demonstrate that our approach is capable of accurate estimation of ion proportions without extensive pre-processing required for state-of-the-art methods. The conclusions are further substantiated using data sets simulated in a way that mimics most of the measurement inaccuracies occurring in real experiments. We have implemented our methods in a Python 3 package, freely available at https://github.com/mciach/masserstein.


2021 ◽  
Vol 23 (5) ◽  
Author(s):  
Gregor Jordan ◽  
Roland F. Staack

AbstractThe testing of protein drug candidates for inducing the generation of anti-drug antibodies (ADA) plays a fundamental role in drug development. The basis of the testing strategy includes a screening assay followed by a confirmatory test. Screening assay cut points (CP) are calculated mainly based on two approaches, either non-parametric, when the data set does not appear normally distributed, or parametric, in the case of a normal distribution. A normal distribution of data is preferred and may be achieved after outlier exclusion and, if necessary, transformation of the data. The authors present a Weibull transformation and a comparison with a decision tree-based approach that was tested on 10 data sets (healthy human volunteer matrix, different projects). Emphasis is placed on a transformation calculation that can be easily reproduced to make it accessible to non-mathematicians. The cut point value and the effect on the false positive rate as well as the number of excluded samples of both methods are compared.


2010 ◽  
Vol 3 (1) ◽  
pp. 293-307 ◽  
Author(s):  
P. J. Applegate ◽  
N. M. Urban ◽  
B. J. C. Laabs ◽  
K. Keller ◽  
R. B. Alley

Abstract. Geomorphic process modeling allows us to evaluate different methods for estimating moraine ages from cosmogenic exposure dates, and may provide a means to identify the processes responsible for the excess scatter among exposure dates on individual moraines. Cosmogenic exposure dating is an elegant method for estimating the ages of moraines, but individual exposure dates are sometimes biased by geomorphic processes. Because exposure dates may be either "too young" or "too old," there are a variety of methods for estimating the ages of moraines from exposure dates. In this paper, we present Monte Carlo-based models of moraine degradation and inheritance of cosmogenic nuclides, and we use the models to examine the effectiveness of these methods. The models estimate the statistical distributions of exposure dates that we would expect to obtain from single moraines, given reasonable geomorphic assumptions. The model of moraine degradation is based on prior examples, but the inheritance model is novel. The statistical distributions of exposure dates from the moraine degradation model are skewed toward young values; in contrast, the statistical distributions of exposure dates from the inheritance model are skewed toward old values. Sensitivity analysis shows that this difference is robust for reasonable parameter choices. Thus, the skewness can help indicate whether a particular data set has problems with inheritance or moraine degradation. Given representative distributions from these two models, we can determine which methods of estimating moraine ages are most successful in recovering the correct age for test cases where this value is known. The mean is a poor estimator of moraine age for data sets drawn from skewed parent distributions, and excluding outliers before calculating the mean does not improve this mismatch. The extreme estimators (youngest date and oldest date) perform well under specific circumstances, but fail in other cases. We suggest a simple estimator that uses the skewnesses of individual data sets to determine whether the youngest date, mean, or oldest date will provide the best estimate of moraine age. Although this method is perhaps the most globally robust of the estimators we tested, it sometimes fails spectacularly. The failure of simple methods to provide accurate estimates of moraine age points toward a need for more sophisticated statistical treatments.


1977 ◽  
Vol 7 (3) ◽  
pp. 481-487 ◽  
Author(s):  
W. L. Hafley ◽  
H. T. Schreuder

The beta, Johnson's SB, Weibull, lognormal, gamma, and normal distributions are discussed in terms of their flexibility in the skewness squared (β1) − kurtosis (β2) plane. The SB and the beta are clearly the most flexible distributions since they represent surfaces in the plane, whereas the Weibull, lognormal, and gamma are represented by lines, and the normal is represented by a single point.The six distributions are fit to 21 data sets for which both diameters and heights are available. The log likelihood criterion is used to rank the six distributions in regard to their fit to each data set. Overall, Johnson's SB distribution gave the best performance in terms of quality of fit to the variety of sample distributions.


2018 ◽  
Vol 154 (2) ◽  
pp. 149-155
Author(s):  
Michael Archer

1. Yearly records of worker Vespula germanica (Fabricius) taken in suction traps at Silwood Park (28 years) and at Rothamsted Research (39 years) are examined. 2. Using the autocorrelation function (ACF), a significant negative 1-year lag followed by a lesser non-significant positive 2-year lag was found in all, or parts of, each data set, indicating an underlying population dynamic of a 2-year cycle with a damped waveform. 3. The minimum number of years before the 2-year cycle with damped waveform was shown varied between 17 and 26, or was not found in some data sets. 4. Ecological factors delaying or preventing the occurrence of the 2-year cycle are considered.


2018 ◽  
Vol 21 (2) ◽  
pp. 117-124 ◽  
Author(s):  
Bakhtyar Sepehri ◽  
Nematollah Omidikia ◽  
Mohsen Kompany-Zareh ◽  
Raouf Ghavami

Aims & Scope: In this research, 8 variable selection approaches were used to investigate the effect of variable selection on the predictive power and stability of CoMFA models. Materials & Methods: Three data sets including 36 EPAC antagonists, 79 CD38 inhibitors and 57 ATAD2 bromodomain inhibitors were modelled by CoMFA. First of all, for all three data sets, CoMFA models with all CoMFA descriptors were created then by applying each variable selection method a new CoMFA model was developed so for each data set, 9 CoMFA models were built. Obtained results show noisy and uninformative variables affect CoMFA results. Based on created models, applying 5 variable selection approaches including FFD, SRD-FFD, IVE-PLS, SRD-UVEPLS and SPA-jackknife increases the predictive power and stability of CoMFA models significantly. Result & Conclusion: Among them, SPA-jackknife removes most of the variables while FFD retains most of them. FFD and IVE-PLS are time consuming process while SRD-FFD and SRD-UVE-PLS run need to few seconds. Also applying FFD, SRD-FFD, IVE-PLS, SRD-UVE-PLS protect CoMFA countor maps information for both fields.


Author(s):  
Kyungkoo Jun

Background & Objective: This paper proposes a Fourier transform inspired method to classify human activities from time series sensor data. Methods: Our method begins by decomposing 1D input signal into 2D patterns, which is motivated by the Fourier conversion. The decomposition is helped by Long Short-Term Memory (LSTM) which captures the temporal dependency from the signal and then produces encoded sequences. The sequences, once arranged into the 2D array, can represent the fingerprints of the signals. The benefit of such transformation is that we can exploit the recent advances of the deep learning models for the image classification such as Convolutional Neural Network (CNN). Results: The proposed model, as a result, is the combination of LSTM and CNN. We evaluate the model over two data sets. For the first data set, which is more standardized than the other, our model outperforms previous works or at least equal. In the case of the second data set, we devise the schemes to generate training and testing data by changing the parameters of the window size, the sliding size, and the labeling scheme. Conclusion: The evaluation results show that the accuracy is over 95% for some cases. We also analyze the effect of the parameters on the performance.


2019 ◽  
Vol 73 (8) ◽  
pp. 893-901
Author(s):  
Sinead J. Barton ◽  
Bryan M. Hennelly

Cosmic ray artifacts may be present in all photo-electric readout systems. In spectroscopy, they present as random unidirectional sharp spikes that distort spectra and may have an affect on post-processing, possibly affecting the results of multivariate statistical classification. A number of methods have previously been proposed to remove cosmic ray artifacts from spectra but the goal of removing the artifacts while making no other change to the underlying spectrum is challenging. One of the most successful and commonly applied methods for the removal of comic ray artifacts involves the capture of two sequential spectra that are compared in order to identify spikes. The disadvantage of this approach is that at least two recordings are necessary, which may be problematic for dynamically changing spectra, and which can reduce the signal-to-noise (S/N) ratio when compared with a single recording of equivalent duration due to the inclusion of two instances of read noise. In this paper, a cosmic ray artefact removal algorithm is proposed that works in a similar way to the double acquisition method but requires only a single capture, so long as a data set of similar spectra is available. The method employs normalized covariance in order to identify a similar spectrum in the data set, from which a direct comparison reveals the presence of cosmic ray artifacts, which are then replaced with the corresponding values from the matching spectrum. The advantage of the proposed method over the double acquisition method is investigated in the context of the S/N ratio and is applied to various data sets of Raman spectra recorded from biological cells.


Sign in / Sign up

Export Citation Format

Share Document