A Novel Approach to Increase the Goodness of Fits with an Application to Real and Simulated Data Sets

In practice, the data sets with extreme values are possible in many fields such as engineering, lifetime analysis, business, and economics. A lot of probability distributions are derived and presented to increase the model flexibility in the presence of such values. The current study also focuses on investigations to derive a new probability model New Flexible Family (NFF) of distributions. The significance of NFF is carried out using the Weibull distribution called New Flexible Weibull distribution or in short NFW. Various mathematical properties of NFW have been discussed including the estimation of parameters and entropy measures. Two real data sets with extreme values and a simulation study have been conducted so as to delineate the importance of NFW. Furthermore, NFW is compared with other existing probability distributions; numerically, it has been observed that the new mechanism of producing the lifetime probability distributions plays a significant role in making predictions about the population than others using the data sets with extreme values.

Download Full-text

Alpha-Power Exponentiated Inverse Rayleigh distribution and its applications to real and simulated data

PLoS ONE ◽

10.1371/journal.pone.0245253 ◽

2021 ◽

Vol 16 (1) ◽

pp. e0245253

Author(s):

Muhammad Ali ◽

Alamgir Khalil ◽

Muhammad Ijaz ◽

Noor Saeed

Keyword(s):

Residual Life ◽

Probability Distributions ◽

Simulated Data ◽

Real Data ◽

Rayleigh Distribution ◽

Likelihood Method ◽

Strength Parameter ◽

Data Sets ◽

Alpha Power ◽

Mean Waiting Time

The main goal of the current paper is to contribute to the existing literature of probability distributions. In this paper, a new probability distribution is generated by using the Alpha Power Family of distributions with the aim to model the data with non-monotonic failure rates and provides a better fit. The proposed distribution is called Alpha Power Exponentiated Inverse Rayleigh or in short APEIR distribution. Various statistical properties have been investigated including they are the order statistics, moments, residual life function, mean waiting time, quantiles, entropy, and stress-strength parameter. To estimate the parameters of the proposed distribution, the maximum likelihood method is employed. It has been proved theoretically that the proposed distribution provides a better fit to the data with monotonic as well as non-monotonic hazard rate shapes. Moreover, two real data sets are used to evaluate the significance and flexibility of the proposed distribution as compared to other probability distributions.

Download Full-text

THE POWER MUTH DISTRIBUTION∗

Mathematical Modelling and Analysis ◽

10.3846/13926292.2017.1289481 ◽

2017 ◽

Vol 22 (2) ◽

pp. 186-201 ◽

Cited By ~ 4

Author(s):

Pedro Jodra ◽

Hector Wladimir Gomez ◽

Maria Dolores Jimenez-Gamero ◽

Maria Virtudes Alba-Fernandez

Keyword(s):

Probability Distributions ◽

Real Data ◽

Rate Function ◽

Data Sets ◽

Estimation Of Parameters ◽

Failure Rate Function ◽

Monte Carlo Simulation Study ◽

New Model ◽

Practical Usefulness ◽

Method Of Maximum Likelihood

Muth introduced a probability distribution with application in reliability theory. We propose a new model from the Muth law. This paper studies its statistical properties, such as the computation of the moments, computer generation of pseudo-random data and the behavior of the failure rate function, among others. The estimation of parameters is carried out by the method of maximum likelihood and a Monte Carlo simulation study assesses the performance of this method. The practical usefulness of the new model is illustrated by means of two real data sets, showing that it may provide a better fit than other probability distributions.

Download Full-text

A new grade-capping approach based on coarse duplicate data correlation

Journal of the Southern African Institute of Mining and Metallurgy ◽

10.17159/2411-9717/1379/2021 ◽

2021 ◽

Vol 121 (5) ◽

Author(s):

R.V. Dutaut ◽

D. Marcotte

Keyword(s):

Extreme Values ◽

Simulated Data ◽

Real Data ◽

Data Sets ◽

Data Correlation ◽

New Approach ◽

Assay Procedure ◽

Nickel Deposits ◽

Simulated Data Sets

SYNOPSIS In most exploration or mining grade data-sets, the presence of outliers or extreme values represents a significant challenge to mineral resource estimators. The most common practice is to cap the extreme values at a predefined level. A new capping approach is presented that uses QA/QC coarse duplicate data correlation to predict the real data coefficient of variation (i.e., error-free CV). The cap grade is determined such that the capped data has a CV equal to the predicted CV. The robustness of the approach with regard to original core assay length decisions, departure from lognormality, and capping before or after compositing is assessed using simulated data-sets. Real case studies of gold and nickel deposits are used to compare the proposed approach to the methods most widely used in industry. The new approach is simple and objective. It provides a cap grade that is determined automatically, based on predicted CV, and takes into account the quality of the assay procedure as determined by coarse duplicates correlation. Keywords: geostatistics, outliers, capping, duplicates, QA/QC, lognormal distribution.

Download Full-text

A Generalized Rayleigh Family of Distributions Based on the Modified Slash Model

Symmetry ◽

10.3390/sym13071226 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1226

Author(s):

Inmaculada Barranco-Chamorro ◽

Yuri A. Iriarte ◽

Yolanda M. Gómez ◽

Juan M. Astorga ◽

Héctor W. Gómez

Keyword(s):

Heavy Tails ◽

Real Data ◽

Rate Function ◽

Data Sets ◽

Estimation Of Parameters ◽

Stochastic Representation ◽

Likelihood Methods ◽

Chi Square ◽

Maximum Likelihood Methods ◽

Family Of Distributions

Specifying a proper statistical model to represent asymmetric lifetime data with high kurtosis is an open problem. In this paper, the three-parameter, modified, slashed, generalized Rayleigh family of distributions is proposed. Its structural properties are studied: stochastic representation, probability density function, hazard rate function, moments and estimation of parameters via maximum likelihood methods. As merits of our proposal, we highlight as particular cases a plethora of lifetime models, such as Rayleigh, Maxwell, half-normal and chi-square, among others, which are able to accommodate heavy tails. A simulation study and applications to real data sets are included to illustrate the use of our results.

Download Full-text

Utilizarea teoriei valorilor extreme în climatologie

Starea actuală a componentelor de mediu ◽

10.53380/9789975315593.17 ◽

2019 ◽

Author(s):

Valentin Raileanu ◽

Keyword(s):

Maximum Likelihood ◽

Extreme Values ◽

Probability Distributions ◽

Simulated Data ◽

Likelihood Estimation ◽

R Software ◽

Data Set ◽

Data Format ◽

Generalized Pareto ◽

Distribution Parameters

The article briefly describes the history and fields of application of the theory of extreme values, including climatology. The data format, the Generalized Extreme Value (GEV) probability distributions with Bock Maxima, the Generalized Pareto (GP) distributions with Point of Threshold (POT) and the analysis methods are presented. Estimating the distribution parameters is done using the Maximum Likelihood Estimation (MLE) method. Free R software installation, the minimum set of required commands and the GUI in2extRemes graphical package are described. As an example, the results of the GEV analysis of a simulated data set in in2extRemes are presented.

Download Full-text

A Novel Approach for Crawling the Opinions from World Wide Web

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2016040101 ◽

2016 ◽

Vol 6 (2) ◽

pp. 1-23 ◽

Cited By ~ 4

Author(s):

Surbhi Bhatia ◽

Manisha Sharma ◽

Komal Kumar Bhatia

Keyword(s):

World Wide ◽

Opinion Mining ◽

Real Data ◽

User Generated Content ◽

Decision Making Process ◽

Web Pages ◽

Data Sets ◽

Web Technologies ◽

Design And Implementation ◽

Novel Approach

Due to the sudden and explosive increase in web technologies, huge quantity of user generated content is available online. The experiences of people and their opinions play an important role in the decision making process. Although facts provide the ease of searching information on a topic but retrieving opinions is still a crucial task. Many studies on opinion mining have to be undertaken efficiently in order to extract constructive opinionated information from these reviews. The present work focuses on the design and implementation of an Opinion Crawler which downloads the opinions from various sites thereby, ignoring rest of the web. Besides, it also detects web pages which frequently undergo updation by calculating the timestamp for its revisit in order to extract relevant opinions. The performance of the Opinion Crawler is justified by taking real data sets that prove to be much more accurate in terms of precision and recall quality attributes.

Download Full-text

An Unsupervised Approach for Determining Link Specifications

International Journal of Information Technology and Web Engineering ◽

10.4018/ijitwe.2018100106 ◽

2018 ◽

Vol 13 (4) ◽

pp. 104-123

Author(s):

Khayra Bencherif ◽

Mimoun Malki ◽

Djamel Amar Bensaber

Keyword(s):

Linked Data ◽

Open Data ◽

Real Data ◽

Knowledge Bases ◽

Structured Data ◽

Data Sets ◽

Novel Approach ◽

Link Discovery ◽

Unsupervised Approach

This article describes how the Linked Open Data Cloud project allows data providers to publish structured data on the web according to the Linked Data principles. In this context, several link discovery frameworks have been developed for connecting entities contained in knowledge bases. In order to achieve a high effectiveness for the link discovery task, a suitable link configuration is required to specify the similarity conditions. Unfortunately, such configurations are specified manually; which makes the link discovery task tedious and more difficult for the users. In this article, the authors address this drawback by proposing a novel approach for the automatic determination of link specifications. The proposed approach is based on a neural network model to combine a set of existing metrics into a compound one. The authors evaluate the effectiveness of the proposed approach in three experiments using real data sets from the LOD Cloud. In addition, the proposed approach is compared against link specifications approaches to show that it outperforms them in most experiments.

Download Full-text

A Growth Model for Multilevel Ordinal Data

Journal of Educational and Behavioral Statistics ◽

10.3102/10769986030004369 ◽

2005 ◽

Vol 30 (4) ◽

pp. 369-396 ◽

Cited By ~ 8

Author(s):

Eisuke Segawa

Keyword(s):

Latent Variable ◽

Ordinal Data ◽

Linear Models ◽

Growth Models ◽

Simulated Data ◽

Real Data ◽

Analytic Structure ◽

Data Sets ◽

Data Set ◽

Time Points

Multi-indicator growth models were formulated as special three-level hierarchical generalized linear models to analyze growth of a trait latent variable measured by ordinal items. Items are nested within a time-point, and time-points are nested within subject. These models are special because they include factor analytic structure. This model can analyze not only data with item- and time-level missing observations, but also data with time points freely specified over subjects. Furthermore, features useful for longitudinal analyses, “autoregressive error degree one” structure for the trait residuals and estimated time-scores, were included. The approach is Bayesian with Markov Chain and Monte Carlo, and the model is implemented in WinBUGS. They are illustrated with two simulated data sets and one real data set with planned missing items within a scale.

Download Full-text

An Inferential Aptness of a Weibull Generated Distribution and Application

Journal of Reliability and Statistical Studies ◽

10.13052/10.13052/jrss0974-8024.14214 ◽

2021 ◽

Author(s):

Brijesh P. Singh ◽

Utpal Dhar Das

Keyword(s):

Weibull Distribution ◽

Continuous Distribution ◽

Selection Criterion ◽

Real Data ◽

Rayleigh Distribution ◽

Single Parameter ◽

Parameter Distribution ◽

Data Sets ◽

Lifetime Distributions ◽

Limiting Behavior

In this article an attempt has been made to develop a flexible single parameter continuous distribution using Weibull distribution. The Weibull distribution is most widely used lifetime distributions in both medical and engineering sectors. The exponential and Rayleigh distribution is particular case of Weibull distribution. Here in this study we use these two distributions for developing a new distribution. Important statistical properties of the proposed distribution is discussed such as moments, moment generating and characteristic function. Various entropy measures like Rényi, Shannon and cumulative entropy are also derived. The kthkt⁢h order statistics of pdf and cdf also obtained. The properties of hazard function and their limiting behavior is discussed. The maximum likelihood estimate of the parameter is obtained that is not in closed form, thus iteration procedure is used to obtain the estimate. Simulation study has been done for different sample size and MLE, MSE, Bias for the parameter λλ has been observed. Some real data sets are used to check the suitability of model over some other competent distributions for some data sets from medical and engineering science. In the tail area, the proposed model works better. Various model selection criterion such as -2LL, AIC, AICc, BIC, K-S and A-D test suggests that the proposed distribution perform better than other competent distributions and thus considered this as an alternative distribution. The proposed single parameter distribution is found more flexible as compare to some other two parameter complicated distributions for the data sets considered in the present study.

Download Full-text

MSIGNET: a Metropolis sampling-based method for global optimal significant network identification

10.1101/260844 ◽

2018 ◽

Cited By ~ 2

Author(s):

Xi Chen ◽

Jianhua Xuan

Keyword(s):

Cancer Recurrence ◽

Simulated Data ◽

Superior Performance ◽

Biological Knowledge ◽

Specific Gene ◽

Data Sets ◽

Data Set ◽

Novel Approach ◽

Network Identification ◽

Global Optimal

AbstractIn this paper, we propose a novel approach namely MSIGNET to identify subnetworks with significantly expressed genes by integrating context specific gene expression and protein-protein interaction (PPI) data. Specifically, we integrate differential expression of each gene and mutual information of gene pairs in a Bayesian framework and use Metropolis sampling to identify functional interactions. During the sampling process, a conditional probability is calculated given a randomly selected gene to control the network state transition. Our method provides global statistics of all genes and their interactions, and finally achieves a global optimal sub-network. We apply MSIGNET to simulated data and have demonstrated its superior performance over comparable network identification tools. Using a validated Parkinson data set we show that the network identified using MSIGNET is consistent to previously reported results but provides more biology meaningful interpretation of Parkinson’s disease. Finally, to study networks related to ovarian cancer recurrence, we investigate two patient data sets. Identified networks from independent data sets show functional consistence. And those common genes and interactions are well supported by current biological knowledge.

Download Full-text