E-Bayesian Estimation of Reliability Characteristics of a Weibull Distribution with Applications

Given a progressively type-II censored sample, the E-Bayesian estimates, which are the expected Bayesian estimates over the joint prior distributions of the hyper-parameters in the gamma prior distribution of the unknown Weibull rate parameter, are developed for any given function of unknown rate parameter under the square error loss function. In order to study the impact from the selection of hyper-parameters for the prior, three different joint priors of the hyper-parameters are utilized to establish the theoretical properties of the E-Bayesian estimators for four functions of the rate parameter, which include an identity function (that is, a rate parameter) as well as survival, hazard rate and quantile functions. A simulation study is also conducted to compare the three E-Bayesian and a Bayesian estimate as well as the maximum likelihood estimate for each of the four functions considered. Moreover, two real data sets from a medical study and industry life test, respectively, are used for illustration. Finally, concluding remarks are addressed.

Download Full-text

Self-Adaptive K-Means Based on a Covering Algorithm

Complexity ◽

10.1155/2018/7698274 ◽

2018 ◽

Vol 2018 ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Yiwen Zhang ◽

Yuanyuan Zhou ◽

Xing Guo ◽

Jintao Wu ◽

Qiang He ◽

...

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Real Data ◽

Second Phase ◽

Data Sets ◽

Number Of Clusters ◽

Large Scale Data ◽

Long Time ◽

Two Phases ◽

Selection Of

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.

Download Full-text

A Bayesian Mutation–Selection Framework for Detecting Site-Specific Adaptive Evolution in Protein-Coding Genes

Molecular Biology and Evolution ◽

10.1093/molbev/msaa265 ◽

2020 ◽

Author(s):

Nicolas Rodrigue ◽

Thibault Latrille ◽

Nicolas Lartillot

Keyword(s):

Adaptive Evolution ◽

Real Data ◽

Data Sets ◽

Protein Coding ◽

Site Specific ◽

Protein Coding Genes ◽

Codon Substitution ◽

Selection Framework ◽

Dna Alignment ◽

The Impact

Abstract In recent years, codon substitution models based on the mutation–selection principle have been extended for the purpose of detecting signatures of adaptive evolution in protein-coding genes. However, the approaches used to date have either focused on detecting global signals of adaptive regimes—across the entire gene—or on contexts where experimentally derived, site-specific amino acid fitness profiles are available. Here, we present a Bayesian site-heterogeneous mutation–selection framework for site-specific detection of adaptive substitution regimes given a protein-coding DNA alignment. We offer implementations, briefly present simulation results, and apply the approach on a few real data sets. Our analyses suggest that the new approach shows greater sensitivity than traditional methods. However, more study is required to assess the impact of potential model violations on the method, and gain a greater empirical sense its behavior on a broader range of real data sets. We propose an outline of such a research program.

Download Full-text

A COMPARISON OF SCORING METRICS FOR PREDICTING THE NEXT NAVIGATION STEP WITH MARKOV MODEL-BASED SYSTEMS

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622010003956 ◽

2010 ◽

Vol 09 (04) ◽

pp. 547-573 ◽

Cited By ~ 4

Author(s):

JOSÉ BORGES ◽

MARK LEVENE

Keyword(s):

Markov Model ◽

Prediction Accuracy ◽

Prediction Models ◽

Markov Models ◽

Real Data ◽

Absolute Error ◽

Brier Score ◽

Data Sets ◽

Extensive Evaluation ◽

The Impact

The problem of predicting the next request during a user's navigation session has been extensively studied. In this context, higher-order Markov models have been widely used to model navigation sessions and to predict the next navigation step, while prediction accuracy has been mainly evaluated with the hit and miss score. We claim that this score, although useful, is not sufficient for evaluating next link prediction models with the aim of finding a sufficient order of the model, the size of a recommendation set, and assessing the impact of unexpected events on the prediction accuracy. Herein, we make use of a variable length Markov model to compare the usefulness of three alternatives to the hit and miss score: the Mean Absolute Error, the Ignorance Score, and the Brier score. We present an extensive evaluation of the methods on real data sets and a comprehensive comparison of the scoring methods.

Download Full-text

Estimation of parameters and reliability characteristics for a generalized Rayleigh distribution under progressive type-II censored sample

Communications in Statistics - Simulation and Computation ◽

10.1080/03610918.2019.1630431 ◽

2019 ◽

pp. 1-30 ◽

Cited By ~ 3

Author(s):

Kousik Maiti ◽

Suchandan Kayal

Keyword(s):

Rayleigh Distribution ◽

Type Ii ◽

Estimation Of Parameters ◽

Reliability Characteristics ◽

Progressive Type ◽

Generalized Rayleigh Distribution ◽

Censored Sample ◽

Type Ii Censored Sample

Download Full-text

The impact of multiple molecular and morphological data sets on the phylogenetic reconstruction of subtribe Neurachninae (Poaceae: Panicoideae: Paniceae)

Australian Systematic Botany ◽

10.1071/sb20015 ◽

2021 ◽

Author(s):

E. J. Thompson ◽

Melodina Fabillo

Keyword(s):

Internal Transcribed Spacer ◽

Phylogenetic Reconstruction ◽

Morphological Characters ◽

Molecular Data ◽

Morphological Data ◽

Data Sets ◽

Plastid Markers ◽

Morphological And Molecular Data ◽

The Impact ◽

Selection Of

The taxonomy of Neurachninane has been unstable, with its member genera consisting of Ancistrachne, Calyptochloa, Cleistochloa, Dimorphochloa, Neurachne, Paraneurachne and Thyridolepis, changing since its original circumscription that comprised only the latter three genera. Recent studies on the phylogeny of Neurachninae have focused primarily on molecular data. We analysed the phylogeny of Neurachninae on the basis of molecular data from seven molecular loci (plastid markers: matK, ndhF, rbcL, rpl16, rpoC2 and trnLF, and ribosomal internal transcribed spacer, ITS) and morphological data from 104 morphological characters, including new taxonomically informative micromorphology of upper paleas. We devised an impact assessment scoring (IAS) protocol to aid selection of a tree for inferring the phylogeny of Neurachninae. Combining morphological and molecular data resulted in a well resolved phylogeny with the highest IAS value. Our findings support reinstatement of subtribe Neurachninae in its original sense, Neurachne muelleri and Dimorphochloa rigida. We show that Ancistrachne, Cleistochloa and Dimorphochloa are not monophyletic and Ancistrachne maidenii, Calyptochloa, Cleistochloa and Dimorphochloa form a new group, the cleistogamy group, united by having unique morphology associated with reproductive dimorphism.

Download Full-text

Impact of Smoothing on Parameter Estimation in Quantitative DNA Amplification Experiments

Clinical Chemistry ◽

10.1373/clinchem.2014.230656 ◽

2015 ◽

Vol 61 (2) ◽

pp. 379-388 ◽

Cited By ~ 15

Author(s):

Andrej-Nikolai Spiess ◽

Claudia Deutschmann ◽

Michał Burdukiewicz ◽

Ralf Himmelreich ◽

Katharina Klat ◽

...

Keyword(s):

Moving Average ◽

Dna Amplification ◽

Amplification Efficiency ◽

Data Sets ◽

Smoothing Algorithm ◽

Analytical Strategy ◽

Amplification Curve ◽

Low Sensitivity ◽

The Impact ◽

Selection Of

Abstract BACKGROUND Quantification cycle (Cq) and amplification efficiency (AE) are parameters mathematically extracted from raw data to characterize quantitative PCR (qPCR) reactions and quantify the copy number in a sample. Little attention has been paid to the effects of preprocessing and the use of smoothing or filtering approaches to compensate for noisy data. Existing algorithms largely are taken for granted, and it is unclear which of the various methods is most informative. We investigated the effect of smoothing and filtering algorithms on amplification curve data. METHODS We obtained published high-replicate qPCR data sets from standard block thermocyclers and other cycler platforms and statistically evaluated the impact of smoothing on Cq and AE. RESULTS Our results indicate that selected smoothing algorithms affect estimates of Cq and AE considerably. The commonly used moving average filter performed worst in all qPCR scenarios. The Savitzky–Golay smoother, cubic splines, and Whittaker smoother resulted overall in the least bias in our setting and exhibited low sensitivity to differences in qPCR AE, whereas other smoothers, such as running mean, introduced an AE-dependent bias. CONCLUSIONS The selection of a smoothing algorithm is an important step in developing data analysis pipelines for real-time PCR experiments. We offer guidelines for selection of an appropriate smoothing algorithm in diagnostic qPCR applications. The findings of our study were implemented in the R packages chipPCR and qpcR as a basis for the implementation of an analytical strategy.

Download Full-text

The Impact of Normalization Methods on RNA-Seq Data Analysis

BioMed Research International ◽

10.1155/2015/621690 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 44

Author(s):

J. Zyprych-Walczak ◽

A. Szabelska ◽

L. Handschuh ◽

K. Górczak ◽

K. Klamecka ◽

...

Keyword(s):

High Throughput Sequencing ◽

Data Sets ◽

Complex Data ◽

Rna Seq ◽

Medical Problems ◽

Data Set ◽

Normalization Methods ◽

Wide Range ◽

The Impact ◽

Selection Of

High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably.

Download Full-text

Constraining mass–diameter relations from hydrometeor images and cloud radar reflectivities in tropical continental and oceanic convective anvils

Atmospheric Chemistry and Physics ◽

10.5194/acp-14-11367-2014 ◽

2014 ◽

Vol 14 (20) ◽

pp. 11367-11392 ◽

Cited By ~ 34

Author(s):

E. Fontaine ◽

A. Schwarzenboeck ◽

J. Delanoë ◽

W. Wobrock ◽

D. Leroy ◽

...

Keyword(s):

Power Law ◽

Density Measurement ◽

Real Data ◽

Data Sets ◽

Convective Systems ◽

Cloud Radar ◽

Mesoscale Convective ◽

Aircraft Trajectory ◽

Factor Α ◽

The Impact

Abstract. In this study the density of ice hydrometeors in tropical clouds is derived from a combined analysis of particle images from 2-D-array probes and associated reflectivities measured with a Doppler cloud radar on the same research aircraft. Usually, the mass–diameter m(D) relationship is formulated as a power law with two unknown coefficients (pre-factor, exponent) that need to be constrained from complementary information on hydrometeors, where absolute ice density measurement methods do not apply. Here, at first an extended theoretical study of numerous hydrometeor shapes simulated in 3-D and arbitrarily projected on a 2-D plan allowed to constrain the exponent βof the m(D) relationship from the exponent σ of the surface–diameterS(D)relationship, which is likewise written as a power law. Since S(D) always can be determined for real data from 2-D optical array probes or other particle imagers, the evolution of the m(D) exponent can be calculated. After that, the pre-factor α of m(D) is constrained from theoretical simulations of the radar reflectivities matching the measured reflectivities along the aircraft trajectory. The study was performed as part of the Megha-Tropiques satellite project, where two types of mesoscale convective systems (MCS) were investigated: (i) above the African continent and (ii) above the Indian Ocean. For the two data sets, two parameterizations are derived to calculate the vertical variability of m(D) coefficients α and β as a function of the temperature. Originally calculated (with T-matrix) and also subsequently parameterized m(D) relationships from this study are compared to other methods (from literature) of calculating m(D) in tropical convection. The significant benefit of using variable m(D) relations instead of a single m(D) relationship is demonstrated from the impact of all these m(D) relations on Z-CWC (Condensed Water Content) and Z-CWC-T-fitted parameterizations.

Download Full-text

Weibull Inverse Lomax Distribution

Pakistan Journal of Statistics and Operation Research ◽

10.18187/pjsor.v15i3.2378 ◽

2019 ◽

pp. 587-603 ◽

Cited By ~ 4

Author(s):

Amal S Hassan ◽

Rokaya E Mohamed

Keyword(s):

Maximum Likelihood ◽

Maximum Likelihood Estimators ◽

Real Data ◽

Reliability Function ◽

Model Parameters ◽

Asymptotic Confidence Intervals ◽

Censored Sample ◽

Inverse Lomax Distribution ◽

Sample Maximum ◽

Type Ii Censored Sample

A four-parameter lifetime model, named the Weibull inverse Lomax (WIL) is presented and studied. Some structural properties are derived. The estimation of the model parameters is performed based on Type II censored sample. Maximum likelihood estimators along with asymptotic confidence intervals of population parameters and reliability function are constructed. The property of consistency of maximum likelihood estimators has been verified on the basis of simulated samples. Â Further, the results are applied on two real data.

Download Full-text

Refinements of Stout’s Procedure for Assessing Latent Trait Unidimensionality

Journal of Educational Statistics ◽

10.3102/10769986018001041 ◽

1993 ◽

Vol 18 (1) ◽

pp. 41-68 ◽

Cited By ~ 12

Author(s):

Ratna Nandakumar ◽

William Stout

Keyword(s):

Item Response ◽

Latent Trait ◽

Real Data ◽

Data Sets ◽

Simulation Studies ◽

Nominal Level ◽

Response Data ◽

Latent Trait Model ◽

Binary Item ◽

Selection Of

This article provides a detailed investigation of Stout’s statistical procedure (the computer program DIMTEST) for testing the hypothesis that an essentially unidimensional latent trait model fits observed binary item response data from a psychological test. One finding was that DIMTEST may fail to perform as desired in the presence of guessing when coupled with many high-discriminating items. A revision of DIMTEST is proposed to overcome this limitation. Also, an automatic approach is devised to determine the size of the assessment subtests. Further, an adjustment is made on the estimated standard error of the statistic on which DIMTEST depends. These three refinements have led to an improved procedure that is shown in simulation studies to adhere closely to the nominal level of signficance while achieving considerably greater power. Finally, DIMTEST is validated on a selection of real data sets.

Download Full-text