Simple and Fast Generalized - M (GM) Estimator and Its Application to Real Data Set

It is now evident that some robust methods such as MM-estimator do not address the concept of bounded influence function, which means that their estimates still be affected by outliers in the X directions or high leverage points (HLPs), even though they have high efficiency and high breakdown point (BDP). The Generalized M(GM) estimator, such as the GM6 estimator is put forward with the main aim of making a bound for the influence of HLPs by some weight function. The limitation of GM6 is that it gives lower weight to both bad leverage points (BLPs) and good leverage points (GLPs) which make its efficiency decreases when more GLPs are present in a data set. Moreover, the GM6 takes longer computational time. In this paper, we develop a new version of GM-estimator which is based on simple and fast algorithm. The attractive feature of this method is that it only downs weights BLPs and vertical outliers (VOs) and increases its efficiency. The merit of our proposed GM estimator is studied by simulation study and well-known aircraft data set.

Download Full-text

Robust Three-Step Regression Based on Comedian and Its Performance in Cell-Wise and Case-Wise Outliers

Mathematics ◽

10.3390/math8081259 ◽

2020 ◽

Vol 8 (8) ◽

pp. 1259 ◽

Cited By ~ 2

Author(s):

Henry Velasco ◽

Henry Laniado ◽

Mauricio Toro ◽

Víctor Leiva ◽

Yuhlong Lio

Keyword(s):

Regression Model ◽

Simulation Study ◽

Real Data ◽

Robust Estimator ◽

Regression Estimator ◽

Robust Methods ◽

Step Method ◽

Data Set ◽

Potential Applications

Both cell-wise and case-wise outliers may appear in a real data set at the same time. Few methods have been developed in order to deal with both types of outliers when formulating a regression model. In this work, a robust estimator is proposed based on a three-step method named 3S-regression, which uses the comedian as a highly robust scatter estimate. An intensive simulation study is conducted in order to evaluate the performance of the proposed comedian 3S-regression estimator in the presence of cell-wise and case-wise outliers. In addition, a comparison of this estimator with recently developed robust methods is carried out. The proposed method is also extended to the model with continuous and dummy covariates. Finally, a real data set is analyzed for illustration in order to show potential applications.

Download Full-text

Robust AIC with High Breakdown Scale Estimate

Journal of Applied Mathematics ◽

10.1155/2014/286414 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 3

Author(s):

Shokrya Saleh

Keyword(s):

Influence Function ◽

Real Data ◽

Information Criterion ◽

Breakdown Point ◽

Point Estimate ◽

Leverage Points ◽

High Breakdown Point ◽

Outlying Observation ◽

Scale Estimate ◽

Squared Residuals

Akaike Information Criterion (AIC) based on least squares (LS) regression minimizes the sum of the squared residuals; LS is sensitive to outlier observations. Alternative criterion, which is less sensitive to outlying observation, has been proposed; examples are robust AIC (RAIC), robust Mallows Cp (RCp), and robust Bayesian information criterion (RBIC). In this paper, we propose a robust AIC by replacing the scale estimate with a high breakdown point estimate of scale. The robustness of the proposed methods is studied through its influence function. We show that, the proposed robust AIC is effective in selecting accurate models in the presence of outliers and high leverage points, through simulated and real data examples.

Download Full-text

EXPONENTIATED HALF-LOGISTIC LOMAX DISTRIBUTION WITH PROPERTIES AND APPLICATION

NED University Journal of Research ◽

10.35453/nedjr-ascn-2018-0033 ◽

2019 ◽

Vol XVI (2) ◽

pp. 1-11

Author(s):

Farrukh Jamal ◽

Hesham Mohammed Reyad ◽

Soha Othman Ahmed ◽

Muhammad Akbar Ali Shah ◽

Emrah Altun

Keyword(s):

Real Data ◽

Continuous Model ◽

Model Parameters ◽

Data Set ◽

Lomax Distribution ◽

Mathematical Properties ◽

Proposed Model ◽

Probability Weighted Moment ◽

Record Statistics ◽

Maximum Likelihood Criterion

A new three-parameter continuous model called the exponentiated half-logistic Lomax distribution is introduced in this paper. Basic mathematical properties for the proposed model were investigated which include raw and incomplete moments, skewness, kurtosis, generating functions, Rényi entropy, Lorenz, Bonferroni and Zenga curves, probability weighted moment, stress strength model, order statistics, and record statistics. The model parameters were estimated by using the maximum likelihood criterion and the behaviours of these estimates were examined by conducting a simulation study. The applicability of the new model is illustrated by applying it on a real data set.

Download Full-text

Evaluation for estimating of the PDF and the CDF of Generalized Inverted Exponential Distribution with Application in Industry

Advances in Mathematics: Scientific Journal ◽

10.37418/amsj.9.1.39 ◽

2020 ◽

pp. 507-522

Author(s):

Parisa Torkaman

Keyword(s):

Least Squares ◽

Exponential Distribution ◽

Mean Squared Error ◽

Weighted Least Squares ◽

Real Data ◽

Minimum Variance ◽

Cumulative Distribution ◽

Estimation Methods ◽

Data Set ◽

Better Than

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.

Download Full-text

HCVS: Pinpointing Chromatin States Through Hierarchical Clustering and Visualization Scheme

Current Bioinformatics ◽

10.2174/1574893613666180402141107 ◽

2019 ◽

Vol 14 (2) ◽

pp. 148-156

Author(s):

Nighat Noureen ◽

Sahar Fazal ◽

Muhammad Abdul Qadir ◽

Muhammad Tanvir Afzal

Keyword(s):

Hierarchical Clustering ◽

Real Data ◽

Cell Types ◽

Computational Scheme ◽

Data Set ◽

Chromatin States ◽

Functional Regions ◽

Visualization Strategy ◽

Hidden States ◽

Next Generation Sequencing Ngs

Background: Specific combinations of Histone Modifications (HMs) contributing towards histone code hypothesis lead to various biological functions. HMs combinations have been utilized by various studies to divide the genome into different regions. These study regions have been classified as chromatin states. Mostly Hidden Markov Model (HMM) based techniques have been utilized for this purpose. In case of chromatin studies, data from Next Generation Sequencing (NGS) platforms is being used. Chromatin states based on histone modification combinatorics are annotated by mapping them to functional regions of the genome. The number of states being predicted so far by the HMM tools have been justified biologically till now. Objective: The present study aimed at providing a computational scheme to identify the underlying hidden states in the data under consideration. </P><P> Methods: We proposed a computational scheme HCVS based on hierarchical clustering and visualization strategy in order to achieve the objective of study. Results: We tested our proposed scheme on a real data set of nine cell types comprising of nine chromatin marks. The approach successfully identified the state numbers for various possibilities. The results have been compared with one of the existing models as well which showed quite good correlation. Conclusion: The HCVS model not only helps in deciding the optimal state numbers for a particular data but it also justifies the results biologically thereby correlating the computational and biological aspects.

Download Full-text

Implementation of a Modified Faster R-CNN for Target Detection Technology of Coastal Defense Radar

Remote Sensing ◽

10.3390/rs13091703 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1703

Author(s):

He Yan ◽

Chao Chen ◽

Guodong Jin ◽

Jindong Zhang ◽

Xudong Wang ◽

...

Keyword(s):

False Alarm ◽

False Alarm Rate ◽

Target Detection ◽

Real Data ◽

Detection Performance ◽

Detection Accuracy ◽

Constant False Alarm Rate ◽

Data Set ◽

Detection Technology ◽

Coastal Defense

The traditional method of constant false-alarm rate detection is based on the assumption of an echo statistical model. The target recognition accuracy rate and the high false-alarm rate under the background of sea clutter and other interferences are very low. Therefore, computer vision technology is widely discussed to improve the detection performance. However, the majority of studies have focused on the synthetic aperture radar because of its high resolution. For the defense radar, the detection performance is not satisfactory because of its low resolution. To this end, we herein propose a novel target detection method for the coastal defense radar based on faster region-based convolutional neural network (Faster R-CNN). The main processing steps are as follows: (1) the Faster R-CNN is selected as the sea-surface target detector because of its high target detection accuracy; (2) a modified Faster R-CNN based on the characteristics of sparsity and small target size in the data set is employed; and (3) soft non-maximum suppression is exploited to eliminate the possible overlapped detection boxes. Furthermore, detailed comparative experiments based on a real data set of coastal defense radar are performed. The mean average precision of the proposed method is improved by 10.86% compared with that of the original Faster R-CNN.

Download Full-text

Novel semi-analytical optoelectronic modeling based on homogenization theory for realistic plasmonic polymer solar cells

Scientific Reports ◽

10.1038/s41598-021-82525-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Zahra Arefinia ◽

Dip Prakash Samajdar

Keyword(s):

Solar Cells ◽

Active Layer ◽

Analytical Modeling ◽

High Efficiency ◽

Polymer Solar Cells ◽

Scattered Light ◽

Homogenization Theory ◽

Computational Time ◽

Coupled Equations ◽

Long Time

AbstractNumerical-based simulations of plasmonic polymer solar cells (PSCs) incorporating a disordered array of non-uniform sized plasmonic nanoparticles (NPs) impose a prohibitively long-time and complex computational demand. To surmount this limitation, we present a novel semi-analytical modeling, which dramatically reduces computational time and resource consumption and yet is acceptably accurate. For this purpose, the optical modeling of active layer-incorporated plasmonic metal NPs, which is described by a homogenization theory based on a modified Maxwell–Garnett-Mie theory, is inputted in the electrical modeling based on the coupled equations of Poisson, continuity, and drift–diffusion. Besides, our modeling considers the effects of absorption in the non-active layers, interference induced by electrodes, and scattered light escaping from the PSC. The modeling results satisfactorily reproduce a series of experimental data for photovoltaic parameters of plasmonic PSCs, demonstrating the validity of our modeling approach. According to this, we implement the semi-analytical modeling to propose a new high-efficiency plasmonic PSC based on the PM6:Y6 PSC, having the highest reported power conversion efficiency (PCE) to date. The results show that the incorporation of plasmonic NPs into PM6:Y6 active layer leads to the PCE over 18%.

Download Full-text

Comparison study in statistical estimation of gene effects based on a real data set

Journal of Physics Conference Series ◽

10.1088/1742-6596/1978/1/012047 ◽

2021 ◽

Vol 1978 (1) ◽

pp. 012047

Author(s):

Xiaona Sheng ◽

Yuqiu Ma ◽

Jiabin Zhou ◽

Jingjing Zhou

Keyword(s):

Statistical Estimation ◽

Real Data ◽

Comparison Study ◽

Data Set ◽

Gene Effects

Download Full-text

Intuitionistic fuzzy two-factor variance analysis of movie ticket sales

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219212 ◽

2021 ◽

pp. 1-11

Author(s):

Velichka Traneva ◽

Stoyan Tranev

Keyword(s):

Comparative Analysis ◽

Real Data ◽

Intuitionistic Fuzzy Sets ◽

Real Numbers ◽

Data Set ◽

Important Method ◽

Ticket Sales ◽

Intuitionistic Fuzzy ◽

The One ◽

First Time

Analysis of variance (ANOVA) is an important method in data analysis, which was developed by Fisher. There are situations when there is impreciseness in data In order to analyze such data, the aim of this paper is to introduce for the first time an intuitionistic fuzzy two-factor ANOVA (2-D IFANOVA) without replication as an extension of the classical ANOVA and the one-way IFANOVA for a case where the data are intuitionistic fuzzy rather than real numbers. The proposed approach employs the apparatus of intuitionistic fuzzy sets (IFSs) and index matrices (IMs). The paper also analyzes a unique set of data on daily ticket sales for a year in a multiplex of Cinema City Bulgaria, part of Cineworld PLC Group, applying the two-factor ANOVA and the proposed 2-D IFANOVA to study the influence of “ season ” and “ ticket price ” factors. A comparative analysis of the results, obtained after the application of ANOVA and 2-D IFANOVA over the real data set, is also presented.

Download Full-text

Fundamental resource trade-offs for encoded distributed optimization

Information and Inference A Journal of the IMA ◽

10.1093/imaiai/iaaa026 ◽

2020 ◽

Author(s):

A Salman Avestimehr ◽

Seyed Mohammadreza Mousavi Kalan ◽

Mahdi Soltanolkotabi

Keyword(s):

Computational Time ◽

Massive Data ◽

Data Sets ◽

Massive Data Sets ◽

Computational Framework ◽

Data Set ◽

Trade Offs ◽

Major Bottleneck ◽

Computing Environments ◽

Analyze Data

Abstract Dealing with the shear size and complexity of today’s massive data sets requires computational platforms that can analyze data in a parallelized and distributed fashion. A major bottleneck that arises in such modern distributed computing environments is that some of the worker nodes may run slow. These nodes a.k.a. stragglers can significantly slow down computation as the slowest node may dictate the overall computational time. A recent computational framework, called encoded optimization, creates redundancy in the data to mitigate the effect of stragglers. In this paper, we develop novel mathematical understanding for this framework demonstrating its effectiveness in much broader settings than was previously understood. We also analyze the convergence behavior of iterative encoded optimization algorithms, allowing us to characterize fundamental trade-offs between convergence rate, size of data set, accuracy, computational load (or data redundancy) and straggler toleration in this framework.

Download Full-text