robust statistics Latest Research Papers

Abstract In the traditional approach of obtaining time series forecasts based on the selected model, the model parameters are first estimated, then a point forecast using the obtained estimatesis made and then an interval forecast with a given probability is made. In the article the authors propose a nonparametric method for obtaining a single-stage interval forecasting of a time series based on constructing predictive and target variables sets using robust statistics and obtaining the forecast boundaries by constructing linear regression models. The predictive algorithm is based on the problems of estimating the parameters of linear multiple regression using a model regularization methods. The results of forecasting prove the expediency and effectiveness of the proposed method.

Download Full-text

A comprehensive LFQ benchmark dataset on modern day acquisition strategies in proteomics

10.1101/2021.11.24.469852 ◽

2021 ◽

Author(s):

Bart Van Puyvelde ◽

Simon Daled ◽

Sander Willems ◽

Ralf Gabriels ◽

Anne Gonzalez de Peredo ◽

...

Keyword(s):

Mass Spectrometry ◽

Liquid Chromatography ◽

Ion Mobility ◽

Additional Data ◽

Robust Statistics ◽

Liquid Chromatography Mass Spectrometry ◽

Data Repositories ◽

Data Independent Acquisition ◽

Data Formats ◽

Acquisition Strategies

In the last decade, a revolution in liquid chromatography-mass spectrometry (LC-MS) based proteomics was unfolded with the introduction of dozens of novel instruments that incorporate additional data dimensions through innovative acquisition methodologies, in turn inspiring specialized data analysis pipelines. Simultaneously, a growing number of proteomics datasets have been made publicly available through data repositories such as ProteomeXchange, Zenodo and Skyline Panorama. However, developing algorithms to mine this data and assessing the performance on different platforms is currently hampered by the lack of single benchmark experimental design. Therefore, we acquired a hybrid proteome mixture on different instrument platforms and in all currently available families of data acquisition. Here, we present a comprehensive Data-Dependent and Data-Independent Acquisition (DDA/DIA) dataset acquired using several of the most commonly used current day instrumental platforms. The dataset consists of over 700 LC-MS runs, including adequate replicates allowing robust statistics and covering over nearly 10 different data formats, including scanning quadrupole and ion mobility enabled acquisitions. Datasets are available via ProteomeXchange (PXD028735).

Download Full-text

Still No Free Lunches: The Price to Pay for Tighter PAC-Bayes Bounds

Entropy ◽

10.3390/e23111529 ◽

2021 ◽

Vol 23 (11) ◽

pp. 1529

Author(s):

Benjamin Guedj ◽

Louis Pujol

Keyword(s):

Robust Statistics ◽

Learning Algorithm ◽

Finite Variance ◽

Free Lunch ◽

Recent Advances ◽

Probably Approximately Correct ◽

The Cost ◽

No Free Lunch

“No free lunch” results state the impossibility of obtaining meaningful bounds on the error of a learning algorithm without prior assumptions and modelling, which is more or less realistic for a given problem. Some models are “expensive” (strong assumptions, such as sub-Gaussian tails), others are “cheap” (simply finite variance). As it is well known, the more you pay, the more you get: in other words, the most expensive models yield the more interesting bounds. Recent advances in robust statistics have investigated procedures to obtain tight bounds while keeping the cost of assumptions minimal. The present paper explores and exhibits what the limits are for obtaining tight probably approximately correct (PAC)-Bayes bounds in a robust setting for cheap models.

Download Full-text

Evaluating Quantitative Precipitation Forecasts Using the 2.5 km CReSS Model for Typhoons in Taiwan: An Update through the 2015 Season

Atmosphere ◽

10.3390/atmos12111501 ◽

2021 ◽

Vol 12 (11) ◽

pp. 1501

Author(s):

Chung-Chieh Wang ◽

Chih-Sheng Chang ◽

Yi-Wen Wang ◽

Chien-Chang Huang ◽

Shih-Chieh Wang ◽

...

Keyword(s):

Robust Statistics ◽

Strong Dependence ◽

Rain Gauge ◽

Grid Spacing ◽

Rainfall Events ◽

Rain Gauge Data ◽

Gauge Data ◽

Quantitative Precipitation Forecasts ◽

Cloud Resolving Model ◽

Typhoon Rainfall

In this study, 24 h quantitative precipitation forecasts (QPFs) by a cloud-resolving model (with a grid spacing of 2.5 km) on days 1–3 for 29 typhoons in six seasons of 2010–2015 in Taiwan were examined using categorical scores and rain gauge data. The study represents an update from a previous study for 2010–2012, in order to produce more stable and robust statistics toward the high thresholds (typically with fewer sample points), which is our main focus of interest. This is important to better understand the model’s ability to predict such high-impact typhoon rainfall events. The overall threat scores (TS, defined as the fraction among all verification points that are correctly predicted to reach a given threshold to all points that are either observed or predicted to reach that threshold, or both) were 0.28 and 0.18 on day 1 (0–24 h) QPFs, 0.25 and 0.16 on day 2 (24–48 h) QPFs, and 0.15 and 0.08 on day 3 (48–72 h) QPFs at 350 mm and 500 mm, respectively, showing improvements over 5 km models. Moreover, as found previously, a strong dependence of higher TSs for larger rainfall events also existed, and the corresponding TSs at 350 and 500 mm for the top 5% of events were 0.39 and 0.25 on day 1, 0.38 and 0.21 on day 2, and 0.25 and 0.12 on day 3. Thus, for the top typhoon rainfall events that have the highest potential for hazards, the model exhibits an even higher ability for QPFs based on categorical scores. Furthermore, it is shown that the model has little tendency to overpredict or underpredict rainfall for all groups of events with different rainfall magnitude across all thresholds, except for some tendency to under-forecast for the largest event group on day 3. Some issues associated with categorical statistics to be aware of are also demonstrated and discussed.

Download Full-text

Revisiting Lightning Activity and Parameterization Using Geostationary Satellite Observations

Remote Sensing ◽

10.3390/rs13193866 ◽

2021 ◽

Vol 13 (19) ◽

pp. 3866

Author(s):

Xin Zhang ◽

Yan Yin ◽

Julia Kukulies ◽

Yang Li ◽

Xiang Kuang ◽

...

Keyword(s):

Robust Statistics ◽

Detection Efficiency ◽

Lightning Activity ◽

Power Relationship ◽

Lightning Flash ◽

Flash Rate ◽

The Difference ◽

Total Lightning ◽

The Tropics ◽

The Relationship

The Geostationary Lightning Mapper (GLM) on the Geostationary Operational Environmental Satellite 16 (GOES-16) detects total lightning continuously, with a high spatial resolution and detection efficiency. Coincident data from the GLM and the Advanced Baseline Imager (ABI) are used to explore the correlation between the cloud top properties and flash activity across the continental United States (CONUS) sector from May to September 2020. A large number of collocated infrared (IR) brightness temperature (TBB), cloud top height (CTH) and lightning data provides robust statistics. Overall, the likelihood of lightning occurrence and high flash density is higher if the TBB is colder than 225 K. The higher CTH is observed to be correlated with a larger flash rate, a smaller flash size, stronger updraft, and larger optical energy. Furthermore, the cloud top updraft velocity (w) is estimated based on the decreasing rate of TBB, but it is smaller than the updraft velocity of the convective core. As a result, the relationship between CTH and lightning flash rate is investigated independently of w over the continental, oceanic and coastal regimes in the tropics and mid-latitudes. When the CTH is higher than 12 km, the flash rates of oceanic lightning are 38% smaller than those of both coastal and continental lightning. In addition, it should be noted that more studies are necessary to examine why the oceanic lightning with low clouds (CTH < 8 km) has higher flash rates than lightning over land and coast. Finally, the exponents of derived power relationship between CTH and lightning flash rate are smaller than four, which is underestimated due to the GLM detection efficiency and the difference between IR CTH and 20 dBZ CTH. The results from combining the ABI and GLM products suggest that merging multiple satellite datasets could benefit both lightning activity and parameterization studies, although the parallax corrections should be considered.

Download Full-text

Data reduction for serial crystallography using a robust peak finder

Journal of Applied Crystallography ◽

10.1107/s1600576721007317 ◽

2021 ◽

Vol 54 (5) ◽

Author(s):

Marjan Hadian-Jazi ◽

Alireza Sadri ◽

Anton Barty ◽

Oleksandr Yefanov ◽

Marina Galchenkova ◽

...

Keyword(s):

Robust Statistics ◽

Probability Distributions ◽

Data Sets ◽

Time Data ◽

Peak Finding ◽

Short Period ◽

Serial Crystallography ◽

Diffraction Patterns ◽

Analyse Data ◽

Real Time Data Processing

A peak-finding algorithm for serial crystallography (SX) data analysis based on the principle of `robust statistics' has been developed. Methods which are statistically robust are generally more insensitive to any departures from model assumptions and are particularly effective when analysing mixtures of probability distributions. For example, these methods enable the discretization of data into a group comprising inliers (i.e. the background noise) and another group comprising outliers (i.e. Bragg peaks). Our robust statistics algorithm has two key advantages, which are demonstrated through testing using multiple SX data sets. First, it is relatively insensitive to the exact value of the input parameters and hence requires minimal optimization. This is critical for the algorithm to be able to run unsupervised, allowing for automated selection or `vetoing' of SX diffraction data. Secondly, the processing of individual diffraction patterns can be easily parallelized. This means that it can analyse data from multiple detector modules simultaneously, making it ideally suited to real-time data processing. These characteristics mean that the robust peak finder (RPF) algorithm will be particularly beneficial for the new class of MHz X-ray free-electron laser sources, which generate large amounts of data in a short period of time.

Download Full-text

Addressing Unusual Assay Variability with Robust Statistics

SLAS DISCOVERY Advancing Life Sciences ◽

10.1177/24725552211038379 ◽

2021 ◽

pp. 247255522110383

Author(s):

Jason Haelewyn ◽

Philip W. Iversen ◽

Jeffrey R. Weidner

Keyword(s):

Best Practices ◽

Statistical Methods ◽

Robust Statistics ◽

Biological Process ◽

Primary Efficacy ◽

Efficacy Data ◽

Assay Optimization ◽

Bioassay Data ◽

Robust Statistical Methods

Well-behaved, in vitro bioassays generally produce normally distributed values in their primary (efficacy) data. Accordingly, the best practices for statistical analysis are well documented and understood. However, assays may occasionally display unusually high variability and fall outside the assumptions inherent in these standard analyses. These assays may still be in the optimization phase, in which the source of variation could be identified and addressed. They might also represent the best available option to address the biological process being examined. In these cases, the use of robust statistical methods may provide a more appropriate set of tools for both data analysis and assay optimization. This article provides guidance on best practices for the use of robust statistical methods for the analysis of bioassay data as an alternative to standard methods. Impacts on experimental design and interpretation will be discussed.

Download Full-text

Organisation of Multi-Mycotoxin Proficiency Tests: Evaluation of the Performances of the Laboratories Using the Triple A Rating Approach

Toxins ◽

10.3390/toxins13090591 ◽

2021 ◽

Vol 13 (9) ◽

pp. 591

Author(s):

Emmanuel K. Tangni ◽

Bart Huybrechts ◽

Julien Masquelier ◽

Els Van Hoeck

Keyword(s):

United States ◽

Robust Statistics ◽

The United States ◽

International Standard Organization ◽

Proficiency Tests ◽

Z Scores ◽

Iso 13528 ◽

Standard Organization ◽

The Stability

In accordance with the International Standard Organization ISO 17043, two proficiency tests (PTs) for the simultaneous determination of aflatoxins (AFB1, AFB2, AFG1, AFG2); deoxynivalenol; fumonisins FB1, FB2, and B3; ochratoxin A, the T-2 toxin; and the HT-2 toxin were conducted in 2019 and 2020 using cornflakes and rusk flours that were prepared in house. The homogeneity and the stability of these materials were verified according to the criteria laid down in ISO 13528 using randomly selected samples. Most of the targeted toxins were found to be homogenously distributed in both materials with no significant changes during the timescale of the PTs. Next, the materials were distributed to approximately 25 participating laboratories from Europe, Canada, and the United States. The obtained datasets were computed using robust statistics. The outliers were checked and removed, and the toxin concentrations were assigned as the consensus value of the results of the participants at Horwitz ratios <1.2. The z scores were generated for all mycotoxins, and the results were pooled to calculate the relative sum of squared z scores (SZ2) indexes and were clustered according to the triple A rating. Overall, at least 80% of the participating laboratories achieved good and acceptable performances. The most frequent categories assigned to good performances (SZ2 ≤ 2) were AAA (51%) and BAA (13%). Clusters of BBA + CBA (6%) included laboratories reporting acceptable z scores <90% of the total z scores for less than 90% or 50% of the mycotoxins targeted in the 2 matrices. The triple A rating seems to be appropriate in evaluating the performances of laboratories involved in multi-mycotoxin analyses. Accredited and non-accredited analytical methods achieved good and acceptable performances.

Download Full-text

MAAPER: model-based analysis of alternative polyadenylation using 3′ end-linked reads

Genome Biology ◽

10.1186/s13059-021-02429-5 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Wei Vivian Li ◽

Dinghai Zheng ◽

Ruijia Wang ◽

Bin Tian

Keyword(s):

Single Cell ◽

Robust Statistics ◽

Alternative Polyadenylation ◽

Model Based ◽

Eukaryotic Genes ◽

Different Types ◽

Cell Transcriptome ◽

Single Cell Transcriptome ◽

Cell Data ◽

Model Based Analysis

AbstractMost eukaryotic genes express alternative polyadenylation (APA) isoforms. A growing number of RNA sequencing methods, especially those used for single-cell transcriptome analysis, generate reads close to the polyadenylation site (PAS), termed nearSite reads, hence inherently containing information about APA isoform abundance. Here, we present a probabilistic model-based method named MAAPER to utilize nearSite reads for APA analysis. MAAPER predicts PASs with high accuracy and sensitivity and examines different types of APA events with robust statistics. We show MAAPER’s performance with both bulk and single-cell data and its applicability in unpaired or paired experimental designs.

Download Full-text

robust statistics
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Robust Statistics: Methods and Applications

Interval forecasting of time series using orderstatistics

A comprehensive LFQ benchmark dataset on modern day acquisition strategies in proteomics

Still No Free Lunches: The Price to Pay for Tighter PAC-Bayes Bounds

Evaluating Quantitative Precipitation Forecasts Using the 2.5 km CReSS Model for Typhoons in Taiwan: An Update through the 2015 Season

Revisiting Lightning Activity and Parameterization Using Geostationary Satellite Observations

Data reduction for serial crystallography using a robust peak finder

Addressing Unusual Assay Variability with Robust Statistics

Organisation of Multi-Mycotoxin Proficiency Tests: Evaluation of the Performances of the Laboratories Using the Triple A Rating Approach

MAAPER: model-based analysis of alternative polyadenylation using 3′ end-linked reads

Export Citation Format

robust statisticsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Robust Statistics: Methods and Applications

Interval forecasting of time series using orderstatistics

A comprehensive LFQ benchmark dataset on modern day acquisition strategies in proteomics

Still No Free Lunches: The Price to Pay for Tighter PAC-Bayes Bounds

Evaluating Quantitative Precipitation Forecasts Using the 2.5 km CReSS Model for Typhoons in Taiwan: An Update through the 2015 Season

Revisiting Lightning Activity and Parameterization Using Geostationary Satellite Observations

Data reduction for serial crystallography using a robust peak finder

Addressing Unusual Assay Variability with Robust Statistics

Organisation of Multi-Mycotoxin Proficiency Tests: Evaluation of the Performances of the Laboratories Using the Triple A Rating Approach

MAAPER: model-based analysis of alternative polyadenylation using 3′ end-linked reads

robust statistics
Recently Published Documents