scholarly journals Dealing With Sparse Data Bias in Medical Sciences: Comprehensive Review of Methods and Applications

Author(s):  
Mohammad Hossein Panahi ◽  
Kazem Mohammad ◽  
Razieh Bidhendi Yarandi ◽  
Fahimeh Ramezani Tehrani

This study aims to illustrate the problem of (Quasi) Complete Separation in the sparse data pattern occurring medical data. We presented the failure of traditional methods and then provided an overview of popular remedial approaches to reduce bias through vivid examples. Penalized maximum likelihood estimation and Bayesian methods are some remedial tools introduced to reduce bias. Data from the Tehran Thyroid and Pregnancy Study, a two-phase cohort study conducted from September 2013 through February 2016, was applied for illustration. The bias reduction of the estimate showed how sufficient these methods are compared to the traditional method. Extremely large measures of association such as the Risk ratios along with an extraordinarily wide range of confidence interval proved the traditional estimation methods futile in case of sparse data while it is still widely applying and reporting. In this review paper, we introduce some advanced methods such as data augmentation to provide unbiased estimations.

Nanomaterials ◽  
2021 ◽  
Vol 11 (5) ◽  
pp. 1346
Author(s):  
Andreas Breitwieser ◽  
Uwe B. Sleytr ◽  
Dietmar Pum

Homogeneous and stable dispersions of functionalized carbon nanotubes (CNTs) in aqueous solutions are imperative for a wide range of applications, especially in life and medical sciences. Various covalent and non-covalent approaches were published to separate the bundles into individual tubes. In this context, this work demonstrates the non-covalent modification and dispersion of pristine multi-walled carbon nanotubes (MWNTs) using two S-layer proteins, namely, SbpA from Lysinibacillus sphaericus CCM2177 and SbsB from Geobacillus stearothermophilus PV72/p2. Both the S-layer proteins coated the MWNTs completely. Furthermore, it was shown that SbpA can form caps at the ends of MWNTs. Reassembly experiments involving a mixture of both S-layer proteins in the same solution showed that the MWNTs were primarily coated with SbsB, whereas SbpA formed self-assembled layers. The dispersibility of the pristine nanotubes coated with SbpA was determined by zeta potential measurements (−24.4 +/− 0.6 mV, pH = 7). Finally, the SbpA-coated MWNTs were silicified with tetramethoxysilane (TMOS) using a mild biogenic approach. As expected, the thickness of the silica layer could be controlled by the reaction time and was 6.3 +/− 1.25 nm after 5 min and 25.0 +/− 5.9 nm after 15 min. Since S-layer proteins have already demonstrated their capability to bind (bio)molecules in dense packing or to act as catalytic sites in biomineralization processes, the successful coating of pristine MWNTs has great potential in the development of new materials, such as biosensor architectures.


2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Malte Seemann ◽  
Lennart Bargsten ◽  
Alexander Schlaefer

AbstractDeep learning methods produce promising results when applied to a wide range of medical imaging tasks, including segmentation of artery lumen in computed tomography angiography (CTA) data. However, to perform sufficiently, neural networks have to be trained on large amounts of high quality annotated data. In the realm of medical imaging, annotations are not only quite scarce but also often not entirely reliable. To tackle both challenges, we developed a two-step approach for generating realistic synthetic CTA data for the purpose of data augmentation. In the first step moderately realistic images are generated in a purely numerical fashion. In the second step these images are improved by applying neural domain adaptation. We evaluated the impact of synthetic data on lumen segmentation via convolutional neural networks (CNNs) by comparing resulting performances. Improvements of up to 5% in terms of Dice coefficient and 20% for Hausdorff distance represent a proof of concept that the proposed augmentation procedure can be used to enhance deep learning-based segmentation for artery lumen in CTA images.


Author(s):  
Anne Krogh Nøhr ◽  
Kristian Hanghøj ◽  
Genis Garcia Erill ◽  
Zilong Li ◽  
Ida Moltke ◽  
...  

Abstract Estimation of relatedness between pairs of individuals is important in many genetic research areas. When estimating relatedness, it is important to account for admixture if this is present. However, the methods that can account for admixture are all based on genotype data as input, which is a problem for low-depth next-generation sequencing (NGS) data from which genotypes are called with high uncertainty. Here we present a software tool, NGSremix, for maximum likelihood estimation of relatedness between pairs of admixed individuals from low-depth NGS data, which takes the uncertainty of the genotypes into account via genotype likelihoods. Using both simulated and real NGS data for admixed individuals with an average depth of 4x or below we show that our method works well and clearly outperforms all the commonly used state-of-the-art relatedness estimation methods PLINK, KING, relateAdmix, and ngsRelate that all perform quite poorly. Hence, NGSremix is a useful new tool for estimating relatedness in admixed populations from low-depth NGS data. NGSremix is implemented in C/C ++ in a multi-threaded software and is freely available on Github https://github.com/KHanghoj/NGSremix.


Axioms ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 25 ◽  
Author(s):  
Ehab Almetwally ◽  
Randa Alharbi ◽  
Dalia Alnagar ◽  
Eslam Hafez

This paper aims to find a statistical model for the COVID-19 spread in the United Kingdom and Canada. We used an efficient and superior model for fitting the COVID 19 mortality rates in these countries by specifying an optimal statistical model. A new lifetime distribution with two-parameter is introduced by a combination of inverted Topp-Leone distribution and modified Kies family to produce the modified Kies inverted Topp-Leone (MKITL) distribution, which covers a lot of application that both the traditional inverted Topp-Leone and the modified Kies provide poor fitting for them. This new distribution has many valuable properties as simple linear representation, hazard rate function, and moment function. We made several methods of estimations as maximum likelihood estimation, least squares estimators, weighted least-squares estimators, maximum product spacing, Crame´r-von Mises estimators, and Anderson-Darling estimators methods are applied to estimate the unknown parameters of MKITL distribution. A numerical result of the Monte Carlo simulation is obtained to assess the use of estimation methods. also, we applied different data sets to the new distribution to assess its performance in modeling data.


2020 ◽  
Vol 70 (1) ◽  
pp. 181-189
Author(s):  
Guy Baele ◽  
Mandev S Gill ◽  
Paul Bastide ◽  
Philippe Lemey ◽  
Marc A Suchard

Abstract Markov models of character substitution on phylogenies form the foundation of phylogenetic inference frameworks. Early models made the simplifying assumption that the substitution process is homogeneous over time and across sites in the molecular sequence alignment. While standard practice adopts extensions that accommodate heterogeneity of substitution rates across sites, heterogeneity in the process over time in a site-specific manner remains frequently overlooked. This is problematic, as evolutionary processes that act at the molecular level are highly variable, subjecting different sites to different selective constraints over time, impacting their substitution behavior. We propose incorporating time variability through Markov-modulated models (MMMs), which extend covarion-like models and allow the substitution process (including relative character exchange rates as well as the overall substitution rate) at individual sites to vary across lineages. We implement a general MMM framework in BEAST, a popular Bayesian phylogenetic inference software package, allowing researchers to compose a wide range of MMMs through flexible XML specification. Using examples from bacterial, viral, and plastid genome evolution, we show that MMMs impact phylogenetic tree estimation and can substantially improve model fit compared to standard substitution models. Through simulations, we show that marginal likelihood estimation accurately identifies the generative model and does not systematically prefer the more parameter-rich MMMs. To mitigate the increased computational demands associated with MMMs, our implementation exploits recent developments in BEAGLE, a high-performance computational library for phylogenetic inference. [Bayesian inference; BEAGLE; BEAST; covarion, heterotachy; Markov-modulated models; phylogenetics.]


Stats ◽  
2019 ◽  
Vol 2 (2) ◽  
pp. 247-258 ◽  
Author(s):  
Pedro L. Ramos ◽  
Francisco Louzada

A new one-parameter distribution is proposed in this paper. The new distribution allows for the occurrence of instantaneous failures (inliers) that are natural in many areas. Closed-form expressions are obtained for the moments, mean, variance, a coefficient of variation, skewness, kurtosis, and mean residual life. The relationship between the new distribution with the exponential and Lindley distributions is presented. The new distribution can be viewed as a combination of a reparametrized version of the Zakerzadeh and Dolati distribution with a particular case of the gamma model and the occurrence of zero value. The parameter estimation is discussed under the method of moments and the maximum likelihood estimation. A simulation study is performed to verify the efficiency of both estimation methods by computing the bias, mean squared errors, and coverage probabilities. The superiority of the proposed distribution and some of its concurrent distributions are tested by analyzing four real lifetime datasets.


Sign in / Sign up

Export Citation Format

Share Document