sample sizes
Recently Published Documents


TOTAL DOCUMENTS

2104
(FIVE YEARS 622)

H-INDEX

76
(FIVE YEARS 9)

Symmetry ◽  
2022 ◽  
Vol 14 (1) ◽  
pp. 149
Author(s):  
Waqar Khan ◽  
Lingfu Kong ◽  
Brekhna Brekhna ◽  
Ling Wang ◽  
Huigui Yan

Streaming feature selection has always been an excellent method for selecting the relevant subset of features from high-dimensional data and overcoming learning complexity. However, little attention is paid to online feature selection through the Markov Blanket (MB). Several studies based on traditional MB learning presented low prediction accuracy and used fewer datasets as the number of conditional independence tests is high and consumes more time. This paper presents a novel algorithm called Online Feature Selection Via Markov Blanket (OFSVMB) based on a statistical conditional independence test offering high accuracy and less computation time. It reduces the number of conditional independence tests and incorporates the online relevance and redundant analysis to check the relevancy between the upcoming feature and target variable T, discard the redundant features from Parents-Child (PC) and Spouses (SP) online, and find PC and SP simultaneously. The performance OFSVMB is compared with traditional MB learning algorithms including IAMB, STMB, HITON-MB, BAMB, and EEMB, and Streaming feature selection algorithms including OSFS, Alpha-investing, and SAOLA on 9 benchmark Bayesian Network (BN) datasets and 14 real-world datasets. For the performance evaluation, F1, precision, and recall measures are used with a significant level of 0.01 and 0.05 on benchmark BN and real-world datasets, including 12 classifiers keeping a significant level of 0.01. On benchmark BN datasets with 500 and 5000 sample sizes, OFSVMB achieved significant accuracy than IAMB, STMB, HITON-MB, BAMB, and EEMB in terms of F1, precision, recall, and running faster. It finds more accurate MB regardless of the size of the features set. In contrast, OFSVMB offers substantial improvements based on mean prediction accuracy regarding 12 classifiers with small and large sample sizes on real-world datasets than OSFS, Alpha-investing, and SAOLA but slower than OSFS, Alpha-investing, and SAOLA because these algorithms only find the PC set but not SP. Furthermore, the sensitivity analysis shows that OFSVMB is more accurate in selecting the optimal features.


Author(s):  
Damian JJ Farnell

3D facial surface imaging is a useful tool in dentistry and in terms of diagnostics and treatment planning. Between-groups PCA (bgPCA) is a method that has been used to analyse shapes in biological morphometrics, although various “pathologies” of bgPCA have recently been proposed. Monte Carlo (MC) simulated datasets were created here in order to explore “pathologies” of multilevel PCA (mPCA), where mPCA with two levels is equivalent to bgPCA. The first set of MC experiments involved 300 uncorrelated normally distributed variables, whereas the second set of MC experiments used correlated multivariate MC data describing 3D facial shape. We confirmed previous results of other researchers that indicated that bgPCA (and so also mPCA) can give a false impression of strong differences in component scores between groups when there is none in reality. These spurious differences in component scores via mPCA reduced strongly as the sample sizes per group were increased. Eigenvalues via mPCA were also found to be strongly effected by imbalances in sample sizes per group, although this problem was removed by using weighted forms of covariance matrices suggested by the maximum likelihood solution of the two-level model. However, this did not solve problems of spurious differences between groups in these simulations, which was driven by very small sample sizes in one group here. As a “rule of thumb” only, all of our experiments indicate that reasonable results are obtained when sample sizes per group in all groups are at least equal to the number of variables. Interestingly, the sum of all eigenvalues over both levels via mPCA scaled approximately linearly with the inverse of the sample size per group in all experiments. Finally, between-group variation was added explicitly to the MC data generation model in two experiments considered here. Results for the sum of all eigenvalues via mPCA predicted the asymptotic amount for the total amount of variance correctly in this case, whereas standard “single-level” PCA underestimated this quantity.


2022 ◽  
Vol 8 ◽  
Author(s):  
Xiangyu Long ◽  
Rong Wan ◽  
Zengguang Li ◽  
Dong Wang ◽  
Pengbo Song ◽  
...  

A fishery-independent survey can provide detailed information for fishery assessment and management. However, the sampling design for the survey on ichthyoplankton in the estuary area is still poorly understood. In this study, we developed six stratified schemes with various sample sizes, attempting to find cost-efficient sampling designs for monitoring Coilia mystus ichthyoplankton in the Yangtze Estuary. The generalized additive model (GAM) with the Tweedie distribution was used to quantify the “true” distribution of C. mystus eggs and larvae, based on the data from the fishery-independent survey in 2019–2020. The performances of different sampling designs were evaluated by relative estimation error (REE), relative bias (RB), and coefficient of variation (CV). The results indicated that appropriate stratifications with intra-stratum homogeneity and inter-stratum heterogeneity could improve precision. The stratified schemes should be divided not only between the North Branch and South Branch but between river and sea. No less than two stratifications in the South Branch could also get better performance. The sample sizes of 45–55 were considered as the cost-efficient range. Compared to other monitoring programs, monitoring ichthyoplankton in the estuary area required a more complex stratification and a higher resolution sampling. The design ideology and optimization methodology in our study would provide references to sampling designs for ichthyoplankton in the estuary area.


2022 ◽  
Author(s):  
Loic Yengo ◽  
Sailaja Vedantam ◽  
Eirini Marouli ◽  
Julia Sidorenko ◽  
Eric Bartell ◽  
...  

Common SNPs are predicted to collectively explain 40-50% of phenotypic variation in human height, but identifying the specific variants and associated regions requires huge sample sizes. Here we show, using GWAS data from 5.4 million individuals of diverse ancestries, that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a median size of ~90 kb, covering ~21% of the genome. The density of independent associations varies across the genome and the regions of elevated density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs account for 40% of phenotypic variance in European ancestry populations but only ~10%-20% in other ancestries. Effect sizes, associated regions, and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely explained by linkage disequilibrium and allele frequency differences within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than needed to implicate causal genes and variants. Overall, this study, the largest GWAS to date, provides an unprecedented saturated map of specific genomic regions containing the vast majority of common height-associated variants.


2022 ◽  
Vol 12 ◽  
Author(s):  
Ying Wang ◽  
Mandong Liu ◽  
Youyou Tan ◽  
Zhixiao Dong ◽  
Jing Wu ◽  
...  

Background: There is a growing need to offer appropriate services to persons with mild cognitive impairment (MCI) and dementia who are faced with depression and anxiety distresses beyond traditional pharmacological treatment. Dance-based interventions as multi-dimensional interventions address persons' physical, emotional, social, and spiritual aspects of well-being. However, no meta-analysis of randomized controlled treatment trials (RCTs) has examined the effectiveness of dance-based interventions on depression and anxiety among persons with MCI and dementia, and the results of RCTs are inconsistent. The study aimed to examine the effectiveness of dance-based interventions on depression (a primary outcome) and anxiety (a secondary outcome) among persons with MCI and dementia.Methods: A systematic review with meta-analysis was conducted. The inclusion criteria were: population: people of all ages with MCI and dementia; intervention: dance-based interventions; control group: no treatment, usual care, or waiting list group; outcome: depression and anxiety; study design: published or unpublished RCTs. Seven electronic databases (Cochrane, PsycINFO, Web of Science, PubMed, EBSCO, CNKI, WanFang) were searched from 1970 to March 2021. Grey literature and reference lists from relevant articles were also searched and reviewed. The Cochrane “Risk of Bias” tool was used to assess study quality. RevMan 5.4 was used for meta-analysis and heterogeneity was investigated by subgroup and sensitivity analysis. GRADE was applied to assess the evidence quality of depression and anxiety outcomes.Results: Five randomized controlled trials were identified. Sample sizes ranged from 21 to 204. The risk of bias was low, except for being rated as high or unclear for most included studies in two domains: allocation concealment, blinding participants and personnel. Meta-analysis of depression outcome showed no heterogeneity (I2 = 0%), indicating that the variation in study outcomes did not influence the interpretation of results. There were significant differences in decreasing depression in favor of dance-based interventions compared with controls [SMD = −0.42, 95% CI (−0.60, −0.23), p < 0.0001] with a small effect size (Cohen's d = 0.3669); Compared with the post-intervention data, the follow-up data indicated diminishing effects (Cohen's d = 0.1355). Dance-based interventions were more effective in reducing depression for persons with dementia than with those having MCI, and were more effective with the delivery frequency of 1 h twice a week than 35 min 2–3 times a week. Also, one included RCT study showed no significant benefit on anxiety rating scores, which demonstrated small effect sizes at 6 weeks and 12 weeks (Cohen's d = 0.1378, 0.1675, respectively). GRADE analysis indicated the evidence quality of depression was moderate, and the evidence quality of anxiety was low.Conclusions: Dance-based interventions are beneficial to alleviate depression among persons with MCI and dementia. More trials of high quality, large sample sizes are needed to gain more profound insight into dance-based interventions, such as their effects of alleviating anxiety, and the best approaches to perform dance-based interventions.


PLoS ONE ◽  
2022 ◽  
Vol 17 (1) ◽  
pp. e0259994
Author(s):  
Ahmet Faruk Aysan ◽  
Ibrahim Guney ◽  
Nicoleta Isac ◽  
Asad ul Islam Khan

This paper evaluates the performance of eight tests with null hypothesis of cointegration on basis of probabilities of type I and II errors using Monte Carlo simulations. This study uses a variety of 132 different data generations covering three cases of deterministic part and four sample sizes. The three cases of deterministic part considered are: absence of both intercept and linear time trend, presence of only the intercept and presence of both the intercept and linear time trend. It is found that all of tests have either larger or smaller probabilities of type I error and concluded that tests face either problems of over rejection or under rejection, when asymptotic critical values are used. It is also concluded that use of simulated critical values leads to controlled probability of type I error. So, the use of asymptotic critical values may be avoided, and the use of simulated critical values is highly recommended. It is found and concluded that the simple LM test based on KPSS statistic performs better than rest for all specifications of deterministic part and sample sizes.


2022 ◽  
pp. 089443932110549
Author(s):  
Nils Witte ◽  
Ines Schaurer ◽  
Jette Schröder ◽  
Jean Philippe Décieux ◽  
Andreas Ette

This article investigates how mail-based online panel recruitment can be facilitated through incentives. The analysis relies on two incentive experiments and their effects on panel recruitment, and the intermediate participation in the recruitment survey. The experiments were implemented in the context of the German Emigration and Remigration Panel Study and encompass two samples of randomly sampled persons. Tested incentives include a conditional lottery, conditional monetary incentives, and the combination of unconditional money-in-hand with conditional monetary incentives. For an encompassing evaluation of the link between incentives and panel recruitment, the article further assesses the incentives’ implications for demographic composition and panel recruitment unit costs. Multivariate analysis indicates that low combined incentives (€5/€5) or, where unconditional disbursement is unfeasible, high conditional incentives (€20) are most effective in enhancing panel participation. In terms of demographic bias, low combined incentives (€5/€5) and €10 conditional incentives are the favored options. The budget options from the perspective of panel recruitment include the lottery and the €10 conditional incentive which break-even at net sample sizes of 1000.


2022 ◽  
Author(s):  
Shirlee Wohl ◽  
Elizabeth C Lee ◽  
Bethany L DiPrete ◽  
Justin Lessler

As demonstrated during the SARS-CoV-2 pandemic, detecting and tracking the emergence and spread of pathogen variants is an important component of monitoring infectious disease outbreaks. Pathogen genome sequencing has emerged as the primary tool for variant characterization, so it is important to consider the number of sequences needed when designing surveillance programs or studies, both to ensure accurate conclusions and to optimize use of limited resources. However, current approaches to calculating sample size for variant monitoring often do not account for the biological and logistical processes that can bias which infections are detected and which samples are ultimately selected for sequencing. In this manuscript, we introduce a framework that models the full process from infection detection to variant characterization and demonstrate how to use this framework to calculate appropriate sample sizes for sequencing-based surveillance studies. We consider both cross-sectional and continuous sampling, and we have implemented our method in a publicly available tool that allows users to estimate necessary sample sizes given a specific aim (e.g., variant detection or measuring variant prevalence) and sampling method. Our framework is designed to be easy to use, while also flexible enough to be adapted to other pathogens and surveillance scenarios.


PEDIATRICS ◽  
2022 ◽  
Vol 149 (Supplement_1) ◽  
pp. S48-S52
Author(s):  
Nadir Yehya ◽  
Robinder G. Khemani ◽  
Simon Erickson ◽  
Lincoln S. Smith ◽  
Courtney M. Rowan ◽  
...  

CONTEXT Respiratory dysfunction is a component of every organ failure scoring system developed, reflecting the significance of the lung in multiple organ dysfunction syndrome. However, existing systems do not reflect current practice and are not consistently evidence based. OBJECTIVE We aimed to review the literature to identify the components of respiratory failure associated with outcomes in children, with the purpose of developing an operational and evidence-based definition of respiratory dysfunction. DATA SOURCES Electronic searches of PubMed and Embase were conducted from 1992 to January 2020 by using a combination of medical subject heading terms and text words to define respiratory dysfunction, critical illness, and outcomes. STUDY SELECTION We included studies of critically ill children with respiratory dysfunction that evaluated the performance of metrics of respiratory dysfunction and their association with patient-centered outcomes. Studies in adults, studies in premature infants (≤36 weeks’ gestational age), animal studies, reviews and commentaries, case series with sample sizes ≤10, and studies not published in English in which we were unable to determine eligibility criteria were excluded. DATA EXTRACTION Data were abstracted into a standard data extraction form. RESULTS We provided binary (no or yes) and graded (no, nonsevere, or severe) definitions of respiratory dysfunction, prioritizing oxygenation and respiratory support. The proposed criteria were approved by 82% of members in the first round, with a score of 8 of 9 (interquartile range 7–8). LIMITATIONS Exclusion of non-English publications, heterogeneity across the pediatric age range, small sample sizes, and incomplete handling of confounders are limitations. CONCLUSIONS We propose definitions for respiratory dysfunction in critically ill children after an exhaustive literature review.


Sign in / Sign up

Export Citation Format

Share Document