scholarly journals Usefulness of the DETECT program for assessing the internal structure of dimensionality in simulated data and results of the Korean nursing licensing examination

Author(s):  
Dong Gi Seo ◽  
Younyoung Choi ◽  
Sun Huh

Purpose: The dimensionality of examinations provides empirical evidence of the internal test structure underlying the responses to a set of items. In turn, the internal structure is an important piece of evidence of the validity of an examination. Thus, the aim of this study was to investigate the performance of the DETECT program and to use it to examine the internal structure of the Korean nursing licensing examination. Methods: Non-parametric methods of dimensional testing, such as the DETECT program, have been proposed as ways of overcoming the limitations of traditional parametric methods. A non-parametric method (the DETECT program) was investigated using simulation data under several conditions and applied to the Korean nursing licensing examination. Results: The DETECT program performed well in terms of determining the number of underlying dimensions under several different conditions in the simulated data. Further, the DETECT program correctly revealed the internal structure of the Korean nursing licensing examination, meaning that it detected the proper number of dimensions and appropriately clustered the items within each dimension.Conclusion: The DETECT program performed well in detecting the number of dimensions and in assigning items for each dimension. This result implies that the DETECT method can be useful for examining the internal structure of assessments, such as licensing examinations, that possess relatively many domains and content areas.

2021 ◽  
Vol 31 (1) ◽  
pp. e40128
Author(s):  
Jimmie Leppink

Aims: outcomes of research in education and training are partly a function of the context in which that study takes place, the questions we ask, and what is feasible. Many questions are about learning, which involves repeated measurements in a particular time window, and the practical context is usually such that offering an intervention to some but not to all learners does not make sense or is unethical. For quality assurance and other purposes, education and training centers may have very locally oriented questions that they seek to answer, such as whether an intervention can be considered effective in their context of small numbers of learners. While the rationale behind the design and outcomes of this kind of studies may be of interest to a much wider community, for example to study the transferability of findings to other contexts, people are often discouraged to report on the outcomes of such studies at conferences or in educational research journals. The aim of this paper is to counter that discouragement and instead encourage people to see small numbers as an opportunity instead of as a problem.Method: a worked example of a parametric and a non-parametric method for this type of situation, using simulated data in the zero-cost Open Source statistical program R version 4.0.5.Results: contrary to the non-parametric method, the parametric method can provide estimates of intervention effectiveness for the individual participant, account for trends in different phases of a study. However, the non-parametric method provides a solution in several situations where the parametric method should be used.Conclusion: Given the costs of research, the lessons to be learned from research, and statistical methods available, small numbers should be considered an opportunity, not a problem.


2018 ◽  
Vol 15 (1) ◽  
pp. 98-107
Author(s):  
R Lestawati ◽  
Rais Rais ◽  
I T Utami

Classification is one of statistical methods in grouping the data compiled systematically. The classification of an object can be done by two approaches, namely classification methods parametric and non-parametric methods. Non-parametric methods is used in this study is the method of CART to be compared to the classification result of the logistic regression as one of a parametric method. From accuracy classification table of CART method to classify the status of DHF patient into category of severe and non-severe exactly 76.3%, whereas the percentage of truth logistic regression was 76.7%, CART method to classify the status of DHF patient into categories of severe and non-severe exactly 76.3%, CART method yielded 4 significant variables that hepatomegaly, epitaksis, melena and diarrhea as well as the classification is divided into several segmens into a more accurate whereas the logistic regression produces only 1 significant variables that hepatomegaly


Author(s):  
Ellen M. Manning ◽  
Barbara R. Holland ◽  
Simon P. Ellingsen ◽  
Shari L. Breen ◽  
Xi Chen ◽  
...  

AbstractWe applied three statistical classification techniques—linear discriminant analysis (LDA), logistic regression, and random forests—to three astronomical datasets associated with searches for interstellar masers. We compared the performance of these methods in identifying whether specific mid-infrared or millimetre continuum sources are likely to have associated interstellar masers. We also discuss the interpretability of the results of each classification technique. Non-parametric methods have the potential to make accurate predictions when there are complex relationships between critical parameters. We found that for the small datasets the parametric methods logistic regression and LDA performed best, for the largest dataset the non-parametric method of random forests performed with comparable accuracy to parametric techniques, rather than any significant improvement. This suggests that at least for the specific examples investigated here accuracy of the predictions obtained is not being limited by the use of parametric models. We also found that for LDA, transformation of the data to match a normal distribution led to a significant improvement in accuracy. The different classification techniques had significant overlap in their predictions; further astronomical observations will enable the accuracy of these predictions to be tested.


2012 ◽  
Vol 9 (73) ◽  
pp. 1797-1808 ◽  
Author(s):  
Eric de Silva ◽  
Neil M. Ferguson ◽  
Christophe Fraser

Using sequence data to infer population dynamics is playing an increasing role in the analysis of outbreaks. The most common methods in use, based on coalescent inference, have been widely used but not extensively tested against simulated epidemics. Here, we use simulated data to test the ability of both parametric and non-parametric methods for inference of effective population size (coded in the popular BEAST package) to reconstruct epidemic dynamics. We consider a range of simulations centred on scenarios considered plausible for pandemic influenza, but our conclusions are generic for any exponentially growing epidemic. We highlight systematic biases in non-parametric effective population size estimation. The most prominent such bias leads to the false inference of slowing of epidemic spread in the recent past even when the real epidemic is growing exponentially. We suggest some sampling strategies that could reduce (but not eliminate) some of the biases. Parametric methods can correct for these biases if the infected population size is large. We also explore how some poor sampling strategies (e.g. that over-represent epidemiologically linked clusters of cases) could dramatically exacerbate bias in an uncontrolled manner. Finally, we present a simple diagnostic indicator, based on coalescent density and which can easily be applied to reconstructed phylogenies, that identifies time-periods for which effective population size estimates are less likely to be biased. We illustrate this with an application to the 2009 H1N1 pandemic.


Author(s):  
Nico Borgsmüller ◽  
Jose Bonet ◽  
Francesco Marass ◽  
Abel Gonzalez-Perez ◽  
Nuria Lopez-Bigas ◽  
...  

AbstractThe high resolution of single-cell DNA sequencing (scDNA-seq) offers great potential to resolve intra-tumor heterogeneity by distinguishing clonal populations based on their mutation profiles. However, the increasing size of scDNA-seq data sets and technical limitations, such as high error rates and a large proportion of missing values, complicate this task and limit the applicability of existing methods. Here we introduce BnpC, a novel non-parametric method to cluster individual cells into clones and infer their genotypes based on their noisy mutation profiles. BnpC employs a Dirichlet process mixture model coupled with a Markov chain Monte Carlo sampling scheme, including a modified split-merge move and a novel posterior estimator to predict clones and genotypes. We benchmarked our method comprehensively against state-of-the-art methods on simulated data using various data sizes, and applied it to three cancer scDNA-seq data sets. On simulated data, BnpC compared favorably against current methods in terms of accuracy, runtime, and scalability. Its inferred genotypes were the most accurate, and it was the only method able to run and produce results on data sets with 10,000 cells. On tumor scDNA-seq data, BnpC was able to identify clonal populations missed by the original cluster analysis but supported by supplementary experimental data. With ever growing scDNA-seq data sets, scalable and accurate methods such as BnpC will become increasingly relevant, not only to resolve intra-tumor heterogeneity but also as a pre-processing step to reduce data size. BnpC is freely available under MIT license at https://github.com/cbg-ethz/BnpC.


2020 ◽  
Vol 36 (19) ◽  
pp. 4854-4859
Author(s):  
Nico Borgsmüller ◽  
Jose Bonet ◽  
Francesco Marass ◽  
Abel Gonzalez-Perez ◽  
Nuria Lopez-Bigas ◽  
...  

Abstract Motivation The high resolution of single-cell DNA sequencing (scDNA-seq) offers great potential to resolve intratumor heterogeneity (ITH) by distinguishing clonal populations based on their mutation profiles. However, the increasing size of scDNA-seq datasets and technical limitations, such as high error rates and a large proportion of missing values, complicate this task and limit the applicability of existing methods. Results Here, we introduce BnpC, a novel non-parametric method to cluster individual cells into clones and infer their genotypes based on their noisy mutation profiles. We benchmarked our method comprehensively against state-of-the-art methods on simulated data using various data sizes, and applied it to three cancer scDNA-seq datasets. On simulated data, BnpC compared favorably against current methods in terms of accuracy, runtime and scalability. Its inferred genotypes were the most accurate, especially on highly heterogeneous data, and it was the only method able to run and produce results on datasets with 5000 cells. On tumor scDNA-seq data, BnpC was able to identify clonal populations missed by the original cluster analysis but supported by Supplementary Experimental Data. With ever growing scDNA-seq datasets, scalable and accurate methods such as BnpC will become increasingly relevant, not only to resolve ITH but also as a preprocessing step to reduce data size. Availability and implementation BnpC is freely available under MIT license at https://github.com/cbg-ethz/BnpC. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 79 (12) ◽  
Author(s):  
A. M. Velasquez-Toribio ◽  
M. M. Machado ◽  
Julio C. Fabris

AbstractWe investigate the possibilities of reconstructing the cosmic equation of state (EoS) for high redshift. In order to obtain general results, we use two model-independent approaches. The first reconstructs the EoS using comoving distance and the second makes use of the Hubble parameter data. To implement the first method, we use a recent set of Gamma-Ray Bursts (GRBs) measures. To implement the second method, we generate simulated data using the Sandage–Loeb (SL) effect; for the fiducial model, we use the $$\Lambda CDM$$ΛCDM model. In both cases, the statistical analysis is conducted through the Gaussian processes (non-parametric). In general, we demonstrate that this methodology for reconstructing the EoS using a non-parametric method plus a model-independent approach works appropriately due to the feasibility of calculation and the ease of introducing a priori information ($$H_ {0}$$H0 and $$\Omega _{m0}$$Ωm0). In the near future, following this methodology with a higher number of high quality data will help obtain strong restrictions for the EoS.


2001 ◽  
Vol 78 (3) ◽  
pp. 303-316 ◽  
Author(s):  
P. TILQUIN ◽  
W. COPPIETERS ◽  
J. M. ELSEN ◽  
F. LANTIER ◽  
C. MORENO ◽  
...  

Most QTL mapping methods assume that phenotypes follow a normal distribution, but many phenotypes of interest are not normally distributed, e.g. bacteria counts (or colony-forming units, CFU). Such data are extremely skewed to the right and can present a high amount of zero values, which are ties from a statistical point of view. Our objective is therefore to assess the efficiency of four QTL mapping methods applied to bacteria counts: (1) least-squares (LS) analysis, (2) maximum-likelihood (ML) analysis, (3) non-parametric (NP) mapping and (4) nested ANOVA (AN). A transformation based on quantiles is used to mimic observed distributions of bacteria counts. Single positions (1 marker, 1 QTL) as well as chromosome scans (11 markers, 1 QTL) are simulated. When compared with the analysis of a normally distributed phenotype, the analysis of raw bacteria counts leads to a strong decrease in power for parametric methods, but no decrease is observed for NP. However, when a mathematical transformation (MT) is applied to bacteria counts prior to analysis, parametric methods have the same power as NP. Furthermore, parametric methods, when coupled with MT, outperform NP when bacteria counts have a very high proportion of zeros (70·8%). Our results show that the loss of power is mainly explained by the asymmetry of the phenotypic distribution, for parametric methods, and by the existence of ties, for the non-parametric method. Therefore, mapping of QTL for bacterial diseases, as well as for other diseases assessed by a counting process, should focus on the occurrence of ties in phenotypes before choosing the appropriate QTL mapping method.


2021 ◽  
Vol 13 (19) ◽  
pp. 3872
Author(s):  
Jianlai Chen ◽  
Hanwen Yu ◽  
Gang Xu ◽  
Junchao Zhang ◽  
Buge Liang ◽  
...  

Existing airborne SAR autofocus methods can be classified as parametric and non-parametric. Generally, non-parametric methods, such as the widely used phase gradient autofocus (PGA) algorithm, are only suitable for scenes with many dominant point targets, while the parametric ones are suitable for all types of scenes, in theory, but their efficiency is generally low. In practice, whether many dominant point targets are present in the scene is usually unknown, so determining what kind of algorithm should be selected is not straightforward. To solve this issue, this article proposes an airborne SAR autofocus approach combined with blurry imagery classification to improve the autofocus efficiency for ensuring autofocus precision. In this approach, we embed the blurry imagery classification based on a typical VGGNet in a deep learning community into the traditional autofocus framework as a preprocessing step before autofocus processing to analyze whether dominant point targets are present in the scene. If many dominant point targets are present in the scene, the non-parametric method is used for autofocus processing. Otherwise, the parametric one is adopted. Therefore, the advantage of the proposed approach is the automatic batch processing of all kinds of airborne measured data.


2015 ◽  
Author(s):  
Konstantinos Koutroumpas ◽  
François Képès

Identification of protein complexes from proteomic experiments is crucial to understand not only their function but also the principles of cellular organization. Advances in experimental techniques have enabled the construction of large-scale protein-protein interaction networks, and computational methods have been developed to analyze high-throughput data. In most cases several parameters are introduced that have to be trained before application. But how do we select the parameter values when there are no training data available? How many data do we need to properly train a method. How is the performance of a method affected when we incorrectly select the parameter values? The above questions, although important to determine the applicability of a method, are most of the time overlooked. We highlight the importance of such an analysis by investigating how limited knowledge, in the form of incomplete training data, affects the performance of parametric protein-complex prediction algorithms. Furthermore, we develop a simple non-parametric method that does not rely on the existence of training data and we compare it with the parametric alternatives. Using datasets from yeast and fly we demonstrate that parametric methods trained with limited data provide sub-optimal predictions, while our non-parametric method performs better or is on par with the parametric alternatives. Overall, our analysis questions, at least for the specific problem, whether parametric methods provide significantly better results than non-parametric ones to justify the additional effort for applying them.


Sign in / Sign up

Export Citation Format

Share Document