Sample size, number of categories and sampling assumptions: Exploring some differences between categorization and generalization

The Mantel-Haenszel chi-square (χ2MH) is widely used to detect differential item functioning (item bias) between ethnic and gender-based subgroups on educational and psychological tests. The empirical behavior of χ2MH has been incompletely understood; previous research is inconclusive. The present simulation study explored the effects of sample size, number of items, and trait distributions on the power of χ2MH to detect modeled differential item functioning. A significant effect was obtained for sample size with unacceptably low power for 250 subjects each in the focal and reference groups. The discussion supports the 1990 recommendations of Swaminathan and Rogers, opposes the 1993 view of Zieky that a sample size of 250 for each group is adequate.

Download Full-text

Measurement Invariance Re-Visited: Relations with Sample Size, Number of Groups, Model Size, and Fit Criteria

PsycEXTRA Dataset ◽

10.1037/e500122015-054 ◽

2014 ◽

Author(s):

D. L. Putnick ◽

M. H. Bornstein

Keyword(s):

Sample Size ◽

Measurement Invariance ◽

Size Number ◽

Model Size

Download Full-text

Aggregation of Consumer Ratings of Online Products: Applying Social Choice Theory to Investigate the Cordorcet Efficiency of Mean Rule

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/154193120705101304 ◽

2007 ◽

Vol 51 (13) ◽

pp. 837-840

Author(s):

Xianjun Sam Zheng

Keyword(s):

Sample Size ◽

Social Choice ◽

Majority Rule ◽

Choice Theory ◽

Social Choice Theory ◽

Sample Mean ◽

Condorcet Efficiency ◽

Size Number ◽

Consumer Ratings ◽

The Mean

Mean rule has been popularly used to aggregate consumer ratings of online products. This study applied social choice theory to evaluate the Condorcet efficiency of the mean rule, and to investigate the effect of sample size (number of voters) on the agreement or disagreement between the mean and majority rules. The American National Election Survey data (1968) were used, where three candidates competed for the presidency, and the numerical thermometer scores were provided for each candidate. Random sampling data with varied sample sizes were drew from the survey, and then were aggregated according to the majority rule, the mean rule, and other social choice rules. The results show that the sample winner of the mean rule agrees with the sample majority winner very well; as sample size increases, the sample mean rule even converges faster to the correct population majority winner and ordering than does the sample majority rule. The implications for using aggregation rules for online product rating were also discussed.

Download Full-text

Effects of sample size, number of markers, and allelic richness on the detection of spatial genetic pattern

Molecular Ecology Resources ◽

10.1111/j.1755-0998.2011.03077.x ◽

2011 ◽

Vol 12 (2) ◽

pp. 276-284 ◽

Cited By ~ 103

Author(s):

ERIN L. LANDGUTH ◽

BRADLEY C. FEDY ◽

SARA J. OYLER‐McCANCE ◽

ANDREW L. GAREY ◽

SARAH L. EMEL ◽

...

Keyword(s):

Sample Size ◽

Allelic Richness ◽

Size Number ◽

Genetic Pattern ◽

Spatial Genetic Pattern

Download Full-text

Sample size, number of categories and sampling assumptions: Exploring some differences between categorization and generalization

10.31234/osf.io/9vc2n ◽

2019 ◽

Author(s):

Andrew T Hendrickson ◽

Amy Perfors ◽

Danielle Navarro ◽

Keith Ransom

Keyword(s):

Sample Size ◽

Computational Models ◽

Inference Problems ◽

Size Number ◽

New Item ◽

Category Frequency

Categorization and generalization are fundamentally related inference problems. Yet leading computational models of categorization (as exemplified by, e.g., Nosofsky, 1986) and generalization (as exemplified by, e.g., Tenenbaum & Griffiths, 2001) make qualitatively different predictions about how inference should change as a function of the number of items. Assuming all else is equal, categorization models predict that increasing the number of items in a category increases the chance of assigning a new item to that category; generalization models predict a decrease, or category tightening with additional exemplars. This paper investigates this discrepancy, showing that people do indeed perform qualitatively differently in categorization and generalization tasks even when all superficial elements of the task are kept constant. Furthermore, the effect of category frequency on generalization is moderated by assumptions about how the items are sampled. We show that neither model naturally accounts for the pattern of behavior across both categorization and generalization tasks, and discuss theoretical extensions of these frameworks to account for the importance of category frequency and sampling assumptions.

Download Full-text

Evaluating performance and determining optimum sample size for regression tree and automatic linear modeling

Arquivo Brasileiro de Medicina Veterinária e Zootecnia ◽

10.1590/1678-4162-12413 ◽

2021 ◽

Vol 73 (6) ◽

pp. 1391-1402

Author(s):

S. Genç ◽

M. Mendeş

Keyword(s):

Sample Size ◽

Simulation Study ◽

Regression Tree ◽

Linear Modeling ◽

Experimental Conditions ◽

Monte Carlo Simulation Study ◽

Optimum Sample Size ◽

Size Number ◽

Explained Variation ◽

Optimum Sample

ABSTRACT This study was carried out for two purposes: comparing performances of Regression Tree and Automatic Linear Modeling and determining optimum sample size for these methods under different experimental conditions. A comprehensive Monte Carlo Simulation Study was designed for these purposes. Results of simulation study showed that percentage of explained variation estimates of both Regression Tree and Automatic Linear Modeling was influenced by sample size, number of variables, and structure of variance-covariance matrix. Automatic Linear Modeling had higher performance than Regression Tree under all experimental conditions. It was concluded that the Regression Tree required much larger samples to make stable estimates when comparing to Automatic Linear Modeling.

Download Full-text

Reference sample size for multiple regression in corn

Pesquisa Agropecuária Brasileira ◽

10.1590/s1678-3921.pab2020.v55.01400 ◽

2020 ◽

Vol 55 ◽

Author(s):

Alberto Cargnelutti Filho ◽

Marcos Toebe

Keyword(s):

Grain Yield ◽

Sample Size ◽

Multiple Regression ◽

Reference Sample ◽

Multiple Regression Model ◽

Coefficient Of Determination ◽

Size Number ◽

Corn Grain ◽

Double Cross ◽

Corn Grain Yield

Abstract: The objective of this work was to determine the number of plants required to model corn grain yield (Y) as a function of ear length (X1) and ear diameter (X2), using the multiple regression model Y = β0 + β1X1 + β2X2. The Y, X1, and X2 traits were measured in 361, 373, and 416 plants, respectively, of single-, three-way, and double-cross hybrids in the 2008/2009 crop year; and in 1,777, 1,693, and 1,720 plants, respectively, of single-, three-way, and double-cross hybrids in the 2009/2010 crop year, totaling 6,340 plants. Descriptive statistics were calculated, and frequency histograms and scatterplots were created. The sample size (number of plants) for the estimate of the β0, β1, and β2 parameters, of the residual standard error, the coefficient of determination, the variance inflation factor, and the condition number between the explanatory traits of the model (X1 and X2) were determined by resampling with replacement. Measuring 260 plants is sufficient to adjust precise multiple regression models of corn grain yield as a function of ear length and ear diameter. The Y = -229.76 + 0.54X1 + 6.16X2 model is a reference for estimating corn grain yield.

Download Full-text

Impact of different conditions on accuracy of five rules for principal components retention

Psihologija ◽

10.2298/psi130801008z ◽

2013 ◽

Vol 46 (3) ◽

pp. 331-347

Author(s):

Aleksandar Zoric ◽

Goran Opacic

Keyword(s):

Monte Carlo ◽

Sample Size ◽

Principal Components ◽

Poor Performance ◽

Error Variance ◽

Parallel Analysis ◽

Monte Carlo Experiment ◽

Size Number ◽

The Impact ◽

New Criteria

Polemics about criteria for nontrivial principal components are still present in the literature. Finding of a lot of papers, is that the most frequently used Guttman Kaiser?s criterion has very poor performance. In the last three years some new criteria were proposed. In this Monte Carlo experiment we aimed to investigate the impact that sample size, number of analyzed variables, number of supposed factors and proportion of error variance have on the accuracy of analyzed criteria for principal components retention. We compared the following criteria: Bartlett?s ?2 test, Horn?s Parallel Analysis, Guttman-Kaiser?s eigenvalue over one, Velicer?s MAP and CHull originally proposed by Ceulemans & Kiers. Factors were systematically combined resulting in 690 different combinations. A total of 138,000 simulations were performed. Novelty in this research is systematic variation of the error variance. Performed simulations showed that, in favorable research conditions, all analyzed criteria work properly. Bartlett?s and Horns criterion expressed the robustness in most of analyzed situations. Velicer?s MAP had the best accuracy in situations with small number of subjects and high number of variables. Results confirm earlier findings of Guttman-Kaiser?s criterion having the worse performance.

Download Full-text

Evaluating the estimation of genetic correlation and heritability using summary statistics

Molecular Genetics and Genomics ◽

10.1007/s00438-021-01817-7 ◽

2021 ◽

Author(s):

Ju Zhang ◽

Fredrick R. Schumacher

Keyword(s):

Sample Size ◽

Genetic Correlation ◽

Reference Panel ◽

Summary Statistics ◽

External Reference ◽

Size Number ◽

Admixed Population ◽

Statistical Approaches ◽

Panel Summary ◽

Number Of Individuals

AbstractWhile novel statistical methods quantifying the shared heritability of traits and diseases between ancestral distinct populations have been recently proposed, a thorough evaluation of these approaches under differing circumstances remain elusive. Brown et al.2016 proposed the method Popcorn to estimate the shared heritability, i.e. genetic correlation, using only summary statistics. Here, we evaluate Popcorn under several parameters and circumstances: sample size, number of SNPs, sample size of external reference panel, various population pairs, inappropriate external reference panel, and admixed population involved. Our results determined the minimum sample size of the external reference panel, summary statistics, and number of SNPs required to accurately estimate both the genetic correlation and heritability. Moreover, the number of individuals and SNPs required to produce accurate and stable estimates was directly proportional with heritability in Popcorn. Misrepresentation of the reference panel overestimated the genetic correlation by 20% and heritability by 60%. Lastly, applying Popcorn to homogeneous (EUR) and admixed (ASW) populations underestimated the genetic correlation by 15%. Although statistical approaches estimating the shared heritability between ancestral populations will provide novel etiologic insight, caution is required ensuring results are based on the appropriate sample size, number of SNPs, and the generalizability of the reference panel to the discovery populations.

Download Full-text

Peptide sequencing in an electrolytic cell with two nanopores in tandem and exopeptidase

10.1101/015297 ◽

2015 ◽

Cited By ~ 1

Author(s):

G Sampath

Keyword(s):

Sample Size ◽

Gel Electrophoresis ◽

Peptide Sequencing ◽

Electrolytic Cell ◽

Amino Acid Type ◽

Tandem Cell ◽

Confidence Levels ◽

Size Number ◽

Acid Type ◽

Blockade Level

A nanopore-based approach to peptide sequencing without labels or immobilization is considered. It is based on a tandem cell (RSC Adv., 2015, 5, 167-171) with the structure [cis1, upstream pore (UNP), trans1/cis2, downstream pore (DNP), trans2]. An amino or carboxyl exopeptidase attached to the downstream side of UNP cleaves successive leading residues in a peptide threading from cis1 through UNP. A cleaved residue translocates to and through DNP where it is identified. A Fokker-Planck model is used to compute translocation statistics for each amino acid type. Multiple discriminators, including a variant of the current blockade level and translocation times through trans1/cis2 and DNP, identify a residue. Calculations show the 20 amino acids to be grouped by charge (+, -, neutral) and ordered within each group (which makes error correction easier). The minimum cleaving interval required of the exopeptidase, the sample size (number of copies of the peptide to sequence or runs with one copy) to identify a residue with a given confidence level, and confidence levels for a given sample size are calculated. The results suggest that if the exopeptidase cleaves each and every residue and does so in a reasonable time, peptide sequencing with acceptable (and correctable) errors may be feasible. If validated experimentally the proposed device could be an alternative to mass spectrometry and gel electrophoresis. Implementation-related issues are discussed.

Download Full-text