HARMONIZING CIGAR SURVEY DATA ACROSS TCORS, CTP, AND PATH STUDIES: THE CIGAR COLLABORATIVE RESEARCH (CCR) GROUP

Abstract Introduction Cigars are a popular tobacco product of choice for youth and young adults. Despite growing interest in cigar research, there are gaps in the available literature limiting an ability to set evidence-based policies. Too small research samples, the heterogeneity of types of cigars when asking a single question about use, makes analyzing data difficult. Given the Food and Drug Administration’s (FDA) authority granted in 2016 to regulate cigars, and its popularity, data to better understand use and preference for cigars will help FDA set appropriate regulatory policies. Methods We harmonized cigar survey data previously collected by five independent tobacco regulatory science survey research projects. Data supplying participants included 3 TCORS, 1 CTP grantee, and data from PATH’s public use data set. Results Analyzing 92 data variables from across five studies, and applying a rigorous data harmonization protocol, we report findings on 24 key cigar use variables. The step by step protocol for harmonizing is presented. Selected findings show strict reproducibility across all 5 studies reveal youth 17-19 years at highest risk for cigar initiation; relative reproducibility shows males more likely to try cigars than females, but with significant differences in magnitude across studies; and areas of inconsistent reproducibility are revealed when evaluating brand preferences. Conclusion Harmonizing data from multiple sources fosters a broader view of the robustness and generalizability of survey data than that from a single source. These observations raise awareness to look for the highest degree of reproducibility among and across data sources to inform policy. Implications Harmonizing data from discrete data sets provides insights to cigar initiation and use, and is presented showing opportunities, challenges, and solutions. Comparing observational data from PATH and four independent research studies, provides a best-practices approach and example of data synthesis for the tobacco research community. The data set of 5 studies offers a look at the degree of confidence in analyzing harmonized survey results. Variable conclusions raise the need to strive for the highest degree of reproducibility, to best understand the behaviors of cigar users, and allow for future development of the most effective interventions to alter tobacco use patterns.

Download Full-text

Disparities Across Time: Exploring Absenteeism Patterns between Cohorts of Students with Disabilities

Teachers College Record ◽

10.1177/016146812012201114 ◽

2020 ◽

Vol 122 (11) ◽

pp. 1-32

Author(s):

Michael A. Gottfried ◽

Vi-Nhuan Le ◽

J. Jacob Kirksey

Keyword(s):

Students With Disabilities ◽

Social Needs ◽

Data Sets ◽

Chronic Absenteeism ◽

Data Set ◽

Full Day Kindergarten ◽

Effective Interventions ◽

Nationally Representative ◽

Single Data ◽

Over Time

Background It is of grave concern that kindergartners are missing more school than students in any other year of elementary school; therefore, documenting which students are absent and for how long is of upmost importance. Yet, doing so for students with disabilities (SWDs) has received little attention. This study addresses this gap by examining two cohorts of SWDs, separated by more than a decade, to document changes in attendance patterns. Research Questions First, for SWDs, has the number of school days missed or chronic absenteeism rates changed over time? Second, how are changes in the number of school days missed and chronic absenteeism rates related to changes in academic emphasis, presence of teacher aides, SWD-specific teacher training, and preschool participation? Subjects This study uses data from the Early Childhood Longitudinal Study (ECLS), a nationally representative data set of children in kindergarten. We rely on both ECLS data sets— the kindergarten classes of 1998–1999 and 2010–2011. Measures were identical in both data sets, making it feasible to compare children across the two cohorts. Given identical measures, we combined the data sets into a single data set with an indicator for being in the older cohort. Research Design This study examined two sets of outcomes: The first was number of days absent, and the second was likelihood of being chronically absent. These outcomes were regressed on a measure for being in the older cohort (our key measure for changes over time) and numerous control variables. The error term was clustered by classroom. Findings We found that SWDs are absent more often now than they were a decade earlier, and this growth in absenteeism was larger than what students without disabilities experienced. Absenteeism among SWDs was higher for those enrolled in full-day kindergarten, although having attended center-based care mitigates this disparity over time. Implications are discussed. Conclusions Our study calls for additional attention and supports to combat the increasing rates of absenteeism for SWDs over time. Understanding contextual shifts and trends in rates of absenteeism for SWDs in kindergarten is pertinent to crafting effective interventions and research geared toward supporting the academic and social needs of these students.

Download Full-text

Doppler radar rainfall prediction and gauge data

BMC Research Notes ◽

10.1186/s13104-020-05311-y ◽

2020 ◽

Vol 13 (1) ◽

Author(s):

Jesse W. Lansford ◽

Tyson H. Walsh ◽

T. V. Hromadka ◽

P. Rao

Keyword(s):

Doppler Radar ◽

Weather Forecasting ◽

Rain Gauge ◽

Data Sets ◽

Multiple Sources ◽

Data Set ◽

Radar Rainfall ◽

Rainfall Prediction ◽

Doppler Data ◽

Precipitation Estimates

Abstract Objective The data herein represents multiple gauge sets and multiple radar sites of like-type Doppler data sets combined to produce populations of ordered pairs. Publications spanning decades yet specific to Doppler radar sites contain graphs of data pairs of Doppler radar precipitation estimates versus rain gauge precipitation readings. Data description Taken from multiple sources, the data set represents several radar sites and rain gauge sites combined for 8830 data points. The data is relevant in various applications of hydrometeorology and engineering as well as weather forecasting. Further, the importance of accuracy in radar and precipitation estimates continues to increase, necessitating the incorporation of as much data as possible.

Download Full-text

Working with Missing Data: Imputation of Nonresponse Items in Categorical Survey Data with a Non-Monotone Missing Pattern

Journal of Applied Mathematics ◽

10.1155/2014/368791 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Machelle D. Wilson ◽

Kerstin Lueck

Keyword(s):

Missing Data ◽

Survey Data ◽

Asian American ◽

Categorical Variables ◽

Data Sets ◽

Data Set ◽

Missing Data Imputation ◽

Combined Test ◽

Socioeconomic Success ◽

Socioeconomic Data

The imputation of missing data is often a crucial step in the analysis of survey data. This study reviews typical problems with missing data and discusses a method for the imputation of missing survey data with a large number of categorical variables which do not have a monotone missing pattern. We develop a method for constructing a monotone missing pattern that allows for imputation of categorical data in data sets with a large number of variables using a model-based MCMC approach. We report the results of imputing the missing data from a case study, using educational, sociopsychological, and socioeconomic data from the National Latino and Asian American Study (NLAAS). We report the results of multiply imputed data on a substantive logistic regression analysis predicting socioeconomic success from several educational, sociopsychological, and familial variables. We compare the results of conducting inference using a single imputed data set to those using a combined test over several imputations. Findings indicate that, for all variables in the model, all of the single tests were consistent with the combined test.

Download Full-text

Tailoring data source distributions for fairness-aware data integration

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476299 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2519-2532

Author(s):

Fatemeh Nargesian ◽

Abolfazl Asudeh ◽

H. V. Jagadish

Keyword(s):

Optimal Solution ◽

Cost Effective ◽

Data Sources ◽

Data Sets ◽

Multiple Sources ◽

Data Set ◽

Demographic Groups ◽

Reward Function ◽

Effective Manner ◽

Data Source

Data scientists often develop data sets for analysis by drawing upon sources of data available to them. A major challenge is to ensure that the data set used for analysis has an appropriate representation of relevant (demographic) groups: it meets desired distribution requirements. Whether data is collected through some experiment or obtained from some data provider, the data from any single source may not meet the desired distribution requirements. Therefore, a union of data from multiple sources is often required. In this paper, we study how to acquire such data in the most cost effective manner, for typical cost functions observed in practice. We present an optimal solution for binary groups when the underlying distributions of data sources are known and all data sources have equal costs. For the generic case with unequal costs, we design an approximation algorithm that performs well in practice. When the underlying distributions are unknown, we develop an exploration-exploitation based strategy with a reward function that captures the cost and approximations of group distributions in each data source. Besides theoretical analysis, we conduct comprehensive experiments that confirm the effectiveness of our algorithms.

Download Full-text

An Investigation of the Convergence of Average Peak Accelerations for High-Speed Planing Craft

10.5957/smc-2021-063 ◽

2021 ◽

Author(s):

Michael R. Riley ◽

Heidi P. Murphy ◽

Brock W. Aron

Keyword(s):

High Speed ◽

Cumulative Distribution ◽

Data Sets ◽

Peak Acceleration ◽

Multiple Sources ◽

Data Set ◽

Distribution Shape ◽

Acceleration Data ◽

Rough Water ◽

The Stability

This paper summarizes the results of an investigation of the convergence of average peak accelerations as more and more peaks are recorded during rough-water trials of small high-speed craft. Existing guidance from multiple sources suggest that more peaks is better, but how much more, and what engineering rationale should substantiate the answer? To address the question, simplified equations and numerous examples of peak acceleration data sets are presented. The results demonstrate that convergence of the average of the highest 10 percent of peaks (A1/10), and the average of the highest 1 percent of peaks (A1/100), and the ratio means that the shape of the cumulative distribution of the data set becomes more stable as the number of peak acceleration data points increases. A simple percent difference criterion is presented for quantifying the stability of the cumulative distribution shape.

Download Full-text

Using Integrative Data Analysis to Investigate School Climate Across Multiple Informants

Educational and Psychological Measurement ◽

10.1177/0013164419885999 ◽

2019 ◽

Vol 80 (4) ◽

pp. 617-637 ◽

Cited By ~ 3

Author(s):

Kathleen V. McGrath ◽

Elizabeth A. Leighton ◽

Mihaela Ene ◽

Christine DiStefano ◽

Diane M. Monrad

Keyword(s):

Data Analysis ◽

School Climate ◽

Complex Model ◽

Multiple Informants ◽

Data Sets ◽

Multiple Perspectives ◽

Multiple Sources ◽

Data Set ◽

Integrative Data Analysis ◽

Practical Applications

Survey research frequently involves the collection of data from multiple informants. Results, however, are usually analyzed by informant group, potentially ignoring important relationships across groups. When the same construct(s) are measured, integrative data analysis (IDA) allows pooling of data from multiple sources into one data set to examine information from multiple perspectives within the same analysis. Here, the IDA procedure is demonstrated via the examination of pooled data from student and teacher school climate surveys. This study contributes to the sparse literature regarding IDA applications in the social sciences, specifically in education. It also lays the groundwork for future educational researchers interested in the practical applications of the IDA framework to empirical data sets with complex model structures.

Download Full-text

Estimating Pay Gaps for Workers With Disabilities: Implications From Broadening Definitions and Data Sets

Rehabilitation Research Policy and Education ◽

10.1891/2168-6653.28.4.264 ◽

2014 ◽

Vol 28 (4) ◽

pp. 264-290 ◽

Cited By ~ 5

Author(s):

Kevin F. Hallock ◽

Xin Jin ◽

Linda Barrington

Keyword(s):

Survey Data ◽

Data Sets ◽

Full Time ◽

Data Set ◽

Pay Gap ◽

Persons With Disabilities ◽

Total Compensation ◽

Individual Level ◽

Compensation Gap ◽

Definition Of

Purpose:To compare pay gap estimates across 3 different national survey data sets for people with disabilities relative to those without disabilities when pay is measured as wage and salary alone versus a (total compensation) definition that includes an estimate of the value of benefits.Method:Estimates of the cost to the employers of employee benefits at the occupational level from an employer survey data set are matched to individual-level data in each of the 3 data sets. Multiple regression techniques are applied to estimate wage and salary and total compensation gaps between full-time men with and without disabilities.Results:For full-time working men with disabilities (relative to those without disabilities), there is a consistently larger percentage wage and salary gap than percentage total compensation gap and breadth of the definition of pay affects the size of any estimated pay gap. In addition, there are differences in the estimated pay gaps depending on data source and disability measure.Conclusions:Results obtained from a single data set or definition of key variables may not be broadly generalizable. Studies containing such limitations should be interpreted cautiously. Our research further suggests employers looking to hire persons with disabilities or those offering employment placement services should put substantial weight on the non–base pay component of the total compensation package.

Download Full-text

Global Entrepreneurship Monitor [GEM] Adult Population Survey Data Sets: 1998-2003: Codebook and Data Set Description

SSRN Electronic Journal ◽

10.2139/ssrn.1022325 ◽

2007 ◽

Author(s):

Paul D. Reynolds ◽

Diana Hechavarria

Keyword(s):

Survey Data ◽

Population Survey ◽

Adult Population ◽

Data Sets ◽

Global Entrepreneurship Monitor ◽

Data Set ◽

Global Entrepreneurship

Download Full-text

Comparing and Integrating Fish Surveys in the San Francisco Estuary: Why Diverse Long-Term Monitoring Programs are Important

San Francisco Estuary and Watershed Science ◽

10.15447/sfews.2020v18iss2art4 ◽

2020 ◽

Vol 18 (2) ◽

Author(s):

Dylan Stompe ◽

Peter Moyle ◽

Avery Kruger ◽

John Durand

Keyword(s):

Survey Data ◽

San Francisco ◽

Monitoring Program ◽

San Francisco Estuary ◽

Data Sets ◽

Data Set ◽

Monitoring Programs ◽

Long Term Monitoring ◽

Term Monitoring

Many fishes in the San Francisco Estuary have suffered declines in recent decades, as shown by numerous long-term monitoring programs. A long-term monitoring program, such as the Interagency Ecological Program, comprises a suite of surveys, each conducted by a state or federal agency or academic institution. These types of programs have produced rich data sets that are useful for tracking species trends over time. Problems arise from drawing conclusions based on one or few surveys because each survey samples a different subset of species or reflects different spatial or temporal trends in abundance. The challenges in using data sets from these surveys for comparative purposes stem from methodological differences, magnitude of data, incompatible data formats, and end-user preference for familiar surveys. To improve the utility of these data sets and encourage multi-survey analyses, we quantitatively rate these surveys based on their ability to represent species trends, present a methodology for integrating long-term data sets, and provide examples that highlight the importance of expanded analyses. We identify areas and species that are under-sampled, and compare fish salvage data from large water export facilities with survey data. Our analysis indicates that while surveys are redundant for some species, no two surveys are completely duplicative. Differing trends become evident when considering individual and aggregate survey data, because they imply spatial, seasonal, or gear-dependent catch. Our quantitative ratings and integrated data set allow for improved and better-informed comparisons of species trends across surveys, while highlighting the importance of the current array of sampling methodologies.

Download Full-text

Characterizing Tobacco and Marijuana Use Among Youth Experiencing Homelessness in a Midwestern City

10.21203/rs.3.rs-824818/v1 ◽

2021 ◽

Author(s):

Allison M. Glasser ◽

Alice Hinton ◽

Amy Wermert ◽

Joseph Macisco ◽

Julianna Nemeth

Keyword(s):

Tobacco Use ◽

Marijuana Use ◽

Tobacco Product ◽

Housing Situation ◽

Use Patterns ◽

The Past ◽

Product Use ◽

Youth And Young Adults ◽

Selection Of ◽

Midwestern City

Abstract BackgroundCigarette smoking is three times more prevalent among youth experiencing homelessness compared with the general population. Co-use of tobacco and marijuana is also common. The aim of this study is to characterize tobacco and marijuana use among homeless young people in a Midwestern city.MethodsThis study included 96 youth and young adults (52% male, 39% female, 5% transgender/non-binary) attending a homeless drop-in center who had used at least one combustible tobacco product in the past week. We assessed past-month use of tobacco products and marijuana and other product use characteristics (e.g., frequency, brand and flavor).ResultsMost youth experiencing homelessness with past-week combustible tobacco use had used cigarettes (88.5%), cigars (92.7%), and marijuana (85.4%) in the past month. One-third used electronic vapor products, 19.8% smoked hookah, and 11.5% used smokeless tobacco. Most marijuana users co-administered with tobacco (69.8%). Daily combustible tobacco smoking was associated with having a child and smoking out of boredom/habit. Daily marijuana use was associated with using substances to cope with one’s housing situation. Newport and Black & Mild were the most popular brands of cigarettes and cigars. Most non-combustible tobacco users reported not having a usual brand. Cigar smokers reported the most varied selection of flavors.ConclusionsYoung combustible tobacco users experiencing homelessness engage in high-risk use patterns, including poly-tobacco use, co-use of tobacco with marijuana, and frequent combustible product use. Interventions that consider the full context of tobacco and marijuana use are needed to support cessation in this population.

Download Full-text