scholarly journals Know your big data: De-biasing subsamples of large datasets for personality research using importance sampling and kNN matching

2019 ◽  
Author(s):  
Kai Fricke ◽  
Philipp Herzberg

Large behavioral datasets collect ecologically valid data on user behavior, but only seldom provide more data on the users. To gain further insights on the participants, small subsamples can be collected, assessing variables of interest, such as personality characteristics. However, such samples are often biased through self-selection of participation. Removal of this bias is often hard because no demographic data is present and only distributions of (multiple) continuous target variables in the population are known from the large dataset. In two studies, we examined de-biasing methods with respect to arbitrary continuous variables. In the first study, we examined an artificial dataset with full information on the population and found that sequential importance sampling and single-nearest-neighbour matching proved successful in removing bias from subsamples. In the second study, we took this method to practice and examined the relationship of personality characteristics and music preferences in a sample of Spotify users with respect to the musical preference distributions of one million users of the Million Song Dataset. Our results show that sequential importance sampling is a promising way to remove bias from samples when the distribution of relevant variables in the population is known.

Blood ◽  
2020 ◽  
Vol 136 (Supplement 1) ◽  
pp. 16-17
Author(s):  
Miguel Gonzalez Velez ◽  
Carolyn Mead-Harvey ◽  
Heidi E. Kosiorek ◽  
Yael Kusne ◽  
Leyla Bojanini ◽  
...  

Introduction: Serum folate (SF), vitamin B12 (B12), and iron deficiency (def) are common causes of nutritional anemias (NA). These deficiencies are usually multifactorial, with nutritional and non-nutritional causes playing a role. SF, B12, and iron levels are usually ordered in the setting of anemia, and malnutrition with or without neurologic symptoms. Clinical evidence suggests that these def have a strong dietary component and socioeconomic status (SES). The relationship of NA and area-based SES in the US has not been studied. We aimed to determine the relationship of SES with the prevalence of NA. Methods: We performed a cross-sectional analysis of adult patients with SF, B12 and iron levels at Mayo Clinic Arizona and Florida between 2010 and 2018. Race was classified using the NIH criteria. Normal laboratory values were determined according to our lab reference and the US NHANES III. SF levels (mcg/Lt) were defined as deficient <4, normal ≥4.0, and excess ≥20. B12 levels (ng/L) as deficient <150, borderline 150-400, normal >400-900, and excess ≥900. Iron def was determined by ferritin levels (mcg/L) as low <24, normal 24-336, elevated >336 for men, low <11, normal 11-307, elevated >307 for women. Area-Level SES indicators: Median Household income (MHI), unemployment rate (UR), median gross rent month (MGRM), % uninsured, median house value (MHV), % high school; were geocoded by zip code using the 2014 American Community Survey. Demographics and clinical variables were compared between groups by chi-square test for frequency data or Kruskal Wallis rank-sum test for continuous variables. Results: 202,046 samples from 128,084 patients were analyzed. In the sample-level analysis, there were statistically significant associations between SES and SF def; all SES indicators except UR for B12 def; and no differences for iron def, except % uninsured (Table 1). There was no statistically significant interaction between race and SES for SF def and iron def. Race was a statistically significant modifier between B12 def and MHI (p<0.001), % uninsured (p=0.002), and MHV (p=0.007). Asian and Other race had an increase in odds of B12 def with increasing MHI (Asian OR=1.11 , Other OR=1.18); white race had a decrease in odds of B12 def with increasing MHI (OR=0.95 for a $10,000 increase in MHI). Conclusions: We show significant relationships between SES and NA in the US. Differences were observed between SF def and all the SES indicators without race interactions. There were significant interactions between B12 def, race and SES for pts of White, Asian and Other race. There were no differences between SES and race for iron def. These relationships confirm that NA are related to area-level SES and other social determinants of health. Research regarding the causes of these disparities on a population level are needed. Disclosures No relevant conflicts of interest to declare.


2019 ◽  
Vol 6 (Supplement_2) ◽  
pp. S794-S794
Author(s):  
Angela Branche ◽  
Lisa Saiman ◽  
Edward E Walsh ◽  
Ann R Falsey ◽  
William Sieling ◽  
...  

Abstract Background Respiratory syncytial virus (RSV) infection is a common cause of acute respiratory infection (ARI) in adults. Prospective surveillance enables collection of representative data on demographic and clinical characteristics. Few data of this kind are available for adults hospitalized with RSV infection. We used active population-based surveillance to identify patients with laboratory-confirmed RSV infection and evaluated demographic characteristics and clinical outcomes. Methods Hospitalized adults ≥ 18 years old residing in a predefined catchment area with ≥ 2 ARI symptoms or exacerbation of underlying cardiopulmonary disease were screened for eligibility during October 2017–April 2018 and October 2018–April 2019 in 3 hospitals in Rochester, NY and New York City. Respiratory specimens were tested for RSV using PCR assays. Clinical and demographic data were abstracted from the medical record. Multivariate analysis was used to evaluate the relationship of patient characteristics with clinical outcomes. Results 8,217 hospitalized adults were screened and 9.4% positive for RSV infection. Preliminary clinical and demographic data were available for 348 patients including 14% 18–49 years, 28% 50–64 years and 58% > 65 years. Mean age was 68 years and 60% were female (Figure 1). Patients had a mean of 3 co-morbidities, with diabetes (40%), chronic obstructive pulmonary disease (30%), chronic kidney disease (28%), congestive heart failure (28%), coronary artery disease (25%) and asthma (24%) the most common co-morbidities (Figure 2). Median hospital length of stay was 6 days (IQR 4–10), 13% of patients were admitted to the ICU, 5% were mechanically ventilated and 5% died during admission and 12% within 6 months. In multivariate analysis having > 3 comorbidities, cardiac disease or a lower baseline functional status measured by activities of daily living scores was significantly associated with 6-month mortality. Conclusion The majority of hospitalized patients with RSV infection were older adults with ≥ 3 chronic comorbid conditions. Baseline functional status may be predictive of worse clinical outcomes in patients with RSV infection. These insights into patient characteristics and clinical outcomes will provide information for prevention programs. Disclosures All authors: No reported disclosures.


Neurology ◽  
2019 ◽  
Vol 93 (24) ◽  
pp. e2224-e2236 ◽  
Author(s):  
Richard B. Lipton ◽  
Kristina M. Fanning ◽  
Dawn C. Buse ◽  
Vincent T. Martin ◽  
Lee B. Hohaia ◽  
...  

ObjectiveTo test the hypothesis that statistically defined subgroups of migraine (based on constellations of comorbidities and concomitant conditions; henceforth comorbidities), previously identified using Chronic Migraine Epidemiology and Outcomes (CaMEO) Study data, differ in prognosis, as measured by rates of progression from episodic migraine (EM) to chronic migraine (CM).MethodsThe onset of CM was assessed up to 4 times over 12 months in individuals with EM and ≥1 comorbidity at baseline, based on constellations of comorbidities (comorbidity classes). The “fewest comorbidities” class served as reference. Individuals completing ≥1 follow-up survey from the web-based CaMEO Study were included. Covariates included sociodemographic variables and headache characteristics. Sex, income, cutaneous allodynia, and medication overuse were modeled as binary variables; age, body mass index, headache-related disability (Migraine Disability Assessment [MIDAS]), and Migraine Symptom Severity Scale as continuous variables. CM onset was assessed using discrete time analysis.ResultsIn the final sociodemographic model, all comorbidity classes had significantly elevated hazard ratios (HRs) for risk of progression to CM from EM, relative to fewest comorbidities. HRs for CM onset ranged from 5.34 (95% confidence interval [CI] 3.89–7.33; p ≤ 0.001) for most comorbidities to 1.53 (95% CI 1.17–2.01; p < 0.05) for the respiratory class. After adjusting for headache covariates independently, each comorbidity class significantly predicted CM onset, although HRs were attenuated.ConclusionsSubgroups of migraine identified by comorbidity classes at cross-section predicted progression from EM (with ≥1 comorbidity at baseline) to CM. The relationship of comorbidity group to CM onset remained after adjusting for indicators of migraine severity, such as MIDAS.Clinicaltrials.gov identifierNCT01648530.


Sign in / Sign up

Export Citation Format

Share Document