scholarly journals Revealing multi-scale population structure in large cohorts

2018 ◽  
Author(s):  
Alex Diaz-Papkovich ◽  
Luke Anderson-Trocmé ◽  
Simon Gravel

AbstractGenetic structure in large cohorts results from technical, sampling and demographic variation. Visualisation is therefore a first step in most genomic analyses. However, existing data exploration methods struggle with unbalanced sampling and the many scales of population structure. We investigate an approach to dimension reduction of genomic data that combines principal components analysis (PCA) with uniform manifold approximation and projection (UMAP) to succinctly illustrate population structure in large cohorts and capture their relationships on local and global scales. Using data from large-scale genomic datasets, we demonstrate that PCA-UMAP effectively clusters closely related individuals while placing them in a global continuum of genetic variation. This approach reveals previously overlooked subpopulations within the American Hispanic population and fine-scale relationships between geography, genotypes, and phenotypes in the UK population. This opens new lines of investigation for demographic research and statistical genetics. Given its small computational cost, PCA-UMAP also provides a general-purpose approach to exploratory analysis in population-scale datasets.Author summaryBecause of geographic isolation, individuals tend to be more genetically related to people living nearby than to people living far. This is an example of population structure, a situation where a large population contains subgroups that share more than the average amount of DNA. This structure can tell us about human history, and it can also have a large effect on medical studies. We use a newly developed method (UMAP) to visualize population structure from three genomic datasets. Using genotype data alone, we reveal numerous subgroups related to ancestry and correlated with traits such as white blood cell count, height, and FEV1, a measure used to detect airway obstruction. We demonstrate that UMAP reveals previously unobserved patterns and fine-scale structure. We show that visualizations work especially well in large datasets containing populations with diverse backgrounds, which are rapidly becoming more common, and that unlike other visualization methods, we can preserve intuitive connections between populations that reflect their shared ancestries. The combination of these results and the effectiveness of the strategy on large and diverse datasets make this an important approach for exploratory analysis for geneticists studying ancestral events and phenotype distributions.

2019 ◽  
Vol 6 (Supplement_2) ◽  
pp. S54-S54
Author(s):  
Ron Dagan ◽  
Shalom Ben-Shimol ◽  
Rachel Benisty ◽  
Gili Regev-Yochay ◽  
Merav Ron ◽  
...  

Abstract Background IPD caused by Sp2 (non-PCV13 serotype) is relatively rare. However, Sp2 has a high potential for causing IPD including meningitis. Large-scale outbreaks of Sp2 IPD are rare and were not reported post-PCV implementation. We describe Sp2 IPD outbreak in Israel, in the PCV13 era, caused by a novel clone. Additionally, we analyzed the population structure and evolutionary dynamics of Sp2 during 2006–2018. Methods An ongoing, population-based, nationwide active surveillance, conducted since July 2009. PCV7/PCV13 were implemented in Israel in July 2009 and November 2010, respectively. All isolates were tested for antimicrobial susceptibility, PFGE, MLST and whole-genome sequencing (WGS). Results. Overall, 173 Sp2 IPD cases were identified; all isolates were analyzed by MLST (Figure 1). During 2016–2017, Sp2 caused 7.6% of all-IPD, a 7-fold increase compared with 2006–2015, and ranked second (after serotype 12F causing 12%) among IPD isolates. During 2006–2015, 98% (40/41) Sp2 IPD were caused by the previously reported global ST-1504 clone. The outbreak was caused by a novel clone ST-13578, not previously reported (Figure 2). WGS analysis confirmed that ST-13578 was related, but genetically distinct from ST-1504, observed exclusively before the outbreak. A single strain of clone ST-74 previously globally reported was identified in 2017–2018. An additional case was identified in an adult in the UK, following a family visit from Israel. The ST-13578 clone was identified only in the Jewish population and was mainly distributed in 3 of the 7 Israeli districts. All tested strains were penicillin-susceptible (MIC < 0.06 μg/mL). Conclusion To the best of our knowledge, this is the first widespread Sp2 outbreak since PCV13 introduction worldwide, caused by a novel clone ST-13578. The outbreak is still ongoing, although a declining trend was noted since 2017. Disclosures All Authors: No reported Disclosures.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Julie A. Fitzpatrick ◽  
Nicolas Basty ◽  
Madeleine Cule ◽  
Yi Liu ◽  
Jimmy D. Bell ◽  
...  

AbstractPsoas muscle measurements are frequently used as markers of sarcopenia and predictors of health. Manually measured cross-sectional areas are most commonly used, but there is a lack of consistency regarding the position of the measurement and manual annotations are not practical for large population studies. We have developed a fully automated method to measure iliopsoas muscle volume (comprised of the psoas and iliacus muscles) using a convolutional neural network. Magnetic resonance images were obtained from the UK Biobank for 5000 participants, balanced for age, gender and BMI. Ninety manual annotations were available for model training and validation. The model showed excellent performance against out-of-sample data (average dice score coefficient of 0.9046 ± 0.0058 for six-fold cross-validation). Iliopsoas muscle volumes were successfully measured in all 5000 participants. Iliopsoas volume was greater in male compared with female subjects. There was a small but significant asymmetry between left and right iliopsoas muscle volumes. We also found that iliopsoas volume was significantly related to height, BMI and age, and that there was an acceleration in muscle volume decrease in men with age. Our method provides a robust technique for measuring iliopsoas muscle volume that can be applied to large cohorts.


2019 ◽  
Vol 28 (3) ◽  
pp. 218-231 ◽  
Author(s):  
James T Walker ◽  
Ammon Salter ◽  
Rita Fontinha ◽  
Rossella Salandra

Abstract The marked increase in the use of metrics, such as journal lists, to assess research has had a profound effect on academics’ working lives. While some view the diffusion of rankings as beneficial, others consider their diffusion as a malicious development, which further acerbates a tendency towards managerialism in academia, and undermine the integrity and diversity of academic research. Using data from a large-scale survey and a re-grading of journals in a ranking used by Business and Management UK scholars—the Academic Journal Guide—as a pseudo-experiment, we examine what determines negative and positive perceptions of rankings. We find that the individuals who published in outlets that were upgraded were less hostile to the ranking than those who did not benefit from these changes, and that individuals were also less hostile to the ranking if outlets in their field had benefited from re-grading in the new list. We also find that the individuals who published in outlets that were upgraded were more positive to the ranking than those who did not benefit from these changes, and that individuals were also more positive to the ranking if outlets in their field had benefited from re-grading in the new list.


2020 ◽  
Vol 29 (16) ◽  
pp. 2803-2811
Author(s):  
James P Cook ◽  
Anubha Mahajan ◽  
Andrew P Morris

Abstract The UK Biobank is a prospective study of more than 500 000 participants, which has aggregated data from questionnaires, physical measures, biomarkers, imaging and follow-up for a wide range of health-related outcomes, together with genome-wide genotyping supplemented with high-density imputation. Previous studies have highlighted fine-scale population structure in the UK on a North-West to South-East cline, but the impact of unmeasured geographical confounding on genome-wide association studies (GWAS) of complex human traits in the UK Biobank has not been investigated. We considered 368 325 white British individuals from the UK Biobank and performed GWAS of their birth location. We demonstrate that widely used approaches to adjust for population structure, including principal component analysis and mixed modelling with a random effect for a genetic relationship matrix, cannot fully account for the fine-scale geographical confounding in the UK Biobank. We observe significant genetic correlation of birth location with a range of lifestyle-related traits, including body-mass index and fat mass, hypertension and lung function, even after adjustment for population structure. Variants driving associations with birth location are also strongly associated with many of these lifestyle-related traits after correction for population structure, indicating that there could be environmental factors that are confounded with geography that have not been adequately accounted for. Our findings highlight the need for caution in the interpretation of lifestyle-related trait GWAS in UK Biobank, particularly in loci demonstrating strong residual association with birth location.


2021 ◽  
Author(s):  
Thomas G. Brooks ◽  
Nicholas F. Lahens ◽  
Gregory R. Grant ◽  
Yvette I. Sheline ◽  
Garret A. FitzGerald ◽  
...  

AbstractWrist-worn accelerometer actigraphy devices present the opportunity for large-scale data collection from people during their daily lives. Using data from approximately 100,000 participants in the UK Biobank, actigraphy-derived measures of physical activity, sleep, and diurnal rhythms were associated in exploration and validation cohorts with a full phenome-wide set of diagnoses, biomarkers and metadata. Rhythmicity was captured by two independent models based on accelerometer and skin temperature harnessing behavioral (diurnal) and molecular (circadian) components. We found that robust rhythms significantly with biomarkers, survival, and phenotypes including diabetes, hypertension, mood disorders, and chronic airway obstruction; these associations were comparable to those with physical activity and sleep. Surprisingly, associations were mostly consistent between the sexes, while modulation by age was significant. More importantly, rhythms were found to be powerful predictors of future diseases: a two standard deviation difference in wrist temperature rhythms corresponded to increases in rate of diagnosis of 61% in diabetes, 38% in chronic airway obstruction, 27% in anxiety disorders, and 22% in hypertension. Our PheWAS of actigraphy data in the UK Biobank establishes that rhythmicity is fundamental to modeling disease trajectories, as are physical activity and sleep. Integration of long-term remote biosensing into patient care could thus afford an individualized approach to risk management.


2020 ◽  
Author(s):  
Rounak Dey ◽  
Wei Zhou ◽  
Tuomo Kiiskinen ◽  
Aki Havulinna ◽  
Amanda Elliott ◽  
...  

AbstractWith decades of electronic health records linked to genetic data, large biobanks provide unprecedented opportunities for systematically understanding the genetics of the natural history of complex diseases. Genome-wide survival association analysis can identify genetic variants associated with ages of onset, disease progression and lifespan. We developed an efficient and accurate frailty (random effects) model approach for genome-wide survival association analysis of censored time-to-event (TTE) phenotypes in large biobanks by accounting for both population structure and relatedness. Our method utilizes state-of-the-art optimization strategies to reduce the computational cost. The saddlepoint approximation is used to allow for analysis of heavily censored phenotypes (>90%) and low frequency variants (down to minor allele count 20). We demonstrated the performance of our method through extensive simulation studies and analysis of five TTE phenotypes, including lifespan, with heavy censoring rates (90.9% to 99.8%) on ~400,000 UK Biobank participants with white British ancestry and ~180,000 samples in FinnGen, respectively. We further performed genome-wide association analysis for 871 TTE phenotypes in UK Biobank and presented the genome-wide scale phenome-wide association (PheWAS) results with the PheWeb browser.


2019 ◽  
Author(s):  
Aman Agrawal ◽  
Alec M. Chiu ◽  
Minh Le ◽  
Eran Halperin ◽  
Sriram Sankararaman

AbstractPrincipal component analysis (PCA) is a key tool for understanding population structure and controlling for population stratification in genome-wide association studies (GWAS). With the advent of large-scale datasets of genetic variation, there is a need for methods that can compute principal components (PCs) with scalable computational and memory requirements. We present ProPCA, a highly scalable method based on a probabilistic generative model, which computes the top PCs on genetic variation data efficiently. We applied ProPCA to compute the top five PCs on genotype data from the UK Biobank, consisting of 488,363 individuals and 146,671 SNPs, in less than thirty minutes. Leveraging the population structure inferred by ProPCA within the White British individuals in the UK Biobank, we scanned for SNPs that are not well-explained by the PCs to identify several novel genome-wide signals of recent putative selection including missense mutations in RPGRIP1L and TLR4.Author SummaryPrincipal component analysis is a commonly used technique for understanding population structure and genetic variation. With the advent of large-scale datasets that contain the genetic information of hundreds of thousands of individuals, there is a need for methods that can compute principal components (PCs) with scalable computational and memory requirements. In this study, we present ProPCA, a highly scalable statistical method to compute genetic PCs efficiently. We systematically evaluate the accuracy and robustness of our method on large-scale simulated data and apply it to the UK Biobank. Leveraging the population structure inferred by ProPCA within the White British individuals in the UK Biobank, we identify several novel signals of putative recent selection.


2020 ◽  
Author(s):  
Lana Ruck ◽  
P. Thomas Schoenemann

AbstractOpen data initiatives such as the UK Biobank and Human Connectome Project provide researchers with access to neuroimaging, genetic, and other data for large samples of left-and right-handed participants, allowing for more robust investigations of handedness than ever before. Handedness inventories are universal tools for assessing participant handedness in these large-scale neuroimaging contexts. These self-report measures are typically used to screen and recruit subjects, but they are also widely used as variables in statistical analyses of fMRI and other data. Recent investigations into the validity of handedness inventories, however, suggest that self-report data from these inventories might not reflect hand preference/performance as faithfully as previously thought. Using data from the Human Connectome Project, we assessed correspondence between three handedness measures – the Edinburgh Handedness Inventory (EHI), the Rolyan 9-hole pegboard, and grip strength – in 1179 healthy subjects. We show poor association between the different handedness measures, with roughly 10% of the sample having at least one behavioral measure which indicates hand-performance bias opposite to the EHI score, and over 65% of left-handers having one or more mismatched handedness scores. We discuss implications for future work, urging researchers to critically consider direction, degree, and consistency of handedness in their data.


PeerJ ◽  
2015 ◽  
Vol 3 ◽  
pp. e1458 ◽  
Author(s):  
Mohd Z.H. Haniza ◽  
Sally Adams ◽  
Eleanor P. Jones ◽  
Alan MacNicoll ◽  
Eamonn B. Mallon ◽  
...  

The brown rat (Rattus norvegicus) is a relatively recent (<300 years) addition to the British fauna, but by association with negative impacts on public health, animal health and agriculture, it is regarded as one of the most important vertebrate pest species. Anticoagulant rodenticides were introduced for brown rat control in the 1950s and are widely used for rat control in the UK, but long-standing resistance has been linked to control failures in some regions. One thus far ignored aspect of resistance biology is the population structure of the brown rat. This paper investigates the role population structure has on the development of anticoagulant resistance. Using mitochondrial and microsatellite DNA, we examined 186 individuals (from 15 counties in England and one location in Wales near the Wales–England border) to investigate the population structure of rural brown rat populations. We also examined individual rats for variations of theVKORC1gene previously associated with resistance to anticoagulant rodenticides. We show that the populations were structured to some degree, but that this was only apparent in the microsatellite data and not the mtDNA data. We discuss various reasons why this is the case. We show that the population as a whole appears not to be at equilibrium. The relative lack of diversity in the mtDNA sequences examined can be explained by founder effects and a subsequent spatial expansion of a species introduced to the UK relatively recently. We found there was a geographical distribution of resistance mutations, and relatively low rate of gene flow between populations, which has implications for the development and management of anticoagulant resistance.


2021 ◽  
Author(s):  
Karina-Doris Vihta ◽  
Koen B. Pouwels ◽  
Tim Peto ◽  
Emma Pritchard ◽  
David W. Eyre ◽  
...  

Background: Several community-based studies have assessed the ability of different symptoms to identify COVID-19 infections, but few have compared symptoms over time (reflecting SARS-CoV-2 variants) and by vaccination status. Methods: Using data and samples collected by the COVID-19 Infection Survey at regular visits to representative households across the UK, we compared symptoms in new PCR-positives and comparator test-negative controls. Results: From 26/4/2020-7/8/2021, 27,869 SARS-CoV-2 PCR-positive episodes occurred in 27,692 participants (median 42 years (IQR 22-58)); 13,427 (48%) self-reported symptoms ("symptomatic positive episodes"). The comparator group comprised 3,806,692 test-negative visits (457,215 participants); 130,612 (3%) self-reported symptoms ("symptomatic negative visits"). Reporting of any symptoms in positive episodes varied over calendar time, reflecting changes in prevalence of variants, incidental changes (e.g. seasonal pathogens, schools re-opening) and vaccination roll-out. There was a small increase in sore throat reporting in symptomatic positive episodes and negative visits from April-2021. After May-2021 when Delta emerged there were substantial increases in headache and fever in positives, but not in negatives. Although specific symptom reporting in symptomatic positive episodes vs. negative visits varied by age, sex, and ethnicity, only small improvements in symptom-based infection detection were obtained; e.g. adding fatigue/weakness or all eight symptoms to the classic four symptoms (cough, fever, loss of taste/smell) increased sensitivity from 74% to 81% to 90% but tests per positive from 4.6 to 5.3 to 8.7. Conclusions: Whilst SARS-CoV-2-associated symptoms vary by variant, vaccination status and demographics, differences are modest and do not warrant large-scale changes to targeted testing approaches given resource implications.


Sign in / Sign up

Export Citation Format

Share Document