scholarly journals Machine learning optimized polygenic scores for blood cell traits identify sex-specific trajectories and genetic correlations with disease

Cell Genomics ◽  
2022 ◽  
Vol 2 (1) ◽  
pp. 100086
Author(s):  
Yu Xu ◽  
Dragana Vuckovic ◽  
Scott C. Ritchie ◽  
Parsa Akbari ◽  
Tao Jiang ◽  
...  
2020 ◽  
Author(s):  
Yu Xu ◽  
Dragana Vuckovic ◽  
Scott C Ritchie ◽  
Parsa Akbari ◽  
Tao Jiang ◽  
...  

AbstractPolygenic scores (PGSs) for blood cell traits can be constructed using summary statistics from genome-wide association studies. As the selection of variants and the modelling of their interactions in PGSs may be limited by univariate analysis, therefore, such a conventional method may yield sub-optional performance. This study evaluated the relative effectiveness of four machine learning and deep learning methods, as well as a univariate method, in the construction of PGSs for 26 blood cell traits, using data from UK Biobank (n=~400,000) and INTERVAL (n=~40,000). Our results showed that learning methods can improve PGSs construction for nearly every blood cell trait considered, with this superiority explained by the ability of machine learning methods to capture interactions among variants. This study also demonstrated that populations can be well stratified by the PGSs of these blood cell traits, even for traits that exhibit large differences between ages and sexes, suggesting potential for disease prevention. As our study found genetic correlations between the PGSs for blood cell traits and PGSs for several common human diseases (recapitulating well-known associations between the blood cell traits themselves and certain diseases), it suggests that blood cell traits may be indicators or/and mediators for a variety of common disorders via shared genetic variants and functional pathways.


Sensors ◽  
2016 ◽  
Vol 16 (11) ◽  
pp. 1836 ◽  
Author(s):  
Xiwei Huang ◽  
Yu Jiang ◽  
Xu Liu ◽  
Hang Xu ◽  
Zhi Han ◽  
...  

2021 ◽  
pp. 247553032110007
Author(s):  
Eric Munger ◽  
Amit K. Dey ◽  
Justin Rodante ◽  
Martin P. Playford ◽  
Alexander V. Sorokin ◽  
...  

Background: Psoriasis is associated with accelerated non-calcified coronary plaque burden (NCB) by coronary computed tomography angiography (CCTA). Machine learning (ML) algorithms have been shown to effectively identify cardiometabolic variables with NCB in cross-sectional analysis. Objective: To use ML methods to characterize important predictors of change in NCB by CCTA in psoriasis over 1-year of observation. Methods: The analysis included 182 consecutive patients with 80 available variables from the Psoriasis Atherosclerosis Cardiometabolic Initiative, a prospective, observational cohort study at baseline and 1-year using the random forest regression algorithm. NCB was assessed at baseline and 1-year from CCTA. Results: Using ML, we identified variables of high importance in the context of predicting changes in NCB. For the cohort that worsened NCB (n = 102), top baseline variables were cholesterol (total and HDL), white blood cell count, psoriasis area severity index score, and diastolic blood pressure. Top predictors of 1-year change were change in visceral adiposity, white blood cell count, total cholesterol, c-reactive protein, and absolute lymphocyte count. For the cohort that improved NCB (n = 80), the top baseline variables were HDL cholesterol related including apolipoprotein A1, basophil count, and psoriasis area severity index score, and top predictors of 1-year change were change in apoA, apoB, and systolic blood pressure. Conclusion: ML methods ranked predictors of progression and regression of NCB in psoriasis over 1 year providing strong evidence to focus on treating LDL, blood pressure, and obesity; as well as the importance of controlling cutaneous disease in psoriasis.


2018 ◽  
Vol 14 (6) ◽  
pp. e1006278 ◽  
Author(s):  
Alexander Kihm ◽  
Lars Kaestner ◽  
Christian Wagner ◽  
Stephan Quint

Author(s):  
Sai Pavan Kamma ◽  
Guru Sai Sharma Chilukuri ◽  
Guru Sree Ram Tholeti ◽  
Rudra Kalyan Nayak ◽  
Tapaswi Maradani

2020 ◽  
Vol 46 (1) ◽  
pp. 553-581 ◽  
Author(s):  
Melinda C. Mills ◽  
Felix C. Tropf

Recent years have seen the birth of sociogenomics via the infusion of molecular genetic data. We chronicle the history of genetics, focusing particularly on post-2005 genome-wide association studies, the post-2015 big data era, and the emergence of polygenic scores. We argue that understanding polygenic scores, including their genetic correlations with each other, causation, and underlying biological architecture, is vital. We show how genetics can be introduced to understand a myriad of topics such as fertility, educational attainment, intergenerational social mobility, well-being, addiction, risky behavior, and longevity. Although models of gene-environment interaction and correlation mirror agency and structure models in sociology, genetics is yet to be fully discovered by this discipline. We conclude with a critical reflection on the lack of diversity, nonrepresentative samples, precision policy applications, ethics, and genetic determinism. We argue that sociogenomics can speak to long-standing sociological questions and that sociologists can offer innovative theoretical, measurement, and methodological innovations to genetic research.


Blood ◽  
2020 ◽  
Vol 136 (Supplement 1) ◽  
pp. 37-37
Author(s):  
Simon Mantha ◽  
Andrew Dunbar ◽  
Kelly L. Bolton ◽  
Sean Devlin ◽  
Dmitriy Gorenshteyn ◽  
...  

Background: Several clinical prediction scores have been designed to assess the risk of cancer-associated thrombosis (CAT). The most commonly used in current clinical practice is the Khorana score, however it is applicable only to patients prior to initiation of chemotherapy. We now apply machine learning with clinical, demographic, and genomics parameters to predict CAT events. Methods: The random survival forest (RSF) ensemble learning method was selected to illustrate a machine approach to CAT prediction. The cohort consisted of 14,223 individuals with a solid tumor malignancy and MSK IMPACT somatic genomic data collected during the years 2014 to 2016. CAT was defined as the diagnosis of lower extremity deep vein thrombosis (proximal or distal) or pulmonary embolism, incidental or symptomatic. Covariates considered for inclusion in the model consisted of tumor type, metastatic status, age, exposure to cytotoxic chemotherapy in the month before cohort entry, time elapsed since cancer diagnosis, time elapsed since tumor sampling, normalized mean blood cell counts (white cell count, hemoglobin, platelet count) in the prior 3 months, normalized mean prothrombin time (PT) and activated partial thromboplastin time (aPTT) in the prior 3 months, body mass index (BMI), and presence or absence of a somatic genetic alteration for oncogenes/tumor suppressor genes with an alteration frequency ≥ 1.5% (n = 56). The primary endpoint consisted of time to CAT episode. The C-index for models including different covariates was derived from the test holdout sample using repeated 10-fold cross-validation. The C-index, measuring the relative agreement between the RSF predicted risk and the CAT times of patients, has values between 0.5 and 1.0 with the latter indicating perfect agreement. Results: 12,040 patients were included in the final analysis. There were 855 CAT events during the observation period. The most common tumor types were lung (17%), breast (15%) and colorectal cancer (9%). Blood cell count data and coagulation parameters were missing for 8% and 51% of patients respectively. Using cross-validation, the baseline model with cancer type and metastatic status had a C-index of 0.62 (95% CI = 0.61-0.64), which increased to 0.65 (95% CI = 0.63-0.66) with the addition of chemotherapy, age, time from tissue sampling, time from cancer diagnosis and BMI. Further adding genetic data increased the C-index to 0.68 (95% CI = 0.66-0.69). Replacing genetic data in this model with cell counts and coagulation parameters resulted in a C-index of 0.69 (95% CI = 0.68-0.70). The model with all available covariates had a C-index of 0.70 (95% CI = 0.69-0.71). The cumulative incidence of CAT at 6 months for 5 categories of predicted risk using the model with all available covariates is plotted in Figure A. Scaled Ishwaran-Kogalur Variable Importance (VIMP) values, presented in Figure B, indicate that cancer type and prior chemotherapy are the two top factors for model performance. Conclusions: Machine learning is a promising approach in the search of more accurate and generalizable models for prediction of CAT. In the application described here, the use of random survival forests performed well without information about future chemotherapy administration. Additional work is needed to identify the optimal algorithm and covariates, including better delineation of which cancer genomic information should be retained. Future models will have to be validated independently before being used for patient care. Disclosures Mantha: Physicians Education Resource: Honoraria; MJH Associates: Honoraria. Bolton:GRAIL: Research Funding. Soff:Bristol-Myers Squibb, Pfizer: Honoraria; Dova Pharmaceuticals: Honoraria; Janssen Scientific Affairs: Honoraria; Amgen: Research Funding; Janssen Scientific Affairs: Research Funding; Amgen: Honoraria; Dova Pharmaceuticals: Research Funding.


2016 ◽  
Vol 19 (5) ◽  
pp. 407-417 ◽  
Author(s):  
Alexander Weiss ◽  
Bart M. L. Baselmans ◽  
Edith Hofer ◽  
Jingyun Yang ◽  
Aysu Okbay ◽  
...  

Approximately half of the variation in wellbeing measures overlaps with variation in personality traits. Studies of non-human primate pedigrees and human twins suggest that this is due to common genetic influences. We tested whether personality polygenic scores for the NEO Five-Factor Inventory (NEO-FFI) domains and for item response theory (IRT) derived extraversion and neuroticism scores predict variance in wellbeing measures. Polygenic scores were based on published genome-wide association (GWA) results in over 17,000 individuals for the NEO-FFI and in over 63,000 for the IRT extraversion and neuroticism traits. The NEO-FFI polygenic scores were used to predict life satisfaction in 7 cohorts, positive affect in 12 cohorts, and general wellbeing in 1 cohort (maximalN= 46,508). Meta-analysis of these results showed no significant association between NEO-FFI personality polygenic scores and the wellbeing measures. IRT extraversion and neuroticism polygenic scores were used to predict life satisfaction and positive affect in almost 37,000 individuals from UK Biobank. Significant positive associations (effect sizes <0.05%) were observed between the extraversion polygenic score and wellbeing measures, and a negative association was observed between the polygenic neuroticism score and life satisfaction. Furthermore, using GWA data, genetic correlations of -0.49 and -0.55 were estimated between neuroticism with life satisfaction and positive affect, respectively. The moderate genetic correlation between neuroticism and wellbeing is in line with twin research showing that genetic influences on wellbeing are also shared with other independent personality domains.


2021 ◽  
Author(s):  
Taylor R Thomas ◽  
Tanner Koomar ◽  
Lucas Casten ◽  
Ashton Tener ◽  
Ethan Bahl ◽  
...  

The complexity of autism's phenotypic spectra is well-known, yet most genetic research uses case-control status as the target trait. It is unclear whether clinical autism instruments such as the Social Communication Questionnaire (SCQ), Repetitive Behaviors Scale-Revised (RBS-R), and Developmental Coordination Disorder Questionnaire (DCDQ) are more genetically informative than case-control. We employed the SPARK autism cohort (N = 6,449) to illuminate the genetic etiology of these twelve subscales. In comparison to the heritability of autism case-control at 0.12, the RBS-R subscales were increased, ranging from 0.18 to 0.30 (all p < 0.05). Heritability of the DCDQ subscales ranged from 0.07 to 0.09 and the SCQ subscales from 0 to 0.09 (all p > 0.05). We also found evidence for genetic correlations among the RBS-R, SCQ, and DCDQ. GWAS followed by projection of polygenic scores (PGS) into ABCD revealed significant associations with CBCL social and thought problems, while the autism case-control PGS did not significantly associate. In phenotypic correlation analyses, the autism case-control PGS did not predict the subscales in SPARK, and sex-stratified correlations showed no effect in males and a surprising negative effect in females. Notably, other PGS did predict the subscales, with the strongest being educational attainment negatively correlated, while ADHD and major depression were positively correlated. Overall, our analyses suggest that clinical subscales are more genetically powerful than case-control, and that of the three instruments investigated, the RBS-R shows the greatest evidence of common genetic signal in both autistic and general population samples.


2021 ◽  
Author(s):  
Peter B Barr ◽  
Travis T Mallard ◽  
Sandra Sanchez-Roige ◽  
Holly E Poore ◽  
Richard Karlsson Linner ◽  
...  

Importance: Characterizing whether genetic variants for psychiatric outcomes operate via specific versus general pathways provides more informative measures of genetic risk, and, potentially, allows us to design more targeted prevention and interventions. Objective: Employ multivariate methods to tease apart variants associated with problematic alcohol use through either general or specific pathways and compare results to standard univariate genetic analysis of problematic alcohol use. Design: We compared results from a univariate genome wide association study (GWAS) of problematic alcohol use to those from a previous multivariate GWAS of externalizing phenotypes. We identified genetic variants associated with problematic alcohol use through a broad liability to externalizing, and those that remain after removing shared variance with externalizing. We compared these results across SNP overlap, bioannotations, genetic correlations, and polygenic scores. Setting: We included GWAS summary statistics from existing GWAS, and two US based hold out samples: The National Longitudinal Study of Adolescent to Adult Health (Add Health) and the Collaborative Study on the Genetics of Alcoholism (COGA). Participants: Publicly available GWAS of externalizing behaviors and participants in Add Health (N=5,107) and COGA (N=7,483), limited to individuals of European ancestries. Exposure(s): N/A Main Outcome(s) and Measure(s): Outcomes included problematic alcohol use (ALCP-O), shared risk for externalizing (EXT), and problematic alcohol use-specific risk (ALCP-S) for the GWASs; a preregistered list of 99 available phenotypes for genetic correlations; and substance use, substance use disorder criteria, and alcohol misuse in the polygenic score analyses. Results: The analysis differentiated SNPs operating through common versus specific risk pathways. While ALCP-O was associated with multiple phenotypes, ALCP-S was predominantly associated with alcohol use and other forms of psychopathology. Polygenic scores for ALCP-O were associated with a variety of other forms of substance use and substance use disorders, polygenic scores for ALCP-S were only associated with alcohol phenotypes. Polygenic scores for both ALCP-S and EXT show differential patterns of associations with alcohol misuse across development. Conclusions and Relevance: Focusing on the differential impacts of shared and specific risk can better characterize pathways of risk for alcohol use disorders. Multivariate methods can be a useful tool for studying many psychiatric conditions. Parsing risk pathways will become increasingly relevant as genetic information is incorporated into clinical practice for psychiatric outcomes.


Sign in / Sign up

Export Citation Format

Share Document