scholarly journals Object-Attribute Biclustering for Elimination of Missing Genotypes in Ischemic Stroke Genome-Wide Data

Author(s):  
Dmitry I. Ignatov ◽  
Gennady V. Khvorykh ◽  
Andrey V. Khrunin ◽  
Stefan Nikolić ◽  
Makhmud Shaban ◽  
...  
2020 ◽  
Author(s):  
Dmitry I. Ignatov ◽  
Gennady V. Khvorykh ◽  
Andrey V. Khrunin ◽  
Stefan Nikolić ◽  
Makhmud Shaban ◽  
...  

AbstractMissing genotypes can affect the efficacy of machine learning approaches to identify the risk genetic variants of common diseases and traits. The problem occurs when genotypic data are collected from different experiments with different DNA microarrays, each being characterised by its pattern of uncalled (missing) genotypes. This can prevent the machine learning classifier from assigning the classes correctly. To tackle this issue, we used well-developed notions of object-attribute biclusters and formal concepts that correspond to dense subrelations in the binary relation patients × SNPs. The paper contains experimental results on applying a biclustering algorithm to a large real-world dataset collected for studying the genetic bases of ischemic stroke. The algorithm could identify large dense biclusters in the genotypic matrix for further processing, which in return significantly improved the quality of machine learning classifiers. The proposed algorithm was also able to generate biclusters for the whole dataset without size constraints in comparison to the In-Close4 algorithm for generation of formal concepts.


Stroke ◽  
2013 ◽  
Vol 44 (10) ◽  
pp. 2694-2702 ◽  
Author(s):  
James F. Meschia ◽  
Donna K. Arnett ◽  
Hakan Ay ◽  
Robert D. Brown ◽  
Oscar R. Benavente ◽  
...  

Background and Purpose— Meta-analyses of extant genome-wide data illustrate the need to focus on subtypes of ischemic stroke for gene discovery. The National Institute of Neurological Disorders and Stroke SiGN (Stroke Genetics Network) contributes substantially to meta-analyses that focus on specific subtypes of stroke. Methods— The National Institute of Neurological Disorders and Stroke SiGN includes ischemic stroke cases from 24 genetic research centers: 13 from the United States and 11 from Europe. Investigators harmonize ischemic stroke phenotyping using the Web-based causative classification of stroke system, with data entered by trained and certified adjudicators at participating genetic research centers. Through the Center for Inherited Diseases Research, the Network plans to genotype 10 296 carefully phenotyped stroke cases using genome-wide single nucleotide polymorphism arrays and adds to these another 4253 previously genotyped cases, for a total of 14 549 cases. To maximize power for subtype analyses, the study allocates genotyping resources almost exclusively to cases. Publicly available studies provide most of the control genotypes. Center for Inherited Diseases Research–generated genotypes and corresponding phenotypes will be shared with the scientific community through the US National Center for Biotechnology Information database of Genotypes and Phenotypes, and brain MRI studies will be centrally archived. Conclusions— The Stroke Genetics Network, with its emphasis on careful and standardized phenotyping of ischemic stroke and stroke subtypes, provides an unprecedented opportunity to uncover genetic determinants of ischemic stroke.


2021 ◽  
Vol 7 (3) ◽  
pp. eabd9036
Author(s):  
Sara Saez-Atienzar ◽  
Sara Bandres-Ciga ◽  
Rebekah G. Langston ◽  
Jonggeol J. Kim ◽  
Shing Wan Choi ◽  
...  

Despite the considerable progress in unraveling the genetic causes of amyotrophic lateral sclerosis (ALS), we do not fully understand the molecular mechanisms underlying the disease. We analyzed genome-wide data involving 78,500 individuals using a polygenic risk score approach to identify the biological pathways and cell types involved in ALS. This data-driven approach identified multiple aspects of the biology underlying the disease that resolved into broader themes, namely, neuron projection morphogenesis, membrane trafficking, and signal transduction mediated by ribonucleotides. We also found that genomic risk in ALS maps consistently to GABAergic interneurons and oligodendrocytes, as confirmed in human single-nucleus RNA-seq data. Using two-sample Mendelian randomization, we nominated six differentially expressed genes (ATG16L2, ACSL5, MAP1LC3A, MAPKAPK3, PLXNB2, and SCFD1) within the significant pathways as relevant to ALS. We conclude that the disparate genetic etiologies of this fatal neurological disease converge on a smaller number of final common pathways and cell types.


2021 ◽  
Vol 7 (13) ◽  
pp. eabe4414
Author(s):  
Guido Alberto Gnecchi-Ruscone ◽  
Elmira Khussainova ◽  
Nurzhibek Kahbatkyzy ◽  
Lyazzat Musralina ◽  
Maria A. Spyrou ◽  
...  

The Scythians were a multitude of horse-warrior nomad cultures dwelling in the Eurasian steppe during the first millennium BCE. Because of the lack of first-hand written records, little is known about the origins and relations among the different cultures. To address these questions, we produced genome-wide data for 111 ancient individuals retrieved from 39 archaeological sites from the first millennia BCE and CE across the Central Asian Steppe. We uncovered major admixture events in the Late Bronze Age forming the genetic substratum for two main Iron Age gene-pools emerging around the Altai and the Urals respectively. Their demise was mirrored by new genetic turnovers, linked to the spread of the eastern nomad empires in the first centuries CE. Compared to the high genetic heterogeneity of the past, the homogenization of the present-day Kazakhs gene pool is notable, likely a result of 400 years of strict exogamous social rules.


GigaScience ◽  
2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Taras K Oleksyk ◽  
Walter W Wolfsberger ◽  
Alexandra M Weber ◽  
Khrystyna Shchubelka ◽  
Olga T Oleksyk ◽  
...  

Abstract Background The main goal of this collaborative effort is to provide genome-wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for public data release. BGISEQ-500 sequence data and genotypes by an Illumina GWAS chip were cross-validated on multiple samples and additionally referenced to 1 sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage. Results The genome data have been searched for genomic variation represented in this population, and a number of variants have been reported: large structural variants, indels, copy number variations, single-nucletide polymorphisms, and microsatellites. To our knowledge, this study provides the largest to-date survey of genetic variation in Ukraine, creating a public reference resource aiming to provide data for medical research in a large understudied population. Conclusions Our results indicate that the genetic diversity of the Ukrainian population is uniquely shaped by evolutionary and demographic forces and cannot be ignored in future genetic and biomedical studies. These data will contribute a wealth of new information bringing forth a wealth of novel, endemic and medically related alleles.


Nature ◽  
2021 ◽  
Vol 592 (7853) ◽  
pp. 253-257 ◽  
Author(s):  
Mateja Hajdinjak ◽  
Fabrizio Mafessoni ◽  
Laurits Skov ◽  
Benjamin Vernot ◽  
Alexander Hübner ◽  
...  

AbstractModern humans appeared in Europe by at least 45,000 years ago1–5, but the extent of their interactions with Neanderthals, who disappeared by about 40,000 years ago6, and their relationship to the broader expansion of modern humans outside Africa are poorly understood. Here we present genome-wide data from three individuals dated to between 45,930 and 42,580 years ago from Bacho Kiro Cave, Bulgaria1,2. They are the earliest Late Pleistocene modern humans known to have been recovered in Europe so far, and were found in association with an Initial Upper Palaeolithic artefact assemblage. Unlike two previously studied individuals of similar ages from Romania7 and Siberia8 who did not contribute detectably to later populations, these individuals are more closely related to present-day and ancient populations in East Asia and the Americas than to later west Eurasian populations. This indicates that they belonged to a modern human migration into Europe that was not previously known from the genetic record, and provides evidence that there was at least some continuity between the earliest modern humans in Europe and later people in Eurasia. Moreover, we find that all three individuals had Neanderthal ancestors a few generations back in their family history, confirming that the first European modern humans mixed with Neanderthals and suggesting that such mixing could have been common.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Pierpaolo Maisano Delser ◽  
Eppie R. Jones ◽  
Anahit Hovhannisyan ◽  
Lara Cassidy ◽  
Ron Pinhasi ◽  
...  

AbstractOver the last few years, genome-wide data for a large number of ancient human samples have been collected. Whilst datasets of captured SNPs have been collated, high coverage shotgun genomes (which are relatively few but allow certain types of analyses not possible with ascertained captured SNPs) have to be reprocessed by individual groups from raw reads. This task is computationally intensive. Here, we release a dataset including 35 whole-genome sequenced samples, previously published and distributed worldwide, together with the genetic pipeline used to process them. The dataset contains 72,041,355 sites called across 19 ancient and 16 modern individuals and includes sequence data from four previously published ancient samples which we sequenced to higher coverage (10–18x). Such a resource will allow researchers to analyse their new samples with the same genetic pipeline and directly compare them to the reference dataset without re-processing published samples. Moreover, this dataset can be easily expanded to increase the sample distribution both across time and space.


Stroke ◽  
2016 ◽  
Vol 47 (suppl_1) ◽  
Author(s):  
Tai-Ming Ko ◽  
Tsong-Hai Lee Lee ◽  
Chien-Hsiun Chen ◽  
Yuan-Tsong Chen ◽  
Jer-Yuarn Wu

Introduction: Although family history studies in ischemic stroke support that genetic factors may be involved in the pathogenesis of two major subtypes of ischemia stroke: large-artery atherosclerosis (LAA) and small-vessel occlusion (SVO), it is still unclear which particular genetic factors contribute to LAA or SVO. Hypothesis: Because the etiology of ischemic stroke is heterogeneous, we hypothesize that genetic factors may vary by etiologic subtypes or ethnicities. Thus, we aim to identify genetic factors that contribute to LAA or SVO based on two independent Han Chinese populations. Methods: Novel genetic variants that predispose individuals to LAA and SVO were identified by genome-wide association study comprising of 824 individuals (including 444 LAA cases and 380 SVO cases) and 1,727 controls in a Han Chinese population residing in Taiwan. The LAA study was replicated in an independent Han Chinese population comprising of an additional 319 LAA cases and 1,802 controls. Results: In LAA cases, from two independent populations, we identified five single-nucleotide polymorphisms (SNPs), including SNP-1 (P = 3.10 х 10–8), SNP-2 (P = 4.00 х 10–9), SNP-3 (P = 3.57 х 10–8), SNP-4 (P = 1.76 х 10–8), and SNP-5 (P = 2.92 х 10–8), at one novel locus on chromosome 14q13.3 within PTCSC3 (encoding papillary thyroid carcinoma susceptibility candidate 3). In SVO cases, from the discovery stage, we identified two novel candidate susceptibility loci on chromosome 3p25.3 (SNP-6, P = 3.24 х 10–5) and chromosome 14 q31.1 (SNP-7, P = 2.58 х 10–4). Conclusions: For LAA, the newly identified SNPs within PTCSC3 gene were found to have genome-wide statistical significance (P < 5 х 10–8) and were shown to be located in a risk locus correlated with papillary thyroid carcinoma. Moreover, the genetic association between PTCSC3 gene and SVO was not identified, which suggested that PTCSC3 is a specific susceptibility locus for LAA. For SVO, we identified two novel candidate genetic loci which were valuable for replication by an independent population with SVO. In conclusion, our findings provide insights into the genetic basis of LAA and SVO, which may be applicable in the study of the pathogenesis of ischemic stroke and in the development of alternative therapeutic interventions.


Stroke ◽  
2021 ◽  
Vol 52 (Suppl_1) ◽  
Author(s):  
Stacy C Brown ◽  
Cameron Both ◽  
Julian N Acosta ◽  
Natalia Szejko ◽  
Victor Torres ◽  
...  

Background: Several genetic susceptibility risk loci for ischemic stroke have been identified. However, the relative dearth of genetic data from populations of non-European ancestry has the potential to create disparities in access to genomics-based precision medicine strategies. Individuals of Native Hawaiian ancestry represent a particularly understudied group in stroke genomics research despite facing high rates of cerebrovascular disease. Hypothesis: Genetic variants associated with stroke differ between Native Hawaiians and previously studied groups of predominantly European ancestry. Methods: We conducted a genome-wide (GW) association study of stroke and myocardial infarction (MI) in an adult population of Native Hawaiian ancestry, using data from the Multiethnic Cohort study (MEC). Genetic information was ascertained via genome-wide array genotyping using the AB OpenArray and TaqMan platforms followed by imputation to 1000 Genomes reference panels. We pursued replication of variants that were GW significant (p<5x10 -8 ) or yielded suggestive associations (p<5x10 -7 ) in the prior stroke GW association study MEGASTROKE. Results: We identified 2,104 individuals (1,089 [51.8%] female) of Native Hawaiian ancestry, including 173 cases and 1,931 controls. We identified one novel susceptibility risk locus at a narrow intronic region located at chromosome q26.2 (top associated SNP 3:169096251, OR 2.48, 95%CI 1.81-3.41; p=1.93x10 -8 ), overlying the MECOM gene. We also identified 9 other suggestive risk loci at p<5x10 -7 . When replicating in MEGASTROKE, q26.2 did not have available counterpart variants to analyze, and 3 out of 9 suggestive signals were associated with ischemic stroke subtypes at p<0.05. Conclusions: We report the first GW association study of ischemic stroke and myocardial infarction in a Native Hawaiian population. We identified one susceptibility risk locus at q26.2, located in a narrow intronic region of MECOM, a gene that codes for a histone-lysine N-methyltransferase that has transcriptional regulation and oncoprotein functions. The lack of available replication data for this locus in the large MEGASTROKE collaboration emphasizes the importance of developing genomic resources across ancestral groups.


Stroke ◽  
2013 ◽  
Vol 44 (suppl_1) ◽  
Author(s):  
May M Luke ◽  
Carmen H Tong ◽  
Joseph J Catanese ◽  
James J Devlin ◽  
Christine Mannhalter ◽  
...  

Introduction The International Stroke Genetics Consortium (ISGC) and the Wellcome Trust Case Control Consortium 2 (WTCCC2) performed a large genome wide association study of ischemic stroke and its subtypes (large vessel stroke (LVD), small vessel stroke (SVD), cardioembolic stroke (CE)), and identified a polymorphism in HDAC9 (rs11984041) associated with the LVD subtype of ischemic stroke. Hypothesis We assessed the hypothesis that rs11984041 is associated with LVD in two additional studies. Methods The genotype of rs11984041 was determined for participants of the Vienna Study (815 controls, 122 LVD, 165 SVD, 202 CE) and of the German Study (1040 controls, 495 LVD, 230 SVD, 462 CE). The association of rs11984041 with LVD was assessed by logistic regression. Heterogeneity of the effect of rs11984041 on LVD, CE or SVD was assessed by testing the equality of the corresponding regression coefficients from a multinomial logistic regression model. Results Carriers of the minor (T) allele of rs11984041 (23.3% of LVD cases and 17.4% of controls), compared with noncarriers, had increased risk for LVD: the odds ratios (OR) were 1.92 (95%CI 1.25-2.96) for the Vienna Study and 1.33 (95%CI 1.02-1.74) for the German Study. Adjusting for covariates including sex, age, diabetes, and hypertension did not materially change the ORs. Heterogeneity of the effects of rs11984041 on LVD vs CE was significant in the Vienna Study (p = 0.009) and in the German Study (p = 0.005). Heterogeneity of the effects of rs11984041 on LVD vs SVD trended toward significance in the Vienna Study (p = 0.088) and was significant in the German Study (p = 0.047). Adjusting for covariates did not materially change the heterogeneity test p values. Conclusions The HDAC9 polymorphism rs11984041 was associated with the LVD stroke subtype in the Vienna Study and the German Study. These results replicated the ISGC/WTCCC2 findings.


Sign in / Sign up

Export Citation Format

Share Document