whole genome sequencing data Latest Research Papers

High-Throughput Sequencing Haplotype Analysis Indicates in LRRK2 Gene a Potential Risk Factor for Endemic Parkinsonism in Southeastern Moravia, Czech Republic

Life ◽

10.3390/life12010121 ◽

2022 ◽

Vol 12 (1) ◽

pp. 121

Author(s):

Kristyna Kolarikova ◽

Radek Vodicka ◽

Radek Vrtel ◽

Julia Stellmachova ◽

Martin Prochazka ◽

...

Keyword(s):

Risk Factor ◽

High Throughput ◽

Potential Risk ◽

High Throughput Sequencing ◽

Genome Project ◽

Potential Risk Factor ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Lrrk2 Gene ◽

Project Data

Parkinson’s disease and parkinsonism are relatively common neurodegenerative disorders. This study aimed to assess potential genetic risk factors of haplotypes in genes associated with parkinsonism in a population in which endemic parkinsonism and atypical parkinsonism have recently been found. The genes ADH1C, EIF4G1, FBXO7, GBA, GIGYF2, HTRA2, LRRK2, MAPT, PARK2, PARK7, PINK1 PLA2G6, SNCA, UCHL1, and VPS35 were analyzed in 62 patients (P) and 69 age-matched controls from the researched area (C1). Variants were acquired by high-throughput sequencing using Ion Torrent workflow. As another set of controls, the whole genome sequencing data from 100 healthy non-related individuals from the Czech population were used (C2); the results were also compared with the Genome Project data (C3). We observed shared findings of four intron (rs11564187, rs36220738, rs200829235, and rs3789329) and one exon variant (rs33995883) in the LRRK2 gene in six patients. A comparison of the C1–C3 groups revealed significant differences in haplotype frequencies between ratio of 2.09 for C1, 1.65 for C2, and 6.3 for C3, and odds ratios of 13.15 for C1, 2.58 for C2, and 7.6 for C3 were estimated. The co-occurrence of five variants in the LRRK2 gene (very probably in haplotype) could be an important potential risk factor for the development of parkinsonism, even outside the recently described pedigrees in the researched area where endemic parkinsonism is present.

EagleImp: Fast and Accurate Genome-wide Phasing and Imputation in a Single Tool

10.1101/2022.01.11.475810 ◽

2022 ◽

Author(s):

Lars Wienbrandt ◽

David Ellinghaus

Keyword(s):

Memory Management ◽

Imputation Accuracy ◽

Simulated Data ◽

Genotype Imputation ◽

Whole Genome Sequencing Data ◽

Common Variants ◽

Sequencing Data ◽

1000 Genomes ◽

Genome Wide ◽

Reference Genomes

Background: Reference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used. Methods: We developed EagleImp, a software with algorithmic and technical improvements and new features for accurate and accelerated phasing and imputation in a single tool. Results: We compared accuracy and runtime of EagleImp with Eagle2, PBWT and prominent imputation servers using whole-genome sequencing data from the 1000 Genomes Project, the Haplotype Reference Consortium and simulated data with more than 1 million reference genomes. EagleImp is 2 to 10 times faster (depending on the single or multiprocessor configuration selected) than Eagle2/PBWT, with the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical GWAS studies, EagleImp provides same or higher imputation accuracy than the Sanger Imputation Service, Michigan Imputation Server and the newly developed TOPMed Imputation Server, despite larger (not publicly available) reference panels. It has many new features, including automated chromosome splitting and memory management at runtime to avoid job aborts, fast reading and writing of large files, and various user-configurable algorithm and output options. Conclusions: Due to the technical optimisations, EagleImp can perform fast and accurate reference-based phasing and imputation for future very large reference panels with more than 1 million genomes. EagleImp is freely available for download from https://github.com/ikmb/eagleimp.

Divergence and introgression among the virilis group of Drosophila

10.1101/2022.01.11.475832 ◽

2022 ◽

Author(s):

Leeban Yusuf ◽

Venera Tyukmaeva ◽

Anneli Hoikkala ◽

Michael G Ritchie

Keyword(s):

Gene Flow ◽

Related Species ◽

De Novo ◽

Phylogenetic Reconstruction ◽

Sequence Divergence ◽

Sexual Isolation ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Closely Related Species ◽

Virilis Group

Speciation with gene flow is now widely regarded as common. However, the frequency of introgression between recently diverged species and the evolutionary consequences of gene flow are still poorly understood. The virilis group of Drosophila contains around a dozen species that are geographically widespread and show varying levels of pre-zygotic and post-zygotic isolation. Here, we utilize de novo genome assemblies and whole-genome sequencing data to resolve phylogenetic relationships and describe patterns of introgression and divergence across the group. We suggest that the virilis group consists of three, rather than the traditional two, subgroups. We found evidence of pervasive phylogenetic discordance caused by ancient introgression events between distant lineages within the group, and much more recent gene flow between closely-related species. When assessing patterns of genome-wide divergence in species pairs across the group, we found no consistent genomic evidence of a disproportionate role for the X chromosome. Some genes undergoing rapid sequence divergence across the group were involved in chemical communication and may be related to the evolution of sexual isolation. We suggest that gene flow between closely-related species has potentially had an impact on lineage-specific adaptation and the evolution of reproductive barriers. Our results show how ancient and recent introgression confuse phylogenetic reconstruction, and suggest that shared variation can facilitate adaptation and speciation.

Population-level genome-wide STR discovery and validation for population structure and genetic diversity assessment of Plasmodium species

PLoS Genetics ◽

10.1371/journal.pgen.1009604 ◽

2022 ◽

Vol 18 (1) ◽

pp. e1009604

Author(s):

Jiru Han ◽

Jacob E. Munro ◽

Anthony Kocoski ◽

Alyssa E. Barry ◽

Melanie Bahlo

Keyword(s):

Genetic Diversity ◽

Tandem Repeats ◽

Plasmodium Species ◽

Whole Genome Sequence ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Genome Wide ◽

Diversity Assessment ◽

Field Samples

Short tandem repeats (STRs) are highly informative genetic markers that have been used extensively in population genetics analysis. They are an important source of genetic diversity and can also have functional impact. Despite the availability of bioinformatic methods that permit large-scale genome-wide genotyping of STRs from whole genome sequencing data, they have not previously been applied to sequencing data from large collections of malaria parasite field samples. Here, we have genotyped STRs using HipSTR in more than 3,000 Plasmodium falciparum and 174 Plasmodium vivax published whole-genome sequence data from samples collected across the globe. High levels of noise and variability in the resultant callset necessitated the development of a novel method for quality control of STR genotype calls. A set of high-quality STR loci (6,768 from P. falciparum and 3,496 from P. vivax) were used to study Plasmodium genetic diversity, population structures and genomic signatures of selection and these were compared to genome-wide single nucleotide polymorphism (SNP) genotyping data. In addition, the genome-wide information about genetic variation and other characteristics of STRs in P. falciparum and P. vivax have been available in an interactive web-based R Shiny application PlasmoSTR (https://github.com/bahlolab/PlasmoSTR).

Application of deep learning algorithm on whole genome sequencing data uncovers structural variants associated with multiple mental disorders in African American patients

Molecular Psychiatry ◽

10.1038/s41380-021-01418-1 ◽

2022 ◽

Author(s):

Yichuan Liu ◽

Hui-Qi Qu ◽

Frank D. Mentch ◽

Jingchun Qu ◽

Xiao Chang ◽

...

Keyword(s):

Deep Learning ◽

Mental Disorders ◽

Mental Disorder ◽

Genome Sequencing ◽

Learning Algorithm ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Coding Regions ◽

Deep Learning Algorithm

AbstractMental disorders present a global health concern, while the diagnosis of mental disorders can be challenging. The diagnosis is even harder for patients who have more than one type of mental disorder, especially for young toddlers who are not able to complete questionnaires or standardized rating scales for diagnosis. In the past decade, multiple genomic association signals have been reported for mental disorders, some of which present attractive drug targets. Concurrently, machine learning algorithms, especially deep learning algorithms, have been successful in the diagnosis and/or labeling of complex diseases, such as attention deficit hyperactivity disorder (ADHD) or cancer. In this study, we focused on eight common mental disorders, including ADHD, depression, anxiety, autism, intellectual disabilities, speech/language disorder, delays in developments, and oppositional defiant disorder in the ethnic minority of African Americans. Blood-derived whole genome sequencing data from 4179 individuals were generated, including 1384 patients with the diagnosis of at least one mental disorder. The burden of genomic variants in coding/non-coding regions was applied as feature vectors in the deep learning algorithm. Our model showed ~65% accuracy in differentiating patients from controls. Ability to label patients with multiple disorders was similarly successful, with a hamming loss score less than 0.3, while exact diagnostic matches are around 10%. Genes in genomic regions with the highest weights showed enrichment of biological pathways involved in immune responses, antigen/nucleic acid binding, chemokine signaling pathway, and G-protein receptor activities. A noticeable fact is that variants in non-coding regions (e.g., ncRNA, intronic, and intergenic) performed equally well as variants in coding regions; however, unlike coding region variants, variants in non-coding regions do not express genomic hotspots whereas they carry much more narrow standard deviations, indicating they probably serve as alternative markers.

Clin.iobio: A Collaborative Diagnostic Workflow to Enable Team-Based Precision Genomics

Journal of Personalized Medicine ◽

10.3390/jpm12010073 ◽

2022 ◽

Vol 12 (1) ◽

pp. 73

Author(s):

Alistair Ward ◽

Matt Velinder ◽

Tonya Di Sera ◽

Aditya Ekawade ◽

Sabrina Malone Jenkins ◽

...

Keyword(s):

Clinical Practice ◽

Genomic Analysis ◽

Knowledge Bases ◽

Diagnostic Process ◽

Whole Genome Sequencing Data ◽

Specific Gene ◽

Sequencing Data ◽

Team Members ◽

Variant Information ◽

Allele Segregation

The primary goal of precision genomics is the identification of causative genetic variants in targeted or whole-genome sequencing data. The ultimate clinical hope is that these findings lead to an efficacious change in treatment for the patient. In current clinical practice, these findings are typically returned by expert analysts as static, text-based reports. Ideally, these reports summarize the quality of the data obtained, integrate known gene–phenotype associations, follow allele segregation and affected status within the sequenced samples, and weigh computational evidence of pathogenicity. These findings are used to prioritize the variant(s) most likely to cause the given patient’s phenotypes. In most diagnostic settings, a team of experts contribute to these reports, including bioinformaticians, clinicians, and genetic counselors, among others. However, these experts often do not have the necessary tools to review genomic findings, test genetic hypotheses, or query specific gene and variant information. Additionally, team members often rely on different tools and methods based on their given expertise, resulting in further difficulties in communicating and discussing genomic findings. Here, we present clin.iobio—a web-based solution to collaborative genomic analysis that enables diagnostic team members to focus on their area of expertise within the diagnostic process, while allowing them to easily review and contribute to all steps of the diagnostic process. Clin.iobio integrates tools from the popular iobio genomic visualization suite into a comprehensive diagnostic workflow, encompassing (1) genomic data quality review, (2) dynamic phenotype-driven gene prioritization, (3) variant prioritization using a comprehensive set of knowledge bases and annotations, (4) and an exportable findings summary. In conclusion, clin.iobio is a comprehensive solution to team-based precision genomics, the findings of which stand to inform genomic considerations in clinical practice.

A Bioinformatics Pipeline for Estimating Mitochondria DNA Copy Number and Heteroplasmy Levels from Whole Genome Sequencing Data

10.1101/2021.12.28.21268452 ◽

2021 ◽

Author(s):

Stephanie L Battle ◽

Daniela Puiu ◽

Eric Boerwinkle ◽

Kent Taylor ◽

Jerome Rotter ◽

...

Keyword(s):

Mitochondrial Genome ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Dna Molecules ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Accurate Identification

Mitochondrial diseases are a heterogeneous group of disorders that can be caused by mutations in the nuclear or mitochondrial genome. Mitochondrial DNA variants may exist in a state of heteroplasmy, where a percentage of DNA molecules harbor a variant, or homoplasmy, where all DNA molecules have a variant. The relative quantity of mtDNA in a cell, or copy number (mtDNA-CN), is associated with mitochondrial function, human disease, and mortality. To facilitate accurate identification of heteroplasmy and quantify mtDNA-CN, we built a bioinformatics pipeline that takes whole genome sequencing data and outputs mitochondrial variants, and mtDNA-CN. We incorporate variant annotations to facilitate determination of variant significance. Our pipeline yields uniform coverage by remapping to a circularized chrM and recovering reads falsely mapped to nuclear-encoded mitochondrial sequences. Notably, we construct a consensus chrM sequence for each sample and recall heteroplasmy against the sample's unique mitochondrial genome. We observe an approximately 3-fold increased association with age for heteroplasmic variants in non-homopolymer regions and, are better able to capture genetic variation in the D-loop of chrM compared to existing software. Our bioinformatics pipeline more accurately captures features of mitochondrial genetics than existing pipelines that are important in understanding how mitochondrial dysfunction contributes to disease.

hapCon: Estimating contamination of ancient genomes by copying from reference haplotypes

10.1101/2021.12.20.473429 ◽

2021 ◽

Author(s):

Yilei Huang ◽

Harald Ringbauer

Keyword(s):

Estimation Methods ◽

Whole Genome Sequencing Data ◽

Genotyping Error ◽

Sequencing Data ◽

X Chromosomes ◽

New Approach ◽

Human Dna ◽

Downstream Analysis ◽

Low Coverage ◽

Rule Out

Human ancient DNA (aDNA) studies have surged in recent years, revolutionizing the study of the human past. Typically, aDNA is preserved poorly, making such data prone to contamination from other human DNA. Therefore, it is important to rule out substantial contamination before proceeding to downstream analysis. As most aDNA samples can only be sequenced to low coverages (<1x average depth), computational methods that can robustly estimate contamination in the low coverage regime are needed. However, the ultra low-coverage regime (0.1x and below) remains a challenging task for existing approaches. We present a new method to estimate contamination in aDNA for male individuals. It utilizes a Li&Stephen's haplotype copying model for haploid X chromosomes, with mismatches modelled as genotyping error or contamination. We assessed an implementation of this new approach, hapCon, on simulated and down-sampled empirical aDNA data. Our results demonstrate that hapCon outperforms a commonly used tool for estimating male X contamination (ANGSD), with substantially lower variance and narrower confidence intervals, especially in the low coverage regime. We found that hapCon provides useful contamination estimates for coverages as low as 0.1x for SNP capture data (1240k) and 0.02x for whole genome sequencing data (WGS), substantially extending the coverage limit of previous male X chromosome based contamination estimation methods.

Validation of HER2 Status in Whole Genome Sequencing Data of Breast Cancers with the Ploidy-Corrected Copy Number Approach

Molecular Diagnosis & Therapy ◽

10.1007/s40291-021-00571-1 ◽

2021 ◽

Author(s):

Marzena Wojtaszewska ◽

Rafał Stępień ◽

Alicja Woźna ◽

Maciej Piernik ◽

Pawel Sztromwasser ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Her2 Status ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Breast Cancers ◽

Sequencing Data

seGMM: a new tool to infer sex from massively parallel sequencing data

10.1101/2021.12.16.472877 ◽

2021 ◽

Author(s):

Sihan Liu ◽

Yuanyuan Zeng ◽

Meilin Chen ◽

Qian Zhang ◽

Lanchen Wang ◽

...

Keyword(s):

Sex Chromosome ◽

Massively Parallel Sequencing ◽

Control Measure ◽

Gaussian Mixture ◽

Massively Parallel ◽

Whole Genome Sequencing Data ◽

Great Promise ◽

Clinical Genetics ◽

Sequencing Data ◽

Parallel Sequencing

Inspecting concordance between self-reported sex and genotype-inferred sex from genomic data is a significant quality control measure in clinical genetic testing. Numerous tools have been developed to infer sex for genotyping array, whole-exome sequencing, and whole-genome sequencing data. However, improvements in sex inference from targeted gene sequencing panels are warranted. Here, we propose a new tool, seGMM, which applies unsupervised clustering (Gaussian Mixture Model) to determine the gender of a sample from the called genotype data integrated aligned reads. seGMM consistently demonstrated >99% sex inference accuracy in publicly available (1000 Genomes) and our in-house panel dataset, which achieved obviously better sex classification than existing popular tools. Compared to including features only in the X chromosome, our results show that adding additional features from Y chromosomes (e.g. reads mapped to the Y chromosome) can increase sex classification accuracy. Notably, for WES and WGS data, seGMM also has an extremely high degree of accuracy. Finally, we proved the ability of seGMM to infer sex in single patient or trio samples by combining with reference data and pinpointing potential sex chromosome abnormality samples. In general, seGMM provides a reproducible framework to infer sex from massively parallel sequencing data and has great promise in clinical genetics.

whole genome sequencing data
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

High-Throughput Sequencing Haplotype Analysis Indicates in LRRK2 Gene a Potential Risk Factor for Endemic Parkinsonism in Southeastern Moravia, Czech Republic

EagleImp: Fast and Accurate Genome-wide Phasing and Imputation in a Single Tool

Divergence and introgression among the virilis group of Drosophila

Population-level genome-wide STR discovery and validation for population structure and genetic diversity assessment of Plasmodium species

Application of deep learning algorithm on whole genome sequencing data uncovers structural variants associated with multiple mental disorders in African American patients

Clin.iobio: A Collaborative Diagnostic Workflow to Enable Team-Based Precision Genomics

A Bioinformatics Pipeline for Estimating Mitochondria DNA Copy Number and Heteroplasmy Levels from Whole Genome Sequencing Data

hapCon: Estimating contamination of ancient genomes by copying from reference haplotypes

Validation of HER2 Status in Whole Genome Sequencing Data of Breast Cancers with the Ploidy-Corrected Copy Number Approach

seGMM: a new tool to infer sex from massively parallel sequencing data

Export Citation Format

whole genome sequencing dataRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

High-Throughput Sequencing Haplotype Analysis Indicates in LRRK2 Gene a Potential Risk Factor for Endemic Parkinsonism in Southeastern Moravia, Czech Republic

EagleImp: Fast and Accurate Genome-wide Phasing and Imputation in a Single Tool

Divergence and introgression among the virilis group of Drosophila

Population-level genome-wide STR discovery and validation for population structure and genetic diversity assessment of Plasmodium species

Application of deep learning algorithm on whole genome sequencing data uncovers structural variants associated with multiple mental disorders in African American patients

Clin.iobio: A Collaborative Diagnostic Workflow to Enable Team-Based Precision Genomics

A Bioinformatics Pipeline for Estimating Mitochondria DNA Copy Number and Heteroplasmy Levels from Whole Genome Sequencing Data

hapCon: Estimating contamination of ancient genomes by copying from reference haplotypes

Validation of HER2 Status in Whole Genome Sequencing Data of Breast Cancers with the Ploidy-Corrected Copy Number Approach

seGMM: a new tool to infer sex from massively parallel sequencing data

whole genome sequencing data
Recently Published Documents