A Flow Procedure for the Linearization of Genome Sequence Graphs

1AbstractEfforts to incorporate human genetic variation into the reference human genome have converged on the idea of a graph representation of genetic variation within a species, a genome sequence graph. A sequence graph represents a set of individual haploid reference genomes as paths in a single graph. When that set of reference genomes is sufficiently diverse, the sequence graph implicitly contains all frequent human genetic variations, including translocations, inversions, deletions, and insertions.In representing a set of genomes as a sequence graph one encounters certain challenges. One of the most important is the problem of graph linearization, essential both for efficiency of storage and access, as well as for natural graph visualization and compatibility with other tools. The goal of graph linearization is to order nodes of the graph in such a way that operations such as access, traversal and visualization are as efficient and effective as possible.A new algorithm for the linearization of sequence graphs, called the flow procedure, is proposed in this paper. Comparative experimental evaluation of the flow procedure against other algorithms shows that it outperforms its rivals in the metrics most relevant to sequence graphs.

Download Full-text

Big data and innovative bioinformatics approaches in personalized genomic medicine

BIO Web of Conferences ◽

10.1051/bioconf/20214101003 ◽

2021 ◽

Vol 41 ◽

pp. 01003

Author(s):

Joris A. Veltman

Keyword(s):

Genetic Variation ◽

Genome Sequence ◽

Developmental Disorders ◽

Genomic Medicine ◽

Population Variation ◽

Biological Information ◽

Management Approach ◽

Specific Information ◽

A Genome ◽

The Impact

The field of human genetics has been radically changed by the introduction of massive parallel sequencing, also called next generation sequencing, approaches. Instead of studying a single gene or a few genetic variants, nowadays we can study genetic variation present in all genes and even throughout the entire human genome. For the first time in history, we can really study what makes us unique and use that to explain differences in for example disease susceptibility or response to treatment. In rare disease, genetics research is essential to identify the molecular diagnosis that provides the basis for a personalized patient management approach. It allows for more precise answers about the underlying cause and family recurrence risk, but also aids in optimizing treatment plans aimed at reducing co-morbidities and providing information about potential drugs or participation in drug trials, with an increasing number focused on gene therapy. These high-throughput sequencing technologies generate enormous amounts of data in order to assemble a genome and identify all of the variation present at different levels, from single nucleotide variations to chromosomal abnormalities. In addition, a genome sequence of a person in itself is not very useful. Value is derived from annotation of all the variation, and integration of the genome sequence with information about the patient involved (clinical information, disease-specific information, family history) as well as biological information (gene as well as variant-specific information, including population variation frequency, pathogenicity predictions, gene-expression information, etc). In this presentation, I will give an overview of the impact of genomics on the diagnosis of patients with rare developmental disorders and fertility disorders. I will highlight the importance of innovative bioinformatics approaches to detect and interpret genetic variation in a clinical context. Also, I will highlight some of the challenges that individual research and diagnostics units face in dealing with the data generated, discuss some of the ethical/privacy issues related to these approaches and discuss some of the latest genomics technologies being developed and validated.

Download Full-text

Decision letter: A genome-to-genome analysis of associations between human genetic variation, HIV-1 sequence diversity, and viral control

10.7554/elife.01123.008 ◽

2013 ◽

Keyword(s):

Genetic Variation ◽

Genome Analysis ◽

Sequence Diversity ◽

Human Genetic Variation ◽

Viral Control ◽

A Genome ◽

Hiv 1

Download Full-text

Author response: A genome-to-genome analysis of associations between human genetic variation, HIV-1 sequence diversity, and viral control

10.7554/elife.01123.009 ◽

2013 ◽

Author(s):

István Bartha ◽

Jonathan M Carlson ◽

Chanson J Brumme ◽

Paul J McLaren ◽

Zabrina L Brumme ◽

...

Keyword(s):

Genetic Variation ◽

Genome Analysis ◽

Sequence Diversity ◽

Author Response ◽

Human Genetic Variation ◽

Viral Control ◽

A Genome ◽

Hiv 1

Download Full-text

A genome-wide survey of segmental duplications that mediate common human genetic variation of chromosomal architecture

Human Genomics ◽

10.1186/1479-7364-1-5-335 ◽

2004 ◽

Vol 1 (5) ◽

pp. 335 ◽

Cited By ~ 40

Author(s):

Michael R Mehan ◽

Nelson B Freimer ◽

Roel A Ophoff

Keyword(s):

Genetic Variation ◽

Segmental Duplications ◽

Human Genetic Variation ◽

Genome Wide ◽

A Genome ◽

Chromosomal Architecture ◽

Genome Wide Survey

Download Full-text

Faculty Opinions recommendation of Genome sequence of Gossypium herbaceum and genome updates of Gossypium arboreum and Gossypium hirsutum provide insights into cotton A-genome evolution.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.737748193.793574951 ◽

2020 ◽

Author(s):

Zhaosheng Kong

Keyword(s):

Gossypium Hirsutum ◽

Genome Evolution ◽

Genome Sequence ◽

Gossypium Arboreum ◽

A Genome ◽

Gossypium Herbaceum

Download Full-text

COVID ‐19 severity, miR ‐21 targets, and common human genetic variation. Letter regarding the article ‘Circulating cardiovascular microRNAs in critically ill COVID ‐19 patients’

European Journal of Heart Failure ◽

10.1002/ejhf.2317 ◽

2021 ◽

Author(s):

Simon A. Dingsdag ◽

Oliver K. Clay ◽

Gustavo A. Quintero

Keyword(s):

Genetic Variation ◽

Critically Ill ◽

Human Genetic Variation

Download Full-text

Integrating genome sequence and structural data for statistical learning to predict transcription factor binding sites

Nucleic Acids Research ◽

10.1093/nar/gkaa1134 ◽

2020 ◽

Vol 48 (22) ◽

pp. 12604-12617

Author(s):

Pengpeng Long ◽

Lu Zhang ◽

Bin Huang ◽

Quan Chen ◽

Haiyan Liu

Keyword(s):

Genome Sequence ◽

Energy Function ◽

Structural Information ◽

Structural Data ◽

P Values ◽

A Genome ◽

Z Scores ◽

Transcription Regulators ◽

Dna Specificity ◽

Tetracycline Repressor

Abstract We report an approach to predict DNA specificity of the tetracycline repressor (TetR) family transcription regulators (TFRs). First, a genome sequence-based method was streamlined with quantitative P-values defined to filter out reliable predictions. Then, a framework was introduced to incorporate structural data and to train a statistical energy function to score the pairing between TFR and TFR binding site (TFBS) based on sequences. The predictions benchmarked against experiments, TFBSs for 29 out of 30 TFRs were correctly predicted by either the genome sequence-based or the statistical energy-based method. Using P-values or Z-scores as indicators, we estimate that 59.6% of TFRs are covered with relatively reliable predictions by at least one of the two methods, while only 28.7% are covered by the genome sequence-based method alone. Our approach predicts a large number of new TFBs which cannot be correctly retrieved from public databases such as FootprintDB. High-throughput experimental assays suggest that the statistical energy can model the TFBSs of a significant number of TFRs reliably. Thus the energy function may be applied to explore for new TFBSs in respective genomes. It is possible to extend our approach to other transcriptional factor families with sufficient structural information.

Download Full-text

Genomic variation in the American pika: signatures of geographic isolation and implications for conservation

BMC Ecology and Evolution ◽

10.1186/s12862-020-01739-9 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Kelly B. Klingler ◽

Joshua P. Jahner ◽

Thomas L. Parchman ◽

Chris Ray ◽

Mary M. Peacock

Keyword(s):

Genetic Variation ◽

Genetic Structure ◽

Sierra Nevada ◽

Spatial Genetic Structure ◽

Spatial Scales ◽

Thermal Sensitivity ◽

Genomic Variation ◽

Genome Wide ◽

A Genome ◽

American Pika

Abstract Background Distributional responses by alpine taxa to repeated, glacial-interglacial cycles throughout the last two million years have significantly influenced the spatial genetic structure of populations. These effects have been exacerbated for the American pika (Ochotona princeps), a small alpine lagomorph constrained by thermal sensitivity and a limited dispersal capacity. As a species of conservation concern, long-term lack of gene flow has important consequences for landscape genetic structure and levels of diversity within populations. Here, we use reduced representation sequencing (ddRADseq) to provide a genome-wide perspective on patterns of genetic variation across pika populations representing distinct subspecies. To investigate how landscape and environmental features shape genetic variation, we collected genetic samples from distinct geographic regions as well as across finer spatial scales in two geographically proximate mountain ranges of eastern Nevada. Results Our genome-wide analyses corroborate range-wide, mitochondrial subspecific designations and reveal pronounced fine-scale population structure between the Ruby Mountains and East Humboldt Range of eastern Nevada. Populations in Nevada were characterized by low genetic diversity (π = 0.0006–0.0009; θW = 0.0005–0.0007) relative to populations in California (π = 0.0014–0.0019; θW = 0.0011–0.0017) and the Rocky Mountains (π = 0.0025–0.0027; θW = 0.0021–0.0024), indicating substantial genetic drift in these isolated populations. Tajima’s D was positive for all sites (D = 0.240–0.811), consistent with recent contraction in population sizes range-wide. Conclusions Substantial influences of geography, elevation and climate variables on genetic differentiation were also detected and may interact with the regional effects of anthropogenic climate change to force the loss of unique genetic lineages through continued population extirpations in the Great Basin and Sierra Nevada.

Download Full-text

Genomic Characterization Provides an Insight into the Pathogenicity of the Poplar Canker Bacterium Lonsdalea populi

Genes ◽

10.3390/genes12020246 ◽

2021 ◽

Vol 12 (2) ◽

pp. 246

Author(s):

Xiaomeng Chen ◽

Rui Li ◽

Yonglin Wang ◽

Aining Li

Keyword(s):

Genome Sequence ◽

Extracellular Enzymes ◽

De Novo ◽

Whole Genome Sequence ◽

Hybrid Poplars ◽

A Genome ◽

Conserved Genes ◽

Genomic Characterization ◽

Molecular Bases ◽

Insight Into

An emerging poplar canker caused by the gram-negative bacterium, Lonsdalea populi, has led to high mortality of hybrid poplars Populus × euramericana in China and Europe. The molecular bases of pathogenicity and bark adaptation of L. populi have become a focus of recent research. This study revealed the whole genome sequence and identified putative virulence factors of L. populi. A high-quality L. populi genome sequence was assembled de novo, with a genome size of 3,859,707 bp, containing approximately 3434 genes and 107 RNAs (75 tRNA, 22 rRNA, and 10 ncRNA). The L. populi genome contained 380 virulence-associated genes, mainly encoding for adhesion, extracellular enzymes, secretory systems, and two-component transduction systems. The genome had 110 carbohydrate-active enzyme (CAZy)-coding genes and putative secreted proteins. The antibiotic-resistance database annotation listed that L. populi was resistant to penicillin, fluoroquinolone, and kasugamycin. Analysis of comparative genomics found that L. populi exhibited the highest homology with the L. britannica genome and L. populi encompassed 1905 specific genes, 1769 dispensable genes, and 1381 conserved genes, suggesting high evolutionary diversity and genomic plasticity. Moreover, the pan genome analysis revealed that the N-5-1 genome is an open genome. These findings provide important resources for understanding the molecular basis of the pathogenicity and biology of L. populi and the poplar-bacterium interaction.

Download Full-text

Detection of human genetic variation in VAC14 gene by ARMA-PCR technique and relation with typhoid fever infection in patients with gallbladder diseases in Thi-Qar province/Iraq

Materials Today Proceedings ◽

10.1016/j.matpr.2021.05.236 ◽

2021 ◽

Author(s):

Zaman K. Hanan ◽

Manal B. Saleh ◽

Ezat H. Mezal ◽

Abduladheem Turki Jalil

Keyword(s):

Genetic Variation ◽

Typhoid Fever ◽

Human Genetic Variation ◽

Gallbladder Diseases

Download Full-text