scholarly journals A Flow Procedure for the Linearization of Genome Sequence Graphs

2017 ◽  
Author(s):  
David Haussler ◽  
Maciej Smuga-Otto ◽  
Benedict Paten ◽  
Adam M Novak ◽  
Sergei Nikitin ◽  
...  

1AbstractEfforts to incorporate human genetic variation into the reference human genome have converged on the idea of a graph representation of genetic variation within a species, a genome sequence graph. A sequence graph represents a set of individual haploid reference genomes as paths in a single graph. When that set of reference genomes is sufficiently diverse, the sequence graph implicitly contains all frequent human genetic variations, including translocations, inversions, deletions, and insertions.In representing a set of genomes as a sequence graph one encounters certain challenges. One of the most important is the problem of graph linearization, essential both for efficiency of storage and access, as well as for natural graph visualization and compatibility with other tools. The goal of graph linearization is to order nodes of the graph in such a way that operations such as access, traversal and visualization are as efficient and effective as possible.A new algorithm for the linearization of sequence graphs, called the flow procedure, is proposed in this paper. Comparative experimental evaluation of the flow procedure against other algorithms shows that it outperforms its rivals in the metrics most relevant to sequence graphs.

2021 ◽  
Vol 41 ◽  
pp. 01003
Author(s):  
Joris A. Veltman

The field of human genetics has been radically changed by the introduction of massive parallel sequencing, also called next generation sequencing, approaches. Instead of studying a single gene or a few genetic variants, nowadays we can study genetic variation present in all genes and even throughout the entire human genome. For the first time in history, we can really study what makes us unique and use that to explain differences in for example disease susceptibility or response to treatment. In rare disease, genetics research is essential to identify the molecular diagnosis that provides the basis for a personalized patient management approach. It allows for more precise answers about the underlying cause and family recurrence risk, but also aids in optimizing treatment plans aimed at reducing co-morbidities and providing information about potential drugs or participation in drug trials, with an increasing number focused on gene therapy. These high-throughput sequencing technologies generate enormous amounts of data in order to assemble a genome and identify all of the variation present at different levels, from single nucleotide variations to chromosomal abnormalities. In addition, a genome sequence of a person in itself is not very useful. Value is derived from annotation of all the variation, and integration of the genome sequence with information about the patient involved (clinical information, disease-specific information, family history) as well as biological information (gene as well as variant-specific information, including population variation frequency, pathogenicity predictions, gene-expression information, etc). In this presentation, I will give an overview of the impact of genomics on the diagnosis of patients with rare developmental disorders and fertility disorders. I will highlight the importance of innovative bioinformatics approaches to detect and interpret genetic variation in a clinical context. Also, I will highlight some of the challenges that individual research and diagnostics units face in dealing with the data generated, discuss some of the ethical/privacy issues related to these approaches and discuss some of the latest genomics technologies being developed and validated.


2013 ◽  
Author(s):  
István Bartha ◽  
Jonathan M Carlson ◽  
Chanson J Brumme ◽  
Paul J McLaren ◽  
Zabrina L Brumme ◽  
...  

2020 ◽  
Vol 48 (22) ◽  
pp. 12604-12617
Author(s):  
Pengpeng Long ◽  
Lu Zhang ◽  
Bin Huang ◽  
Quan Chen ◽  
Haiyan Liu

Abstract We report an approach to predict DNA specificity of the tetracycline repressor (TetR) family transcription regulators (TFRs). First, a genome sequence-based method was streamlined with quantitative P-values defined to filter out reliable predictions. Then, a framework was introduced to incorporate structural data and to train a statistical energy function to score the pairing between TFR and TFR binding site (TFBS) based on sequences. The predictions benchmarked against experiments, TFBSs for 29 out of 30 TFRs were correctly predicted by either the genome sequence-based or the statistical energy-based method. Using P-values or Z-scores as indicators, we estimate that 59.6% of TFRs are covered with relatively reliable predictions by at least one of the two methods, while only 28.7% are covered by the genome sequence-based method alone. Our approach predicts a large number of new TFBs which cannot be correctly retrieved from public databases such as FootprintDB. High-throughput experimental assays suggest that the statistical energy can model the TFBSs of a significant number of TFRs reliably. Thus the energy function may be applied to explore for new TFBSs in respective genomes. It is possible to extend our approach to other transcriptional factor families with sufficient structural information.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Kelly B. Klingler ◽  
Joshua P. Jahner ◽  
Thomas L. Parchman ◽  
Chris Ray ◽  
Mary M. Peacock

Abstract Background Distributional responses by alpine taxa to repeated, glacial-interglacial cycles throughout the last two million years have significantly influenced the spatial genetic structure of populations. These effects have been exacerbated for the American pika (Ochotona princeps), a small alpine lagomorph constrained by thermal sensitivity and a limited dispersal capacity. As a species of conservation concern, long-term lack of gene flow has important consequences for landscape genetic structure and levels of diversity within populations. Here, we use reduced representation sequencing (ddRADseq) to provide a genome-wide perspective on patterns of genetic variation across pika populations representing distinct subspecies. To investigate how landscape and environmental features shape genetic variation, we collected genetic samples from distinct geographic regions as well as across finer spatial scales in two geographically proximate mountain ranges of eastern Nevada. Results Our genome-wide analyses corroborate range-wide, mitochondrial subspecific designations and reveal pronounced fine-scale population structure between the Ruby Mountains and East Humboldt Range of eastern Nevada. Populations in Nevada were characterized by low genetic diversity (π = 0.0006–0.0009; θW = 0.0005–0.0007) relative to populations in California (π = 0.0014–0.0019; θW = 0.0011–0.0017) and the Rocky Mountains (π = 0.0025–0.0027; θW = 0.0021–0.0024), indicating substantial genetic drift in these isolated populations. Tajima’s D was positive for all sites (D = 0.240–0.811), consistent with recent contraction in population sizes range-wide. Conclusions Substantial influences of geography, elevation and climate variables on genetic differentiation were also detected and may interact with the regional effects of anthropogenic climate change to force the loss of unique genetic lineages through continued population extirpations in the Great Basin and Sierra Nevada.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 246
Author(s):  
Xiaomeng Chen ◽  
Rui Li ◽  
Yonglin Wang ◽  
Aining Li

An emerging poplar canker caused by the gram-negative bacterium, Lonsdalea populi, has led to high mortality of hybrid poplars Populus × euramericana in China and Europe. The molecular bases of pathogenicity and bark adaptation of L. populi have become a focus of recent research. This study revealed the whole genome sequence and identified putative virulence factors of L. populi. A high-quality L. populi genome sequence was assembled de novo, with a genome size of 3,859,707 bp, containing approximately 3434 genes and 107 RNAs (75 tRNA, 22 rRNA, and 10 ncRNA). The L. populi genome contained 380 virulence-associated genes, mainly encoding for adhesion, extracellular enzymes, secretory systems, and two-component transduction systems. The genome had 110 carbohydrate-active enzyme (CAZy)-coding genes and putative secreted proteins. The antibiotic-resistance database annotation listed that L. populi was resistant to penicillin, fluoroquinolone, and kasugamycin. Analysis of comparative genomics found that L. populi exhibited the highest homology with the L. britannica genome and L. populi encompassed 1905 specific genes, 1769 dispensable genes, and 1381 conserved genes, suggesting high evolutionary diversity and genomic plasticity. Moreover, the pan genome analysis revealed that the N-5-1 genome is an open genome. These findings provide important resources for understanding the molecular basis of the pathogenicity and biology of L. populi and the poplar-bacterium interaction.


Sign in / Sign up

Export Citation Format

Share Document