genome representation
Recently Published Documents


TOTAL DOCUMENTS

14
(FIVE YEARS 9)

H-INDEX

4
(FIVE YEARS 1)

2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Gabriel Siqueira ◽  
Alexsandro Oliveira Alexandrino ◽  
Andre Rodrigues Oliveira ◽  
Zanoni Dias

AbstractThe rearrangement distance is a method to compare genomes of different species. Such distance is the number of rearrangement events necessary to transform one genome into another. Two commonly studied events are the transposition, which exchanges two consecutive blocks of the genome, and the reversal, which reverts a block of the genome. When dealing with such problems, seminal works represented genomes as sequences of genes without repetition. More realistic models started to consider gene repetition or the presence of intergenic regions, sequences of nucleotides between genes and in the extremities of the genome. This work explores the transposition and reversal events applied in a genome representation considering both gene repetition and intergenic regions. We define two problems called Minimum Common Intergenic String Partition and Reverse Minimum Common Intergenic String Partition. Using a relation with these two problems, we show a $$\Theta \left( k \right)$$ Θ k -approximation for the Intergenic Transposition Distance, the Intergenic Reversal Distance, and the Intergenic Reversal and Transposition Distance problems, where k is the maximum number of copies of a gene in the genomes. Our practical experiments on simulated genomes show that the use of partitions improves the estimates for the distances.


2021 ◽  
Author(s):  
Jonathan Sandoval-Castillo ◽  
Luciano B. Beheregaray ◽  
Maren Wellenreuther

AbstractGrowth is one of the most important traits of an organism. For exploited species, this trait has ecological and evolutionary consequences as well as economical and conservation significance. Rapid changes in growth rate associated with anthropogenic stressors have been reported for several marine fishes, but little is known about the genetic basis of growth traits in teleosts. We used reduced genome representation data and genome-wide association approaches to identify growth-related genetic variation in the commercially, recreationally, and culturally important Australian snapper (Chrysophrys auratus, Sparidae). Based on 17,490 high-quality SNPs and 363 individuals representing extreme growth phenotypes from 15,000 fish of the same age and reared under identical conditions in a sea pen, we identified 100 unique candidates that were annotated to 51 proteins. We documented a complex polygenic nature of growth in the species that included several loci with small effects and a few loci with larger effects. Overall heritability was high (75.7%), reflected in the high accuracy of the genomic prediction for the phenotype (small vs large). Although the SNPs were distributed across the genome, most candidates (60%) clustered on chromosome 16, which also explains the largest proportion of heritability (16.4%). This study demonstrates that reduced genome representation SNPs and the right bioinformatic tools provide a cost-efficient approach to identify growth-related loci and to describe genomic architectures of complex quantitative traits. Our results help to inform captive aquaculture breeding programmes and are of relevance to monitor growth-related evolutionary shifts in wild populations in response to anthropogenic pressures.


2021 ◽  
Author(s):  
Jonathan Edward LoTempio ◽  
Emmanuèle Délot ◽  
Eric Vilain

Background: The utility of long-read genome sequencing platforms has been shown in many fields including whole genome assembly, metagenomics, and amplicon sequencing. Less clear is the applicability of long reads to reference-guided human genomics, the foundation of genomic medicine. Here, we benchmark available platform-agnostic alignment tools on datasets from nanopore and single-molecule real-time platforms to understand their suitability in producing a genome representation. Results: For this study, we leveraged publicly-available data from sample NA12878 generated on Oxford Nanopore and Pacific Biosciences platforms. Each tool that was benchmarked, including GraphMap, minimap2, and NGMLR, produced the same alignment file each time. However, the different tools widely disagreed on which reads to leave unaligned, affecting the end genome coverage and the number and locations of discoverable breakpoints. Only minimap2 was computationally lightweight enough for use at scale. No alignment from one tool independently resolved all large structural variants (10,000-100,000 basepairs) present in the Database of Genome Variants (DGV) for sample NA12878. For variants larger than 1,000,000 basepairs, nanopore sequence aligned with minimap2 and NGMLR, and single-molecule real-time sequence aligned with NGMLR contained more breakpoints than are present in DGV. Conclusions: When computational resources are not a limiting factor, it should be best practice to use an analysis pipeline that generates alignments with both minimap2 and NGMLR, as neither results in a comprehensive genome representation. When computational resources are limited, use of minimap2 for human genome alignment produces files sufficient to answer hypotheses and generate new questions.


2021 ◽  
Author(s):  
Moataz Dowaidar

Using pre-existing datasets to combine published information with new metrics would help researchers construct a broader picture of chromatin in disease. A computational biology goal is the near-real-time integration of epigenomic data sets, irrespective of the laboratory they were generated in—similar to a blood pressure, ECG or troponin test. In addition, epigenome modeling must become dynamic, considering cell-to-cell variability and changes over time due to normal physiological or pathological stressors. Probabilistic modeling and machine learning can help such model creation, while finding (and quantifying) previously identified developing chromatin properties that match heart health changes. A 3D genome representation, for example, may reveal a structural or accessibility attribute connected to health or disease that no single epigenomic test alone can discover. Such strategies can expand basic knowledge of biology and illness.Incorporating wet and dry lab training components to teach schemes to foster the formation of more diverse technical repertoires. Data mining and fresh data collection will revolutionize how we handle chromatin challenges in coming years. Knowing how computers solve problems (as opposed to how people do) and how to computationally phrase questions would create a shared vocabulary that completes tasks. Team members don't need all the big data skills, but a collaborative attitude is important for effective large-scale epigenomic research. UCLA's QCBio Collaboratory is a great platform for teaching non-programmers and facilitating cooperation to resolve biological issues.It also encourages the use of open source technology by making genomics datasets available to non-experts.There are already many bioinformatics tools—and others will be developed to introduce new understanding—but basic knowledge of how computers work and how to answer big-data questions will continue to empower scientists to test the most meaningful hypotheses with appropriate tools to reveal new insights about cardiac biology.


2020 ◽  
Author(s):  
Benjamin Kaminow ◽  
Sara Ballouz ◽  
Jesse Gillis ◽  
Alexander Dobin

The Human Reference Genome serves as the foundation for modern genomic analyses. However, in its present form, it does not adequately represent the vast genetic diversity of the human population. In this study, we explored the consensus genome as a potential successor of the current Reference genome, and assessed its effect on the accuracy of RNA-seq read alignment. In order to find the best haploid genome representation, we constructed consensus genomes at the Pan-human, Super-population and Population levels, utilizing variant information from the 1000 Genomes project. Using personal haploid genomes as the ground truth, we compared mapping errors for real RNA-seq reads aligned to the consensus genomes versus the Reference genome. For reads overlapping homozygous variants, we found that the mapping error decreased by a factor of ∼2-3 when the Reference was replaced with the Pan-human consensus genome. Interestingly, we also found that using more population-specific consensuses resulted in little to no increase over using the Pan-human consensus, suggesting a limit in the utility of incorporating more specific genomic variation. To assess the functional impact, we performed transcript expression quantification and found that the Pan-human consensus increases accuracy of transcript quantification for hundreds of transcripts.


BMC Biology ◽  
2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Peter D. Olson ◽  
Alan Tracey ◽  
Andrew Baillie ◽  
Katherine James ◽  
Stephen R. Doyle ◽  
...  

Abstract Background Chromosome-level assemblies are indispensable for accurate gene prediction, synteny assessment, and understanding higher-order genome architecture. Reference and draft genomes of key helminth species have been published, but little is yet known about the biology of their chromosomes. Here, we present the complete genome of the tapeworm Hymenolepis microstoma, providing a reference quality, end-to-end assembly that represents the first fully assembled genome of a spiralian/lophotrochozoan, revealing new insights into chromosome evolution. Results Long-read sequencing and optical mapping data were added to previous short-read data enabling complete re-assembly into six chromosomes, consistent with karyology. Small genome size (169 Mb) and lack of haploid variation (1 SNP/3.2 Mb) contributed to exceptionally high contiguity with only 85 gaps remaining in regions of low complexity sequence. Resolution of repeat regions reveals novel gene expansions, micro-exon genes, and spliced leader trans-splicing, and illuminates the landscape of transposable elements, explaining observed length differences in sister chromatids. Syntenic comparison with other parasitic flatworms shows conserved ancestral linkage groups indicating that the H. microstoma karyotype evolved through fusion events. Strikingly, the assembly reveals that the chromosomes terminate in centromeric arrays, indicating that these motifs play a role not only in segregation, but also in protecting the linear integrity and full lengths of chromosomes. Conclusions Despite strong conservation of canonical telomeres, our results show that they can be substituted by more complex, species-specific sequences, as represented by centromeres. The assembly provides a robust platform for investigations that require complete genome representation.


Author(s):  
Peter D. Olson ◽  
Alan Tracey ◽  
Andrew Baillie ◽  
Katherine James ◽  
Stephen R. Doyle ◽  
...  

AbstractBackgroundChromosome-level assemblies are indispensable for accurate gene prediction, synteny assessment and understanding higher-order genome architecture. Reference and draft genomes of key helminth species have been published but little is yet known about the biology of their chromosomes. Here we present the complete genome of the tapeworm Hymenolepis microstoma, providing a reference-quality, end-to-end assembly that represents the first fully assembled genome of a spiralian/lophotrochozoan, revealing new insights into chromosome evolution.ResultsLong-read sequencing and optical mapping data were added to previous short-read data enabling complete re-assembly into six chromosomes, consistent with karyology. Small genome size (169 Mb) and lack of haploid variation (1 SNP/3.2 Mb) contributed to exceptionally high contiguity with only 85 gaps remaining in regions of low complexity sequence. Resolution of repeat regions reveals novel gene expansions, micro-exon genes, and spliced leader transsplicing, and illuminates the landscape of transposable elements, explaining observed length differences in sister chromatids. Syntenic comparison with other parasitic flatworms shows conserved ancestral linkage groups indicating that the H. microstoma karyotype evolved through fusion events. Strikingly, the assembly reveals that the chromosomes terminate in centromeric arrays, indicating that these motifs play a role not only in segregation, but also in protecting the linear integrity and full lengths of chromosomes.ConclusionsDespite strong conservation of canonical telomeres, our results show that they can be substituted by more complex, species-specific sequences, as represented by centromeres. The assembly provides a robust platform for investigations that require complete genome representation.


2019 ◽  
Vol 2 (93) ◽  
pp. 53-58
Author(s):  
О. О. Mazurova ◽  
Т. О. Gordienko

The work is devoted to the study of genetic algorithms on the example of the search for the best ways to support“Transit challenge” or “Subway Challenge” system. Based on the rules of “Transit challenge” and the tasks of thesalesman, the problem of stations was formulated - the search for the best way to visit all subway stations in the shortestpossible time. Based on the graph theory, a mathematical model of the metrophone system was developed. To solve thestation problem, a genetic algorithm has been developed: the method of genome representation, population mutationand genome crossing rules have been chosen. On the basis of the experimental study of the genetic algorithm the mosteffective parameters were selected and recommendations on the solution of the station problem for “Transit challenge”of different dimensions were developed.


2019 ◽  
Author(s):  
A. Viehweger ◽  
M. Hoelzer ◽  
C. Brandt

AbstractMany recent microbial genome collections curate hundreds of thousands of genomes. This volume complicates many genomic analyses such as taxon assignment because the associated computational burden is substantial. However, the number of representatives of each species is highly skewed towards human pathogens and model organisms. Thus many genomes contain little additional information and could be removed. We created a frugal dereplication method that can reduce massive genome collections based on genome sequence alone, without the need for manual curation nor taxonomic information.We recently created a genome representation for bacteria and archaea called “nanotext”. This method embeds each genome in a low-dimensional vector of numbers. Extending nanotext, our proposed algorithm called “thinspace” uses these vectors to group and dereplicate similar genomes.We dereplicated the Genome Taxonomy Database (GTDB) from about 150 thousand genomes to less than 22 thousand. The resulting collection increases the percent of classified reads in a metagenomic dataset by a factor of 5 compared to NCBI RefSeq and performs equal to both a larger as well as a manually curated GTDB subset.With thinspace, massive genome collections can be dereplicated on regular hardware, without affecting downstream results. It is released under a BSD-3 license (github.com/phiweger/thinspace).


2018 ◽  
Author(s):  
Caiti Smukowski Heil ◽  
Christopher R. L. Large ◽  
Kira Patterson ◽  
Maitreya J. Dunham

AbstractInterspecific hybridization can introduce genetic variation that aids in adaptation to new or changing environments. Here we investigate how the environment, and more specifically temperature, interacts with hybrid genomes to alter parental genome representation over time. We evolved Saccharomyces cerevisiae x Saccharomyces uvarum hybrids in nutrient-limited continuous culture at 15°C for 200 generations. In comparison to previous evolution experiments at 30°C, we identified a number of temperature specific responses, including the loss of the S. cerevisiae allele in favor of the cryotolerant S. uvarum allele for several portions of the hybrid genome. In particular, we discovered a genotype by environment interaction in the form of a reciprocal loss of heterozygosity event on chromosome XIII. Which species haplotype is lost or maintained is dependent on the parental species temperature preference and the temperature at which the hybrid was evolved. We show that a large contribution to this directionality is due to temperature sensitivity at a single locus, the high affinity phosphate transporter PHO84. This work helps shape our understanding of what forces impact genome evolution after hybridization, and how environmental conditions may favor or disfavor hybrids over time.


Sign in / Sign up

Export Citation Format

Share Document