multiple genome
Recently Published Documents


TOTAL DOCUMENTS

107
(FIVE YEARS 16)

H-INDEX

25
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Joel E. Richardson ◽  
Richard M. Baldarelli ◽  
Carol J. Bult

AbstractThe assembled and annotated genomes for 16 inbred mouse strains (Lilue et al., Nat Genet 50:1574–1583, 2018) and two wild-derived strains (CAROLI/EiJ and PAHARI/EiJ) (Thybert et al., Genome Res 28:448–459, 2018) are valuable resources for mouse genetics and comparative genomics. We developed the multiple genome viewer (MGV; http://www.informatics.jax.org/mgv) to support visualization, exploration, and comparison of genome annotations within and across these genomes. MGV displays chromosomal regions of user-selected genomes as horizontal tracks. Equivalent features across the genome tracks are highlighted using vertical ‘swim lane’ connectors. Navigation across the genomes is synchronized as a researcher uses the scroll and zoom functions. Researchers can generate custom sets of genes and other genome features to be displayed in MGV by entering genome coordinates, function, phenotype, disease, and/or pathway terms. MGV was developed to be genome agnostic and can be used to display homologous features across genomes of different organisms.


Author(s):  
Zhijie Qin ◽  
Shiqin Yu ◽  
Li Liu ◽  
Lingling Wang ◽  
Jian Chen ◽  
...  

2021 ◽  
Author(s):  
Konstantinos Xylogiannopoulos

Pattern detection and string matching are fundamental problems in computer science and the accelerated expansion of bioinformatics and computational biology have made them a core topic for both disciplines. The SARS-CoV-2 pandemic has made such problems more demanding with hundreds or thousands of new genome variants discovered every week, because of constant mutations, and the need for fast and accurate analyses. Medicines and, mostly, vaccines must be altered to adapt and efficiently address mutations. The need of computational tools for genomic analysis, such as sequence alignment, is very important, although, in most cases the resources and computational power needed is vast. The presented data structures and algorithms, specifically built for text mining and pattern detection, can help to address efficiently several bioinformatics problems. With a single execution of advanced algorithms, with limited space and time complexity, it is possible to acquire knowledge on all repeated patterns that exist in multiple genome sequences and this information can be used for further meta analyses. The potentials of the presented solutions are demonstrated with the analysis of more than 55,000 SARS-CoV-2 genome sequences (collected on March 10, 2021) and the detection of all repeated patterns with length up to 60 nucleotides in these sequences, something practically impossible with other algorithms due to its complexity. These results can be used to help provide answers to questions such as all variants common patterns, sequence alignment, palindromes and tandem repeats detection, genome comparisons, etc.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Michael J. Cormier ◽  
Jonathan R. Belyeu ◽  
Brent S. Pedersen ◽  
Joseph Brown ◽  
Johannes Köster ◽  
...  

AbstractThe rapid increase in the amount of genomic data provides researchers with an opportunity to integrate diverse datasets and annotations when addressing a wide range of biological questions. However, genomic datasets are deposited on different platforms and are stored in numerous formats from multiple genome builds, which complicates the task of collecting, annotating, transforming, and integrating data as needed. Here, we developed Go Get Data (GGD) as a fast, reproducible approach to installing standardized data recipes. GGD is available on Github (https://gogetdata.github.io/), is extendable to other data types, and can streamline the complexities typically associated with data integration, saving researchers time and improving research reproducibility.


2021 ◽  
Vol 18 (1) ◽  
pp. 33-33
Author(s):  
Lin Tang
Keyword(s):  

2020 ◽  
Author(s):  
Szymon Grabowski ◽  
Tomasz M. Kowalski

AbstractSummaryGenomes within the same species reveal large similarity, exploited by specialized multiple genome compressors. The existing algorithms and tools are however targeted at large, e.g., mammalian, genomes, and their performance on bacteria strains is mediocre. In this work, we propose MBGC, a specialized genome compressor making use of specific redundancy of bacterial genomes. Our tool is not only compression efficient, but also fast. On a collection of 168,311 bacterial genomes, totalling 587 GB, we achieve the compression ratio around the factor of 730, and the compression (resp. decompression) speed around 1070 MB/s (resp. 740 MB/s) using 8 hardware threads, on a computer with a 6-core / 12-thread CPU and a fast SSD, being about 4 times more succinct and more than an order of magnitude faster in the compression than our main competitors.Availability and implementationMBGC is freely available at github.com/kowallus/mbgc.


Nature ◽  
2020 ◽  
Vol 587 (7833) ◽  
pp. 246-251 ◽  
Author(s):  
Joel Armstrong ◽  
Glenn Hickey ◽  
Mark Diekhans ◽  
Ian T. Fiddes ◽  
Adam M. Novak ◽  
...  

AbstractNew genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies1–3. For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database4 increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies5 are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus6, a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Arnaud Ceol ◽  
Piero Montanari ◽  
Ilaria Bartolini ◽  
Stefano Ceri ◽  
Paolo Ciaccia ◽  
...  

Abstract Background Genome browsers are widely used for locating interesting genomic regions, but their interactive use is obviously limited to inspecting short genomic portions. An ideal interaction is to provide patterns of regions on the browser, and then extract other genomic regions over the whole genome where such patterns occur, ranked by similarity. Results We developed SimSearch, an optimized pattern-search method and an open source plugin for the Integrated Genome Browser (IGB), to find genomic region sets that are similar to a given region pattern. It provides efficient visual genome-wide analytics computation in large datasets; the plugin supports intuitive user interactions for selecting an interesting pattern on IGB tracks and visualizing the computed occurrences of similar patterns along the entire genome. SimSearch also includes functions for the annotation and enrichment of results, and is enhanced with a Quickload repository including numerous epigenomic feature datasets from ENCODE and Roadmap Epigenomics. The paper also includes some use cases to show multiple genome-wide analyses of biological interest, which can be easily performed by taking advantage of the presented approach. Conclusions The novel SimSearch method provides innovative support for effective genome-wide pattern search and visualization; its relevance and practical usefulness is demonstrated through a number of significant use cases of biological interest. The SimSearch IGB plugin, documentation, and code are freely available at https://deib-geco.github.io/simsearch-app/ and https://github.com/DEIB-GECO/simsearch-app/.


2020 ◽  
Author(s):  
Michael J. Cormier ◽  
Jonathan R. Belyeu ◽  
Brent S. Pedersen ◽  
Joseph Brown ◽  
Johannes Koster ◽  
...  

AbstractGenomics research is complicated by the inherent difficulty of collecting, transforming, and integrating the numerous datasets and annotations germane to one’s research. Furthermore, these data exist in disparate sources, and are stored in numerous, often abused formats from multiple genome builds. Since these complexities waste time, inhibit reproducibility, and curtail research creativity, we developed Go Get Data (GGD; https://gogetdata.github.io/) as a fast, reproducible approach to installing standardized data recipes.


Sign in / Sign up

Export Citation Format

Share Document