whole genome alignment
Recently Published Documents


TOTAL DOCUMENTS

48
(FIVE YEARS 6)

H-INDEX

9
(FIVE YEARS 0)

Author(s):  
Roman Kotłowski ◽  
Alicja Nowak-Zaleska ◽  
Grzegorz Węgrzyn

AbstractAn optimized method for bacterial strain differentiation, based on combination of Repeated Sequences and Whole Genome Alignment Differential Analysis (RS&WGADA), is presented in this report. In this analysis, 51 Acinetobacter baumannii multidrug-resistance strains from one hospital environment and patients from 14 hospital wards were classified on the basis of polymorphisms of repeated sequences located in CRISPR region, variation in the gene encoding the EmrA-homologue of E. coli, and antibiotic resistance patterns, in combination with three newly identified polymorphic regions in the genomes of A. baumannii clinical isolates. Differential analysis of two similarity matrices between different genotypes and resistance patterns allowed to distinguish three significant correlations (p < 0.05) between 172 bp DNA insertion combined with resistance to chloramphenicol and gentamycin. Interestingly, 45 and 55 bp DNA insertions within the CRISPR region were identified, and combined during analyses with resistance/susceptibility to trimethoprim/sulfamethoxazole. Moreover, 184 or 1374 bp DNA length polymorphisms in the genomic region located upstream of the GTP cyclohydrolase I gene, associated mainly with imipenem susceptibility, was identified. In addition, considerable nucleotide polymorphism of the gene encoding the gamma/tau subunit of DNA polymerase III, an enzyme crucial for bacterial DNA replication, was discovered. The differentiation analysis performed using the above described approach allowed us to monitor the distribution of A. baumannii isolates in different wards of the hospital in the time frame of several years, indicating that the optimized method may be useful in hospital epidemiological studies, particularly in identification of the source of primary infections.



2021 ◽  
Author(s):  
Bruno Contreras-Moreira ◽  
Carla V Filippi ◽  
Guy Naamati ◽  
Carlos García Girón ◽  
James E Allen ◽  
...  

Ii.Summary/AbstractThe annotation of repetitive sequences within plant genomes can help in the interpretation of observed phenotypes. Moreover, repeat masking is required for tasks such as whole-genome alignment, promoter analysis or pangenome exploration. While homology-based annotation methods are computationally expensive, k-mer strategies for masking are orders of magnitude faster. Here we benchmark a two-step approach, where repeats are first called by k-mer counting and then annotated by comparison to curated libraries. This hybrid protocol was tested on 20 plant genomes from Ensembl, using the kmer-based Repeat Detector (Red) and two repeat libraries (REdat and nrTEplants, curated for this work). We obtained repeated genome fractions that match those reported in the literature, but with shorter repeated elements than those produced with conventional annotators. Inspection of masked regions overlapping genes revealed no preference for specific protein domains. Half of Red masked sequences can be successfully classified with nrTEplants, with the complete protocol taking less than 2h on a desktop Linux box. The repeat library and the scripts to mask and annotate plant genomes can be obtained at https://github.com/Ensembl/plant-scripts.





2021 ◽  
Author(s):  
Rory J Craig ◽  
Ahmed R Hasan ◽  
Rob W Ness ◽  
Peter D Keightley

Abstract Despite its role as a reference organism in the plant sciences, the green alga Chlamydomonas reinhardtii entirely lacks genomic resources from closely related species. We present highly contiguous and well-annotated genome assemblies for three unicellular C. reinhardtii relatives: Chlamydomonas incerta, Chlamydomonas schloesseri, and the more distantly related Edaphochlamys debaryana. The three Chlamydomonas genomes are highly syntenous with similar gene contents, although the 129.2 Mb C. incerta and 130.2 Mb C. schloesseri assemblies are more repeat-rich than the 111.1 Mb C. reinhardtii genome. We identify the major centromeric repeat in C. reinhardtii as a LINE transposable element homologous to Zepp (the centromeric repeat in Coccomyxa subellipsoidea) and infer that centromere locations and structure are likely conserved in C. incerta and C. schloesseri. We report extensive rearrangements, but limited gene turnover, between the minus mating type loci of these Chlamydomonas species. We produce an eight-species core-Reinhardtinia whole-genome alignment, which we use to identify several hundred false positive and missing genes in the C. reinhardtii annotation and &gt;260,000 evolutionarily conserved elements in the C. reinhardtii genome. In summary, these resources will enable comparative genomics analyses for C. reinhardtii, significantly extending the analytical toolkit for this emerging model system.



Author(s):  
John T Burley ◽  
James R Kellner ◽  
Stephen P Hubbell ◽  
Brant C Faircloth

Abstract The lack of genomic resources for tropical canopy trees is impeding several research avenues in tropical forest biology. We present genome assemblies for two Neotropical hardwood species, Jacaranda copaia and Handroanthus (formerly Tabebuia) guayacan, that are model systems for research on tropical tree demography and flowering phenology. For each species, we combined Illumina short-read data with in vitro proximity-ligation (Chicago) libraries to generate an assembly. For J. copaia, we obtained 104X physical coverage and produced an assembly with N50/N90 scaffold lengths of 1.020 Mbp/0.277 Mbp. For H. guayacan, we obtained 129X coverage and produced an assembly with N50/N90 scaffold lengths of 0.795 Mbp/0.165 Mbp. J. copaia and H. guayacan assemblies contained 95.8% and 87.9% of benchmarking orthologs, although they constituted only 77.1% and 66.7% of the estimated genome sizes of 799 Mbp and 512 Mbp, respectively. These differences were potentially due to high repetitive sequence content (&gt; 59.31% and 45.59%) and high heterozygosity (0.5% and 0.8%) in each species. Finally, we compared each new assembly to a previously sequenced genome for H. impetiginosus using whole-genome alignment. This analysis indicated extensive gene duplication in H. impetiginosus since its divergence from H. guayacan.



IEEE Access ◽  
2021 ◽  
Vol 9 ◽  
pp. 161890-161897
Author(s):  
Rostislav Hrivnak ◽  
Petr Gajdos ◽  
Vaclav Snasel


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0244515
Author(s):  
Guanliang Li ◽  
Ziyan Zhou ◽  
Lingrui Liang ◽  
Zhao Song ◽  
Yafei Hu ◽  
...  

The CRISPR/Cas9 system is an efficient genome editing tool that possesses the outstanding advantages of simplicity and high efficiency. Genome-wide identification and specificity analysis of editing sites is an effective approach for mitigating the risk of off-target effects of CRISPR/Cas9 and has been applied in several plant species but has not yet been reported in pepper. In present study, we first identified genome-wide CRISPR/Cas9 editing sites based on the ‘Zunla-1’ reference genome and then evaluated the specificity of CRISPR/Cas9 editing sites through whole-genome alignment. Results showed that a total of 603,202,314 CRISPR/Cas9 editing sites, including 229,909,837 (~38.11%) NGG-PAM sites and 373,292,477 (~61.89%) NAG-PAM sites, were detectable in the pepper genome, and the systematic characterization of their composition and distribution was performed. Furthermore, 29,623,855 highly specific NGG-PAM sites were identified through whole-genome alignment analysis. There were 26,699,38 (~90.13%) highly specific NGG-PAM sites located in intergenic regions, which was 9.13 times of the number in genic regions, but the average density in genic regions was higher than that in intergenic regions. More importantly, 34,251 (~96.93%) out of 35,336 annotated genes exhibited at least one highly specific NGG-PAM site in their exons, and 90.50% of the annotated genes exhibited at least 4 highly specific NGG- PAM sites, indicating that the set of highly specific CRISPR/Cas9 editing sites identified in this study was widely applicable and conducive to the minimization of the off-target effects of CRISPR/Cas9 in pepper.



2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Ilia Minkin ◽  
Paul Medvedev

AbstractMultiple whole-genome alignment is a challenging problem in bioinformatics. Despite many successes, current methods are not able to keep up with the growing number, length, and complexity of assembled genomes, especially when computational resources are limited. Approaches based on compacted de Bruijn graphs to identify and extend anchors into locally collinear blocks have potential for scalability, but current methods do not scale to mammalian genomes. We present an algorithm, SibeliaZ-LCB, for identifying collinear blocks in closely related genomes based on analysis of the de Bruijn graph. We further incorporate this into a multiple whole-genome alignment pipeline called SibeliaZ. SibeliaZ shows run-time improvements over other methods while maintaining accuracy. On sixteen recently-assembled strains of mice, SibeliaZ runs in under 16 hours on a single machine, while other tools did not run to completion for eight mice within a week. SibeliaZ makes a significant step towards improving scalability of multiple whole-genome alignment and collinear block reconstruction algorithms on a single machine.



Nature ◽  
2020 ◽  
Vol 587 (7833) ◽  
pp. 240-245 ◽  
Author(s):  

AbstractThe Zoonomia Project is investigating the genomics of shared and specialized traits in eutherian mammals. Here we provide genome assemblies for 131 species, of which all but 9 are previously uncharacterized, and describe a whole-genome alignment of 240 species of considerable phylogenetic diversity, comprising representatives from more than 80% of mammalian families. We find that regions of reduced genetic diversity are more abundant in species at a high risk of extinction, discern signals of evolutionary selection at high resolution and provide insights from individual reference genomes. By prioritizing phylogenetic diversity and making data available quickly and without restriction, the Zoonomia Project aims to support biological discovery, medical research and the conservation of biodiversity.



2020 ◽  
Vol 36 (10) ◽  
pp. 3242-3243 ◽  
Author(s):  
Samuel O’Donnell ◽  
Gilles Fischer

Abstract Summary MUM&Co is a single bash script to detect structural variations (SVs) utilizing whole-genome alignment (WGA). Using MUMmer’s nucmer alignment, MUM&Co can detect insertions, deletions, tandem duplications, inversions and translocations greater than 50 bp. Its versatility depends upon the WGA and therefore benefits from contiguous de-novo assemblies generated by third generation sequencing technologies. Benchmarked against five WGA SV-calling tools, MUM&Co outperforms all tools on simulated SVs in yeast, plant and human genomes and performs similarly in two real human datasets. Additionally, MUM&Co is particularly unique in its ability to find inversions in both simulated and real datasets. Lastly, MUM&Co’s primary output is an intuitive tabulated file containing a list of SVs with only necessary genomic details. Availability and implementation https://github.com/SAMtoBAM/MUMandCo. Supplementary information Supplementary data are available at Bioinformatics online.



Sign in / Sign up

Export Citation Format

Share Document