scholarly journals Perfect Match Genomic Landscape strategy: Refinement and customization of reference genomes

2021 ◽  
Vol 118 (14) ◽  
pp. e2025192118
Author(s):  
Kim Palacios-Flores ◽  
Jair García-Sotelo ◽  
Alejandra Castillo ◽  
Carina Uribe ◽  
Lucía Morales ◽  
...  

When addressing a genomic question, having a reliable and adequate reference genome is of utmost importance. This drives the necessity to refine and customize reference genomes (RGs). Our laboratory has recently developed a strategy, the Perfect Match Genomic Landscape (PMGL), to detect variation between genomes [K. Palacios-Flores et al.. Genetics 208, 1631–1641 (2018)]. The PMGL is precise and sensitive and, in contrast to most currently used algorithms, is nonstatistical in nature. Here we demonstrate the power of PMGL to refine and customize RGs. As a proof-of-concept, we refined different versions of the Saccharomyces cerevisiae RG. We applied the automatic PMGL pipeline to refine the genomes of microorganisms belonging to the three domains of life: the archaea Methanococcus maripaludis and Pyrococcus furiosus; the bacteria Escherichia coli, Staphylococcus aureus, and Bacillus subtilis; and the eukarya Schizosaccharomyces pombe, Aspergillus oryzae, and several strains of Saccharomyces paradoxus. We analyzed the reference genome of the virus SARS-CoV-2 and previously published viral genomes from patients’ samples with COVID-19. We performed a mutation-accumulation experiment in E. coli and show that the PMGL strategy can detect specific mutations generated at any desired step of the whole procedure. We propose that PMGL can be used as a final step for the refinement and customization of any haploid genome, independently of the strategies and algorithms used in its assembly.

Author(s):  
D. Storato ◽  
M. Comin

AbstractThe major problem when analyzing a metagenomic sample is to taxonomically annotate its reads in order to identify the species they contain. Most of the methods currently available focus on the classification of reads using a set of reference genomes and their k-mers. While in terms of precision these methods have reached percentages of correctness close to perfection, in terms of recall (the actual number of classified reads) the performances fall at around 50%. One of the reasons is the fact that the sequences in a sample can be very different from the corresponding reference genome, e.g. viral genomes are highly mutated. To address this issue, in this paper we study the problem of metagenomic reads classification by improving the reference k-mers library with novel discriminative k-mers from the input sequencing reads. We evaluated the performance in different conditions against several other tools and the results showed an improved F-measure, especially when close reference genomes are not available.Availabilityhttps://github.com/davide92/K2Mem.git


Metabolites ◽  
2021 ◽  
Vol 11 (2) ◽  
pp. 67
Author(s):  
Snehal R. Jadhav ◽  
Rohan M. Shah ◽  
Avinash V. Karpe ◽  
Robert S. Barlow ◽  
Kate E. McMillan ◽  
...  

Shiga toxigenic E. coli (STEC) are an important cause of foodborne disease globally with many outbreaks linked to the consumption of contaminated foods such as leafy greens. Existing methods for STEC detection and isolation are time-consuming. Rapid methods may assist in preventing contaminated products from reaching consumers. This proof-of-concept study aimed to determine if a metabolomics approach could be used to detect STEC contamination in spinach. Using untargeted metabolic profiling, the bacterial pellets and supernatants arising from bacterial and inoculated spinach enrichments were investigated for the presence of unique metabolites that enabled categorization of three E. coli risk groups. A total of 109 and 471 metabolite features were identified in bacterial and inoculated spinach enrichments, respectively. Supervised OPLS-DA analysis demonstrated clear discrimination between bacterial enrichments containing different risk groups. Further analysis of the spinach enrichments determined that pathogen risk groups 1 and 2 could be easily discriminated from the other groups, though some clustering of risk groups 1 and 2 was observed, likely representing their genomic similarity. Biomarker discovery identified metabolites that were significantly associated with risk groups and may be appropriate targets for potential biosensor development. This study has confirmed that metabolomics can be used to identify the presence of pathogenic E. coli likely to be implicated in human disease.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Masuzu Kikuchi ◽  
Keiichi Kojima ◽  
Shin Nakao ◽  
Susumu Yoshizawa ◽  
Shiho Kawanishi ◽  
...  

AbstractMicrobial rhodopsins are photoswitchable seven-transmembrane proteins that are widely distributed in three domains of life, archaea, bacteria and eukarya. Rhodopsins allow the transport of protons outwardly across the membrane and are indispensable for light-energy conversion in microorganisms. Archaeal and bacterial proton pump rhodopsins have been characterized using an Escherichia coli expression system because that enables the rapid production of large amounts of recombinant proteins, whereas no success has been reported for eukaryotic rhodopsins. Here, we report a phylogenetically distinct eukaryotic rhodopsin from the dinoflagellate Oxyrrhis marina (O. marina rhodopsin-2, OmR2) that can be expressed in E. coli cells. E. coli cells harboring the OmR2 gene showed an outward proton-pumping activity, indicating its functional expression. Spectroscopic characterization of the purified OmR2 protein revealed several features as follows: (1) an absorption maximum at 533 nm with all-trans retinal chromophore, (2) the possession of the deprotonated counterion (pKa = 3.0) of the protonated Schiff base and (3) a rapid photocycle through several distinct photointermediates. Those features are similar to those of known eukaryotic proton pump rhodopsins. Our successful characterization of OmR2 expressed in E. coli cells could build a basis for understanding and utilizing eukaryotic rhodopsins.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Nae-Chyun Chen ◽  
Brad Solomon ◽  
Taher Mun ◽  
Sheila Iyer ◽  
Ben Langmead

AbstractMost sequencing data analyses start by aligning sequencing reads to a linear reference genome, but failure to account for genetic variation leads to reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the reference flow alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow achieves a similar level of accuracy and bias avoidance but with 14% of the memory footprint and 5.5 times the speed.


2007 ◽  
Vol 189 (20) ◽  
pp. 7281-7289 ◽  
Author(s):  
Myong-Ok Park ◽  
Taeko Mizutani ◽  
Patrik R. Jones

ABSTRACT The genome sequence of the non-sugar-assimilating mesophile Methanococcus maripaludis contains three genes encoding enzymes: a nonphosphorylating NADP+-dependent glyceraldehyde-3-phosphate dehydrogenase (GAPN), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), and glyceraldehyde-3-phosphate ferredoxin oxidoreductase (GAPOR); all these enzymes are potentially capable of catalyzing glyceraldehyde-3-phosphate (G3P) metabolism. GAPOR, whose homologs have been found mainly in archaea, catalyzes the reduction of ferredoxin coupled with oxidation of G3P. GAPOR has previously been isolated and characterized only from a sugar-assimilating hyperthermophile, Pyrococcus furiosus (GAPORPf), and contains the rare metal tungsten as an irreplaceable cofactor. Active recombinant M. maripaludis GAPOR (GAPORMm) was purified from Escherichia coli grown in minimal medium containing 100 μM sodium molybdate. In contrast, GAPORMm obtained from cells grown in medium containing tungsten (W) and W and molybdenum (Mo) or in medium without added W and Mo did not display any activity. Activity and transcript analysis of putative G3P-metabolizing enzymes and corresponding genes were performed with M. maripaludis cultured under autotrophic conditions in chemically defined medium. The activity of GAPORMm was constitutive throughout the culture period and exceeded that of GAPDH at all time points. As GAPDH activity was detected in only the gluconeogenic direction and GAPN activity was completely absent, only GAPORMm catalyzes oxidation of G3P in M. maripaludis. Recombinant GAPORMm is posttranscriptionally regulated as it exhibits pronounced and irreversible substrate inhibition and is completely inhibited by 1 μM ATP. With support from flux balance analysis, it is concluded that the major physiological role of GAPORMm in M. maripaludis most likely involves only nonoptimal growth conditions.


2022 ◽  
Vol 0 (0) ◽  
Author(s):  
V. Janett Olzog ◽  
Lena I. Freist ◽  
Robin Goldmann ◽  
Jörg Fallmann ◽  
Christina E. Weinberg

Abstract Self-cleaving ribozymes are catalytic RNAs and can be found in all domains of life. They catalyze a site-specific cleavage that results in a 5′ fragment with a 2′,3′ cyclic phosphate (2′,3′ cP) and a 3′ fragment with a 5′ hydroxyl (5′ OH) end. Recently, several strategies to enrich self-cleaving ribozymes by targeted biochemical methods have been introduced by us and others. Here, we develop an alternative strategy in which 5ʹ OH RNAs are specifically ligated by RtcB ligase, which first guanylates the 3′ phosphate of the adapter and then ligates it directly to RNAs with 5′ OH ends. Our results demonstrate that adapter ligation to highly structured ribozyme fragments is much more efficient using the thermostable RtcB ligase from Pyrococcus horikoshii than the broadly applied Escherichia coli enzyme. Moreover, we investigated DNA, RNA and modified RNA adapters for their suitability in RtcB ligation reactions. We used the optimized RtcB-mediated ligation to produce RNA-seq libraries and captured a spiked 3ʹ twister ribozyme fragment from E. coli total RNA. This RNA-seq-based method is applicable to detect ribozyme fragments as well as other cellular RNAs with 5ʹ OH termini from total RNA.


2018 ◽  
Vol 35 (15) ◽  
pp. 2654-2656 ◽  
Author(s):  
Guoli Ji ◽  
Wenbin Ye ◽  
Yaru Su ◽  
Moliang Chen ◽  
Guangzao Huang ◽  
...  

Abstract Summary Alternative splicing (AS) is a well-established mechanism for increasing transcriptome and proteome diversity, however, detecting AS events and distinguishing among AS types in organisms without available reference genomes remains challenging. We developed a de novo approach called AStrap for AS analysis without using a reference genome. AStrap identifies AS events by extensive pair-wise alignments of transcript sequences and predicts AS types by a machine-learning model integrating more than 500 assembled features. We evaluated AStrap using collected AS events from reference genomes of rice and human as well as single-molecule real-time sequencing data from Amborella trichopoda. Results show that AStrap can identify much more AS events with comparable or higher accuracy than the competing method. AStrap also possesses a unique feature of predicting AS types, which achieves an overall accuracy of ∼0.87 for different species. Extensive evaluation of AStrap using different parameters, sample sizes and machine-learning models on different species also demonstrates the robustness and flexibility of AStrap. AStrap could be a valuable addition to the community for the study of AS in non-model organisms with limited genetic resources. Availability and implementation AStrap is available for download at https://github.com/BMILAB/AStrap. Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Vol 1 ◽  
pp. e33 ◽  
Author(s):  
Elisha D. Roberson

CRISPR/Cas9 is emerging as one of the most-used methods of genome modification in organisms ranging from bacteria to human cells. However, the efficiency of editing varies tremendously site-to-site. A recent report identified a novel motif, called the 3′GG motif, which substantially increases the efficiency of editing at all sites tested inC. elegans. Furthermore, they highlighted that previously published gRNAs with high editing efficiency also had this motif. I designed a Python command-line tool, ngg2, to identify 3′GG gRNA sites from indexed FASTA files. As a proof-of-concept, I screened for these motifs in six model genomes:Saccharomyces cerevisiae,Caenorhabditis elegans,Drosophila melanogaster,Danio rerio,Mus musculus, andHomo sapiens. I also scanned the genomes of pig (Sus scrofa) and African elephant (Loxodonta africana) to demonstrate the utility in non-model organisms. I identified more than 60 million single match 3′GG motifs in these genomes. Greater than 61% of all protein coding genes in the reference genomes had at least one unique 3′GG gRNA site overlapping an exon. In particular, more than 96% of mouse and 93% of human protein coding genes have at least one unique, overlapping 3′GG gRNA. These identified sites can be used as a starting point in gRNA selection, and the ngg2 tool provides an important ability to identify 3′GG editing sites in any species with an available genome sequence.


2015 ◽  
Author(s):  
Farzana Rahman ◽  
Mehedi Hassan ◽  
Alona Kryshchenko ◽  
Inna Dubchak ◽  
Tatiana V Tatarinova ◽  
...  

In the last decade a number of algorithms and associated software were developed to align next generation sequencing (NGS) reads to relevant reference genomes. The results of these programs may vary significantly, especially when the NGS reads are contain mutations not found in the reference genome. Yet there is no standard way to compare these programs and assess their biological relevance. We propose a benchmark to assess accuracy of the short reads mapping based on the pre-computed global alignment of closely related genome sequences. In this paper we outline the method and also present a short report of an experiment performed on five popular alignment tools .


mSystems ◽  
2019 ◽  
Vol 4 (1) ◽  
pp. e00010-19
Author(s):  
Sigal Leviatan ◽  
Eran Segal

ABSTRACT Shotgun sequencing of samples taken from the human microbiome often reveals only partial mapping of the sequenced metagenomic reads to existing reference genomes. Such partial mappability indicates that many genomes are missing in our reference genome set. This is particularly true for non-Western populations and for samples that do not originate from the gut. Pasolli et al. (E. Pasolli, F. Asnicar, S. Manara, M. Zolfo, et al., Cell, 2019, https://doi.org/10.1016/j.cell.2019.01.001) perform a grand effort to expand the reference set, and to better classify its members, revealing a wider pangenome of existing species as well as identifying new species of previously unknown taxonomic branches.


Sign in / Sign up

Export Citation Format

Share Document