scholarly journals Mining bacterial NGS data vastly expands the complete genomes of temperate phages

2021 ◽  
Author(s):  
Xianglilan Zhang ◽  
Ruohan Wang ◽  
Xiangcheng Xie ◽  
Yunjia Hu ◽  
Jianping Wang ◽  
...  

Temperate phages (active prophages induced from bacteria) help control pathogenicity, modulate community structure, and maintain gut homeostasis. Complete phage genome sequences are indispensable for understanding phage biology. Traditional plaque techniques are inapplicable to temperate phages due to the lysogenicity of these phages, which curb the identification and characterization of temperate phages. Existing in silico tools for prophage prediction usually fail to detect accurate and complete temperate phage genomes. In this study, by a novel computational method mining both the integrated active prophages and their spontaneously induced forms (temperate phages), we obtained 192,326 complete temperate phage genomes from bacterial next-generation sequencing (NGS) data, hence expanded the existing number of complete temperate phage genomes by more than 100-fold. The reliability of our method was validated by wet-lab experiments. The experiments demonstrated that our method can accurately determine the complete genome sequences of the temperate phages, with exact flanking sites (attP and attB sites), outperforming other state-of-the-art prophage prediction methods. Our analysis indicates that temperate phages are likely to function in the evolution of microbes by 1) cross-infecting different bacterial host species; 2) transferring antibiotic resistance and virulence genes; and 3) interacting with hosts through restriction-modification and CRISPR/anti-CRISPR systems. This work provides a comprehensive complete temperate phage genome database and relevant information, which can serve as a valuable resource for phage research.

2019 ◽  
Author(s):  
Angela McGaughran

Abstract Background Next generation sequencing (NGS) can recover DNA data from valuable extant and extinct museum specimens. However, archived or preserved DNA is difficult to sequence because of its fragmented, damaged nature, such that the most successful NGS methods for preserved specimens remain sub-optimal. Improving wet-lab protocols and determining the effects of sample age on NGS library quality are therefore of vital importance. Here, I examine the relationship between sample age and various indicators of library quality following targeted NGS sequencing of ~1,300 loci using 271 samples of pinned moth specimens ( Helicoverpa armigera ) ranging in age from 4 to 116 years . Results I find that older samples have lower DNA concentrations following extraction and thus require a higher number of indexing PCR cycles during library preparation. When sequenced reads are aligned to a reference genome or to only the targeted region, older samples have a lower number of sequenced and mapped reads, lower mean coverage, and lower estimated library sizes, while the percentage of adapters in sequenced reads increases significantly as samples become older. Older samples also show the poorest capture success, with lower enrichment and a higher improved coverage anticipated from further sequencing. Conclusions Sample age has significant, measurable impacts on the quality of NGS data following targeted enrichment. However, incorporating a uracil-removing enzyme into the blunt end-repair step during library preparation could help to remove and repair DNA damage, and using a method that prevents adapter-dimer formation may result in improved data yields.


2021 ◽  
Vol 948 (1) ◽  
pp. 012082
Author(s):  
Mahat Magandhi ◽  
Sobir ◽  
Yudiwanti W.E. Kusumo ◽  
Sudarmono ◽  
Deden Derajat Matra

Abstract Durian Kura-kura (Durio testudinarius Becc.) belongs to the Malvaceae family and is an endemic species of Borneo. Recently, genomic-based next-generation sequencing (NGS) approaches have been carried out for germplasm conservation and plant breeding programs. The NGS technologies allow plant genomes to be sequenced quickly and inexpensively and enable the efficient development of SSR markers through the in-silico approaches. This study aimed to develop and characterize simple sequence repeats (SSRs) from the assembled genome. The 1203929 scaffolds of the assembled genome were produced from the Ray assembler. The SSRs were identified and extracted using the MISA program produced 4315 sequences containing SSRs. The six motif repeats of SSRs were identified; consist of 431 sequences of dinucleotide (the most motif is AT), 3257 sequences of trinucleotide (the most motif is TTA), 516 sequences of tetranucleotide (the most motif is AAAT), 89 sequences of pentanucleotide (the most motif is ATTTT), 18 sequences of hexanucleotide and four sequences of heptanucleotide. The new SSRs markers will be used in further studies of genetic population of D. testudinarius and plant breeding programs.


2016 ◽  
Author(s):  
Ariya Shajii ◽  
Deniz Yorukoglu ◽  
Y. William Yu ◽  
Bonnie Berger

AbstractMotivationAs the volume of next-generation sequencing (NGS) data increases, faster algorithms become necessary. Although speeding up individual components of a sequence analysis pipeline (e.g. read mapping) can reduce the computational cost of analysis, such approaches do not take full advantage of the particulars of a given problem. One problem of great interest, genotyping a known set of variants (e.g. dbSNP or Affymetrix SNPs), is important for characterization of known genetic traits and causative disease variants within an individual, as well as the initial stage of many ancestral and population genomic pipelines (e.g. GWAS).ResultsWe introduce LAVA (Lightweight Assignment of Variant Alleles), an NGS-based genotyping algorithm for a given set of SNP loci, which takes advantage of the fact that approximate matching of mid-size k-mers (with k = 32) can typically uniquely identify loci in the human genome without full read alignment. LAVA accurately calls the vast majority of SNPs in dbSNP and Affymetrix’s Genome-Wide Human SNP Array 6.0 up to about an order of magnitude faster than standard NGS genotyping pipelines. For Affymetrix SNPs, LAVA has significantly higher SNP calling accuracy than existing pipelines while using as low as ~5GB of RAM. As such, LAVA represents a scalable computational method for population-level genotyping studies as well as a flexible NGS-based replacement for SNP arrays.AvailabilityLAVA software is available at http://[email protected] informationSupplementary data are available at Bioinformatics online.


Author(s):  
Sabrina Sprotte ◽  
Erik Brinks ◽  
Natalia Wagner ◽  
Andrew M. Kropinski ◽  
Horst Neve ◽  
...  

AbstractThe complete genome sequence of the virulent bacteriophage PMBT3, isolated on the proteolytic Pseudomonas grimontii strain MBTL2-21, showed no significant similarity to other known phage genome sequences, making this phage the first reported to infect a strain of P. grimontii. Electron microscopy revealed PMBT3 to be a member of the family Siphoviridae, with notably long and flexible whiskers. The linear, double-stranded genome of 87,196 bp has a mol% G+C content of 60.4 and contains 116 predicted protein-encoding genes. A putative tellurite resistance (terB) gene, originally reported to occur in the genome of a bacterium, was detected in the genome of phage PMBT3.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 136
Author(s):  
Sandra Parenti ◽  
Claudio Rabacchi ◽  
Marco Marino ◽  
Elena Tenedini ◽  
Lucia Artuso ◽  
...  

Next-generation sequencing (NGS)-based cancer risk screening with multigene panels has become the most successful method for programming cancer prevention strategies. ATM germ-line heterozygosity has been described to increase tumor susceptibility. In particular, families carrying heterozygous germ-line variants of ATM gene have a 5- to 9-fold risk of developing breast cancer. Recent studies identified ATM as the second most mutated gene after CHEK2 in BRCA-negative patients. Nowadays, more than 170 missense variants and several truncating mutations have been identified in ATM gene. Here, we present the molecular characterization of a new ATM deletion, identified thanks to the CNV algorithm implemented in the NGS analysis pipeline. An automated workflow implementing the SOPHiA Genetics’ Hereditary Cancer Solution (HCS) protocol was used to generate NGS libraries that were sequenced on Illumina MiSeq Platform. NGS data analysis allowed us to identify a new inactivating deletion of exons 19–27 of ATM gene. The deletion was characterized both at the DNA and RNA level.


Cells ◽  
2021 ◽  
Vol 10 (2) ◽  
pp. 416
Author(s):  
Lorena Landuzzi ◽  
Maria Cristina Manara ◽  
Pier-Luigi Lollini ◽  
Katia Scotlandi

Osteosarcoma (OS) is a rare malignant primary tumor of mesenchymal origin affecting bone. It is characterized by a complex genotype, mainly due to the high frequency of chromothripsis, which leads to multiple somatic copy number alterations and structural rearrangements. Any effort to design genome-driven therapies must therefore consider such high inter- and intra-tumor heterogeneity. Therefore, many laboratories and international networks are developing and sharing OS patient-derived xenografts (OS PDX) to broaden the availability of models that reproduce OS complex clinical heterogeneity. OS PDXs, and new cell lines derived from PDXs, faithfully preserve tumor heterogeneity, genetic, and epigenetic features and are thus valuable tools for predicting drug responses. Here, we review recent achievements concerning OS PDXs, summarizing the methods used to obtain ectopic and orthotopic xenografts and to fully characterize these models. The availability of OS PDXs across the many international PDX platforms and their possible use in PDX clinical trials are also described. We recommend the coupling of next-generation sequencing (NGS) data analysis with functional studies in OS PDXs, as well as the setup of OS PDX clinical trials and co-clinical trials, to enhance the predictive power of experimental evidence and to accelerate the clinical translation of effective genome-guided therapies for this aggressive disease.


2020 ◽  
Vol 22 (Supplement_3) ◽  
pp. iii348-iii348
Author(s):  
Maria Ejmont ◽  
Małgorzata Rydzanicz ◽  
Wiesława Grajkowska ◽  
Marta Perek-Polnik ◽  
Agnieszka Sowińska ◽  
...  

Abstract INTRODUCTION Glioblastoma (GBM) remains one of the biggest therapeutic challenges in neuro-oncology. In spite of multimodal treatment approaches the prognosis of GBM is extremely poor, median survival is estimated about 12–16 months. Although GBM is one of the most common and malignant primary brain tumors, pediatric glioblastoma, including congenital is a very rare tumor, with an incidence of about 1.1–3.4 per million live births. Moreover, the mode of presentation, behavior, response to therapy and molecular background of pediatric glioblastomas differs from adult type of GBM. Until now, about ten patients with congenital glioblastoma have been described and in none of them germline markers were examined. Here we report two patients with GBM, one with congenital tumor with germline mutations in MSH2 gene. METHODS Targeted Next-Generation Sequencing (NGS) of the probands DNA extracted from leucocytes was performed using the TruSight One sequencing panel on an Illumina HiSeq 1500. Applied gene panel investigated the coding sequence and splice sites of 4813 genes associated with known disease phenotypes. The NGS data were analyzed using an in-house procedure. Identified variants were validated by Sanger sequencing. RESULTS NGS analysis of patients constitutional DNA revealed know, pathogenic variants c.940C>T and c.942 + 3A>T in MSH2 gene (NM_000251.3) associated with MMR-dependent hereditary cancer syndromes. CONCLUSION Molecular analysis are heavily needed for better understanding of pediatric GBM etiology and new treatment modality implementation. Identification of this oncogenic driver may provide insight into the pathogenesis of GBM, including congenital cases. Funded by National Science Centre, Poland (2016/23/B/NZ2/03064 and 2016/21/B/NZ2/01785).


2018 ◽  
Vol 6 (13) ◽  
Author(s):  
My V. T. Phan ◽  
Claudia M. E. Schapendonk ◽  
Bas B. Oude Munnink ◽  
Marion P. G. Koopmans ◽  
Rik L. de Swart ◽  
...  

ABSTRACT Genetic characterization of wild-type measles virus (MV) strains is a critical component of measles surveillance and molecular epidemiology. We have obtained complete genome sequences of six MV strains belonging to different genotypes, using random-primed next generation sequencing.


2005 ◽  
Vol 387 (1) ◽  
pp. 271-280 ◽  
Author(s):  
Seonghun KIM ◽  
Sun Bok LEE

The extremely thermoacidophilic archaeon Sulfolobus solfataricus utilizes D-glucose as a sole carbon and energy source through the non-phosphorylated Entner–Doudoroff pathway. It has been suggested that this micro-organism metabolizes D-gluconate, the oxidized form of D-glucose, to pyruvate and D-glyceraldehyde by using two unique enzymes, D-gluconate dehydratase and 2-keto-3-deoxy-D-gluconate aldolase. In the present study, we report the purification and characterization of D-gluconate dehydratase from S. solfataricus, which catalyses the conversion of D-gluconate into 2-keto-3-deoxy-D-gluconate. D-Gluconate dehydratase was purified 400-fold from extracts of S. solfataricus by ammonium sulphate fractionation and chromatography on DEAE-Sepharose, Q-Sepharose, phenyl-Sepharose and Mono Q. The native protein showed a molecular mass of 350 kDa by gel filtration, whereas SDS/PAGE analysis provided a molecular mass of 44 kDa, indicating that D-gluconate dehydratase is an octameric protein. The enzyme showed maximal activity at temperatures between 80 and 90 °C and pH values between 6.5 and 7.5, and a half-life of 40 min at 100 °C. Bivalent metal ions such as Co2+, Mg2+, Mn2+ and Ni2+ activated, whereas EDTA inhibited the enzyme. A metal analysis of the purified protein revealed the presence of one Co2+ ion per enzyme monomer. Of the 22 aldonic acids tested, only D-gluconate served as a substrate, with Km=0.45 mM and Vmax=0.15 unit/mg of enzyme. From N-terminal sequences of the purified enzyme, it was found that the gene product of SSO3198 in the S. solfataricus genome database corresponded to D-gluconate dehydratase (gnaD). We also found that the D-gluconate dehydratase of S. solfataricus is a phosphoprotein and that its catalytic activity is regulated by a phosphorylation–dephosphorylation mechanism. This is the first report on biochemical and genetic characterization of D-gluconate dehydratase involved in the non-phosphorylated Entner–Doudoroff pathway.


Sign in / Sign up

Export Citation Format

Share Document