scholarly journals iMapper: a web application for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes

2008 ◽  
Vol 24 (24) ◽  
pp. 2923-2925 ◽  
Author(s):  
Jun Kong ◽  
Fei Zhu ◽  
Jim Stalker ◽  
David J. Adams
2017 ◽  
Author(s):  
James Hadfield ◽  
Colin Megill ◽  
Sidney M. Bell ◽  
John Huddleston ◽  
Barney Potter ◽  
...  

AbstractSummaryUnderstanding the spread and evolution of pathogens is important for effective public health measures and surveillance. Nextstrain consists of a database of viral genomes, a bioinformatics pipeline for phylodynamics analysis, and an interactive visualisation platform. Together these present a real-time view into the evolution and spread of a range of viral pathogens of high public health importance. The visualization integrates sequence data with other data types such as geographic information, serology, or host species. Nextstrain compiles our current understanding into a single accessible location, publicly available for use by health professionals, epidemiologists, virologists and the public alike.Availability and implementationAll code (predominantly JavaScript and Python) is freely available from github.com/nextstrain and the web-application is available at nextstrain.org.


2020 ◽  
Vol 58 (6) ◽  
Author(s):  
Stefan Moritz Neuenschwander ◽  
Miguel Angel Terrazos Miani ◽  
Heiko Amlang ◽  
Carmen Perroulaz ◽  
Pascal Bittel ◽  
...  

ABSTRACT Amplicon sequencing of the 16S rRNA gene is commonly used for the identification of bacterial isolates in diagnostic laboratories and mostly relies on the Sanger sequencing method. The latter, however, suffers from a number of limitations, with the most significant being the inability to resolve mixed amplicons when closely related species are coamplified from a mixed culture. This often leads to either increased turnaround time or absence of usable sequence data. Short-read next-generation sequencing (NGS) technologies could solve the mixed amplicon issue but would lack both cost efficiency at low throughput and fast turnaround times. Nanopore sequencing developed by Oxford Nanopore Technologies (ONT) could solve those issues by enabling a flexible number of samples per run and an adjustable sequencing time. Here, we report on the development of a standardized laboratory workflow combined with a fully automated analysis pipeline LORCAN (long read consensus analysis), which together provide a sample-to-report solution for amplicon sequencing and taxonomic identification of the resulting consensus sequences. Validation of the approach was conducted on a panel of reference strains and on clinical samples consisting of single or mixed rRNA amplicons associated with various bacterial genera by direct comparison to the corresponding Sanger sequences. Additionally, simulated read and amplicon mixtures were used to assess LORCAN’s behavior when dealing with samples with known cross-contamination levels. We demonstrate that by combining ONT amplicon sequencing results with LORCAN, the accuracy of Sanger sequencing can be closely matched (>99.6% sequence identity) and that mixed samples can be resolved at the single-base resolution level. The presented approach has the potential to significantly improve the flexibility, reliability, and availability of amplicon sequencing in diagnostic settings.


2020 ◽  
Vol 8 (5) ◽  
pp. 669
Author(s):  
Daniela Becker ◽  
Denny Popp ◽  
Hauke Harms ◽  
Florian Centler

Metagenomics analysis revealing the composition and functional repertoire of complex microbial communities typically relies on large amounts of sequence data. Numerous analysis strategies and computational tools are available for their analysis. Fully integrated automated analysis pipelines such as MG-RAST or MEGAN6 are user-friendly but not designed for integrating specific knowledge on the biological system under study. In order to facilitate the consideration of such knowledge, we introduce a modular, adaptable analysis pipeline combining existing tools. We applied the novel pipeline to simulated mock data sets focusing on anaerobic digestion microbiomes and compare results to those obtained with established automated analysis pipelines. We find that the analysis strategy and choice of tools and parameters have a strong effect on the inferred taxonomic community composition, but not on the inferred functional profile. By including prior knowledge, computational costs can be decreased while improving result accuracy. While automated off-the-shelf analysis pipelines are easy to apply and require no knowledge on the microbial system under study, custom-made pipelines require more preparation time and bioinformatics expertise. This extra effort is minimized by our modular, flexible, custom-made pipeline, which can be adapted to different scenarios and can take available knowledge on the microbial system under study into account.


1986 ◽  
Vol 6 (2) ◽  
pp. 380-392 ◽  
Author(s):  
G L Shen-Ong ◽  
H C Morse ◽  
M Potter ◽  
J F Mushinski

Two modes of disruption of the protooncogene c-myb by viral insertional mutagenesis in mouse myeloid tumor cells are described. The first mode was found in six tumors in which a Moloney murine leukemia virus component had inserted in the same transcriptional orientation upstream of the 5'-most exon with v-myb homology (vE1). cDNA sequence data indicate the presence of a truncated c-myb mRNA that is initiated in the upstream 5' long terminal repeat of the integrated provirus and processed via a cryptic splice donor sequence in the gag region to the splice acceptor site in vE1 of the c-myb gene, thus removing the remaining downstream viral and myb intronic sequences. Unlike most gag-onc transcripts, the gag and myb sequences in the hybrid transcript were not in the same reading frame. It is presumed that the gag sequence provides a cryptic translation initiation site for the novel amino-truncated c-myb protein. The second mode of disruption was by downstream virus insertion at the 3' side of the c-myb, which results in the synthesis of a small (approximately 2 kilobase) myb transcript. The 5' long terminal repeat of the inserted provirus provides a TGA termination codon that results in the elimination of 240 normal c-myb amino acid residues from the carboxyl terminus of the tumor-specific myb protein. These results suggest that truncated myb proteins play a role in neoplastic transformation of myeloid cells.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Takuma Nishimaki ◽  
Keiko Sato

Abstract Background Phylogenetic analysis strongly depends on evolutionary models. Most evolutionary models for estimating genetic differences and phylogenetic relationships do not treat gap sites in the alignment of sequences. Appropriately incorporating evolutionary information of sites containing insertions and deletions into genetic difference measures will be improve the accuracy of phylogenetic estimates. Results We introduced a new measure for estimating genetic differences, and presented P*R*O*P, a web application for performing phylogenetic analysis based on genetic difference considering the effect of gaps. As an example of phylogenetic analysis using P*R*O*P, we used complete p53 amino acid sequences of 31 organisms and illustrated that the genetic differences with and without information on sites containing gaps result in trees with different topologies. Conclusions P*R*O*P is available at https://www.rs.tus.ac.jp/bioinformatics/prop and the user can perform phylogenetic analysis by uploading sequence data on the website. The most distinctive feature of P*R*O*P is its genetic difference that is estimated without eliminating gap sites for alignment sequences, which helps users detect meaningful difference in an evolutionary process. The source code is available in GitHub: https://github.com/TUS-Satolab/PROP.


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 230
Author(s):  
Mauricio Oberti ◽  
Iosif Vaisman

Intrinsically disordered proteins or intrinsically disordered regions (IDR) are segments within a protein chain lacking a stable three-dimensional structure under normal physiological conditions. Accurate prediction of IDRs is challenging due to their genome wide occurrence and low ratio of disordered residues, making them a difficult target for traditional classification techniques. Existing computational methods mostly rely on sequence profiles to improve accuracy, which is time consuming and computationally expensive. The shiny-pred application is an ab initio sequence-only disorder predictor implemented in R/Shiny language. In order to make predictions, it uses convolutional neural network models, trained using PDB sequence data. It can be installed on any operating system on which R can be installed and run locally. A public version of the web application can be accessed at https://gmu-binf.shinyapps.io/shiny-pred


2018 ◽  
Author(s):  
Joshua B Singer ◽  
Emma C Thomson ◽  
John McLauchlan ◽  
Joseph Hughes ◽  
Robert J Gifford

AbstractBackgroundVirus genome sequences, generated in ever-higher volumes, can provide new scientific insights and inform our responses to epidemics and outbreaks. To facilitate interpretation, such data must be organised and processed within scalable computing resources that encapsulate virology expertise. GLUE (Genes Linked by Underlying Evolution) is a data-centric bioinformatics environment for building such resources. The GLUE core data schema organises sequence data along evolutionary lines, capturing not only nucleotide data but associated items such as alignments, genotype definitions, genome annotations and motifs. Its flexible design emphasises applicability to different viruses and to diverse needs within research, clinical or public health contexts.ResultsHCV-GLUE is a case study GLUE resource for hepatitis C virus (HCV). It includes an interactive public web application providing sequence analysis in the form of a maximum-likelihood-based genotyping method, antiviral resistance detection and graphical sequence visualisation. HCV sequence data from GenBank is categorised and stored in a large-scale sequence alignment which is accessible via web-based queries. Whereas this web resource provides a range of basic functionality, the underlying GLUE project can also be downloaded and extended by bioinformaticians addressing more advanced questions.ConclusionGLUE can be used to rapidly develop virus sequence data resources with public health, research and clinical applications. This streamlined approach, with its focus on reuse, will help realise the full value of virus sequence data.


2021 ◽  
Author(s):  
Matthias Lange ◽  
Blaise Alako ◽  
Guy Cochrane ◽  
Mehmood Ghaffar ◽  
Martin Mascher ◽  
...  

Background: Linking nucleotide sequence data (NSD) to scientific publication citations can enhance understanding of NSDs provenance, scientific use, and re-use in the community. By connecting publications with NSD records, NSD geographical provenance information, and author geographical information, it becomes possible to assess the contribution of NSD to infer trends in scientific knowledge gain at the global level. Findings: For this data note, we extracted and linked records from the European Nucleotide Archive to citations in open-access publications aggregated at Europe PubMed Central. A total of 8,464,292 ENA accessions with geographical provenance information were associated with publications. We conducted a data quality review to uncover potential issues in publication citation information extraction and author affiliation tagging and developed and implemented best-practice recommendations for citation extraction. Flat data tables and an data warehouse with an interactive web application were constructed to enable ad hoc exploration of NSD use and summary statistics. Conclusions: The extraction and linking of NSD with associated publication citations enables transparency. The quality review contributes to enhanced text mining methods for identifier extraction and use. Furthermore, the global provision and use of NSD enables scientists around the world to join literature and sequence databases in a multidimensional fashion. As a concrete use case, statistics of country clusters were visualized with respect to NSD access in the context of discussions around digital sequence information under the United Nations Convention on Biological Diversity.


Genetics ◽  
1992 ◽  
Vol 131 (4) ◽  
pp. 939-956 ◽  
Author(s):  
M A Moreno ◽  
J Chen ◽  
I Greenblatt ◽  
S L Dellaporta

Abstract The tendency for Ac to transpose over short intervals has been utilized to develop insertional mutagenesis and fine structure genetic mapping strategies in maize. We recovered excisions of Ac from the P gene and insertions into nearby chromosomal sites. These closely linked Ac elements reinserted into the P gene, reconstituting over 250 unstable variegated alleles. Reconstituted alleles condition a variety of variegation patterns that reflect the position and orientation of Ac within the P gene. Molecular mapping and DNA sequence analyses have shown that reinsertion sites are dispersed throughout a 12.3-kb chromosomal region in the promoter, exons and introns of the P gene, but in some regions insertions sites were clustered in a nonrandom fashion. Transposition profiles and target site sequence data obtained from these studies have revealed several features of Ac transposition including its preference for certain target sites. These results clearly demonstrate the tendency of Ac to transpose to nearby sites in both proximal and distal directions from the donor site. With minor modifications, reconstitutional mutagenesis should be applicable to many Ac-induced mutations in maize and in other plant species and can possibly be extended to other eukaryotic transposon systems as well.


Author(s):  
Joshua Singer ◽  
Robert Gifford ◽  
Matthew Cotten ◽  
David Robertson

Summary CoV-GLUE is an online web application for the interpretation and analysis of SARS-CoV-2 virus genome sequences, with a focus on amino acid sequence variation. It is based on the GLUE data-centric bioinformatics environment and provides a browsable database of amino acid replacements and coding region indels that have been observed in sequences from the pandemic. Users may also analyse their own SARS-CoV-2 sequences by submitting them to the web application to receive an interactive report containing visualisations of phylogenetic classification and highlighting genomic variation of potentially high impact, for example linked to primer mismatches.Availability and implementation Available at http://cov-glue.cvr.gla.ac.uk. Implemented using GLUE, an open source framework for the development of virus sequence data resources. Contact [email protected]


Sign in / Sign up

Export Citation Format

Share Document