scholarly journals Unsupervised Learning Approach for Comparing Multiple Transposon Insertion Sequencing Studies

mSphere ◽  
2019 ◽  
Vol 4 (1) ◽  
Author(s):  
Troy P. Hubbard ◽  
Jonathan D. D’Gama ◽  
Gabriel Billings ◽  
Brigid M. Davis ◽  
Matthew K. Waldor

ABSTRACT Transposon insertion sequencing (TIS) is a widely used technique for conducting genome-scale forward genetic screens in bacteria. However, few methods enable comparison of TIS data across multiple replicates of a screen or across independent screens, including screens performed in different organisms. Here, we introduce a post hoc analytic framework, comparative TIS (CompTIS), which utilizes unsupervised learning to enable meta-analysis of multiple TIS data sets. CompTIS first implements screen-level principal-component analysis (PCA) and clustering to identify variation between the TIS screens. This initial screen-level analysis facilitates the selection of related screens for additional analyses, reveals the relatedness of complex environments based on growth phenotypes measured by TIS, and provides a useful quality control step. Subsequently, PCA is performed on genes to identify loci whose corresponding mutants lead to concordant/discordant phenotypes across all or in a subset of screens. We used CompTIS to analyze published intestinal colonization TIS data sets from two vibrio species. Gene-level analyses identified both pan-vibrio genes required for intestinal colonization and conserved genes that displayed species-specific requirements. CompTIS is applicable to virtually any combination of TIS screens and can be implemented without regard to either the number of screens or the methods used for upstream data analysis. IMPORTANCE Forward genetic screens are powerful tools for functional genomics. The comparison of similar forward genetic screens performed in different organisms enables the identification of genes with similar or different phenotypes across organisms. Transposon insertion sequencing is a widely used method for conducting genome-scale forward genetic screens in bacteria, yet few bioinformatic approaches have been developed to compare the results of screen replicates and different screens conducted across species or strains. Here, we used principal-component analysis (PCA) and hierarchical clustering, two unsupervised learning approaches, to analyze the relatedness of multiple in vivo screens of pathogenic vibrios. This analytic framework reveals both shared pan-vibrio requirements for intestinal colonization and strain-specific dependencies. Our findings suggest that PCA-based analytics will be a straightforward widely applicable approach for comparing diverse transposon insertion sequencing screens.

2019 ◽  
Vol 15 (8) ◽  
pp. e1007652 ◽  
Author(s):  
Alyson R. Warr ◽  
Troy P. Hubbard ◽  
Diana Munera ◽  
Carlos J. Blondel ◽  
Pia Abel zur Wiesch ◽  
...  

2019 ◽  
Author(s):  
David A. Baltrus ◽  
John Medlen ◽  
Meara Clark

AbstractTransposon mutagenesis is a widely used tool for carrying out forward genetic screens across systems, but in some cases it can be difficult to identify transposon insertion points after successful phenotypic screens. As an alternative to traditional methods, we report on the efficacy of using an Oxford Nanopore’s MinION to identify transposon insertions through whole genome sequencing. We also report experiments using CRISPR-Cas to selectively target regions of the genome where a transposon has integrated. Our experiments provide a framework for understanding the efficiency of such techniques for carrying out forward genetic screens and point towards the ability to use CRISPR-based sequence capture to identify the insertion of particular regions of DNA across all genomes, which may enable Tn-Seq experiments using Nanopore based sequencing.


2019 ◽  
Author(s):  
Alyson R. Warr ◽  
Troy P. Hubbard ◽  
Diana Munera ◽  
Carlos J. Blondel ◽  
Pia Abel zur Wiesch ◽  
...  

AbstractEnterohemorrhagicEscherichia coliO157:H7 (EHEC) is an important food-borne pathogen that colonizes the colon. Transposon-insertion sequencing (TIS) was used to identify genes required for EHEC and commensalE. coliK-12 growth in vitro and for EHEC growth in vivo in the infant rabbit colon. Surprisingly, many conserved loci contribute to EHEC’s but not to K-12’s growth in vitro, suggesting that gene acquisition during EHEC evolution has heightened the pathogen’s reliance on certain metabolic processes that are dispensable for K-12. There was a restrictive bottleneck for EHEC colonization of the rabbit colon, which complicated identification of EHEC genes facilitating growth in vivo. Both a refined version of an existing analytic framework as well as PCA-based analysis were used to compensate for the effects of the infection bottleneck. These analyses confirmed that the EHEC LEE-encoded type III secretion apparatus is required for growth in vivo and revealed that only a few effectors are critical for in vivo fitness. Numerous mutants not previously associated with EHEC survival/growth in vivo also appeared attenuated in vivo, and a subset of these putative in vivo fitness factors were validated. Some were found to contribute to efficient type-three secretion while others, includingtatABC, oxyR, envC, acrAB, andcvpA, promote EHEC resistance to host-derived stresses encountered in vivo.cvpA, which is also required for intestinal growth of several other enteric pathogens, proved to be required for EHEC,Vibrio choleraeandVibrio parahaemolyticusresistance to the bile salt deoxycholate. Collectively, our findings provide a comprehensive framework for understanding EHEC growth in the intestine.Author SummaryEnterohemorrhagicE. coli(EHEC) are important food-borne pathogens that infect the colon. We created a highly saturated EHEC transposon library and used transposon insertion sequencing to identify the genes required for EHEC growth in vitro and in vivo in the infant rabbit colon. We found that there is a large infection bottleneck in the rabbit model of intestinal colonization, and refined two analytic approaches to facilitate rigorous identification of new EHEC genes that promote fitness in vivo. Besides the known type III secretion system, more than 200 additional genes were found to contribute to EHEC survival and/or growth within the intestine. The requirement for some of these new in vivo fitness factors was confirmed, and their contributions to infection were investigated. This set of genes should be of considerable value for future studies elucidating the processes that enable the pathogen to proliferate in vivo and for design of new therapeutics.


2016 ◽  
Author(s):  
Julia Joung ◽  
Silvana Konermann ◽  
Jonathan S. Gootenberg ◽  
Omar O. Abudayyeh ◽  
Randall J. Platt ◽  
...  

Forward genetic screens are powerful tools for the unbiased discovery and functional characterization of specific genetic elements associated with a phenotype of interest. Recently, the RNA-guided endonuclease Cas9 from the microbial immune system CRISPR (clustered regularly interspaced short palindromic repeats) has been adapted for genome-scale screening by combining Cas9 with guide RNA libraries. Here we describe a protocol for genome-scale knockout and transcriptional activation screening using the CRISPR-Cas9 system. Custom-or ready-made guide RNA libraries are constructed and packaged into lentivirus for delivery into cells for screening. As each screen is unique, we provide guidelines for determining screening parameters and maintaining sufficient coverage. To validate candidate genes identified from the screen, we further describe strategies for confirming the screening phenotype as well as genetic perturbation through analysis of indel rate and transcriptional activation. Beginning with library design, a genome-scale screen can be completed in 6-10 weeks followed by 3-4 weeks of validation.


mBio ◽  
2016 ◽  
Vol 7 (5) ◽  
Author(s):  
Brian J. Akerley

ABSTRACT The property of transposons to randomly insert into target DNA has long been exploited for generalized mutagenesis and forward genetic screens. Newer applications that monitor the relative abundance of each transposon insertion in large libraries of mutants can be used to evaluate the roles in cellular fitness of all genes of an organism, provided that transposition is in fact random across all genes. In a recent article, Kimura and colleagues identified an important exception to the latter assumption [S. Kimura, T. P. Hubbard, B. M. Davis, M. K. Waldor, mBio 7(4):e01351-16, 2016, doi:10.1128/mBio.01351-16]. They provide evidence that the Mariner transposon exhibits locus-specific site preferences in the presence of the histone-like nucleoid structuring protein H-NS. This effect was shown to bias results for important virulence loci in Vibrio cholerae and to result in misidentification of genes involved in growth in vitro . Fortunately, the bulk of this bacterium’s genome was unaffected by this bias, and recognizing the H-NS effect allows filtering to improve the accuracy of the results.


Author(s):  
Hyeuk Kim

Unsupervised learning in machine learning divides data into several groups. The observations in the same group have similar characteristics and the observations in the different groups have the different characteristics. In the paper, we classify data by partitioning around medoids which have some advantages over the k-means clustering. We apply it to baseball players in Korea Baseball League. We also apply the principal component analysis to data and draw the graph using two components for axis. We interpret the meaning of the clustering graphically through the procedure. The combination of the partitioning around medoids and the principal component analysis can be used to any other data and the approach makes us to figure out the characteristics easily.


2020 ◽  
Vol 15 ◽  
Author(s):  
Shuwen Zhang ◽  
Qiang Su ◽  
Qin Chen

Abstract: Major animal diseases pose a great threat to animal husbandry and human beings. With the deepening of globalization and the abundance of data resources, the prediction and analysis of animal diseases by using big data are becoming more and more important. The focus of machine learning is to make computers learn how to learn from data and use the learned experience to analyze and predict. Firstly, this paper introduces the animal epidemic situation and machine learning. Then it briefly introduces the application of machine learning in animal disease analysis and prediction. Machine learning is mainly divided into supervised learning and unsupervised learning. Supervised learning includes support vector machines, naive bayes, decision trees, random forests, logistic regression, artificial neural networks, deep learning, and AdaBoost. Unsupervised learning has maximum expectation algorithm, principal component analysis hierarchical clustering algorithm and maxent. Through the discussion of this paper, people have a clearer concept of machine learning and understand its application prospect in animal diseases.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Delphine Larivière ◽  
Laura Wickham ◽  
Kenneth Keiler ◽  
Anton Nekrutenko ◽  

Abstract Background Significant progress has been made in advancing and standardizing tools for human genomic and biomedical research. Yet, the field of next-generation sequencing (NGS) analysis for microorganisms (including multiple pathogens) remains fragmented, lacks accessible and reusable tools, is hindered by local computational resource limitations, and does not offer widely accepted standards. One such “problem areas” is the analysis of Transposon Insertion Sequencing (TIS) data. TIS allows probing of almost the entire genome of a microorganism by introducing random insertions of transposon-derived constructs. The impact of the insertions on the survival and growth under specific conditions provides precise information about genes affecting specific phenotypic characteristics. A wide array of tools has been developed to analyze TIS data. Among the variety of options available, it is often difficult to identify which one can provide a reliable and reproducible analysis. Results Here we sought to understand the challenges and propose reliable practices for the analysis of TIS experiments. Using data from two recent TIS studies, we have developed a series of workflows that include multiple tools for data de-multiplexing, promoter sequence identification, transposon flank alignment, and read count repartition across the genome. Particular attention was paid to quality control procedures, such as determining the optimal tool parameters for the analysis and removal of contamination. Conclusions Our work provides an assessment of the currently available tools for TIS data analysis. It offers ready to use workflows that can be invoked by anyone in the world using our public Galaxy platform (https://usegalaxy.org). To lower the entry barriers, we have also developed interactive tutorials explaining details of TIS data analysis procedures at https://bit.ly/gxy-tis.


Sign in / Sign up

Export Citation Format

Share Document