scholarly journals Contamination detection and microbiome exploration with GRIMER

2021 ◽  
Author(s):  
Vitor C. Piro ◽  
Bernhard Y. Renard

Exploring microbiome data is a time-consuming task that can be only partially automated due to the specific requirements and goals of each project. Visualizations and analysis platforms are crucial to better guide this step. Best practices in the field are constantly evolving and many pitfalls can lead to biased outcomes. Compositionality of data and sample contamination are two important points that should be carefully considered in early stages of microbiome studies. Detecting contamination can be a challenging task, especially in low-biomass samples or in studies lacking proper controls by design. However, external evidences and commonly identified contaminant taxa can be used to discover and mitigate contamination. We propose GRIMER, a tool that automates analysis, generates plots and runs external tools to create a portable dashboard integrating annotation, taxonomy and metadata. It unifies several sources of evidence towards contamination detection. GRIMER is independent of quantification methods and directly analyses contingency tables to create an interactive and offline report. GRIMER reports can be created in seconds and are accessible for non-specialists, providing an intuitive set of charts to explore data distribution among observations and samples and its connections with external sources. Further, we compiled an extensive list of common contaminants and possible external contaminant taxa reported in the literature and use it to annotate data. GRIMER is open-source and available at: https://gitlab.com/dacs-hpi/grimer

Author(s):  
Prashant Kale ◽  
Harbir Singh

Innovation is a critical to the success of large, diversified Indian business groups and this chapter explores the specific organizational mechanisms they have adopted to enable and foster innovation in their organizations. First, these groups provide internal markets for much needed capital and talent necessary for innovation to make up for sufficient lack of these institutions externally. In addition, they have pursued the following actions: (a) significantly upped their investments in R&D and innovation, (b) created internal leadership councils to oversee and promote innovation, (c) created an innovation culture that encourages and celebrates entrepreneurship, risk-taking, and tolerance for failure, (d) undertaken formal learning interventions to build the innovation capabilities of their managers, and (e) set-up formal units to in-source innovation from external sources. Indian companies are yet in the early stages of this journey and will have to sustain these practices to demonstrate durable success with innovation.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Charlie M. Carpenter ◽  
Daniel N. Frank ◽  
Kayla Williamson ◽  
Jaron Arbet ◽  
Brandie D. Wagner ◽  
...  

Abstract Background The drive to understand how microbial communities interact with their environments has inspired innovations across many fields. The data generated from sequence-based analyses of microbial communities typically are of high dimensionality and can involve multiple data tables consisting of taxonomic or functional gene/pathway counts. Merging multiple high dimensional tables with study-related metadata can be challenging. Existing microbiome pipelines available in R have created their own data structures to manage this problem. However, these data structures may be unfamiliar to analysts new to microbiome data or R and do not allow for deviations from internal workflows. Existing analysis tools also focus primarily on community-level analyses and exploratory visualizations, as opposed to analyses of individual taxa. Results We developed the R package “tidyMicro” to serve as a more complete microbiome analysis pipeline. This open source software provides all of the essential tools available in other popular packages (e.g., management of sequence count tables, standard exploratory visualizations, and diversity inference tools) supplemented with multiple options for regression modelling (e.g., negative binomial, beta binomial, and/or rank based testing) and novel visualizations to improve interpretability (e.g., Rocky Mountain plots, longitudinal ordination plots). This comprehensive pipeline for microbiome analysis also maintains data structures familiar to R users to improve analysts’ control over workflow. A complete vignette is provided to aid new users in analysis workflow. Conclusions tidyMicro provides a reliable alternative to popular microbiome analysis packages in R. We provide standard tools as well as novel extensions on standard analyses to improve interpretability results while maintaining object malleability to encourage open source collaboration. The simple examples and full workflow from the package are reproducible and applicable to external data sets.


Author(s):  
Jonathan M. Smith ◽  
Michael B. Greenwald ◽  
Sotiris Ioannidis ◽  
Angelos D. Keromytis ◽  
Ben Laurie ◽  
...  

This chapter reports on our experiences with POSSE, a project studying “Portable Open Source Security Elements” as part of the larger DARPA effort on Composable High Assurance Trusted Systems. We describe the organization created to manage POSSE and the significant acceleration in producing widely used secure software that has resulted. POSSE’s two main goals were, first, to increase security in open source systems and, second, to more broadly disseminate security knowledge, “best practices,” and working code that reflects these practices. POSSE achieved these goals through careful study of systems (“audit”) and starting from a well-positioned technology base (OpenBSD). We hope to illustrate the advantages of applying OpenBSD-style methodology to secure, open-source projects, and the pitfalls of melding multiple open-source efforts in a single project.


GigaScience ◽  
2019 ◽  
Vol 8 (9) ◽  
Author(s):  
Peter Georgeson ◽  
Anna Syme ◽  
Clare Sloggett ◽  
Jessica Chung ◽  
Harriet Dashnow ◽  
...  

Abstract Background Bioinformatics software tools are often created ad hoc, frequently by people without extensive training in software development. In particular, for beginners, the barrier to entry in bioinformatics software development is high, especially if they want to adopt good programming practices. Even experienced developers do not always follow best practices. This results in the proliferation of poorer-quality bioinformatics software, leading to limited scalability and inefficient use of resources; lack of reproducibility, usability, adaptability, and interoperability; and erroneous or inaccurate results. Findings We have developed Bionitio, a tool that automates the process of starting new bioinformatics software projects following recommended best practices. With a single command, the user can create a new well-structured project in 1 of 12 programming languages. The resulting software is functional, carrying out a prototypical bioinformatics task, and thus serves as both a working example and a template for building new tools. Key features include command-line argument parsing, error handling, progress logging, defined exit status values, a test suite, a version number, standardized building and packaging, user documentation, code documentation, a standard open source software license, software revision control, and containerization. Conclusions Bionitio serves as a learning aid for beginner-to-intermediate bioinformatics programmers and provides an excellent starting point for new projects. This helps developers adopt good programming practices from the beginning of a project and encourages high-quality tools to be developed more rapidly. This also benefits users because tools are more easily installed and consistent in their usage. Bionitio is released as open source software under the MIT License and is available at https://github.com/bionitio-team/bionitio.


Author(s):  
Mateusz Kuzak ◽  
Jen Harrow ◽  
Rafael C. Jimenez ◽  
Paula Andrea Martinez ◽  
Fotis E. Psomopoulos ◽  
...  

2018 ◽  
Vol 35 (3) ◽  
pp. 16-22 ◽  
Author(s):  
Bijan Kumar Roy ◽  
Subal Chandra Biswas ◽  
Parthasarathi Mukhopadhyay

Purpose This paper aims to provide an overview of the emergence of resource discovery systems and services along with their advantages and best practices including current landscapes. It reports the development of a resource discovery system by using the “VuFind” software and describes other technological tools, software, standards and protocols required for the development of the prototype. Design/methodology/approach This paper describes the process of integrating VuFind (resource discovery tool) with Koha (integrated library system), DSpace (repository software) and Apache Tika (as full-text extractor for full-text searching), etc. Findings The proposed model performs like other existing commercial and open source Web-scale resource discovery systems and is capable of harvesting resources from different subscribed or external sources replacing a library’s OPAC. Originality/value This discovery system is an important add-on to designing a one-stop access in place of the existing retrieval silos in libraries. This system is capable of indexing a variety of content within and beyond library collections. This work may help library professionals and administrators in designing their discovery system, as well as vendors to improve their products, to provide different library-friendly services.


Author(s):  
Taedong Yun ◽  
Helen Li ◽  
Pi-Chuan Chang ◽  
Michael F. Lin ◽  
Andrew Carroll ◽  
...  

AbstractPopulation-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready variants remains challenging. Here we introduce an open-source cohort variant-calling method using the highly-accurate caller DeepVariant and scalable merging tool GLnexus. We optimized callset quality based on benchmark samples and Mendelian consistency across many sample sizes and sequencing specifications, resulting in substantial quality improvements and cost savings over existing best practices. We further evaluated our pipeline in the 1000 Genomes Project (1KGP) samples, showing superior quality metrics and imputation performance. We publicly release the 1KGP callset to foster development of broad studies of genetic variation.


2021 ◽  
Author(s):  
Dominique Sydow ◽  
Jaime Rodríguez-Guerra ◽  
Talia B. Kimber ◽  
David Schaller ◽  
Corey J. Taylor ◽  
...  

Computational pipelines have become a crucial part of modern drug discovery campaigns. Setting up and maintaining such pipelines, however, can be challenging and time-consuming --- especially for novice scientists in this domain. TeachOpenCADD is a platform that aims to teach domain-specific skills and to provide pipeline templates as starting points for research projects. We offer Python-based solutions for common tasks in cheminformatics and structural bioinformatics in the form of Jupyter notebooks and based on open source resources only. Including the 12 newly released additions, TeachOpenCADD now contains 22 notebooks that each cover both theoretical background as well as hands-on programming. To promote reproducible and reusable research, we apply software best practices to our notebooks such as testing with an automated continuous integration and adhering to a more idiomatic Python style. The new TeachOpenCADD website is available at https://projects.volkamerlab.org/teachopencadd and all code is deposited on GitHub.


Author(s):  
Ricardo Javier Rademacher Mena

With the modification of the 50/50 rule by the Higher Education Reconciliation Act of 2005, the purely online university has become increasingly popular and thus so has the purely online science class. In this chapter, the author will use over a decade of teaching physics and math at traditional offline and pure online universities to compare the two. In the process, the author will uncover what techniques have successfully carried over from the traditional to the online environment and how physics education research and technology are changing the physics classroom. The main purpose of this chapter is to identify best practices in designing and teaching online science courses and to provide recommendations on improving existing online science classrooms. Throughout the chapter, Moodle™, an open source LMS, will be used to showcase and implement the ideas being presented.


Sign in / Sign up

Export Citation Format

Share Document