biological data analysis
Recently Published Documents


TOTAL DOCUMENTS

77
(FIVE YEARS 22)

H-INDEX

10
(FIVE YEARS 4)

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xinzhou Ge ◽  
Yiling Elaine Chen ◽  
Dongyuan Song ◽  
MeiLu McDermott ◽  
Kyla Woyshner ◽  
...  

AbstractHigh-throughput biological data analysis commonly involves identifying features such as genes, genomic regions, and proteins, whose values differ between two conditions, from numerous features measured simultaneously. The most widely used criterion to ensure the analysis reliability is the false discovery rate (FDR), which is primarily controlled based on p-values. However, obtaining valid p-values relies on either reasonable assumptions of data distribution or large numbers of replicates under both conditions. Clipper is a general statistical framework for FDR control without relying on p-values or specific data distributions. Clipper outperforms existing methods for a broad range of applications in high-throughput data analysis.


2021 ◽  
Author(s):  
Hyungtaek Jung ◽  
Brendan Jeon ◽  
Daniel Ortiz-Barrientos

Storing and manipulating Next Generation Sequencing (NGS) file formats is an essential but difficult task in biological data analysis. The easyfm ( easy f ile m anipulation) toolkit ( https://github.com/TaekAndBrendan/easyfm ) makes manipulating commonly used NGS files more accessible to biologists. It enables them to perform end-to-end reproducible data analyses using a free standalone desktop application (available on Windows, Mac and Linux). Unlike existing tools (e.g. Galaxy), the Graphical User Interface (GUI)-based easyfm is not dependent on any high-performance computing (HPC) system and can be operated without an internet connection. This specific benefit allow easyfm to seamlessly integrate visual and interactive representations of NGS files, supporting a wider scope of bioinformatics applications in the life sciences.


2021 ◽  

Abstract R is an open-source statistical environment modelled after the previously widely used commercial programs S and S-Plus, but in addition to powerful statistical analysis tools, it also provides powerful graphics outputs. In addition to its statistical and graphical capabilities, R is a programming language suitable for medium-sized projects. This book presents a set of studies that collectively represent almost all the R operations that beginners, analysing their own data up to perhaps the early years of doing a PhD, need. Although the chapters are organized around topics such as graphing, classical statistical tests, statistical modelling, mapping and text parsing, examples have been chosen based largely on real scientific studies at the appropriate level and within each the use of more R functions is nearly always covered than are simply necessary just to get a p-value or a graph. R comes with around a thousand base functions which are automatically installed when R is downloaded. This book covers the use of those of most relevance to biological data analysis, modelling and graphics. Throughout each chapter, the functions introduced and used in that chapter are summarized in Tool Boxes. The book also shows the user how to adapt and write their own code and functions. A selection of base functions relevant to graphics that are not necessarily covered in the main text are described in Appendix 1, and additional housekeeping functions in Appendix 2.


2021 ◽  
Vol 18 (6) ◽  
pp. 8603-8621
Author(s):  
Grigoriy Gogoshin ◽  
◽  
Sergio Branciamore ◽  
Andrei S. Rodin

<abstract><p>Bayesian Network (BN) modeling is a prominent and increasingly popular computational systems biology method. It aims to construct network graphs from the large heterogeneous biological datasets that reflect the underlying biological relationships. Currently, a variety of strategies exist for evaluating BN methodology performance, ranging from utilizing artificial benchmark datasets and models, to specialized biological benchmark datasets, to simulation studies that generate synthetic data from predefined network models. The last is arguably the most comprehensive approach; however, existing implementations often rely on explicit and implicit assumptions that may be unrealistic in a typical biological data analysis scenario, or are poorly equipped for automated arbitrary model generation. In this study, we develop a purely probabilistic simulation framework that addresses the demands of statistically sound simulations studies in an unbiased fashion. Additionally, we expand on our current understanding of the theoretical notions of causality and dependence / conditional independence in BNs and the Markov Blankets within.</p></abstract>


Author(s):  
Dong Xu ◽  
Zhuchou Lu ◽  
Kangming Jin ◽  
Wenmin Qiu ◽  
Guirong Qiao ◽  
...  

AbstractEfficiently extracting information from biological big data can be a huge challenge for people (especially those who lack programming skills). We developed Sequence Processing and Data Extraction (SPDE) as an integrated tool for sequence processing and data extraction for gene family and omics analyses. Currently, SPDE has seven modules comprising 100 basic functions that range from single gene processing (e.g., translation, reverse complement, and primer design) to genome information extraction. All SPDE functions can be used without the need for programming or command lines. The SPDE interface has enough prompt information to help users run SPDE without barriers. In addition to its own functions, SPDE also incorporates the publicly available analyses tools (such as, NCBI-blast, HMMER, Primer3 and SAMtools), thereby making SPDE a comprehensive bioinformatics platform for big biological data analysis.AvailabilitySPDE was built using Python and can be run on 32-bit, 64-bit Windows and macOS systems. It is an open-source software that can be downloaded from https://github.com/simon19891216/[email protected]


Author(s):  
Hasan Balci ◽  
Metin Can Siper ◽  
Nasim Saleh ◽  
Ilkin Safarli ◽  
Ludovic Roy ◽  
...  

Abstract Motivation Visualization of cellular processes and pathways is a key recurring requirement for effective biological data analysis. There is a considerable need for sophisticated web-based pathway viewers and editors operating with widely accepted standard formats, using the latest visualization techniques and libraries. Results We developed a web-based tool named Newt for viewing, constructing and analyzing biological maps in standard formats such as SBGN, SBML and SIF. Availability and implementation Newt’s source code is publicly available on GitHub and freely distributed under the GNU LGPL. Ample documentation on Newt can be found on http://newteditor.org and on YouTube.


GigaScience ◽  
2020 ◽  
Vol 9 (6) ◽  
Author(s):  
Michael Kluge ◽  
Marie-Sophie Friedl ◽  
Amrei L Menzel ◽  
Caroline C Friedel

Abstract Background Advances in high-throughput methods have brought new challenges for biological data analysis, often requiring many interdependent steps applied to a large number of samples. To address this challenge, workflow management systems, such as Watchdog, have been developed to support scientists in the (semi-)automated execution of large analysis workflows. Implementation Here, we present Watchdog 2.0, which implements new developments for module creation, reusability, and documentation and for reproducibility of analyses and workflow execution. Developments include a graphical user interface for semi-automatic module creation from software help pages, sharing repositories for modules and workflows, and a standardized module documentation format. The latter allows generation of a customized reference book of public and user-specific modules. Furthermore, extensive logging of workflow execution, module and software versions, and explicit support for package managers and container virtualization now ensures reproducibility of results. A step-by-step analysis protocol generated from the log file may, e.g., serve as a draft of a manuscript methods section. Finally, 2 new execution modes were implemented. One allows resuming workflow execution after interruption or modification without rerunning successfully executed tasks not affected by changes. The second one allows detaching and reattaching to workflow execution on a local computer while tasks continue running on computer clusters. Conclusions Watchdog 2.0 provides several new developments that we believe to be of benefit for large-scale bioinformatics analysis and that are not completely covered by other competing workflow management systems. The software itself, module and workflow repositories, and comprehensive documentation are freely available at https://www.bio.ifi.lmu.de/watchdog.


BMC Biology ◽  
2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Sunil Nagpal ◽  
Krishanu Das Baksi ◽  
Bhusan K. Kuntal ◽  
Sharmila S. Mande

Abstract Background Most biological experiments are inherently designed to compare changes or transitions of state between conditions of interest. The advancements in data intensive research have in particular elevated the need for resources and tools enabling comparative analysis of biological data. The complexity of biological systems and the interactions of their various components, such as genes, proteins, taxa, and metabolites, have been inferred, represented, and visualized via graph theory-based networks. Comparisons of multiple networks can help in identifying variations across different biological systems, thereby providing additional insights. However, while a number of online and stand-alone tools exist for generating, analyzing, and visualizing individual biological networks, the utility to batch process and comprehensively compare multiple networks is limited. Results Here, we present a graphical user interface (GUI)-based web application which implements multiple network comparison methodologies and presents them in the form of organized analysis workflows. Dedicated comparative visualization modules are provided to the end-users for obtaining easy to comprehend, insightful, and meaningful comparisons of various biological networks. We demonstrate the utility and power of our tool using publicly available microbial and gene expression data. Conclusion NetConfer tool is developed keeping in mind the requirements of researchers working in the field of biological data analysis with limited programming expertise. It is also expected to be useful for advanced users from biological as well as other domains (working with association networks), benefiting from provided ready-made workflows, as they allow to focus directly on the results without worrying about the implementation. While the web version allows using this application without installation and dependency requirements, a stand-alone version has also been supplemented to accommodate the offline requirement of processing large networks.


Author(s):  
Mousomi Roy

Biological data analysis is one of the most important and challenging tasks in today's world. Automated analysis of these data is necessary for quick and accurate diagnosis. Intelligent computing-based solutions are highly required to reduce the human intervention as well as time. Artificial intelligence-based methods are frequently used to analyze and mine information from biological data. There are several machine learning-based tools available, using which powerful and intelligent automated systems can be developed. In general, the amount and volume of this kind of data is quite huge and demands sophisticated tools that can efficiently handle this data and produce results within reasonable time by extracting useful information from big data. In this chapter, the authors have made a comprehensive study about different computer-aided automated methods and tools to analyze the different types of biological data. Moreover, this chapter gives an insight about various types of biological data and their real-life applications.


Sign in / Sign up

Export Citation Format

Share Document