Computational Methods for Learning Bayesian Networks from High-Throughput Biological Data

The February 16th, 2001 issue of Science magazine announced the completion of the human genome project—making the entire nucleotide sequence of the genome available (Venter, Adams et al. 2001). For the first time a comprehensive data set was available with nucleotide sequences for every gene. This marked the beginning of a new era, the ‘‘genomics’’ era, where molecular biological science began a shift from the investigation of single genes towards the investigation of all genes in an organism simultaneously. Alongside the completion of the genome project came the introduction of new high throughput experimental approaches such as gene expression microarrays, rapid single nucleotide polymorphism detection, and proteomics methods such as yeast two hybrid screens (Brown and Botstein 1999; Kwok and Chen 2003; Sharff and Jhoti 2003; Zhu, Bilgin et al. 2003). These methods permitted the investigation of hundreds if not thousands of genes simultaneously. With these high throughput methods, the limiting step in the study of biology began shifting from data collection to data interpretation. To interpret traditional experimental results that addressed the function of only a single or handful of genes, investigators needed to understand only those few genes addressed in the study in detail and perhaps a handful of other related genes. These investigators needed to be familiar with a comparatively small collection of peer-reviewed publications and prior results. Today, new genomics experimental assays, such as gene expression microarrays, are generating data for thousands of genes simultaneously. The increasing complexity and sophistication of these methods makes them extremely unwieldy for manual analysis since the number and diversity of genes involved exceed the expertise of any single investigator. The only practical solution to analyzing these types of data sets is using computational methods that are unhindered by the volume of modern data. Bioinformatics is a new field that emphasizes computational methods to analyze such data sets (Lesk 2002). Bioinformatics combines the algorithms and approaches employed in computer science and statistics to analyze, understand, and hypothesize about the large repositories of collected biological data and knowledge.

Download Full-text

Kinetic Solvent Effects in Organic Reactions

10.26434/chemrxiv.5370778.v1 ◽

2017 ◽

Author(s):

Belinda Slakman ◽

Richard West

Keyword(s):

Solvent Effect ◽

Reaction Kinetics ◽

Computational Methods ◽

High Throughput ◽

Kinetic Modeling ◽

Solvent Effects ◽

Reaction Rates ◽

Prior Work ◽

Solvent Phase ◽

High Throughput Manner

<div> <div> <div> <p>This article reviews prior work studying reaction kinetics in solution, with the goal of using this information to improve detailed kinetic modeling in the solvent phase. Both experimental and computational methods for calculating reaction rates in liquids are reviewed. Previous studies, which used such methods to determine solvent effects, are then analyzed based on reaction family. Many of these studies correlate kinetic solvent effect with one or more solvent parameters or properties of reacting species, but it is not always possible, and investigations are usually done on too few reactions and solvents to truly generalize. From these studies, we present suggestions on how best to use data to generalize solvent effects for many different reaction types in a high throughput manner. </p> </div> </div> </div>

Download Full-text

Comparison of Penalty-based Feature Selection Approach on High Throughput Biological Data

Proceedings of the 2020 10th International Conference on Biomedical Engineering and Technology ◽

10.1145/3397391.3397404 ◽

2020 ◽

Author(s):

Ningya Wang ◽

Wenbin Zhou ◽

Jiamin Wu ◽

Shengjia Chen ◽

Ziling Fan

Keyword(s):

Feature Selection ◽

High Throughput ◽

Biological Data ◽

Selection Approach ◽

Feature Selection Approach

Download Full-text

iSEE: Interactive SummarizedExperiment Explorer

F1000Research ◽

10.12688/f1000research.14966.1 ◽

2018 ◽

Vol 7 ◽

pp. 741 ◽

Cited By ~ 26

Author(s):

Kevin Rue-Albrecht ◽

Federico Marini ◽

Charlotte Soneson ◽

Aaron T.L. Lun

Keyword(s):

High Throughput ◽

Software Package ◽

Biological Data ◽

Data Exploration ◽

Data Sets ◽

Proteomics Data ◽

Code Tracking ◽

Dynamic Linking ◽

Interactive Visualisation ◽

Visual Interface

Data exploration is critical to the comprehension of large biological data sets generated by high-throughput assays such as sequencing. However, most existing tools for interactive visualisation are limited to specific assays or analyses. Here, we present the iSEE (Interactive SummarizedExperiment Explorer) software package, which provides a general visual interface for exploring data in a SummarizedExperiment object. iSEE is directly compatible with many existing R/Bioconductor packages for analysing high-throughput biological data, and provides useful features such as simultaneous examination of (meta)data and analysis results, dynamic linking between plots and code tracking for reproducibility. We demonstrate the utility and flexibility of iSEE by applying it to explore a range of real transcriptomics and proteomics data sets.

Download Full-text

Bioinformatics in otolaryngology research. Part one: concepts in DNA sequencing and gene expression analysis

The Journal of Laryngology & Otology ◽

10.1017/s002221511400200x ◽

2014 ◽

Vol 128 (10) ◽

pp. 848-858 ◽

Cited By ~ 1

Author(s):

T J Ow ◽

K Upadhyay ◽

T J Belbin ◽

M B Prystowsky ◽

H Ostrer ◽

...

Keyword(s):

Gene Expression ◽

Data Storage ◽

Expression Analysis ◽

High Throughput ◽

Gene Expression Analysis ◽

Biological Data ◽

Biological Research ◽

Nucleotide Sequencing ◽

New Era ◽

Crucial Component

AbstractBackground:Advances in high-throughput molecular biology, genomics and epigenetics, coupled with exponential increases in computing power and data storage, have led to a new era in biological research and information. Bioinformatics, the discipline devoted to storing, analysing and interpreting large volumes of biological data, has become a crucial component of modern biomedical research. Research in otolaryngology has evolved along with these advances.Objectives:This review highlights several modern high-throughput research methods, and focuses on the bioinformatics principles necessary to carry out such studies. Several examples from recent literature pertinent to otolaryngology are provided. The review is divided into two parts; this first part discusses the bioinformatics approaches applied in nucleotide sequencing and gene expression analysis.Conclusion:This paper demonstrates how high-throughput nucleotide sequencing and transcriptomics are changing biology and medicine, and describes how these changes are affecting otorhinolaryngology. Sound bioinformatics approaches are required to obtain useful information from the vast new sources of data.

Download Full-text

High-Throughput Screening of Phytochemicals: Application of Computational Methods

Computational Phytochemistry ◽

10.1016/b978-0-12-812364-5.00006-7 ◽

2018 ◽

pp. 165-192

Author(s):

Fyaz M.D. Ismail ◽

Lutfun Nahar ◽

Satyajit D. Sarker

Keyword(s):

Computational Methods ◽

High Throughput ◽

High Throughput Screening

Download Full-text

GEView (Gene Expression View) Tool for Intuitive and High Accessible Visualization of Expression Data for Non-Programmer Biologists

Data Analytics in Medicine ◽

10.4018/978-1-7998-1204-3.ch032 ◽

2020 ◽

pp. 580-592

Author(s):

Libi Hertzberg ◽

Assif Yitzhaky ◽

Metsada Pasmanik-Chor

Keyword(s):

Gene Expression ◽

Quality Control ◽

User Interface ◽

High Throughput ◽

Graphical User Interface ◽

Differential Expression Analysis ◽

Biological Data ◽

Expression Data ◽

Batch Correction ◽

User Friendly

This article describes how the last decade has been characterized by the production of huge amounts of different types of biological data. Following that, a flood of bioinformatics tools have been published. However, many of these tools are commercial, or require computational skills. In addition, not all tools provide intuitive and highly accessible visualization of the results. The authors have developed GEView (Gene Expression View), which is a free, user-friendly tool harboring several existing algorithms and statistical methods for the analysis of high-throughput gene, microRNA or protein expression data. It can be used to perform basic analysis such as quality control, outlier detection, batch correction and differential expression analysis, through a single intuitive graphical user interface. GEView is unique in its simplicity and highly accessible visualization it provides. Together with its basic and intuitive functionality it allows Bio-Medical scientists with no computational skills to independently analyze and visualize high-throughput data produced in their own labs.

Download Full-text

Targeting Virus-host Protein Interactions: Feature Extraction and Machine Learning Approaches

Current Drug Metabolism ◽

10.2174/1389200219666180829121038 ◽

2019 ◽

Vol 20 (3) ◽

pp. 177-184 ◽

Cited By ~ 16

Author(s):

Nantao Zheng ◽

Kairou Wang ◽

Weihua Zhan ◽

Lei Deng

Keyword(s):

Machine Learning ◽

Computational Methods ◽

Protein Interactions ◽

Prediction Models ◽

Learning Algorithms ◽

Biological Data ◽

Machine Learning Algorithms ◽

Host Protein ◽

Protein Protein Interactions ◽

Protein Motifs

Background:Targeting critical viral-host Protein-Protein Interactions (PPIs) has enormous application prospects for therapeutics. Using experimental methods to evaluate all possible virus-host PPIs is labor-intensive and time-consuming. Recent growth in computational identification of virus-host PPIs provides new opportunities for gaining biological insights, including applications in disease control. We provide an overview of recent computational approaches for studying virus-host PPI interactions.Methods:In this review, a variety of computational methods for virus-host PPIs prediction have been surveyed. These methods are categorized based on the features they utilize and different machine learning algorithms including classical and novel methods.Results:We describe the pivotal and representative features extracted from relevant sources of biological data, mainly include sequence signatures, known domain interactions, protein motifs and protein structure information. We focus on state-of-the-art machine learning algorithms that are used to build binary prediction models for the classification of virus-host protein pairs and discuss their abilities, weakness and future directions.Conclusion:The findings of this review confirm the importance of computational methods for finding the potential protein-protein interactions between virus and host. Although there has been significant progress in the prediction of virus-host PPIs in recent years, there is a lot of room for improvement in virus-host PPI prediction.

Download Full-text