scholarly journals Translational informatics: enabling high-throughput research paradigms

2009 ◽  
Vol 39 (3) ◽  
pp. 131-140 ◽  
Author(s):  
Philip R. O. Payne ◽  
Peter J. Embi ◽  
Chandan K. Sen

A common thread throughout the clinical and translational research domains is the need to collect, manage, integrate, analyze, and disseminate large-scale, heterogeneous biomedical data sets. However, well-established and broadly adopted theoretical and practical frameworks and models intended to address such needs are conspicuously absent in the published literature or other reputable knowledge sources. Instead, the development and execution of multidisciplinary, clinical, or translational studies are significantly limited by the propagation of “silos” of both data and expertise. Motivated by this fundamental challenge, we report upon the current state and evolution of biomedical informatics as it pertains to the conduct of high-throughput clinical and translational research and will present both a conceptual and practical framework for the design and execution of informatics-enabled studies. The objective of presenting such findings and constructs is to provide the clinical and translational research community with a common frame of reference for discussing and expanding upon such models and methodologies.

2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
Andreas Friedrich ◽  
Erhan Kenar ◽  
Oliver Kohlbacher ◽  
Sven Nahnsen

Big data bioinformatics aims at drawing biological conclusions from huge and complex biological datasets. Added value from the analysis of big data, however, is only possible if the data is accompanied by accurate metadata annotation. Particularly in high-throughput experiments intelligent approaches are needed to keep track of the experimental design, including the conditions that are studied as well as information that might be interesting for failure analysis or further experiments in the future. In addition to the management of this information, means for an integrated design and interfaces for structured data annotation are urgently needed by researchers. Here, we propose a factor-based experimental design approach that enables scientists to easily create large-scale experiments with the help of a web-based system. We present a novel implementation of a web-based interface allowing the collection of arbitrary metadata. To exchange and edit information we provide a spreadsheet-based, humanly readable format. Subsequently, sample sheets with identifiers and metainformation for data generation facilities can be created. Data files created after measurement of the samples can be uploaded to a datastore, where they are automatically linked to the previously created experimental design model.


2019 ◽  
Vol 15 ◽  
pp. 117693431984907 ◽  
Author(s):  
Tomáš Farkaš ◽  
Jozef Sitarčík ◽  
Broňa Brejová ◽  
Mária Lucká

Computing similarity between 2 nucleotide sequences is one of the fundamental problems in bioinformatics. Current methods are based mainly on 2 major approaches: (1) sequence alignment, which is computationally expensive, and (2) faster, but less accurate, alignment-free methods based on various statistical summaries, for example, short word counts. We propose a new distance measure based on mathematical transforms from the domain of signal processing. To tolerate large-scale rearrangements in the sequences, the transform is computed across sliding windows. We compare our method on several data sets with current state-of-art alignment-free methods. Our method compares favorably in terms of accuracy and outperforms other methods in running time and memory requirements. In addition, it is massively scalable up to dozens of processing units without the loss of performance due to communication overhead. Source files and sample data are available at https://bitbucket.org/fiitstubioinfo/swspm/src


2018 ◽  
Vol 1 (1) ◽  
pp. 263-274 ◽  
Author(s):  
Marylyn D. Ritchie

Biomedical data science has experienced an explosion of new data over the past decade. Abundant genetic and genomic data are increasingly available in large, diverse data sets due to the maturation of modern molecular technologies. Along with these molecular data, dense, rich phenotypic data are also available on comprehensive clinical data sets from health care provider organizations, clinical trials, population health registries, and epidemiologic studies. The methods and approaches for interrogating these large genetic/genomic and clinical data sets continue to evolve rapidly, as our understanding of the questions and challenges continue to emerge. In this review, the state-of-the-art methodologies for genetic/genomic analysis along with complex phenomics will be discussed. This field is changing and adapting to the novel data types made available, as well as technological advances in computation and machine learning. Thus, I will also discuss the future challenges in this exciting and innovative space. The promises of precision medicine rely heavily on the ability to marry complex genetic/genomic data with clinical phenotypes in meaningful ways.


2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Yipu Zhang ◽  
Ping Wang

New high-throughput technique ChIP-seq, coupling chromatin immunoprecipitation experiment with high-throughput sequencing technologies, has extended the identification of binding locations of a transcription factor to the genome-wide regions. However, the most existing motif discovery algorithms are time-consuming and limited to identify binding motifs in ChIP-seq data which normally has the significant characteristics of large scale data. In order to improve the efficiency, we propose a fast cluster motif finding algorithm, named as FCmotif, to identify the(l, d)motifs in large scale ChIP-seq data set. It is inspired by the emerging substrings mining strategy to find the enriched substrings and then searching the neighborhood instances to construct PWM and cluster motifs in different length. FCmotif is not following the OOPS model constraint and can find long motifs. The effectiveness of proposed algorithm has been proved by experiments on the ChIP-seq data sets from mouse ES cells. The whole detection of the real binding motifs and processing of the full size data of several megabytes finished in a few minutes. The experimental results show that FCmotif has advantageous to deal with the(l, d)motif finding in the ChIP-seq data; meanwhile it also demonstrates better performance than other current widely-used algorithms such as MEME, Weeder, ChIPMunk, and DREME.


2020 ◽  
Author(s):  
Zeyu Jiao ◽  
Yinglei Lai ◽  
Jujiao Kang ◽  
Weikang Gong ◽  
Liang Ma ◽  
...  

AbstractHigh-throughput technologies, such as magnetic resonance imaging (MRI) and DNA/RNA sequencing (DNA-seq/RNA-seq), have been increasingly used in large-scale association studies. With these technologies, important biomedical research findings have been generated. The reproducibility of these findings, especially from structural MRI (sMRI) and functional MRI (fMRI) association studies, has recently been questioned. There is an urgent demand for a reliable overall reproducibility assessment for large-scale high-throughput association studies. It is also desirable to understand the relationship between study reproducibility and sample size in an experimental design. In this study, we developed a novel approach: the mixture model reproducibility index (M2RI) for assessing study reproducibility of large-scale association studies. With M2RI, we performed study reproducibility analysis for several recent large sMRI/fMRI data sets. The advantages of our approach were clearly demonstrated, and the sample size requirements for different phenotypes were also clearly demonstrated, especially when compared to the Dice coefficient (DC). We applied M2RI to compare two MRI or RNA sequencing data sets. The reproducibility assessment results were consistent with our expectations. In summary, M2RI is a novel and useful approach for assessing study reproducibility, calculating sample sizes and evaluating the similarity between two closely related studies.


Author(s):  
Huimin Luo ◽  
Min Li ◽  
Mengyun Yang ◽  
Fang-Xiang Wu ◽  
Yaohang Li ◽  
...  

Abstract Drug repositioning can drastically decrease the cost and duration taken by traditional drug research and development while avoiding the occurrence of unforeseen adverse events. With the rapid advancement of high-throughput technologies and the explosion of various biological data and medical data, computational drug repositioning methods have been appealing and powerful techniques to systematically identify potential drug-target interactions and drug-disease interactions. In this review, we first summarize the available biomedical data and public databases related to drugs, diseases and targets. Then, we discuss existing drug repositioning approaches and group them based on their underlying computational models consisting of classical machine learning, network propagation, matrix factorization and completion, and deep learning based models. We also comprehensively analyze common standard data sets and evaluation metrics used in drug repositioning, and give a brief comparison of various prediction methods on the gold standard data sets. Finally, we conclude our review with a brief discussion on challenges in computational drug repositioning, which includes the problem of reducing the noise and incompleteness of biomedical data, the ensemble of various computation drug repositioning methods, the importance of designing reliable negative samples selection methods, new techniques dealing with the data sparseness problem, the construction of large-scale and comprehensive benchmark data sets and the analysis and explanation of the underlying mechanisms of predicted interactions.


Author(s):  
Denny M. Oliveira ◽  
Eftyhia Zesta ◽  
Piyush M. Mehta ◽  
Richard J. Licata ◽  
Marcin D. Pilinski ◽  
...  

Satellites, crewed spacecraft and stations in low-Earth orbit (LEO) are very sensitive to atmospheric drag. A satellite’s lifetime and orbital tracking become increasingly inaccurate or uncertain during magnetic storms. Given the planned increase of government and private satellite presence in LEO, the need for accurate density predictions for collision avoidance and lifetime optimization, particularly during extreme events, has become an urgent matter and requires comprehensive international collaboration. Additionally, long-term solar activity models and historical data suggest that solar activity will significantly increase in the following years and decades. In this article, we briefly summarize the main achievements in the research of thermosphere response to extreme magnetic storms occurring particularly after the launching of many satellites with state-of-the-art accelerometers from which high-accuracy density can be determined. We find that the performance of an empirical model with data assimilation is higher than its performance without data assimilation during all extreme storm phases. We discuss how forecasting models can be improved by looking into two directions: first, to the past, by adapting historical extreme storm datasets for density predictions, and second, to the future, by facilitating the assimilation of large-scale thermosphere data sets that will be collected in future events. Therefore, this topic is relevant to the scientific community, government agencies that operate satellites, and the private sector with assets operating in LEO.


2015 ◽  
Author(s):  
Paul D Blischak ◽  
Laura S Kubatko ◽  
Andrea D Wolfe

Despite the increasing opportunity to collect large-scale data sets for population genomic analyses, the use of high throughput sequencing to study populations of polyploids has seen little application. This is due in large part to problems associated with determining allele copy number in the genotypes of polyploid individuals (allelic dosage uncertainty--ADU), which complicates the calculation of important quantities such as allele frequencies. Here we describe a statistical model to estimate biallelic SNP frequencies in a population of autopolyploids using high throughput sequencing data in the form of read counts.We bridge the gap from data collection (using restriction enzyme based techniques [e.g., GBS, RADseq]) to allele frequency estimation in a unified inferential framework using a hierarchical Bayesian model to sum over genotype uncertainty. Simulated data sets were generated under various conditions for tetraploid, hexaploid and octoploid populations to evaluate the model's performance and to help guide the collection of empirical data. We also provide an implementation of our model in the R package POLYFREQS and demonstrate its use with two example analyses that investigate (i) levels of expected and observed heterozygosity and (ii) model adequacy. Our simulations show that the number of individuals sampled from a population has a greater impact on estimation error than sequencing coverage. The example analyses also show that our model and software can be used to make inferences beyond the estimation of allele frequencies for autopolyploids by providing assessments of model adequacy and estimates of heterozygosity.


2012 ◽  
Vol 21 (01) ◽  
pp. 135-138 ◽  
Author(s):  
Y. L. Yip ◽  

SummaryTo review current excellent research and trend in the field of bioinformatics and translational informatics with direct application in the medical domain.Synopsis of the articles selected for the IMIA Yearbook 2012.Six excellent articles were selected in this Yearbook’s section on Bioinformatics and Translational Informatics. They exemplify current key advances in the use of patient information for translational research and health surveillance. First, two proof-of-concept studies demonstrated the cross-institutional and -geographic use of Electronic Health Records (EHR) for clinical trial subjects identification and drug safety signals detection. These reports pave ways to global large-scale population monitoring. Second, there is further evidence on the importance of coupling phenotypic information in EHR with genotypic information (either in biobank or in gene association studies) for new biomedical knowledge discovery. Third, patient data gathered via social media and self-reporting was found to be comparable to existent data and less labor intensive. This alternative means could potentially overcome data collection challenge in cohort and prospective studies. Finally, it can be noted that metagenomic studies are gaining momentum in bioinformatics and system-level analysis of human microbiome sheds important light on certain human diseases.The current literature showed that the traditional bench to bedside translational research is increasing being complemented by the reverse approach, in which bedside information can be used to provide novel biomedical insights.


2017 ◽  
Author(s):  
Carlos Arteta ◽  
Victor Lempitsky ◽  
Jaroslav Zak ◽  
Xin Lu ◽  
J. Alison Noble ◽  
...  

AbstractHigh-throughput screening (HTS) techniques have enabled large scale image-based studies, but extracting biological insights from the imaging data in an exploratory setting remains a challenge. Existing packages for this task either require expert annotations, which can bias the outcome of the study, or are completely unsupervised, failing to leverage the information present in the assay design. We present HTX, an interactive tool to aid in the exploration of large microscopy data sets by allowing the visualization of entire image-based assays according to visual similarities between the samples in an intuitive and navigable manner. Underlying HTX are a collection of novel algorithmic techniques for deep texture descriptor learning, 2D data visualization, adversarial suppression of batch effects, and backprop-based image saliency estimation.We demonstrate that HTX can exploit the screen meta-data in order to learn screen-specific image descriptors, which are then used to quantify the visual similarity between samples in the assay. Given these similarities and the different visualization resources of HTX, it is shown that screens of small-molecule libraries on cell data can be easily explored, reproducing the results of previous studies where highly-specific domain knowledge was required.


Sign in / Sign up

Export Citation Format

Share Document