AN ONTOLOGY FOR SEMANTIC INTEGRATION OF LIFE SCIENCE WEB DATABASES

2003 ◽  
Vol 12 (02) ◽  
pp. 275-294 ◽  
Author(s):  
ZINA BEN MILED ◽  
YUE W. WEBSTER ◽  
YANG LIU ◽  
NIANHUA LI

The incompatibilities among complex data formats and various schema used by biological databases that house these data are becoming a bottleneck in biological research. For example, biological data format varies from simple words (e.g. gene name), numbers (e.g. molecular weight) to sequence strings (e.g. nucleic acid sequence), to even more complex data formats such as taxonomy trees. Some information is embedded in narrative text, such as expert comments and publications. Some other information is expressed as graphs or images (e.g. pathways networks). The confederation of heterogeneous web databases has become a crucial issue in today's biological research. In other words, interoperability has to be archieved among the biological web databases and the heterogeneity of the web databases has to be resolved. This paper presents a biological ontology, BAO, and discusses its advantages in supporting the semantic integration of biological web databases are discussed.

2006 ◽  
Vol 14 (02) ◽  
pp. 275-293 ◽  
Author(s):  
CHRISTOPHER S. OEHMEN ◽  
TJERK P. STRAATSMA ◽  
GORDON A. ANDERSON ◽  
GALYA ORR ◽  
BOBBIE-JO M. WEBB-ROBERTSON ◽  
...  

The future of biology will be increasingly driven by the fundamental paradigm shift from hypothesis-driven research to data-driven discovery research employing the growing volume of biological data coupled to experimental testing of new discoveries. But hardware and software limitations in the current workflow infrastructure make it impossible or intractible to use real data from disparate sources for large-scale biological research. We identify key technological developments needed to enable this paradigm shift involving (1) the ability to store and manage extremely large datasets which are dispersed over a wide geographical area, (2) development of novel analysis and visualization tools which are capable of operating on enormous data resources without overwhelming researchers with unusable information, and (3) formalisms for integrating mathematical models of biosystems from the molecular level to the organism population level. This will require the development of algorithms and tools which efficiently utilize high-performance compute power and large storage infrastructures. The end result will be the ability of a researcher to integrate complex data from many different sources with simulations to analyze a given system at a wide range of temporal and spatial scales in a single conceptual model.


2016 ◽  
Vol 1 ◽  
pp. 25 ◽  
Author(s):  
Aravind Venkatesan ◽  
Jee-Hyub Kim ◽  
Francesco Talo ◽  
Michele Ide-Smith ◽  
Julien Gobeill ◽  
...  

Biological databases are fundamental to biological research and discovery. Database curation adds highly precise and useful information, usually extracted from the literature through experts reading research articles. The significant amount of time and effort put in by curators, against the backdrop of tremendous data growth, makes manual curation a high value task. Therefore, there is an urgent need to find ways to scale curation efforts by improving data integration, linking literature to the underlying data. As part of the development of Europe PMC, we have developed a new platform, SciLite, that overlays text-mined annotations on research articles. The aim is to aid Europe PMC users in finding key concepts more easily and provide links to related resources or tools, bridging the gap between literature and biological data.


2019 ◽  
Author(s):  
Charles Tapley Hoyt ◽  
Daniel Domingo-Fernández ◽  
Sarah Mubeen ◽  
Josep Marin Llaó ◽  
Andrej Konotopez ◽  
...  

AbstractBackgroundThe integration of heterogeneous, multiscale, and multimodal knowledge and data has become a common prerequisite for joint analysis to unravel the mechanisms and aetiologies of complex diseases. Because of its unique ability to capture this variety, Biological Expression Language (BEL) is well suited to be further used as a platform for semantic integration and harmonization in networks and systems biology.ResultsWe have developed numerous independent packages capable of downloading, structuring, and serializing various biological data sources to BEL. Each Bio2BEL package is implemented in the Python programming language and distributed through GitHub (https://github.com/bio2bel) and PyPI.ConclusionsThe philosophy of Bio2BEL encourages reproducibility, accessibility, and democratization of biological databases. We present several applications of Bio2BEL packages including their ability to support the curation of pathway mappings, integration of pathway databases, and machine learning applications.TweetA suite of independent Python packages for downloading, parsing, warehousing, and converting multi-modal and multi-scale biological databases to Biological Expression Language


2011 ◽  
pp. 589-601
Author(s):  
Zina Ben Miled

Life science Web databases are becoming increasingly essential in conducting everyday biological research. With more than 300 life science Web databases and the growing size of the life science data, searching and managing these complex data requires technology beyond that of traditional database systems. The open research issues include the interoperability of geographically distributed autonomous databases, which are generally only Web-accessible, and the seamless semantic-based integration of these databases with total transparency to the user. In this paper, the implementation of a Biological and Chemical Information Integration System (BACIIS) is presented. BACIIS supports the integration of multiple heterogeneous life science Web databases and allows the execution of global applications that extend beyond the boundaries of individual databases. This paper discusses the architecture of BACIIS. It also discusses the techniques used to extract and integrate data from the various life science Web databases.


Author(s):  
PHILIP ADEBO

ABSTRACT Today biological research is experiencing explosive growth in academic, industry, and government sectors. Bioinformatics has emerged to make sense of such high volume and complex data. It is an interdisciplinary field that  combines computer science, biology, engineering, and mathematics in order to develop methods, techniques, and tools for analyzing and interpreting  biological data.  It uses computational approaches to solve complex biological problems and analyze large-volume of biological data.  This paper provides a primer on bioinformatics.


2021 ◽  
Vol 15 (8) ◽  
pp. 898-911
Author(s):  
Yongqing Zhang ◽  
Jianrong Yan ◽  
Siyu Chen ◽  
Meiqin Gong ◽  
Dongrui Gao ◽  
...  

Rapid advances in biological research over recent years have significantly enriched biological and medical data resources. Deep learning-based techniques have been successfully utilized to process data in this field, and they have exhibited state-of-the-art performances even on high-dimensional, nonstructural, and black-box biological data. The aim of the current study is to provide an overview of the deep learning-based techniques used in biology and medicine and their state-of-the-art applications. In particular, we introduce the fundamentals of deep learning and then review the success of applying such methods to bioinformatics, biomedical imaging, biomedicine, and drug discovery. We also discuss the challenges and limitations of this field, and outline possible directions for further research.


2020 ◽  
Vol 15 ◽  
Author(s):  
Omer Irshad ◽  
Muhammad Usman Ghani Khan

Aim: To facilitate researchers and practitioners for unveiling the mysterious functional aspects of human cellular system through performing exploratory searching on semantically integrated heterogeneous and geographically dispersed omics annotations. Background: Improving health standards of life is one of the motives which continuously instigates researchers and practitioners to strive for uncovering the mysterious aspects of human cellular system. Inferring new knowledge from known facts always requires reasonably large amount of data in well-structured, integrated and unified form. Due to the advent of especially high throughput and sensor technologies, biological data is growing heterogeneously and geographically at astronomical rate. Several data integration systems have been deployed to cope with the issues of data heterogeneity and global dispersion. Systems based on semantic data integration models are more flexible and expandable than syntax-based ones but still lack aspect-based data integration, persistence and querying. Furthermore, these systems do not fully support to warehouse biological entities in the form of semantic associations as naturally possessed by the human cell. Objective: To develop aspect-oriented formal data integration model for semantically integrating heterogeneous and geographically dispersed omics annotations for providing exploratory querying on integrated data. Method: We propose an aspect-oriented formal data integration model which uses web semantics standards to formally specify its each construct. Proposed model supports aspect-oriented representation of biological entities while addressing the issues of data heterogeneity and global dispersion. It associates and warehouses biological entities in the way they relate with Result: To show the significance of proposed model, we developed a data warehouse and information retrieval system based on proposed model compliant multi-layered and multi-modular software architecture. Results show that our model supports well for gathering, associating, integrating, persisting and querying each entity with respect to its all possible aspects within or across the various associated omics layers. Conclusion: Formal specifications better facilitate for addressing data integration issues by providing formal means for understanding omics data based on meaning instead of syntax


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Rahi Jain ◽  
Wei Xu

Abstract Background Developing statistical and machine learning methods on studies with missing information is a ubiquitous challenge in real-world biological research. The strategy in literature relies on either removing the samples with missing values like complete case analysis (CCA) or imputing the information in the samples with missing values like predictive mean matching (PMM) such as MICE. Some limitations of these strategies are information loss and closeness of the imputed values with the missing values. Further, in scenarios with piecemeal medical data, these strategies have to wait to complete the data collection process to provide a complete dataset for statistical models. Method and results This study proposes a dynamic model updating (DMU) approach, a different strategy to develop statistical models with missing data. DMU uses only the information available in the dataset to prepare the statistical models. DMU segments the original dataset into small complete datasets. The study uses hierarchical clustering to segment the original dataset into small complete datasets followed by Bayesian regression on each of the small complete datasets. Predictor estimates are updated using the posterior estimates from each dataset. The performance of DMU is evaluated by using both simulated data and real studies and show better results or at par with other approaches like CCA and PMM. Conclusion DMU approach provides an alternative to the existing approaches of information elimination and imputation in processing the datasets with missing values. While the study applied the approach for continuous cross-sectional data, the approach can be applied to longitudinal, categorical and time-to-event biological data.


2021 ◽  
Author(s):  
FENG GUO ◽  
HUI-LIN QIN

With the continuous development of information technology, enterprises have gradually entered the era of big data. How to analyze the complex data and find out the useful information to promote the development of enterprises is becoming more and more important in the modernization of science and technology. This paper expounds the importance and existing problems of big data application in enterprise management, and briefly analyzes and discusses its application in enterprises and its future development direction and trend. With the rapid development of Internet of things, cloud computing and other information technology, the world ushered in the era of big data. It has become a trend to promote the deep integration of Internet, big data, artificial intelligence and real economy. Due to the rapid development of economy, the amount of data information generated in the process of consumption and production is very large. Under the traditional management mode, enterprises can not meet the needs of the current social and economic development. However, the application of big data technology in enterprises can achieve better analysis and Research on these data information, so as to provide reliable data basis for enterprises to carry out various business management decisions.


2020 ◽  
Author(s):  
A. E. Sullivan ◽  
S. J. Tappan ◽  
P. J. Angstman ◽  
A. Rodriguez ◽  
G. C. Thomas ◽  
...  

AbstractWith advances in microscopy and computer science, the technique of digitally reconstructing, modeling, and quantifying microscopic anatomies has become central to many fields of biological research. MBF Bioscience has chosen to openly document their digital reconstruction file format, Neuromorphological File Specification (4.0), available at www.mbfbioscience.com/filespecification (Angstman et al. 2020). One of such technologies, the format created and maintained by MBF Bioscience is broadly utilized by the neuroscience community. The data format’s structure and capabilities have evolved since its inception, with modifications made to keep pace with advancements in microscopy and the scientific questions raised by worldwide experts in the field. More recent modifications to the neuromorphological data format ensure it abides by the Findable, Accessible, Interoperable, and Reusable (FAIR) data standards promoted by the International Neuroinformatics Coordinating Facility (INCF; Wilkinson et al. 2016). The incorporated metadata make it easy to identify and repurpose these data types for downstream application and investigation. This publication describes key elements of the file format and details their relevant structural advantages in an effort to encourage the reuse of these rich data files for alternative analysis or reproduction of derived conclusions.


Sign in / Sign up

Export Citation Format

Share Document