An open-ended data representation model for EU_LISP

Author(s):  
Christian Queinnec ◽  
Pierre Cointe
Author(s):  
Neha Warikoo ◽  
Yung-Chun Chang ◽  
Wen-Lian Hsu

Abstract Motivation Natural Language Processing techniques are constantly being advanced to accommodate the influx of data as well as to provide exhaustive and structured knowledge dissemination. Within the biomedical domain, relation detection between bio-entities known as the Bio-Entity Relation Extraction (BRE) task has a critical function in knowledge structuring. Although recent advances in deep learning-based biomedical domain embedding have improved BRE predictive analytics, these works are often task selective or use external knowledge-based pre-/post-processing. In addition, deep learning-based models do not account for local syntactic contexts, which have improved data representation in many kernel classifier-based models. In this study, we propose a universal BRE model, i.e. LBERT, which is a Lexically aware Transformer-based Bidirectional Encoder Representation model, and which explores both local and global contexts representations for sentence-level classification tasks. Results This article presents one of the most exhaustive BRE studies ever conducted over five different bio-entity relation types. Our model outperforms state-of-the-art deep learning models in protein–protein interaction (PPI), drug–drug interaction and protein–bio-entity relation classification tasks by 0.02%, 11.2% and 41.4%, respectively. LBERT representations show a statistically significant improvement over BioBERT in detecting true bio-entity relation for large corpora like PPI. Our ablation studies clearly indicate the contribution of the lexical features and distance-adjusted attention in improving prediction performance by learning additional local semantic context along with bi-directionally learned global context. Availability and implementation Github. https://github.com/warikoone/LBERT. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 18 (02) ◽  
pp. 1950022
Author(s):  
Nur Adila Azram ◽  
Rodziah Atan

The growth of data from scientific experiments is increasing nowadays. These data came from different experiments done through various laboratory instruments or machines. It became an issue to manage and analyse scientific experimental data because of the heterogeneous nature of data structure and format. This paper proposed a knowledge metadata representation model to standardise the scientific experimental data representation to make it a standard structure. We discussed the methodology of the proposed model and gives the analysis of results. The evaluation and validation of the knowledge metadata representation model, as well as the verification of the metadata elements extraction, show promising results.


Personalized medicine exploits the patient data, for example, genetic compositions, and key biomarkers. During the data mining process, the key challenges are the information loss, the data types heterogeneity and the time series representation. In this paper, a novel data representation model for personalized medicine is proposed in light of these challenges. The proposed model will account for the structured, temporal and non-temporal data and their types, namely, numeric, nominal, date, and Boolean. After the "Date and Boolean" data transformation, the nominal data are treated by dispersion while several clustering techniques are deployed to control the numeric data distribution. Ultimately, the transformation process results in three homogeneous representations with these representations having only two dimensions to ease the exploration of the represented dataset. Compared to the Symbolic Aggregate Approximation technique, the proposed model preserves the time-series information, conserves as much data as possible and offers multiple simple representations to be explored.


2010 ◽  
Vol 11 (3) ◽  
pp. 333-341 ◽  
Author(s):  
Min CHEN ◽  
Yehua SHENG ◽  
Yongning WEN ◽  
Hongjun SU ◽  
Fei GUO

2012 ◽  
Vol 23 (1) ◽  
pp. 78-102 ◽  
Author(s):  
Avichai Meged ◽  
Roy Gelbard

A novel fuzzy data representation model which enables data mining with standard tools is introduced. Many data elements in the world are fuzzy in nature. There is an obvious need to represent and process such data effectively and efficiently, using the same standard tools for crisp data that are popular with researchers and practitioners alike. Currently, however, standard tools cannot process or analyze data that are not adequately represented. The comprehensive data representation model put forward here extends principles of binary databases and provides a unified approach to all types of data: discrete and continuous, crisp and fuzzy. The model is illustrated on a baseline dataset and tested in clustering experiments matched against controlled groupings and a real dataset. The tests confirm that the implementation of the model not only enables the use of standard tools but also yields better results as regards segmentation and clustering of fuzzy datasets.


Sign in / Sign up

Export Citation Format

Share Document