scholarly journals Bioentity2vec: Attribute- and behavior-driven representation for predicting multi-type relationships between bioentities

GigaScience ◽  
2020 ◽  
Vol 9 (6) ◽  
Author(s):  
Zhen-Hao Guo ◽  
Zhu-Hong You ◽  
Yan-Bin Wang ◽  
De-Shuang Huang ◽  
Hai-Cheng Yi ◽  
...  

Abstract Background The explosive growth of genomic, chemical, and pathological data provides new opportunities and challenges for humans to thoroughly understand life activities in cells. However, there exist few computational models that aggregate various bioentities to comprehensively reveal the physical and functional landscape of biological systems. Results We constructed a molecular association network, which contains 18 edges (relationships) between 8 nodes (bioentities). Based on this, we propose Bioentity2vec, a new method for representing bioentities, which integrates information about the attributes and behaviors of a bioentity. Applying the random forest classifier, we achieved promising performance on 18 relationships, with an area under the curve of 0.9608 and an area under the precision-recall curve of 0.9572. Conclusions Our study shows that constructing a network with rich topological and biological information is important for systematic understanding of the biological landscape at the molecular level. Our results show that Bioentity2vec can effectively represent biological entities and provides easily distinguishable information about classification tasks. Our method is also able to simultaneously predict relationships between single types and multiple types, which will accelerate progress in biological experimental research and industrial product development.

2021 ◽  
Author(s):  
Eunsaem Lee ◽  
Se Young Jung ◽  
Hyung Ju Hwang ◽  
Jaewoo Jung

BACKGROUND Nationwide population-based cohorts provide a new opportunity to build automated risk prediction models at the patient level, and claim data are one of the more useful resources to this end. To avoid unnecessary diagnostic intervention after cancer screening tests, patient-level prediction models should be developed. OBJECTIVE We aimed to develop cancer prediction models using nationwide claim databases with machine learning algorithms, which are explainable and easily applicable in real-world environments. METHODS As source data, we used the Korean National Insurance System Database. Every Korean in ≥40 years old undergoes a national health checkup every 2 years. We gathered all variables from the database including demographic information, basic laboratory values, anthropometric values, and previous medical history. We applied conventional logistic regression methods, light gradient boosting methods, neural networks, survival analysis, and one-class embedding classifier methods to effectively analyze high dimension data based on deep learning–based anomaly detection. Performance was measured with area under the curve and area under precision recall curve. We validated our models externally with a health checkup database from a tertiary hospital. RESULTS The one-class embedding classifier model received the highest area under the curve scores with values of 0.868, 0.849, 0.798, 0.746, 0.800, 0.749, and 0.790 for liver, lung, colorectal, pancreatic, gastric, breast, and cervical cancers, respectively. For area under precision recall curve, the light gradient boosting models had the highest score with values of 0.383, 0.401, 0.387, 0.300, 0.385, 0.357, and 0.296 for liver, lung, colorectal, pancreatic, gastric, breast, and cervical cancers, respectively. CONCLUSIONS Our results show that it is possible to easily develop applicable cancer prediction models with nationwide claim data using machine learning. The 7 models showed acceptable performances and explainability, and thus can be distributed easily in real-world environments.


10.2196/29807 ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. e29807
Author(s):  
Eunsaem Lee ◽  
Se Young Jung ◽  
Hyung Ju Hwang ◽  
Jaewoo Jung

Background Nationwide population-based cohorts provide a new opportunity to build automated risk prediction models at the patient level, and claim data are one of the more useful resources to this end. To avoid unnecessary diagnostic intervention after cancer screening tests, patient-level prediction models should be developed. Objective We aimed to develop cancer prediction models using nationwide claim databases with machine learning algorithms, which are explainable and easily applicable in real-world environments. Methods As source data, we used the Korean National Insurance System Database. Every Korean in ≥40 years old undergoes a national health checkup every 2 years. We gathered all variables from the database including demographic information, basic laboratory values, anthropometric values, and previous medical history. We applied conventional logistic regression methods, light gradient boosting methods, neural networks, survival analysis, and one-class embedding classifier methods to effectively analyze high dimension data based on deep learning–based anomaly detection. Performance was measured with area under the curve and area under precision recall curve. We validated our models externally with a health checkup database from a tertiary hospital. Results The one-class embedding classifier model received the highest area under the curve scores with values of 0.868, 0.849, 0.798, 0.746, 0.800, 0.749, and 0.790 for liver, lung, colorectal, pancreatic, gastric, breast, and cervical cancers, respectively. For area under precision recall curve, the light gradient boosting models had the highest score with values of 0.383, 0.401, 0.387, 0.300, 0.385, 0.357, and 0.296 for liver, lung, colorectal, pancreatic, gastric, breast, and cervical cancers, respectively. Conclusions Our results show that it is possible to easily develop applicable cancer prediction models with nationwide claim data using machine learning. The 7 models showed acceptable performances and explainability, and thus can be distributed easily in real-world environments.


2019 ◽  
Author(s):  
Zhen-Hao Guo ◽  
Zhu-Hong You ◽  
Yan-Bin Wang ◽  
Hai-Cheng Yi

AbstractThe explosive growth of genomic, chemical and pathological data provides new opportunities and challenges to re-recognize life activities within human cells. However, there exist few computational models that aggregate various biomarkers to comprehensively reveal the physical and functional landscape of the biology system. Here, we construct a graph called Molecular Association Network (MAN) and a representation method called Biomarker2vec. Specifically, MAN is a heterogeneous attribute network consists of 18 kinds of edges (relationships) among 8 kinds of nodes (biomarkers). Biomarker2vec is an algorithm that represents the nodes as vectors by integrating biomarker attribute and behavior. After the biomarkers are described as vectors, random forest classifier is applied to carry out the prediction task. Our approach achieved promising performance on 18 relationships, with AUC of 0.9608 and AUPR of 0.9572. We also empirically explored the contribution of attribute and behavior feature of biomarkers to the results. In addition, a drug-disease association prediction case study was performed to validate our method’s ability on a specific object. These results strongly prove that MAN is a network with rich topological and biological information and Biomarker2vec can indeed adequately characterize biomarkers. Generally, our method can achieve simultaneous prediction of both single-type and multi-type relationships, which bring beneficial inspiration to relevant scholars and expand the medical research paradigm.


2021 ◽  
Vol 11 (5) ◽  
pp. 2083
Author(s):  
Jia Xie ◽  
Zhu Wang ◽  
Zhiwen Yu ◽  
Bin Guo ◽  
Xingshe Zhou

Ischemic stroke is one of the typical chronic diseases caused by the degeneration of the neural system, which usually leads to great damages to human beings and reduces life quality significantly. Thereby, it is crucial to extract useful predictors from physiological signals, and further diagnose or predict ischemic stroke when there are no apparent symptoms. Specifically, in this study, we put forward a novel prediction method by exploring sleep related features. First, to characterize the pattern of ischemic stroke accurately, we extract a set of effective features from several aspects, including clinical features, fine-grained sleep structure-related features and electroencephalogram-related features. Second, a two-step prediction model is designed, which combines commonly used classifiers and a data filter model together to optimize the prediction result. We evaluate the framework using a real polysomnogram dataset that contains 20 stroke patients and 159 healthy individuals. Experimental results demonstrate that the proposed model can predict stroke events effectively, and the Precision, Recall, Precision Recall Curve and Area Under the Curve are 63%, 85%, 0.773 and 0.919, respectively.


2021 ◽  
Vol 10 (1) ◽  
pp. 70
Author(s):  
Oladosu Oyebisi Oladimeji ◽  
Abimbola Oladimeji ◽  
Oladimeji Olayanju

Introduction: Hepatitis C is a chronic infection caused by hepatitis c virus - a blood borne virus. Therefore, the infection occurs through exposure to small quantities of blood. It has been estimated by World Health Organization (WHO) to have affected 71 million people worldwide. This infection costs individual, groups and government a lot because no vaccine has been gotten yet for the treatment. This disease is likely to continue to affect more people because it’s long asymptotic phase which makes its early detection not feasible.Material and Methods: In this study, we have presented machine learning models to automatically classify the diagnosis test of hepatitis and also ranked the test features in order to know how they contribute to the classification which help in decision making process by the health care industry. The synthetic minority oversampling technique (SMOTE) was used to solve the problem of imbalance dataset.Results: The models were evaluated based on metrics such as Matthews correlation coefficient, F-measure, Precision-Recall curve and Receiver Operating Characteristic Area Under Curve.  We found that using SMOTE techniques helped raise performance of the predictive models. Also, random forest (RF) had the best performance based on Matthews correlation coefficient (0.99), F-measure (0.99), Precision-Recall curve (1.00) and Receiver Operating Characteristic Area Under Curve (0.99).Conclusion: This discovery has the potential to impact on clinical practice, when health workers aim at classifying diagnosis result of disease at its early stage.


Author(s):  
Christopher A. Miller ◽  
Tammy Ott ◽  
Peggy Wu ◽  
Vanessa Vakili

If culture is expressed in the patterns of behavior, values and expectations of a group, then a central element in the practical modeling and understanding of culture is the expression of politeness and its roles in governing and influencing behavior. The authors have been developing computational models of “politeness” and its role in power and familiarity relationships, urgency, indebtedness, etc. Such a model, insofar as it extends to human-machine interactions, will enable better and more effective decision aids. This model, based on a universal theory of human politeness, links aspects of social context (power and familiarity relationships, imposition, character), which have culture-specific values, to produce expectations about the use of polite, redressive behaviors (also culturally defined). The authors have linked this “politeness perception” model to a coarse model of decision making and behavior in order to predict influences of politeness on behavior and attitudes. This chapter describes the algorithm along with results from multiple validation experiments: two addressing the model’s ability to predict perceived politeness and two predicting the impact of perceived politeness on compliance behaviors in response to directives. The authors conclude that their model tracks well with subjective perceptions of American cultural politeness and that its predictions broadly anticipate and explain situations in which perceived politeness in a directive yields improved affect, trust, perceived competence, subjective workload, and compliance, though somewhat decreased reaction time. The model proves better at accounting for the effects of social distance than for power differences.


Zootaxa ◽  
2019 ◽  
Vol 4551 (1) ◽  
pp. 53 ◽  
Author(s):  
GABRIELLE JORGE ◽  
MARÍA LAURA LIBONATTI ◽  
CESAR JOÃO BENETTI ◽  
NEUSA HAMADA

In this paper we describe and illustrate for the first time the immature forms (larva and pupa) of Ora semibrunnea Pic, 1922, including biological information and behavior observed in the laboratory. This is the first record of the occurrence of this species in the Brazilian Amazon region. Larvae and pupae were found in natural lakes associated with macrophyte banks. Pupae are aquatic and have morphological adaptations (well-developed pronotal siphons) to obtain atmospheric O2 at the water surface. 


2019 ◽  
Vol 374 (1774) ◽  
pp. 20180370 ◽  
Author(s):  
Salva Duran-Nebreda ◽  
George W. Bassel

Information processing and storage underpins many biological processes of vital importance to organism survival. Like animals, plants also acquire, store and process environmental information relevant to their fitness, and this is particularly evident in their decision-making. The control of plant organ growth and timing of their developmental transitions are carefully orchestrated by the collective action of many connected computing agents, the cells, in what could be addressed as distributed computation. Here, we discuss some examples of biological information processing in plants, with special interest in the connection to formal computational models drawn from theoretical frameworks. Research into biological processes with a computational perspective may yield new insights and provide a general framework for information processing across different substrates.This article is part of the theme issue ‘Liquid brains, solid brains: How distributed cognitive architectures process information’.


Sign in / Sign up

Export Citation Format

Share Document