scholarly journals Extraction of Protein-Protein Interaction from Scientific Articles by Predicting Dominant Keywords

2015 ◽  
Vol 2015 ◽  
pp. 1-13 ◽  
Author(s):  
Shun Koyabu ◽  
Thi Thanh Thuy Phan ◽  
Takenao Ohkawa

For the automatic extraction of protein-protein interaction information from scientific articles, a machine learning approach is useful. The classifier is generated from training data represented using several features to decide whether a protein pair in each sentence has an interaction. Such a specific keyword that is directly related to interaction as “bind” or “interact” plays an important role for training classifiers. We call it a dominant keyword that affects the capability of the classifier. Although it is important to identify the dominant keywords, whether a keyword is dominant depends on the context in which it occurs. Therefore, we propose a method for predicting whether a keyword is dominant for each instance. In this method, a keyword that derives imbalanced classification results is tentatively assumed to be a dominant keyword initially. Then the classifiers are separately trained from the instance with and without the assumed dominant keywords. The validity of the assumed dominant keyword is evaluated based on the classification results of the generated classifiers. The assumption is updated by the evaluation result. Repeating this process increases the prediction accuracy of the dominant keyword. Our experimental results using five corpora show the effectiveness of our proposed method with dominant keyword prediction.

2014 ◽  
Vol 12 (01) ◽  
pp. 1450004 ◽  
Author(s):  
SLAVKA JAROMERSKA ◽  
PETR PRAUS ◽  
YOUNG-RAE CHO

Reconstruction of signaling pathways is crucial for understanding cellular mechanisms. A pathway is represented as a path of a signaling cascade involving a series of proteins to perform a particular function. Since a protein pair involved in signaling and response have a strong interaction, putative pathways can be detected from protein–protein interaction (PPI) networks. However, predicting directed pathways from the undirected genome-wide PPI networks has been challenging. We present a novel computational algorithm to efficiently predict signaling pathways from PPI networks given a starting protein and an ending protein. Our approach integrates topological analysis of PPI networks and semantic analysis of PPIs using Gene Ontology data. An advanced semantic similarity measure is used for weighting each interacting protein pair. Our distance-wise algorithm iteratively selects an adjacent protein from a PPI network to build a pathway based on a distance condition. On each iteration, the strength of a hypothetical path passing through a candidate edge is estimated by a local heuristic. We evaluate the performance by comparing the resultant paths to known signaling pathways on yeast. The results show that our approach has higher accuracy and efficiency than previous methods.


Yeast ◽  
2001 ◽  
Vol 18 (6) ◽  
pp. 523-531 ◽  
Author(s):  
Haretsugu Hishigaki ◽  
Kenta Nakai ◽  
Toshihide Ono ◽  
Akira Tanigami ◽  
Toshihisa Takagi

2015 ◽  
Vol 2015 ◽  
pp. 1-7 ◽  
Author(s):  
Jianzhuang Yao ◽  
Hong Guo ◽  
Xiaohan Yang

Determining protein-protein interaction (PPI) in biological systems is of considerable importance, and prediction of PPI has become a popular research area. Although different classifiers have been developed for PPI prediction, no single classifier seems to be able to predict PPI with high confidence. We postulated that by combining individual classifiers the accuracy of PPI prediction could be improved. We developed a method called protein-protein interaction prediction classifiers merger (PPCM), and this method combines output from two PPI prediction tools, GO2PPI and Phyloprof, using Random Forests algorithm. The performance of PPCM was tested by area under the curve (AUC) using an assembled Gold Standard database that contains both positive and negative PPI pairs. Our AUC test showed that PPCM significantly improved the PPI prediction accuracy over the corresponding individual classifiers. We found that additional classifiers incorporated into PPCM could lead to further improvement in the PPI prediction accuracy. Furthermore, cross species PPCM could achieve competitive and even better prediction accuracy compared to the single species PPCM. This study established a robust pipeline for PPI prediction by integrating multiple classifiers using Random Forests algorithm. This pipeline will be useful for predicting PPI in nonmodel species.


Electronics ◽  
2021 ◽  
Vol 10 (18) ◽  
pp. 2208
Author(s):  
Maria Anna Ferlin ◽  
Michał Grochowski ◽  
Arkadiusz Kwasigroch ◽  
Agnieszka Mikołajczyk ◽  
Edyta Szurowska ◽  
...  

Machine learning-based systems are gaining interest in the field of medicine, mostly in medical imaging and diagnosis. In this paper, we address the problem of automatic cerebral microbleeds (CMB) detection in magnetic resonance images. It is challenging due to difficulty in distinguishing a true CMB from its mimics, however, if successfully solved, it would streamline the radiologists work. To deal with this complex three-dimensional problem, we propose a machine learning approach based on a 2D Faster RCNN network. We aimed to achieve a reliable system, i.e., with balanced sensitivity and precision. Therefore, we have researched and analysed, among others, impact of the way the training data are provided to the system, their pre-processing, the choice of model and its structure, and also the ways of regularisation. Furthermore, we also carefully analysed the network predictions and proposed an algorithm for its post-processing. The proposed approach enabled for obtaining high precision (89.74%), sensitivity (92.62%), and F1 score (90.84%). The paper presents the main challenges connected with automatic cerebral microbleeds detection, its deep analysis and developed system. The conducted research may significantly contribute to automatic medical diagnosis.


2020 ◽  
Vol 6 (39) ◽  
pp. eaba9338 ◽  
Author(s):  
George W. Ashdown ◽  
Michelle Dimon ◽  
Minjie Fan ◽  
Fernando Sánchez-Román Terán ◽  
Kathrin Witmer ◽  
...  

Drug resistance threatens the effective prevention and treatment of an ever-increasing range of human infections. This highlights an urgent need for new and improved drugs with novel mechanisms of action to avoid cross-resistance. Current cell-based drug screens are, however, restricted to binary live/dead readouts with no provision for mechanism of action prediction. Machine learning methods are increasingly being used to improve information extraction from imaging data. These methods, however, work poorly with heterogeneous cellular phenotypes and generally require time-consuming human-led training. We have developed a semi-supervised machine learning approach, combining human- and machine-labeled training data from mixed human malaria parasite cultures. Designed for high-throughput and high-resolution screening, our semi-supervised approach is robust to natural parasite morphological heterogeneity and correctly orders parasite developmental stages. Our approach also reproducibly detects and clusters drug-induced morphological outliers by mechanism of action, demonstrating the potential power of machine learning for accelerating cell-based drug discovery.


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Eunchul Yoon ◽  
Soonbum Kwon ◽  
Unil Yun ◽  
Sun-Yong Kim

In this paper, we propose a Doppler spread estimation approach based on machine learning for an OFDM system. We present a carefully designed neural network architecture to achieve good performance in a mixed-channel scenario in which channel characteristic variables such as Rician K factor, azimuth angle of arrival (AOA) width, mean direction of azimuth AOA, and channel estimation errors are randomly generated. When preprocessing the channel state information (CSI) collected under the mixed-channel scenario, we propose averaged power spectral density (PSD) sequence as high-quality training data in machine learning for Doppler spread estimation. We detail intermediate mathematical derivatives of the machine learning process, making it easy to graft the derived results into other wireless communication technologies. Through simulation, we show that the machine learning approach using the averaged PSD sequence as training data outperforms the other machine learning approach using the channel frequency response (CFR) sequence as training data and two other existing Doppler estimation approaches.


2018 ◽  
Author(s):  
Muhao Chen ◽  
Chelsea Jui-Ting Ju ◽  
Guangyu Zhou ◽  
Tianran Zhang ◽  
Xuelu Chen ◽  
...  

Sequence-based protein-protein interaction (PPI) prediction represents a fundamental computational biology problem. To address this problem, extensive research efforts have been made to extract predefined features from the sequences. Based on these features, statistical algorithms are learned to classify the PPIs. However, such explicit features are usually costly to extract, and typically have limited coverage on the PPI information. Hence, we present an end-to-end framework, Lasagna, for PPI predictions using only the primary sequences of a protein pair. Lasagna incorporates a deep residual recurrent convolutional neural network in the Siamese learning architecture, which leverages both robust local features and contextualized information that are significant for capturing the mutual influence of protein sequences. Our framework relieves the data pre-processing efforts that are required by other systems, and generalizes well to different application scenarios. Experimental evaluations show that Lasagna outperforms various state-of-the-art systems on the binary PPI prediction problem. Moreover, it shows a promising performance on more challenging problems of interaction type prediction and binding affinity estimation, where existing approaches fall short.


Sign in / Sign up

Export Citation Format

Share Document