scholarly journals Text Mining in Cancer Gene and Pathway Prioritization

2014 ◽  
Vol 13s1 ◽  
pp. CIN.S13874 ◽  
Author(s):  
Yuan Luo ◽  
Gregory Riedlinger ◽  
Peter Szolovits

Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.




2014 ◽  
Vol 136 (11) ◽  
Author(s):  
Michael W. Glier ◽  
Daniel A. McAdams ◽  
Julie S. Linsey

Bioinspired design is the adaptation of methods, strategies, or principles found in nature to solve engineering problems. One formalized approach to bioinspired solution seeking is the abstraction of the engineering problem into a functional need and then seeking solutions to this function using a keyword type search method on text based biological knowledge. These function keyword search approaches have shown potential for success, but as with many text based search methods, they produce a large number of results, many of little relevance to the problem in question. In this paper, we develop a method to train a computer to identify text passages more likely to suggest a solution to a human designer. The work presented examines the possibility of filtering biological keyword search results by using text mining algorithms to automatically identify which results are likely to be useful to a designer. The text mining algorithms are trained on a pair of surveys administered to human subjects to empirically identify a large number of sentences that are, or are not, helpful for idea generation. We develop and evaluate three text classification algorithms, namely, a Naïve Bayes (NB) classifier, a k nearest neighbors (kNN) classifier, and a support vector machine (SVM) classifier. Of these methods, the NB classifier generally had the best performance. Based on the analysis of 60 word stems, a NB classifier's precision is 0.87, recall is 0.52, and F score is 0.65. We find that word stem features that describe a physical action or process are correlated with helpful sentences. Similarly, we find biological jargon feature words are correlated with unhelpful sentences.



Author(s):  
Chandrakant Ekkirala

Semantic technologies have gained prominence over the last several years. Semantic technologies are explored in detail and semantic integration of data will be outlined. The various data integration techniques and approaches will also be touched upon. Text Mining, different associated algorithms and the various tools and technologies used in text mining will be enumerated in detail. The chapter will have the following sections – 1. Data Integration Techniques • Data Integration Technique – Extraction, Transformation and Loading (ETL) • Data Integration Technique – Data Federation 2. Data Integration Approaches • Need Based Data Integration • Periodic Data Integration • Continuous Data Integration 3. Semantic Integration 4. Semantic Technologies 5. Semantic Web Technologies 6. Text Mining 7. Text Mining Algorithms 8. Tools and Technologies for Text Mining



Author(s):  
Manish Gupta ◽  
Jiawei Han

Sequential pattern mining methods have been found to be applicable in a large number of domains. Sequential data is omnipresent. Sequential pattern mining methods have been used to analyze this data and identify patterns. Such patterns have been used to implement efficient systems that can recommend based on previously observed patterns, help in making predictions, improve usability of systems, detect events, and in general help in making strategic product decisions. In this chapter, we discuss the applications of sequential data mining in a variety of domains like healthcare, education, Web usage mining, text mining, bioinformatics, telecommunications, intrusion detection, et cetera. We conclude with a summary of the work.



Author(s):  
Soumya Raychaudhuri

The genomics era has presented many new high throughput experimental modalities that are capable of producing large amounts of data on comprehensive sets of genes. In time there will certainly be many more new techniques that explore new avenues in biology. In any case, textual analysis will be an important aspect of the analysis. The body of the peer-reviewed scientific text represents all of our accomplishments in biology, and it plays a critical role in hypothesizing and interpreting any data set. To altogether ignore it is tantamount to reinventing the wheel with each analysis. The volume of relevant literature approaches proportions where it is all but impossible to manually search through all of it. Instead we must often rely on automated text mining methods to access the literature efficiently and effectively. The methods we present in this book provide an introduction to the avenues that one can employ to include text in a meaningful way in the analysis of these functional genomics data sets. They serve as a complement to the statistical methods such as classification and clustering that are commonly employed to analyze data sets. We are hopeful that this book will serve to encourage the reader to utilize and further develop text mining in their own analyses.



2020 ◽  
Vol 202 ◽  
pp. 15004
Author(s):  
Aditya Tegar Satria ◽  
Mustafid ◽  
Dinar Mutiara Kusumo Nugraheni

Nowadays, the utilization of Internet of Things (IoT) is commonly used in the tourism industry, including aviation, where passengers of flight services can rate their satisfaction levels towards the product and service they use by writing their reviews in the form of text-based data on many popular websites. These passenger reviews are collections of potential big data and can be analyzed in order to extract meaningful informations. Some text mining algorithms are already in common use, including the Bayes formula and Support Vector Machine methods. This research proposes an implementation of the Bayes and SVM methods where these algorithms will operate independently yet integrated with other modules such as input data, text pre-processing and shows output result concisely in one single information system. The proposed system was successfully delivered 1000 documents of passenger reviews as input data, then after implemented the pre-processing method, the Bayes formula was used to classify the document reviews into 5 categories, including plane condition, flight comfort, staff service, food and entertainment, and price. While simultanously, the positive and negative sentiment contained in the review document was analyzed with SVM method and shows the accuracy score of 83.6% for a training to testing set ratio of 50:50, while 82.75% accuracy for the 60:40 ratio, and 83.3% accuracy for the 70:30 ratio. This research shows that two different text mining algorithms can be implemented simultaneously in a effective and efficient way, while still providing an accurate and satisfying performance results in one integrated information system.



2020 ◽  
Vol 2 (1) ◽  
Author(s):  
Theodosios Theodosiou ◽  
Nikolaos Papanikolaou ◽  
Maria Savvaki ◽  
Giulia Bonetto ◽  
Stella Maxouri ◽  
...  

Abstract The in-depth study of protein–protein interactions (PPIs) is of key importance for understanding how cells operate. Therefore, in the past few years, many experimental as well as computational approaches have been developed for the identification and discovery of such interactions. Here, we present UniReD, a user-friendly, computational prediction tool which analyses biomedical literature in order to extract known protein associations and suggest undocumented ones. As a proof of concept, we demonstrate its usefulness by experimentally validating six predicted interactions and by benchmarking it against public databases of experimentally validated PPIs succeeding a high coverage. We believe that UniReD can become an important and intuitive resource for experimental biologists in their quest for finding novel associations within a protein network and a useful tool to complement experimental approaches (e.g. mass spectrometry) by producing sorted lists of candidate proteins for further experimental validation. UniReD is available at http://bioinformatics.med.uoc.gr/unired/



Sign in / Sign up

Export Citation Format

Share Document