A Data-Driven, Flexible Machine Learning Strategy for the Classification of Biomedical Data

Author(s):  
Rajmund L. Somorjai ◽  
Murray E. Alexander ◽  
Richard Baumgartner ◽  
Stephanie Booth ◽  
Christopher Bowman ◽  
...  
Author(s):  
Rajmund L. Somorjai ◽  
Murray E. Alexander ◽  
Richard Baumgartner ◽  
Stephanie Booth ◽  
Christopher Bowman ◽  
...  

2019 ◽  
Author(s):  
Mario Prieto ◽  
Helena Deus ◽  
Anita De Waard ◽  
Erik Schultes ◽  
Beatriz García-Jiménez ◽  
...  

The grammatical structures scholars use to express their assertions are intended to convey various degrees of certainty or speculation. Prior studies have suggested a variety of categorization systems for scholarly certainty; however, these have not been objectively tested for their validity, particularly with respect to representing the interpretation by the reader, rather than the intention of the author. In this study, we use a series of questionnaires to determine how researchers classify various scholarly assertions, using three distinct certainty classification systems. We find that there are three distinct categories of certainty along a spectrum from high to low. We show that these categories can be detected in an automated manner, using a machine learning model, with a cross-validation accuracy of 89.2% relative to an author-annotated corpus, and 82.2% accuracy against a publicly-annotated corpus. This finding provides an opportunity for contextual metadata related to certainty to be captured as a part of text-mining pipelines, which currently miss these subtle linguistic cues. We provide an exemplar machine-accessible representation - a Nanopublication - where certainty category is embedded as metadata in a formal, ontology-based manner within text-mined scholarly assertions.


2009 ◽  
Vol 2009 ◽  
pp. 1-13 ◽  
Author(s):  
Leonardo Vanneschi ◽  
Francesco Archetti ◽  
Mauro Castelli ◽  
Ilaria Giordani

Discovering the models explaining the hidden relationship between genetic material and tumor pathologies is one of the most important open challenges in biology and medicine. Given the large amount of data made available by the DNA Microarray technique, Machine Learning is becoming a popular tool for this kind of investigations. In the last few years, we have been particularly involved in the study of Genetic Programming for mining large sets of biomedical data. In this paper, we present a comparison between four variants of Genetic Programming for the classification of two different oncologic datasets: the first one contains data from healthy colon tissues and colon tissues affected by cancer; the second one contains data from patients affected by two kinds of leukemia (acute myeloid leukemia and acute lymphoblastic leukemia). We report experimental results obtained using two different fitness criteria: the receiver operating characteristic and the percentage of correctly classified instances. These results, and their comparison with the ones obtained by three nonevolutionary Machine Learning methods (Support Vector Machines, MultiBoosting, and Random Forests) on the same data, seem to hint that Genetic Programming is a promising technique for this kind of classification.


2018 ◽  
Vol 9 (5) ◽  
pp. 1289-1300 ◽  
Author(s):  
Félix Musil ◽  
Sandip De ◽  
Jack Yang ◽  
Joshua E. Campbell ◽  
Graeme M. Day ◽  
...  

Polymorphism is common in molecular crystals, whose energy landscapes usually contain many structures with similar stability, but very different physical–chemical properties. Machine-learning techniques can accelerate the evaluation of energy and properties by side-stepping accurate but demanding electronic-structure calculations, and provide a data-driven classification of the most important molecular packing motifs.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e8871
Author(s):  
Mario Prieto ◽  
Helena Deus ◽  
Anita de Waard ◽  
Erik Schultes ◽  
Beatriz García-Jiménez ◽  
...  

The grammatical structures scholars use to express their assertions are intended to convey various degrees of certainty or speculation. Prior studies have suggested a variety of categorization systems for scholarly certainty; however, these have not been objectively tested for their validity, particularly with respect to representing the interpretation by the reader, rather than the intention of the author. In this study, we use a series of questionnaires to determine how researchers classify various scholarly assertions, using three distinct certainty classification systems. We find that there are three distinct categories of certainty along a spectrum from high to low. We show that these categories can be detected in an automated manner, using a machine learning model, with a cross-validation accuracy of 89.2% relative to an author-annotated corpus, and 82.2% accuracy against a publicly-annotated corpus. This finding provides an opportunity for contextual metadata related to certainty to be captured as a part of text-mining pipelines, which currently miss these subtle linguistic cues. We provide an exemplar machine-accessible representation—a Nanopublication—where certainty category is embedded as metadata in a formal, ontology-based manner within text-mined scholarly assertions.


2019 ◽  
Author(s):  
Mario Prieto ◽  
Helena Deus ◽  
Anita De Waard ◽  
Erik Schultes ◽  
Beatriz García-Jiménez ◽  
...  

The grammatical structures scholars use to express their assertions are intended to convey various degrees of certainty or speculation. Prior studies have suggested a variety of categorization systems for scholarly certainty; however, these have not been objectively tested for their validity, particularly with respect to representing the interpretation by the reader, rather than the intention of the author. In this study, we use a series of questionnaires to determine how researchers classify various scholarly assertions, using three distinct certainty classification systems. We find that there are three distinct categories of certainty along a spectrum from high to low. We show that these categories can be detected in an automated manner, using a machine learning model, with a cross-validation accuracy of 89.2% relative to an author-annotated corpus, and 82.2% accuracy against a publicly-annotated corpus. This finding provides an opportunity for contextual metadata related to certainty to be captured as a part of text-mining pipelines, which currently miss these subtle linguistic cues. We provide an exemplar machine-accessible representation - a Nanopublication - where certainty category is embedded as metadata in a formal, ontology-based manner within text-mined scholarly assertions.


Sign in / Sign up

Export Citation Format

Share Document