Introducing the Prototypical Stimulus Characteristics Toolbox: Protosc

AbstractMany studies use different categories of images to define their conditions. Since any difference between these categories is a valid candidate to explain category-related behavioral differences, knowledge about the objective image differences between categories is crucial for the interpretation of the behaviors. However, natural images vary in many image features and not every feature is equally important in describing the differences between the categories. Here, we provide a methodological approach to find as many of the image features as possible, using machine learning performance as a tool, that have predictive value over the category the images belong to. In other words, we describe a means to find the features of a group of images by which the categories can be objectively and quantitatively defined. Note that we are not aiming to provide a means for the best possible decoding performance; instead, our aim is to uncover prototypical characteristics of the categories. To facilitate the use of this method, we offer an open-source, MATLAB-based toolbox that performs such an analysis and aids the user in visualizing the features of relevance. We first applied the toolbox to a mock data set with a ground truth to show the sensitivity of the approach. Next, we applied the toolbox to a set of natural images as a more practical example.

Download Full-text

Evaluation of Machine Learning Approaches for Automated Diagnosis of COVID-19 using X-Ray images (Preprint)

10.2196/preprints.18947 ◽

2020 ◽

Author(s):

Mazin Mohammed ◽

Karrar Hameed Abdulkareem ◽

Mashael S. Maashi ◽

Salama A. Mostafa A. Mostafa ◽

Abdullah Baz ◽

...

Keyword(s):

Machine Learning ◽

Computational Method ◽

Learning Performance ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approaches ◽

Data Set ◽

X Ray ◽

Wide Range ◽

Artificial Neural Network Ann

BACKGROUND In most recent times, global concern has been caused by a coronavirus (COVID19), which is considered a global health threat due to its rapid spread across the globe. Machine learning (ML) is a computational method that can be used to automatically learn from experience and improve the accuracy of predictions. OBJECTIVE In this study, the use of machine learning has been applied to Coronavirus dataset of 50 X-ray images to enable the development of directions and detection modalities with risk causes.The dataset contains a wide range of samples of COVID-19 cases alongside SARS, MERS, and ARDS. The experiment was carried out using a total of 50 X-ray images, out of which 25 images were that of positive COVIDE-19 cases, while the other 25 were normal cases. METHODS An orange tool has been used for data manipulation. To be able to classify patients as carriers of Coronavirus and non-Coronavirus carriers, this tool has been employed in developing and analysing seven types of predictive models. Models such as , artificial neural network (ANN), support vector machine (SVM), linear kernel and radial basis function (RBF), k-nearest neighbour (k-NN), Decision Tree (DT), and CN2 rule inducer were used in this study.Furthermore, the standard InceptionV3 model has been used for feature extraction target. RESULTS The various machine learning techniques that have been trained on coronavirus disease 2019 (COVID-19) dataset with improved ML techniques parameters. The data set was divided into two parts, which are training and testing. The model was trained using 70% of the dataset, while the remaining 30% was used to test the model. The results show that the improved SVM achieved a F1 of 97% and an accuracy of 98%. CONCLUSIONS :. In this study, seven models have been developed to aid the detection of coronavirus. In such cases, the learning performance can be improved through knowledge transfer, whereby time-consuming data labelling efforts are not required.the evaluations of all the models are done in terms of different parameters. it can be concluded that all the models performed well, but the SVM demonstrated the best result for accuracy metric. Future work will compare classical approaches with deep learning ones and try to obtain better results. CLINICALTRIAL None

Download Full-text

Open-source benchmarking for learned reaching motion generation in robotics

Paladyn Journal of Behavioral Robotics ◽

10.1515/pjbr-2015-0002 ◽

2015 ◽

Vol 6 (1) ◽

Cited By ~ 8

Author(s):

A. Lemme ◽

Y. Meirovitch ◽

M. Khansari-Zadeh ◽

T. Flash ◽

A. Billard ◽

...

Keyword(s):

Open Source ◽

Performance Measures ◽

Ground Truth ◽

Simulation System ◽

Training Data ◽

Motion Generation ◽

Data Set ◽

Generalization Ability ◽

Human Motions ◽

Technical Terms

AbstractThis paper introduces a benchmark framework to evaluate the performance of reaching motion generation approaches that learn from demonstrated examples. The system implements ten different performance measures for typical generalization tasks in robotics using open source MATLAB software. Systematic comparisons are based on a default training data set of human motions, which specify the respective ground truth. In technical terms, an evaluated motion generation method needs to compute velocities, given a state provided by the simulation system. This however is agnostic to how this is done by the method or how the methods learns from the provided demonstrations. The framework focuses on robustness, which is tested statistically by sampling from a set of perturbation scenarios. These perturbations interfere with motion generation and challenge its generalization ability. The benchmark thus helps to identify the strengths and weaknesses of competing approaches, while allowing the user the opportunity to configure the weightings between different measures.

Download Full-text

Comparison of intracranial injury predictability between machine learning algorithms and the nomogram in pediatric traumatic brain injury

Neurosurgical FOCUS ◽

10.3171/2021.8.focus2155 ◽

2021 ◽

Vol 51 (5) ◽

pp. E7

Author(s):

Thara Tunthanathip ◽

Jarunee Duangsuwan ◽

Niwan Wattanakitrungroj ◽

Sasiporn Tongman ◽

Nakornchai Phuenpathom

Keyword(s):

Machine Learning ◽

Traumatic Brain Injury ◽

Brain Injury ◽

Predictive Value ◽

Predictive Performance ◽

Training Data ◽

Intracranial Injury ◽

Data Set ◽

Head Ct ◽

Cranial Ct

OBJECTIVE The overuse of head CT examinations has been much discussed, especially those for minor traumatic brain injury (TBI). In the disruptive era, machine learning (ML) is one of the prediction tools that has been used and applied in various fields of neurosurgery. The objective of this study was to compare the predictive performance between ML and a nomogram, which is the other prediction tool for intracranial injury following cranial CT in children with TBI. METHODS Data from 964 pediatric patients with TBI were randomly divided into a training data set (75%) for hyperparameter tuning and supervised learning from 14 clinical parameters, while the remaining data (25%) were used for validation purposes. Moreover, a nomogram was developed from the training data set with similar parameters. Therefore, models from various ML algorithms and the nomogram were built and deployed via web-based application. RESULTS A random forest classifier (RFC) algorithm established the best performance for predicting intracranial injury following cranial CT of the brain. The area under the receiver operating characteristic curve for the performance of RFC algorithms was 0.80, with 0.34 sensitivity, 0.95 specificity, 0.73 positive predictive value, 0.80 negative predictive value, and 0.79 accuracy. CONCLUSIONS The ML algorithms, particularly the RFC, indicated relatively excellent predictive performance that would have the ability to support physicians in balancing the overuse of head CT scans and reducing the treatment costs of pediatric TBI in general practice.

Download Full-text

The Active Segmentation Platform for Microscopic Image Classification and Segmentation

Brain Sciences ◽

10.3390/brainsci11121645 ◽

2021 ◽

Vol 11 (12) ◽

pp. 1645

Author(s):

Sumit K. Vohra ◽

Dimiter Prodanov

Keyword(s):

Machine Learning ◽

Image Segmentation ◽

Image Classification ◽

Domain Knowledge ◽

Feature Space ◽

Ground Truth ◽

Classification Problem ◽

Data Sets ◽

Learning Approaches ◽

Data Set

Image segmentation still represents an active area of research since no universal solution can be identified. Traditional image segmentation algorithms are problem-specific and limited in scope. On the other hand, machine learning offers an alternative paradigm where predefined features are combined into different classifiers, providing pixel-level classification and segmentation. However, machine learning only can not address the question as to which features are appropriate for a certain classification problem. The article presents an automated image segmentation and classification platform, called Active Segmentation, which is based on ImageJ. The platform integrates expert domain knowledge, providing partial ground truth, with geometrical feature extraction based on multi-scale signal processing combined with machine learning. The approach in image segmentation is exemplified on the ISBI 2012 image segmentation challenge data set. As a second application we demonstrate whole image classification functionality based on the same principles. The approach is exemplified using the HeLa and HEp-2 data sets. Obtained results indicate that feature space enrichment properly balanced with feature selection functionality can achieve performance comparable to deep learning approaches. In summary, differential geometry can substantially improve the outcome of machine learning since it can enrich the underlying feature space with new geometrical invariant objects.

Download Full-text

An independent assessment of an artificial intelligence system for prostate cancer detection shows strong diagnostic accuracy

Modern Pathology ◽

10.1038/s41379-021-00794-x ◽

2021 ◽

Author(s):

Sudhir Perincheri ◽

Angelique Wolf Levi ◽

Romulo Celli ◽

Peter Gershkovich ◽

David Rimm ◽

...

Keyword(s):

Prostate Cancer ◽

Machine Learning ◽

Diagnostic Accuracy ◽

Cancer Detection ◽

Predictive Value ◽

Core Biopsy ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Cancer Center ◽

Data Set

AbstractProstate cancer is a leading cause of morbidity and mortality for adult males in the US. The diagnosis of prostate carcinoma is usually made on prostate core needle biopsies obtained through a transrectal approach. These biopsies may account for a significant portion of the pathologists’ workload, yet variability in the experience and expertise, as well as fatigue of the pathologist may adversely affect the reliability of cancer detection. Machine-learning algorithms are increasingly being developed as tools to aid and improve diagnostic accuracy in anatomic pathology. The Paige Prostate AI-based digital diagnostic is one such tool trained on the digital slide archive of New York’s Memorial Sloan Kettering Cancer Center (MSKCC) that categorizes a prostate biopsy whole-slide image as either “Suspicious” or “Not Suspicious” for prostatic adenocarcinoma. To evaluate the performance of this program on prostate biopsies secured, processed, and independently diagnosed at an unrelated institution, we used Paige Prostate to review 1876 prostate core biopsy whole-slide images (WSIs) from our practice at Yale Medicine. Paige Prostate categorizations were compared to the pathology diagnosis originally rendered on the glass slides for each core biopsy. Discrepancies between the rendered diagnosis and categorization by Paige Prostate were each manually reviewed by pathologists with specialized genitourinary pathology expertise. Paige Prostate showed a sensitivity of 97.7% and positive predictive value of 97.9%, and a specificity of 99.3% and negative predictive value of 99.2% in identifying core biopsies with cancer in a data set derived from an independent institution. Areas for improvement were identified in Paige Prostate’s handling of poor quality scans. Overall, these results demonstrate the feasibility of porting a machine-learning algorithm to an institution remote from its training set, and highlight the potential of such algorithms as a powerful workflow tool for the evaluation of prostate core biopsies in surgical pathology practices.

Download Full-text

Surrogate-guided sampling designs for classification of rare outcomes from electronic medical records data

Biostatistics ◽

10.1093/biostatistics/kxaa028 ◽

2020 ◽

Author(s):

W Katherine Tan ◽

Patrick J Heagerty

Keyword(s):

Machine Learning ◽

Clinical Outcomes ◽

Language Processing ◽

Large Scale ◽

Model Performance ◽

Learning Performance ◽

Accurate Identification ◽

Text Data ◽

Data Set ◽

The Impact

Summary Scalable and accurate identification of specific clinical outcomes has been enabled by machine-learning applied to electronic medical record systems. The development of classification models requires the collection of a complete labeled data set, where true clinical outcomes are obtained by human expert manual review. For example, the development of natural language processing algorithms requires the abstraction of clinical text data to obtain outcome information necessary for training models. However, if the outcome is rare then simple random sampling results in very few cases and insufficient information to develop accurate classifiers. Since large scale detailed abstraction is often expensive, time-consuming, and not feasible, more efficient strategies are needed. Under such resource constrained settings, we propose a class of enrichment sampling designs, where selection for abstraction is stratified by auxiliary variables related to the true outcome of interest. Stratified sampling on highly specific variables results in targeted samples that are more enriched with cases, which we show translates to increased model discrimination and better statistical learning performance. We provide mathematical details and simulation evidence that links sampling designs to their resulting prediction model performance. We discuss the impact of our proposed sampling on both model training and validation. Finally, we illustrate the proposed designs for outcome label collection and subsequent machine-learning, using radiology report text data from the Lumbar Imaging with Reporting of Epidemiology study.

Download Full-text

Open Source Infrastructure for Health Care Data Integration and Machine Learning Analyses

JCO Clinical Cancer Informatics ◽

10.1200/cci.18.00132 ◽

2019 ◽

pp. 1-16 ◽

Cited By ~ 2

Author(s):

Veli-Matti Isoviita ◽

Liina Salminen ◽

Jimmy Azar ◽

Rainer Lehtonen ◽

Pia Roering ◽

...

Keyword(s):

Machine Learning ◽

Data Integration ◽

Open Source ◽

Clinical Data ◽

Characteristic Curve ◽

Complete Response ◽

Learning System ◽

University Hospital ◽

Primary Therapy ◽

Data Set

PURPOSE We have created a cloud-based machine learning system (CLOBNET) that is an open-source, lean infrastructure for electronic health record (EHR) data integration and is capable of extract, transform, and load (ETL) processing. CLOBNET enables comprehensive analysis and visualization of structured EHR data. We demonstrate the utility of CLOBNET by predicting primary therapy outcomes of patients with high-grade serous ovarian cancer (HGSOC) on the basis of EHR data. MATERIALS AND METHODS CLOBNET is built using open-source software to make data preprocessing, analysis, and model training user friendly. The source code of CLOBNET is available in GitHub. The HGSOC data set was based on a prospective cohort of 208 patients with HGSOC who were treated at Turku University Hospital, Finland, from 2009 to 2019 for whom comprehensive clinical and EHR data were available. RESULTS We trained machine learning (ML) models using clinical data, including a herein developed dissemination score that quantifies the disease burden at the time of diagnosis, to identify patients with progressive disease (PD) or a complete response (CR) on the basis of RECIST (version 1.1). The best performance was achieved with a logistic regression model, which resulted in an area under receiver operating characteristic curve (AUROC) of 0.86, with a specificity of 73% and a sensitivity of 89%, when it classified between patients who experienced PD and CR. CONCLUSION We have developed an open-source computational infrastructure, CLOBNET, that enables effective and rapid analysis of EHR and other clinical data. Our results demonstrate that CLOBNET allows predictions to be made on the basis of EHR data to address clinically relevant questions.

Download Full-text

Open-source QSAR models for pKa prediction using multiple machine learning approaches

Journal of Cheminformatics ◽

10.1186/s13321-019-0384-1 ◽

2019 ◽

Vol 11 (1) ◽

Cited By ~ 10

Author(s):

Kamel Mansouri ◽

Neal F. Cariello ◽

Alexandru Korotcov ◽

Valery Tkachenko ◽

Chris M. Grulke ◽

...

Keyword(s):

Machine Learning ◽

Open Source ◽

Acid Dissociation ◽

Support Vector ◽

Learning Approaches ◽

Data Set ◽

Pka Prediction ◽

Chemical Structures ◽

Extreme Gradient Boosting ◽

Qsar Models

Abstract Background The logarithmic acid dissociation constant pKa reflects the ionization of a chemical, which affects lipophilicity, solubility, protein binding, and ability to pass through the plasma membrane. Thus, pKa affects chemical absorption, distribution, metabolism, excretion, and toxicity properties. Multiple proprietary software packages exist for the prediction of pKa, but to the best of our knowledge no free and open-source programs exist for this purpose. Using a freely available data set and three machine learning approaches, we developed open-source models for pKa prediction. Methods The experimental strongest acidic and strongest basic pKa values in water for 7912 chemicals were obtained from DataWarrior, a freely available software package. Chemical structures were curated and standardized for quantitative structure–activity relationship (QSAR) modeling using KNIME, and a subset comprising 79% of the initial set was used for modeling. To evaluate different approaches to modeling, several datasets were constructed based on different processing of chemical structures with acidic and/or basic pKas. Continuous molecular descriptors, binary fingerprints, and fragment counts were generated using PaDEL, and pKa prediction models were created using three machine learning methods, (1) support vector machines (SVM) combined with k-nearest neighbors (kNN), (2) extreme gradient boosting (XGB) and (3) deep neural networks (DNN). Results The three methods delivered comparable performances on the training and test sets with a root-mean-squared error (RMSE) around 1.5 and a coefficient of determination (R2) around 0.80. Two commercial pKa predictors from ACD/Labs and ChemAxon were used to benchmark the three best models developed in this work, and performance of our models compared favorably to the commercial products. Conclusions This work provides multiple QSAR models to predict the strongest acidic and strongest basic pKas of chemicals, built using publicly available data, and provided as free and open-source software on GitHub.

Download Full-text

Certifying the True Error: Machine Learning in Coq with Verified Generalization Guarantees

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33012662 ◽

2019 ◽

Vol 33 ◽

pp. 2662-2669 ◽

Cited By ~ 1

Author(s):

Alexander Bagnall ◽

Gordon Stewart

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Open Source ◽

Theorem Prover ◽

True Error ◽

Data Set ◽

Test Error ◽

Generalization Bounds ◽

Executable Code ◽

Proof Checker

We present MLCERT, a novel system for doing practical mechanized proof of the generalization of learning procedures, bounding expected error in terms of training or test error. MLCERT is mechanized in that we prove generalization bounds inside the theorem prover Coq; thus the bounds are machine checked by Coq’s proof checker. MLCERT is practical in that we extract learning procedures defined in Coq to executable code; thus procedures with proved generalization bounds can be trained and deployed in real systems. MLCERT is well documented and open source; thus we expect it to be usable even by those without Coq expertise. To validate MLCERT, which is compatible with external tools such as TensorFlow, we use it to prove generalization bounds on neural networks trained using TensorFlow on the extended MNIST data set.

Download Full-text

Detection of Falsely Elevated Point-of-Care Potassium Results Due to Hemolysis Using Predictive Analytics

American Journal of Clinical Pathology ◽

10.1093/ajcp/aqaa039 ◽

2020 ◽

Vol 154 (2) ◽

pp. 242-247

Author(s):

Robert C Benirschke ◽

Thomas J Gniadek

Keyword(s):

Machine Learning ◽

Predictive Value ◽

Predictive Analytics ◽

Point Of Care ◽

Area Under The Curve ◽

Plasma Potassium ◽

Data Set ◽

Laboratory Errors ◽

Rule Out

Abstract Objectives Preanalytical factors, such as hemolysis, affect many components of a test panel. Machine learning can be used to recognize these patterns, alerting clinicians and laboratories to potentially erroneous results. In particular, machine learning might identify which cases of elevated potassium from a point-of-care (POC) basic metabolic panel are likely erroneous. Methods Plasma potassium concentrations were compared between POC and core laboratory basic metabolic panels to identify falsely elevated POC results. A logistic regression model was created using these labels and the other analytes on the POC panel. Results This model has high predictive power in classifying POC potassium as falsely elevated or not (area under the curve of 0.995 when applied to the test data set). A rule-in and rule-out approach further improves the model’s applicability with a positive predictive value of around 90% and a negative predictive value near 100%. Conclusions Machine learning has the potential to detect laboratory errors based on the recognition of patterns in commonly requested multianalyte panels. This could be used to alert providers at the POC that a result is suspicious or used to monitor the quality of POC results.

Download Full-text