Story and Task Issue Analysis for Agile Machine Learning Projects

AbstractObjectivesTo detect unilateral vocal fold paralysis (UVFP) from voice recordings using an explainable model of machine learning.Study DesignCase series - retrospective with a control group.MethodsPatients with confirmed UVFP through endoscopic examination (N=77) and controls with normal voices matched for age and sex (N=77) were included. Two tasks were used to elicit voice samples: reading the Rainbow Passage and sustaining phonation of the vowel /a/. The eighty-eight extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) features were extracted as inputs for four machine learning models of differing complexity. Training and testing were performed using bootstrapped cross-validation. SHAP was used to identify important features.ResultsThe median Area Under the Receiver Operating Characteristic Curve (ROC AUC) score ranged from 0.79 to 0.87 depending on model and task. After removing redundant features for explainability, the highest median ROC AUC score was 0.84 using only 13 features for the vowel task and 0.87 using 39 features for the reading task. The most important features included intensity measures, mean MFCC1, mean F1 amplitude and frequency, and shimmer variability depending on model and task.ConclusionUsing the largest dataset studying UVFP to date, we achieve high performance from just a few seconds of voice recordings while discovering which acoustic features are important across models. Notably, we demonstrate that the models use different combinations of features to achieve similar effect sizes. Overall the categories of features related to vocal fold physiology were conserved across the models. Machine learning thus provides a mechanism to detect UVFP and contextualize the accuracy relative to both model architecture and pathophysiology.Level of EvidenceType 3

Get full-text (via PubEx)

Design of a generic, open platform for machine learning-assisted indexing and clustering of articles in PubMed, a biomedical bibliographic database

Data and Information Management ◽

10.2478/dim-2018-0004 ◽

2018 ◽

Vol 2 (1) ◽

pp. 27-36 ◽

Cited By ~ 1

Author(s):

Neil R. Smalheiser ◽

Aaron M. Cohen

Keyword(s):

Machine Learning ◽

Text Mining ◽

Language Processing ◽

Similarity Measures ◽

Machine Learning Algorithms ◽

Biomedical Literature ◽

Publication Type ◽

Open Platform ◽

Training Examples ◽

Learning Projects

Abstract Many investigators have carried out text mining of the biomedical literature for a variety of purposes, ranging from the assignment of indexing terms to the disambiguation of author names. A common approach is to define positive and negative training examples, extract features from article metadata, and use machine learning algorithms. At present, each research group tackles each problem from scratch, in isolation of other projects, which causes redundancy and a great waste of effort. Here, we propose and describe the design of a generic platform for biomedical text mining, which can serve as a shared resource for machine learning projects and as a public repository for their outputs. We initially focus on a specific goal, namely, classifying articles according to publication type and emphasize how feature sets can be made more powerful and robust through the use of multiple, heterogeneous similarity measures as input to machine learning models. We then discuss how the generic platform can be extended to include a wide variety of other machine learning-based goals and projects and can be used as a public platform for disseminating the results of natural language processing (NLP) tools to end-users as well.

Get full-text (via PubEx)

TERA: optimizing stochastic regression tests in machine learning projects

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis ◽

10.1145/3460319.3464844 ◽

2021 ◽

Author(s):

Saikat Dutta ◽

Jeeva Selvam ◽

Aryaman Jain ◽

Sasa Misailovic

Keyword(s):

Machine Learning ◽

Stochastic Regression ◽

Learning Projects ◽

Regression Tests

Get full-text (via PubEx)

Teaching AI through machine learning projects

Proceedings of the 11th annual SIGCSE conference on Innovation and technology in computer science education - ITICSE '06 ◽

10.1145/1140124.1140230 ◽

2006 ◽

Cited By ~ 2

Author(s):

Ingrid Russell ◽

Zdravko Markov ◽

Todd Neller

Keyword(s):

Machine Learning ◽

Learning Projects

Get full-text (via PubEx)

Aspect Based Sentimental Analysis of Hotel Reviews: A Comparative Study

Sukkur IBA Journal of Computing and Mathematical Sciences ◽

10.30537/sjcms.v4i1.567 ◽

2020 ◽

Vol 4 (1) ◽

pp. 11-20

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Sentiment Analysis ◽

Opinion Mining ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Practical Significance ◽

Support Vector ◽

Use Of The Internet ◽

And Task

The increasing use of the internet enables users to share their opinion about what they like and dislike regarding products and services. For efficient decision making, there is a need to analyze these reviews. Sentiment analysis or opinion mining is commonly used to detect polarity (positive or negative) of reviews. But, it does not show the aspect or orientation of the text. In this study, state-of-art approaches based on supervised machine learning employed to perform three tasks on the dataset provided by SemEval. Tasks A and B are related to predicting the aspect of the restaurant’s reviews, whereas task C shows their polarity. Additionally, this study aims to compare the performance of two feature engineering techniques and five machine learning algorithms to evaluate their performance on a publicly available dataset named SemEval-2015 Task 12. The experimental results showed that the word2vec features when used with the support vector machine algorithm outperformed by giving 76%, 72% and 79% off overall accuracies for Task A, Task B, and Task C respectively. Our comparative study holds practical significance and can be used as a baseline study in the domain of aspect-based sentiment analysis.

Get full-text (via PubEx)

Minimum Viable Model Estimates for Machine Learning Projects

10.5121/csit.2020.101803 ◽

2020 ◽

Author(s):

John Hawkins

Keyword(s):

Machine Learning ◽

Open Source ◽

Predictive Model ◽

Management System ◽

Technical Difficulty ◽

Business Case ◽

Performance Characteristics ◽

Learning Projects ◽

Viable Model ◽

Python Package

Prioritization of machine learning projects requires estimates of both the potential ROI of the business case and the technical difficulty of building a model with the required characteristics. In this work we present a technique for estimating the minimum required performance characteristics of a predictive model given a set of information about how it will be used. This technique will result in robust, objective comparisons between potential projects. The resulting estimates will allow data scientists and managers to evaluate whether a proposed machine learning project is likely to succeed before any modelling needs to be done. The technique has been implemented into the open source application MinViME (Minimum Viable Model Estimator) which can be installed via the PyPI python package management system, or downloaded directly from the GitHub repository. Available at https://github.com/john-hawkins/MinViME.

Get full-text (via PubEx)

Planning an Optimal Road Trip Analysis and Drowsiness Detection using Genetic Algorithm

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35054 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 675-680

Author(s):

Kajal Khatri

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Global Search ◽

Forecasting Model ◽

Trip Generation ◽

Road Trip ◽

Drowsiness Detection ◽

The Road ◽

Learning Projects

One of the Machine Learning Projects which can promptly affect our lives is the Road Trip Analyzer. With our reliance on information and applications these days, going to new places has become the space of the excursion analyser. A solid Trip-generation Forecasting Model is the most essential piece of the traffic determining model. The undertaking has been based on the genetic algorithm which has extraordinary Worldwide Global search ability. It will permit the trip-generation forecasting model to improve the exactness of the expectation. Perhaps the greatest trouble in arranging an excursion is choosing where to stop en route. The proposed framework endeavours to coordinate with the drivers' requirement with the quickest course accessible so the clients have the smartest possible solution.

Get full-text (via PubEx)