Acoustilytix™: A Web-Based Automated Ultrasonic Vocalization Scoring Platform

Ultrasonic vocalizations (USVs) are known to reflect emotional processing, brain neurochemistry, and brain function. Collecting and processing USV data is manual, time-intensive, and costly, creating a significant bottleneck by limiting researchers’ ability to employ fully effective and nuanced experimental designs and serving as a barrier to entry for other researchers. In this report, we provide a snapshot of the current development and testing of Acoustilytix™, a web-based automated USV scoring tool. Acoustilytix implements machine learning methodology in the USV detection and classification process and is recording-environment-agnostic. We summarize the user features identified as desirable by USV researchers and how these were implemented. These include the ability to easily upload USV files, output a list of detected USVs with associated parameters in csv format, and the ability to manually verify or modify an automatically detected call. With no user intervention or tuning, Acoustilytix achieves 93% sensitivity (a measure of how accurately Acoustilytix detects true calls) and 73% precision (a measure of how accurately Acoustilytix avoids false positives) in call detection across four unique recording environments and was superior to the popular DeepSqueak algorithm (sensitivity = 88%; precision = 41%). Future work will include integration and implementation of machine-learning-based call type classification prediction that will recommend a call type to the user for each detected call. Call classification accuracy is currently in the 71–79% accuracy range, which will continue to improve as more USV files are scored by expert scorers, providing more training data for the classification model. We also describe a recently developed feature of Acoustilytix that offers a fast and effective way to train hand-scorers using automated learning principles without the need for an expert hand-scorer to be present and is built upon a foundation of learning science. The key is that trainees are given practice classifying hundreds of calls with immediate corrective feedback based on an expert’s USV classification. We showed that this approach is highly effective with inter-rater reliability (i.e., kappa statistics) between trainees and the expert ranging from 0.30–0.75 (average = 0.55) after only 1000–2000 calls of training. We conclude with a brief discussion of future improvements to the Acoustilytix platform.

Download Full-text

Heterologous Machine Learning for the Identification of Antimicrobial Activity in Human-Targeted Drugs

Molecules ◽

10.3390/molecules24071258 ◽

2019 ◽

Vol 24 (7) ◽

pp. 1258 ◽

Cited By ~ 2

Author(s):

Rodrigo Nava Lara ◽

Longendri Aguilera-Mendoza ◽

Carlos Brizuela ◽

Antonio Peña ◽

Gabriel Del Rio

Keyword(s):

Machine Learning ◽

Broad Spectrum ◽

Current Knowledge ◽

Human Microbiome ◽

Antibiotic Activity ◽

Training Data ◽

Classification Model ◽

Computer Assisted ◽

Antimicrobial Compounds ◽

Data Set

The emergence of microbes resistant to common antibiotics represent a current treat to human health. It has been recently recognized that non-antibiotic labeled drugs may promote antibiotic-resistance mechanisms in the human microbiome by presenting a secondary antibiotic activity; hence, the development of computer-assisted procedures to identify antibiotic activity in human-targeted compounds may assist in preventing the emergence of resistant microbes. In this regard, it is worth noting that while most antibiotics used to treat human infectious diseases are non-peptidic compounds, most known antimicrobials nowadays are peptides, therefore all computer-based models aimed to predict antimicrobials either use small datasets of non-peptidic compounds rendering predictions with poor reliability or they predict antimicrobial peptides that are not currently used in humans. Here we report a machine-learning-based approach trained to identify gut antimicrobial compounds; a unique aspect of our model is the use of heterologous training sets, in which peptide and non-peptide antimicrobial compounds were used to increase the size of the training data set. Our results show that combining peptide and non-peptide antimicrobial compounds rendered the best classification of gut antimicrobial compounds. Furthermore, this classification model was tested on the latest human-approved drugs expecting to identify antibiotics with broad-spectrum activity and our results show that the model rendered predictions consistent with current knowledge about broad-spectrum antibiotics. Therefore, heterologous machine learning rendered an efficient computational approach to classify antimicrobial compounds.

Download Full-text

Text Classification for Organizational Researchers

Organizational Research Methods ◽

10.1177/1094428117719322 ◽

2017 ◽

Vol 21 (3) ◽

pp. 766-799 ◽

Cited By ~ 18

Author(s):

Vladimer B. Kobayashi ◽

Stefan T. Mol ◽

Hannah A. Berkers ◽

Gábor Kismihók ◽

Deanne N. Den Hartog

Keyword(s):

Machine Learning ◽

Text Mining ◽

Text Classification ◽

Training Data ◽

Classification Model ◽

Data Preparation ◽

Organizational Research ◽

Job Vacancy ◽

Text Classifiers ◽

Effective Use

Organizations are increasingly interested in classifying texts or parts thereof into categories, as this enables more effective use of their information. Manual procedures for text classification work well for up to a few hundred documents. However, when the number of documents is larger, manual procedures become laborious, time-consuming, and potentially unreliable. Techniques from text mining facilitate the automatic assignment of text strings to categories, making classification expedient, fast, and reliable, which creates potential for its application in organizational research. The purpose of this article is to familiarize organizational researchers with text mining techniques from machine learning and statistics. We describe the text classification process in several roughly sequential steps, namely training data preparation, preprocessing, transformation, application of classification techniques, and validation, and provide concrete recommendations at each step. To help researchers develop their own text classifiers, the R code associated with each step is presented in a tutorial. The tutorial draws from our own work on job vacancy mining. We end the article by discussing how researchers can validate a text classification model and the associated output.

Download Full-text

Impact of Training Sample Size on the Effects of Regularization in a Convolutional Neural Network-based Dental X-ray Artifact Prediction Model

Journal of Undergraduate Life Sciences ◽

10.33137/juls.v14i1.35883 ◽

2020 ◽

Vol 14 (1) ◽

pp. 5

Author(s):

Adam Adli ◽

Pascal Tyrrell

Keyword(s):

Neural Network ◽

Machine Learning ◽

Convolutional Neural Network ◽

Sample Size ◽

Training Sample ◽

Training Data ◽

Classification Model ◽

Sample Sizes ◽

X Ray ◽

Training Sample Size

Introduction: Advances in computers have allowed for the practical application of increasingly advanced machine learning models to aid healthcare providers with diagnosis and inspection of medical images. Often, a lack of training data and computation time can be a limiting factor in the development of an accurate machine learning model in the domain of medical imaging. As a possible solution, this study investigated whether L2 regularization moderate s the overfitting that occurs as a result of small training sample sizes.Methods: This study employed transfer learning experiments on a dental x-ray binary classification model to explore L2 regularization with respect to training sample size in five common convolutional neural network architectures. Model testing performance was investigated and technical implementation details including computation times and hardware considerations as well as performance factors and practical feasibility were described.Results: The experimental results showed a trend that smaller training sample sizes benefitted more from regularization than larger training sample sizes. Further, the results showed that applying L2 regularization did not apply significant computational overhead and that the extra rounds of training L2 regularization were feasible when training sample sizes are relatively small.Conclusion: Overall, this study found that there is a window of opportunity in which the benefits of employing regularization can be most cost-effective relative to training sample size. It is recommended that training sample size should be carefully considered when forming expectations of achievable generalizability improvements that result from investing computational resources into model regularization.

Download Full-text

Variations on Associative Classifiers and Classification Results Analyses

Post-Mining of Association Rules ◽

10.4018/978-1-60566-404-0.ch009 ◽

2009 ◽

pp. 150-172 ◽

Cited By ~ 1

Author(s):

Maria-Luiza Antonie ◽

David Chodos ◽

Osmar Zaïane

Keyword(s):

Association Rules ◽

Model Building ◽

Training Data ◽

Classification Model ◽

Data Sets ◽

Web Based ◽

Building Process ◽

Rule Set ◽

Negative Association Rules ◽

Associative Classifiers

The chapter introduces the associative classifier, a classification model based on association rules, and describes the three phases of the model building process: rule generation, pruning, and selection. In the first part of the chapter, these phases are described in detail, and several variations on the associative classifier model are presented within the context of the relevant phase. These variations are: mining data sets with re-occurring items, using negative association rules, and pruning rules using graph-based techniques. Each of these departs from the standard model in a crucial way, and thus expands the classification potential. The second part of the chapter describes a system, ARC-UI that allows a user to analyze the results of classifying an item using an associative classifier. This system uses an intuitive, Web-based interface and, with this system, the user is able to see the rules that were used to classify an item, modify either the item being classified or the rule set that was used, view the relationship between attributes, rules and classes in the rule set, and analyze the training data set with respect to the item being classified.

Download Full-text

Differentially Private and Fair Classification via Calibrated Functional Mechanism

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5402 ◽

2020 ◽

Vol 34 (01) ◽

pp. 622-629

Author(s):

Jiahao Ding ◽

Xinyue Zhang ◽

Xiaohuan Li ◽

Junyi Wang ◽

Rong Yu ◽

...

Keyword(s):

Machine Learning ◽

Differential Privacy ◽

State Of The Art ◽

Autonomous Driving ◽

Training Data ◽

Classification Model ◽

Privacy Concerns ◽

Polynomial Coefficients ◽

Functional Mechanism ◽

Machine Learning Model

Machine learning is increasingly becoming a powerful tool to make decisions in a wide variety of applications, such as medical diagnosis and autonomous driving. Privacy concerns related to the training data and unfair behaviors of some decisions with regard to certain attributes (e.g., sex, race) are becoming more critical. Thus, constructing a fair machine learning model while simultaneously providing privacy protection becomes a challenging problem. In this paper, we focus on the design of classification model with fairness and differential privacy guarantees by jointly combining functional mechanism and decision boundary fairness. In order to enforce ϵ-differential privacy and fairness, we leverage the functional mechanism to add different amounts of Laplace noise regarding different attributes to the polynomial coefficients of the objective function in consideration of fairness constraint. We further propose an utility-enhancement scheme, called relaxed functional mechanism by adding Gaussian noise instead of Laplace noise, hence achieving (ϵ, δ)-differential privacy. Based on the relaxed functional mechanism, we can design (ϵ, δ)-differentially private and fair classification model. Moreover, our theoretical analysis and empirical results demonstrate that our two approaches achieve both fairness and differential privacy while preserving good utility and outperform the state-of-the-art algorithms.

Download Full-text

Sentimen Analisis Tweet Berbahasa Indonesia Dengan Deep Belief Network

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.24716 ◽

2017 ◽

Vol 11 (2) ◽

pp. 187 ◽

Cited By ~ 2

Author(s):

Ira Zulfa ◽

Edi Winarko

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Representation ◽

Deep Belief Network ◽

Training Data ◽

Classification Model ◽

Experimental Result ◽

Support Vector ◽

Machine Method ◽

Belief Network

Sentiment analysis is a computational research of opinion sentiment and emotion which is expressed in textual mode. Twitter becomes the most popular communication device among internet users. Deep Learning is a new area of machine learning research. It aims to move machine learning closer to its main goal, artificial intelligence. The purpose of deep learning is to change the manual of engineering with learning. At its growth, deep learning has algorithms arrangement that focus on non-linear data representation. One of the machine learning methods is Deep Belief Network (DBN). Deep Belief Network (DBN), which is included in Deep Learning method, is a stack of several algorithms with some extraction features that optimally utilize all resources. This study has two points. First, it aims to classify positive, negative, and neutral sentiments towards the test data. Second, it determines the classification model accuracy by using Deep Belief Network method so it would be able to be applied into the tweet classification, to highlight the sentiment class of training data tweet in Bahasa Indonesia. Based on the experimental result, it can be concluded that the best method in managing tweet data is the DBN method with an accuracy of 93.31%, compared with Naive Bayes method which has an accuracy of 79.10%, and SVM (Support Vector Machine) method with an accuracy of 92.18%.

Download Full-text

Scalable Approach to High Coverages on Oxides via Iterative Training of a Machine-Learning Algorithm

10.26434/chemrxiv.10288514.v1 ◽

2019 ◽

Author(s):

Andrew Medford ◽

Shengchun Yang ◽

Fuzhu Liu

Keyword(s):

Machine Learning ◽

Chemical Potential ◽

Learning Algorithm ◽

Absolute Error ◽

Low Energy ◽

Training Data ◽

High Coverage ◽

Metal Compounds ◽

Adsorption Energies ◽

The Stability

Understanding the interaction of multiple types of adsorbate molecules on solid surfaces is crucial to establishing the stability of catalysts under various chemical environments. Computational studies on the high coverage and mixed coverages of reaction intermediates are still challenging, especially for transition-metal compounds. In this work, we present a framework to predict differential adsorption energies and identify low-energy structures under high- and mixed-adsorbate coverages on oxide materials. The approach uses Gaussian process machine-learning models with quantified uncertainty in conjunction with an iterative training algorithm to actively identify the training set. The framework is demonstrated for the mixed adsorption of CHx, NHx and OHx species on the oxygen vacancy and pristine rutile TiO2(110) surface sites. The results indicate that the proposed algorithm is highly efficient at identifying the most valuable training data, and is able to predict differential adsorption energies with a mean absolute error of ~0.3 eV based on <25% of the total DFT data. The algorithm is also used to identify 76% of the low-energy structures based on <30% of the total DFT data, enabling construction of surface phase diagrams that account for high and mixed coverage as a function of the chemical potential of C, H, O, and N. Furthermore, the computational scaling indicates the algorithm scales nearly linearly (N1.12) as the number of adsorbates increases. This framework can be directly extended to metals, metal oxides, and other materials, providing a practical route toward the investigation of the behavior of catalysts under high-coverage conditions.

Download Full-text

Optimization of Diabetes Training DATA using Machine Learning Algorithms

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i2.283286 ◽

2018 ◽

Vol 6 (2) ◽

pp. 283-286

Author(s):

M. Samba Siva Rao ◽

◽

M.Yaswanth . ◽

K. Raghavendra Swamy ◽

◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data

Download Full-text

Comparative Analysis of Machine Learning Techniques Using Predictive Modeling

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904164539 ◽

2020 ◽

Vol 13 ◽

Author(s):

Ritu Khandelwal ◽

Hemlata Goyal ◽

Rajveer Singh Shekhawat

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Data Science ◽

Training Data ◽

Machine Learning Techniques ◽

Future Trends ◽

Data Set ◽

Learning Stage ◽

Learning Techniques ◽

Different Types

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.

Download Full-text

Multilayer Soil Moisture Mapping at a Regional Scale from Multisource Data via a Machine Learning Method

Remote Sensing ◽

10.3390/rs11030284 ◽

2019 ◽

Vol 11 (3) ◽

pp. 284 ◽

Cited By ~ 1

Author(s):

Linglin Zeng ◽

Shun Hu ◽

Daxiang Xiang ◽

Xiang Zhang ◽

Deren Li ◽

...

Keyword(s):

Machine Learning ◽

Soil Moisture ◽

Regional Scale ◽

Remotely Sensed ◽

Temporal Variations ◽

Training Data ◽

Estimation Accuracy ◽

Learning Approaches ◽

Remotely Sensed Data ◽

Deep Soil

Soil moisture mapping at a regional scale is commonplace since these data are required in many applications, such as hydrological and agricultural analyses. The use of remotely sensed data for the estimation of deep soil moisture at a regional scale has received far less emphasis. The objective of this study was to map the 500-m, 8-day average and daily soil moisture at different soil depths in Oklahoma from remotely sensed and ground-measured data using the random forest (RF) method, which is one of the machine-learning approaches. In order to investigate the estimation accuracy of the RF method at both a spatial and a temporal scale, two independent soil moisture estimation experiments were conducted using data from 2010 to 2014: a year-to-year experiment (with a root mean square error (RMSE) ranging from 0.038 to 0.050 m3/m3) and a station-to-station experiment (with an RMSE ranging from 0.044 to 0.057 m3/m3). Then, the data requirements, importance factors, and spatial and temporal variations in estimation accuracy were discussed based on the results using the training data selected by iterated random sampling. The highly accurate estimations of both the surface and the deep soil moisture for the study area reveal the potential of RF methods when mapping soil moisture at a regional scale, especially when considering the high heterogeneity of land-cover types and topography in the study area.

Download Full-text