Cephalopod species identification using integrated analysis of machine learning and deep learning approaches

Background Despite the high commercial fisheries value and ecological importance as prey item for higher marine predators, very limited taxonomic work has been done on cephalopods in Malaysia. Due to the soft-bodied nature of cephalopods, the identification of cephalopod species based on the beak hard parts can be more reliable and useful than conventional body morphology. Since the traditional method for species classification was time-consuming, this study aimed to develop an automated identification model that can identify cephalopod species based on beak images. Methods A total of 174 samples of seven cephalopod species were collected from the west coast of Peninsular Malaysia. Both upper and lower beaks were extracted from the samples and the left lateral views of upper and lower beak images were acquired. Three types of traditional morphometric features were extracted namely grey histogram of oriented gradient (HOG), colour HOG, and morphological shape descriptor (MSD). In addition, deep features were extracted by using three pre-trained convolutional neural networks (CNN) models which are VGG19, InceptionV3, and Resnet50. Eight machine learning approaches were used in the classification step and compared for model performance. Results The results showed that the Artificial Neural Network (ANN) model achieved the best testing accuracy of 91.14%, using the deep features extracted from the VGG19 model from lower beak images. The results indicated that the deep features were more accurate than the traditional features in highlighting morphometric differences from the beak images of cephalopod species. In addition, the use of lower beaks of cephalopod species provided better results compared to the upper beaks, suggesting that the lower beaks possess more significant morphological differences between the studied cephalopod species. Future works should include more cephalopod species and sample size to enhance the identification accuracy and comprehensiveness of the developed model.

Download Full-text

Forecasting the risk at infractions: an ensemble comparison of machine learning approach

Industrial Management & Data Systems ◽

10.1108/imds-10-2020-0603 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Lei Li ◽

Desheng Wu

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Short Term Memory ◽

Model Performance ◽

Large Data ◽

Support Vector ◽

Learning Approaches ◽

Content Type ◽

Day To Day Operations ◽

Prediction Approach

PurposeThe infraction of securities regulations (ISRs) of listed firms in their day-to-day operations and management has become one of common problems. This paper proposed several machine learning approaches to forecast the risk at infractions of listed corporates to solve financial problems that are not effective and precise in supervision.Design/methodology/approachThe overall proposed research framework designed for forecasting the infractions (ISRs) include data collection and cleaning, feature engineering, data split, prediction approach application and model performance evaluation. We select Logistic Regression, Naïve Bayes, Random Forest, Support Vector Machines, Artificial Neural Network and Long Short-Term Memory Networks (LSTMs) as ISRs prediction models.FindingsThe research results show that prediction performance of proposed models with the prior infractions provides a significant improvement of the ISRs than those without prior, especially for large sample set. The results also indicate when judging whether a company has infractions, we should pay attention to novel artificial intelligence methods, previous infractions of the company, and large data sets.Originality/valueThe findings could be utilized to address the problems of identifying listed corporates' ISRs at hand to a certain degree. Overall, results elucidate the value of the prior infraction of securities regulations (ISRs). This shows the importance of including more data sources when constructing distress models and not only focus on building increasingly more complex models on the same data. This is also beneficial to the regulatory authorities.

Download Full-text

Forecasting Of Covid-19 Cases Using Machine Learning Approach

Current Respiratory Medicine Reviews ◽

10.2174/1573398x17666210129131009 ◽

2021 ◽

Vol 17 ◽

Author(s):

Sachin Kumar ◽

Karan Veer

Keyword(s):

Machine Learning ◽

Regression Model ◽

Model Performance ◽

Real Data ◽

Absolute Error ◽

Viral Disease ◽

Support Vector ◽

Family Welfare ◽

Accuracy Score ◽

Learning Approaches

Aims: The objective of this research is to predict the covid-19 cases in India based on the machine learning approaches. Background: Covid-19, a respiratory disease caused by one of the coronavirus family members, has led to a pandemic situation worldwide in 2020. This virus was detected firstly in Wuhan city of China in December 2019. This viral disease has taken less than three months to spread across the globe. Objective: In this paper, we proposed a regression model based on the Support vector machine (SVM) to forecast the number of deaths, the number of recovered cases, and total confirmed cases for the next 30 days. Method: For prediction, the data is collected from Github and the ministry of India's health and family welfare from March 14, 2020, to December 3, 2020. The model has been designed in Python 3.6 in Anaconda to forecast the forecasting value of corona trends until September 21, 2020. The proposed methodology is based on the prediction of values using SVM based regression model with polynomial, linear, rbf kernel. The dataset has been divided into train and test datasets with 40% and 60% test size and verified with real data. The model performance parameters are evaluated as a mean square error, mean absolute error, and percentage accuracy. Results and Conclusion: The results show that the polynomial model has obtained 95 % above accuracy score, linear scored above 90%, and rbf scored above 85% in predicting cumulative death, conformed cases, and recovered cases.

Download Full-text

Building attention and edge message passing neural networks for bioactivity and physical–chemical property prediction

Journal of Cheminformatics ◽

10.1186/s13321-019-0407-y ◽

2020 ◽

Vol 12 (1) ◽

Cited By ~ 9

Author(s):

M. Withnall ◽

E. Lindelöf ◽

O. Engkvist ◽

H. Chen

Keyword(s):

Machine Learning ◽

Message Passing ◽

A Priori ◽

Molecular Graph ◽

Model Performance ◽

Learning Approaches ◽

Property Prediction ◽

Physical Chemical ◽

Chemical Descriptor ◽

Hyperparameter Selection

AbstractNeural Message Passing for graphs is a promising and relatively recent approach for applying Machine Learning to networked data. As molecules can be described intrinsically as a molecular graph, it makes sense to apply these techniques to improve molecular property prediction in the field of cheminformatics. We introduce Attention and Edge Memory schemes to the existing message passing neural network framework, and benchmark our approaches against eight different physical–chemical and bioactivity datasets from the literature. We remove the need to introduce a priori knowledge of the task and chemical descriptor calculation by using only fundamental graph-derived properties. Our results consistently perform on-par with other state-of-the-art machine learning approaches, and set a new standard on sparse multi-task virtual screening targets. We also investigate model performance as a function of dataset preprocessing, and make some suggestions regarding hyperparameter selection.

Download Full-text

Upper body activity classification using an inertial measurement unit in court and field-based sports: A systematic review

Proceedings of the Institution of Mechanical Engineers Part P Journal of Sports Engineering and Technology ◽

10.1177/1754337120959754 ◽

2020 ◽

pp. 175433712095975

Author(s):

Joseph McGrath ◽

Jonathon Neville ◽

Tom Stewart ◽

John Cronin

Keyword(s):

Machine Learning ◽

Model Performance ◽

Upper Body ◽

Measurement Unit ◽

Learning Approaches ◽

Classification Problems ◽

Activity Classification ◽

Inertial Measurement ◽

Body Activity ◽

Study Results

Inertial measurement units (IMUs) are becoming increasingly popular in activity classification and workload measurement in sport. This systematic literature review focuses on upper body activity classification in court or field-based sports. The aim of this paper is to provide sport scientists and coaches with an overview of the past research in this area, as well as the processes and challenges involved in activity classification. The SPORTDiscus, PubMed and Scopus databases were searched, resulting in 20 articles. Both manually defined algorithms and machine learning approaches have been used to classify IMU data with varying degrees of success. Manually defined algorithms may offer simplicity and reduced computational demand, whereas machine learning may be beneficial for complex classification problems. Inter-study results show that no one machine learning model is best for activity classification; differences in sensor placement, IMU specification and pre-processing decisions can all affect model performance. Accurate classification of sporting activities could benefit players, coaches and team medical personnel by providing an objective estimate of workload. This could help to prevent injuries, enhance performance and provide valuable data to coaching staff.

Download Full-text

Estimation of PM2.5 Concentrations in New York State: Understanding the Influence of Vertical Mixing on Surface PM2.5 Using Machine Learning

Atmosphere ◽

10.3390/atmos11121303 ◽

2020 ◽

Vol 11 (12) ◽

pp. 1303

Author(s):

Wei-Ting Hung ◽

Cheng-Hsuan (Sarah) Lu ◽

Stefano Alessandrini ◽

Rajesh Kumar ◽

Chin-An Lin

Keyword(s):

Machine Learning ◽

New York ◽

New York State ◽

Vertical Mixing ◽

Model Performance ◽

Machine Learning Techniques ◽

Joint Analysis ◽

Ann Model ◽

York State ◽

Ann Models

In New York State (NYS), episodic high fine particulate matter (PM2.5) concentrations associated with aerosols originated from the Midwest, Mid-Atlantic, and Pacific Northwest states have been reported. In this study, machine learning techniques, including multiple linear regression (MLR) and artificial neural network (ANN), were used to estimate surface PM2.5 mass concentrations at air quality monitoring sites in NYS during the summers of 2016–2019. Various predictors were considered, including meteorological, aerosol, and geographic predictors. Vertical predictors, designed as the indicators of vertical mixing and aloft aerosols, were also applied. Overall, the ANN models performed better than the MLR models, and the application of vertical predictors generally improved the accuracy of PM2.5 estimation of the ANN models. The leave-one-out cross-validation results showed significant cross-site variations and were able to present the different predictor-PM2.5 correlations at the sites with different PM2.5 characteristics. In addition, a joint analysis of regression coefficients from the MLR model and variable importance from the ANN model provided insights into the contributions of selected predictors to PM2.5 concentrations. The improvements in model performance due to aloft aerosols were relatively minor, probably due to the limited cases of aloft aerosols in current datasets.

Download Full-text

Using satellite imagery to understand and promote sustainable development

Science ◽

10.1126/science.abe8628 ◽

2021 ◽

Vol 371 (6535) ◽

pp. eabe8628

Author(s):

Marshall Burke ◽

Anne Driscoll ◽

David B. Lobell ◽

Stefano Ermon

Keyword(s):

Machine Learning ◽

Sustainable Development ◽

Satellite Imagery ◽

Model Building ◽

Model Performance ◽

Training Data ◽

Learning Approaches ◽

Research Directions ◽

Development Outcomes ◽

Research And Policy

Accurate and comprehensive measurements of a range of sustainable development outcomes are fundamental inputs into both research and policy. We synthesize the growing literature that uses satellite imagery to understand these outcomes, with a focus on approaches that combine imagery with machine learning. We quantify the paucity of ground data on key human-related outcomes and the growing abundance and improving resolution (spatial, temporal, and spectral) of satellite imagery. We then review recent machine learning approaches to model-building in the context of scarce and noisy training data, highlighting how this noise often leads to incorrect assessment of model performance. We quantify recent model performance across multiple sustainable development domains, discuss research and policy applications, explore constraints to future progress, and highlight research directions for the field.

Download Full-text

A comparative study of supervised machine learning approaches for slope failure production

E3S Web of Conferences ◽

10.1051/e3sconf/202132501001 ◽

2021 ◽

Vol 325 ◽

pp. 01001

Author(s):

Ashanira Mat Deris ◽

Badariah Solemon ◽

Rohayu Che Omar

Keyword(s):

Machine Learning ◽

Slope Failure ◽

Pressure Ratio ◽

Slope Angle ◽

Friction Angle ◽

Supervised Machine Learning ◽

Unit Weight ◽

Learning Approaches ◽

Ann Model ◽

Grey Relational

Over the years, machine learning, which is a well-known method in artificial intelligent (AI) field has become a new trend and extensively applied in various applications to solve a realworld problem. This includes slope failure prediction. Slope failure is among the major geo-hazard phenomenon which gives the significant impact to the environment or human beings. The estimation of slope failure in slope stability analysis is a complex geotechnical engineering problem that involves many factors such as geology, topography, atmosphere, and land occupancy. Generally, slope failure can be estimated based on traditional methods such as limit equilibrium method (LEM) or finite equilibrium method (FEM). However, beside the methods are quite tedious and time consuming, LEM and FEM have their own limitations and do not guarantee the effectiveness when dealing against problem with various geometry or assumptions. Hence, the introduction of machine learning approaches provides the alternative tools for the prediction of slope failure. Current study applies two mostly used supervised machine learning approaches, support vector machine (SVM) and decision tree (DT) to predict the slope failure based on classification problem using historical cases. 148 of slope cases with six input parameters namely “unit weight, cohesion, internal friction angle, slope angle, slope height and pore pressure ratio and factor of safety (FOS) as an output parameter”, was collected from multinational dataset that has been extracted from the literature. For development of the prediction model, the slope data was divided into 80% training data and 20% testing data. The prediction result from testing data was validated based on statistical analysis. The result shows that SVM model has outperformed DT model by giving the prediction accuracy of 97%. ith the advent of technology and the introduction of computational intelligent methods, the prediction of slope failure using the machine learning (ML) approach is rapidly growing for the past few decades. This study employs an “artificial neural network” (ANN) to predict the slope failures based on historical circular slope cases. Using the feed-forward backpropagation algorithm with a multilayer perceptron network, ANN is a powerful ML method capable of predicting the complex model of slope cases. However, the prediction result of ANN can be improved by integrating the statistical analysis method, namely grey relational analysis (GRA), to the ANN model. GRA is capable of identifying the influencing factors of the input data based on the correlation level of the reference sequence and comparability sequence of the dataset. This statistical machine learning model can analyze the slope data and eliminate the unnecessary data samples to improve the prediction performance. Grey relational analysis-artificial neural network (GRANN) prediction model was developed based on six slope factors: unit weight, friction angle, cohesion, pore pressure ratio, slope height, and slope angle, with the factor of safety (FOS) as the output factor. The prediction results were analyzed based on accuracy percentage and receiver operating characteristic (ROC) values. It shows that the GRANN model has outperformed the ANN model by giving 99% accuracy and 0.999 ROC value, compared with 91% and 0.929.

Download Full-text

Dynamic species classification of microorganisms across time, abiotic and biotic environments — a sliding window approach

10.1101/105395 ◽

2017 ◽

Author(s):

Frank Pennekamp ◽

Jason I. Griffiths ◽

Emanuel A. Fronhofer ◽

Aurélie Garnier ◽

Mathew Seymour ◽

...

Keyword(s):

Machine Learning ◽

Visual Information ◽

Evolutionary Ecology ◽

Learning Algorithm ◽

Sliding Window ◽

Learning Approaches ◽

Species Classification ◽

Abiotic Conditions ◽

Ciliate Species ◽

Window Approach

Summary1. Technological advances have greatly simplified to take and analyze digital images and videos, and ecologists increasingly use these techniques for trait, behavioral and taxonomic analyses. The development of techniques to automate biological measurements from the environment opens up new possibilities to infer species numbers, observe presence/absence patterns and recognize individuals based on audio-visual information.2. Streams of quantitative data, such as temporal species abundances, are processed by machine learning (ML) algorithms into meaningful information. Machine learning approaches learn to distinguish classes (e.g., species) from observed quantitative features (phenotypes), and in-turn predict the distinguished classes in subsequent observations. However, in biological systems, the environment changes, often driving phenotypic changes in behaviour and morphology.3. Here we describe a framework for classifying species under dynamic biotic and abiotic conditions using a novel sliding window approach. We train a random forest classifier on subsets of the data, covering restricted temporal, biotic and abiotic ranges (i.e. windows). We test our approach by applying the classification framework to experimental microbial communities where results were validated against manual classification. Individuals from one to six ciliate species were monitored over hundreds of generations in dozens of different species combinations and over a temperature gradient. We describe the steps of our classification pipeline and systematically explore the effects of the abiotic and biotic environments as well as temporal effects on classification success.4. Differences in biotic and abiotic conditions caused simplistic classification approaches to be unsuccessful. In contrast, the sliding window approach allowed classification to be highly successful, because phenotypic differences driven by environmental change could be captured in the learning algorithm. Importantly, automatic classification showed comparable success compared to manual identifications.5. Our framework allows for reliable classification even in dynamic environmental contexts, and may help to improve long-term monitoring of species from environmental samples. It therefore has application in disciplines with automatic enumeration and phenotyping of organisms such as eco-toxicology, ecology and evolutionary ecology, and broad-scale environmental monitoring.

Download Full-text

Classifying Autism from Crowdsourced Semi-Structured Speech Recordings: A Machine Learning System (Preprint)

10.2196/preprints.35406 ◽

2021 ◽

Author(s):

Nathan Chi ◽

Peter Washington ◽

Aaron Kline ◽

Arman Husic ◽

Cathy Hou ◽

...

Keyword(s):

Machine Learning ◽

Neurodevelopmental Disorder ◽

Model Performance ◽

Autism Spectrum ◽

Learning System ◽

Learning Approaches ◽

Mel Frequency Cepstral Coefficients ◽

Audio Features ◽

Specialized Equipment ◽

Child Speech

BACKGROUND Autism spectrum disorder (ASD) is a neurodevelopmental disorder which results in altered behavior, social development, and communication patterns. In past years, autism prevalence has tripled, with 1 in 54 children now affected. Given that traditional diagnosis is a lengthy, labor-intensive process which requires the work of trained physicians, significant attention has been given to developing systems that automatically diagnose and screen for autism. OBJECTIVE Prosody abnormalities are among the most clear signs of autism, with affected children displaying speech idiosyncrasies (including echolalia, monotonous intonation, atypical pitch, and irregular linguistic stress patterns). In this work, we present a suite of machine learning approaches to detect autism in self-recorded speech audio captured from autistic and neurotypical (NT) children in home environments. METHODS We consider three methods to detect autism in child speech: first, Random Forests trained on extracted audio features (including Mel-frequency cepstral coefficients); second, convolutional neural networks (CNNs) trained on spectrograms; and third, fine-tuned wav2vec 2.0—a state-of-the-art Transformer-based speech recognition model. We train our classifiers on our novel dataset of cellphone-recorded child speech audio curated from Stanford’s Guess What? mobile game, an app designed to crowdsource videos of autistic and neurotypical children in a natural home environment. RESULTS The Random Forest classifier achieves 70% accuracy, the fine-tuned wav2vec 2.0 model achieves 77% accuracy, and the CNN achieves 79% accuracy when classifying children’s audio as either ASD or NT. We use five-fold cross-validation to evaluate model performance. CONCLUSIONS Our models were able to predict autism status when training on a varied selection of home audio clips with inconsistent recording qualities, which may be more generalizable to real world conditions. The results demonstrate that machine learning methods offer promise in detecting autism automatically from speech without specialized equipment.

Download Full-text

Don’t Overweight Weights: Evaluation of Weighting Strategies for Multi-Task Bioactivity Classification Models

Molecules ◽

10.3390/molecules26226959 ◽

2021 ◽

Vol 26 (22) ◽

pp. 6959

Author(s):

Lina Humbeck ◽

Tobias Morawietz ◽

Noe Sturm ◽

Adam Zalewski ◽

Simon Harnqvist ◽

...

Keyword(s):

Machine Learning ◽

High Performance ◽

Model Performance ◽

Pharmaceutical Companies ◽

Data Sets ◽

Learning Approaches ◽

Classification Models ◽

Real World Data ◽

Robust Model ◽

Performance Results

Machine learning models predicting the bioactivity of chemical compounds belong nowadays to the standard tools of cheminformaticians and computational medicinal chemists. Multi-task and federated learning are promising machine learning approaches that allow privacy-preserving usage of large amounts of data from diverse sources, which is crucial for achieving good generalization and high-performance results. Using large, real world data sets from six pharmaceutical companies, here we investigate different strategies for averaging weighted task loss functions to train multi-task bioactivity classification models. The weighting strategies shall be suitable for federated learning and ensure that learning efforts are well distributed even if data are diverse. Comparing several approaches using weights that depend on the number of sub-tasks per assay, task size, and class balance, respectively, we find that a simple sub-task weighting approach leads to robust model performance for all investigated data sets and is especially suited for federated learning.

Download Full-text