An investigation on machine-learning models for the prediction of cyanobacteria growth

Harmful algal blooms, which are a danger to the lives of humans and animals, are caused by a sudden increase in the concentration of cyanobacteria in freshwater lakes. Cyanobacteria concentrations can be reliably measured using chemical and biological indicators, but the measurement process of the indicators is either labor-intensive or very costly. These limitations do not allow the general public to measure concentrations, so local health organizations or departments regularly assume the responsibility of measuring water quality. While computational models exist to predict algal concentrations, the accuracy of these models and need for customization due to varied lake conditions make them generally not yet reliable. We find that common regression-error functions cannot sufficiently evaluate the performance of cyanobacteria prediction models because the occurrence of harmful algal blooms is rare. Therefore, we present a method of forecasting cyanobacteria concentrations in freshwater lakes based on a machine-learning model trained on a dataset from Lake Utah with automatically-measured indicators from lake buoys. We compare several models and find that a support vector machine with a radial basis function kernel for regression reliably forecasts harmful algal blooms using comparatively few and easy-to-obtain input parameters. The special feature of the model is that it exclusively uses variables that can be measured by the general public without great effort and costs, and the amount of data necessary to train such a model is relatively minimal, allowing different models to be trained to accommodate for the nuances of different lakes.

Download Full-text

A Review of Recent Machine Learning Advances for Forecasting Harmful Algal Blooms and Shellfish Contamination

Journal of Marine Science and Engineering ◽

10.3390/jmse9030283 ◽

2021 ◽

Vol 9 (3) ◽

pp. 283

Author(s):

Rafaela C. Cruz ◽

Pedro Reis Costa ◽

Susana Vinga ◽

Ludwig Krippahl ◽

Marta B. Lopes

Keyword(s):

Machine Learning ◽

Graphical Models ◽

Harmful Algal Blooms ◽

Algal Blooms ◽

Data Availability ◽

Mitigation Measures ◽

Support Vector ◽

Governmental Agencies ◽

Vector Machines ◽

Source Data

Harmful algal blooms (HABs) are among the most severe ecological marine problems worldwide. Under favorable climate and oceanographic conditions, toxin-producing microalgae species may proliferate, reach increasingly high cell concentrations in seawater, accumulate in shellfish, and threaten the health of seafood consumers. There is an urgent need for the development of effective tools to help shellfish farmers to cope and anticipate HAB events and shellfish contamination, which frequently leads to significant negative economic impacts. Statistical and machine learning forecasting tools have been developed in an attempt to better inform the shellfish industry to limit damages, improve mitigation measures and reduce production losses. This study presents a synoptic review covering the trends in machine learning methods for predicting HABs and shellfish biotoxin contamination, with a particular focus on autoregressive models, support vector machines, random forest, probabilistic graphical models, and artificial neural networks (ANN). Most efforts have been attempted to forecast HABs based on models of increased complexity over the years, coupled with increased multi-source data availability, with ANN architectures in the forefront to model these events. The purpose of this review is to help defining machine learning-based strategies to support shellfish industry to manage their harvesting/production, and decision making by governmental agencies with environmental responsibilities.

Download Full-text

Machine Learning Classification Algorithms for Predicting Karenia brevis Blooms on the West Florida Shelf

Journal of Marine Science and Engineering ◽

10.3390/jmse9090999 ◽

2021 ◽

Vol 9 (9) ◽

pp. 999

Author(s):

Marvin F. Li ◽

Patricia M. Glibert ◽

Vyacheslav Lyubchich

Keyword(s):

Machine Learning ◽

Harmful Algal Blooms ◽

Algal Blooms ◽

Large River ◽

Karenia Brevis ◽

Machine Learning Algorithms ◽

Support Vector ◽

Economic Damage ◽

The West ◽

Machine Learning Classification

Harmful algal blooms (HABs), events that kill fish, impact human health in multiple ways, and contaminate water supplies, have increased in frequency, magnitude, and impacts in numerous marine and freshwaters around the world. Blooms of the toxic dinoflagellate Karenia brevis have resulted in thousands of tons of dead fish, deaths to many other marine organisms, numerous respiratory-related hospitalizations, and tens to hundreds of millions of dollars in economic damage along the West Florida coast in recent years. Four types of machine learning algorithms, Support Vector Machine (SVM), Relevance Vector Machine (RVM), Naïve Bayes classifier (NB), and Artificial Neural Network (ANN), were developed and compared in their ability to predict these blooms. Comparing the 21 year monitoring dataset of K. brevis abundance, RVM and NB were found to have better skills in bloom prediction than the other two approaches. The importance of upwelling-favorable northerly winds in increasing K. brevis probability, and of onshore westerly winds in preventing blooms from dispersing offshore, were quantified using RVM, and all models were used to explore the importance of large river flows and the nutrients they supply in regulating blooms. These models provide new tools for management of these devastating algal blooms.

Download Full-text

A Remote Sensing and Machine Learning-Based Approach to Forecast the Onset of Harmful Algal Bloom

Remote Sensing ◽

10.3390/rs13193863 ◽

2021 ◽

Vol 13 (19) ◽

pp. 3863

Author(s):

Moein Izadi ◽

Mohamed Sultan ◽

Racha El Kadiri ◽

Amin Ghannadi ◽

Karem Abdelmohsen

Keyword(s):

Machine Learning ◽

Satellite Data ◽

Harmful Algal Blooms ◽

Algal Blooms ◽

Harmful Algal Bloom ◽

Lag Time ◽

Controlling Factors ◽

Support Vector ◽

Red Tides ◽

Temporal Models

In the last few decades, harmful algal blooms (HABs, also known as “red tides”) have become one of the most detrimental natural phenomena in Florida’s coastal areas. Karenia brevis produces toxins that have harmful effects on humans, fisheries, and ecosystems. In this study, we developed and compared the efficiency of state-of-the-art machine learning models (e.g., XGBoost, Random Forest, and Support Vector Machine) in predicting the occurrence of HABs. In the proposed models the K. brevis abundance is used as the target, and 10 level-02 ocean color products extracted from daily archival MODIS satellite data are used as controlling factors. The adopted approach addresses two main shortcomings of earlier models: (1) the paucity of satellite data due to cloudy scenes and (2) the lag time between the period at which a variable reaches its highest correlation with the target and the time the bloom occurs. Eleven spatio-temporal models were generated, each from 3 consecutive day satellite datasets, with a forecasting span from 1 to 11 days. The 3-day models addressed the potential variations in lag time for some of the temporal variables. One or more of the generated 11 models could be used to predict HAB occurrences depending on availability of the cloud-free consecutive days. Findings indicate that XGBoost outperformed the other methods, and the forecasting models of 5–9 days achieved the best results. The most reliable model can forecast eight days ahead of time with balanced overall accuracy, Kappa coefficient, F-Score, and AUC of 96%, 0.93, 0.97, and 0.98 respectively. The euphotic depth, sea surface temperature, and chlorophyll-a are always among the most significant controlling factors. The proposed models could potentially be used to develop an “early warning system” for HABs in southwest Florida.

Download Full-text

Monitoring Coastal Chlorophyll-a Concentrations in Coastal Areas Using Machine Learning Models

Water ◽

10.3390/w10081020 ◽

2018 ◽

Vol 10 (8) ◽

pp. 1020 ◽

Cited By ~ 7

Author(s):

Yong Kown ◽

Seung Baek ◽

Young Lim ◽

JongCheol Pyo ◽

Mayzonee Ligaray ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Artificial Neural Network ◽

Support Vector Machine ◽

Chlorophyll A ◽

Harmful Algal Blooms ◽

Algal Blooms ◽

Support Vector ◽

Landsat 8 ◽

Artificial Neural

Harmful algal blooms have negatively affected the aquaculture industry and aquatic ecosystems globally. Remote sensing using satellite sensor systems has been applied on large spatial scales with high temporal resolutions for effective monitoring of harmful algal blooms in coastal waters. However, oceanic color satellites have limitations, such as low spatial resolution of sensor systems and the optical complexity of coastal waters. In this study, bands 1 to 4, obtained from Landsat-8 Operational Land Imager satellite images, were used to evaluate the performance of empirical ocean chlorophyll algorithms using machine learning techniques. Artificial neural network and support vector machine techniques were used to develop an optimal chlorophyll-a model. Four-band, four-band-ratio, and mixed reflectance datasets were tested to select the appropriate input dataset for estimating chlorophyll-a concentration using the two machine learning models. While the ocean chlorophyll algorithm application on Landsat-8 Operational Land Imager showed relatively low performance, the machine learning methods showed improved performance during both the training and validation steps. The artificial neural network and support vector machine demonstrated a similar level of prediction accuracy. Overall, the support vector machine showed slightly superior performance to that of the artificial neural network during the validation step. This study provides practical information about effective monitoring systems for coastal algal blooms.

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

Machine Learning Methods Applied to the Prediction of Pseudo-nitzschia spp. Blooms in the Galician Rias Baixas (NW Spain)

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10040199 ◽

2021 ◽

Vol 10 (4) ◽

pp. 199

Author(s):

Francisco M. Bellas Aláez ◽

Jesus M. Torres Palenzuela ◽

Evangelos Spyrakos ◽

Luis González Vilas

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Prediction Models ◽

Support Vector ◽

False Alarms ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Rías Baixas ◽

New Algorithms

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.

Download Full-text

Development of Machine Learning Models for Prediction of Smoking Cessation Outcome

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18052584 ◽

2021 ◽

Vol 18 (5) ◽

pp. 2584

Author(s):

Cheng-Chien Lai ◽

Wei-Hsin Huang ◽

Betty Chia-Chen Chang ◽

Lee-Ching Hwang

Keyword(s):

Machine Learning ◽

Smoking Cessation ◽

Success Rate ◽

Prediction Models ◽

Smoking Status ◽

Medical Center ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Support Vector ◽

Smoking Cessation Outcome

Predictors for success in smoking cessation have been studied, but a prediction model capable of providing a success rate for each patient attempting to quit smoking is still lacking. The aim of this study is to develop prediction models using machine learning algorithms to predict the outcome of smoking cessation. Data was acquired from patients underwent smoking cessation program at one medical center in Northern Taiwan. A total of 4875 enrollments fulfilled our inclusion criteria. Models with artificial neural network (ANN), support vector machine (SVM), random forest (RF), logistic regression (LoR), k-nearest neighbor (KNN), classification and regression tree (CART), and naïve Bayes (NB) were trained to predict the final smoking status of the patients in a six-month period. Sensitivity, specificity, accuracy, and area under receiver operating characteristic (ROC) curve (AUC or ROC value) were used to determine the performance of the models. We adopted the ANN model which reached a slightly better performance, with a sensitivity of 0.704, a specificity of 0.567, an accuracy of 0.640, and an ROC value of 0.660 (95% confidence interval (CI): 0.617–0.702) for prediction in smoking cessation outcome. A predictive model for smoking cessation was constructed. The model could aid in providing the predicted success rate for all smokers. It also had the potential to achieve personalized and precision medicine for treatment of smoking cessation.

Download Full-text

Machine Learning-Based Prediction of Air Quality

Applied Sciences ◽

10.3390/app10249151 ◽

2020 ◽

Vol 10 (24) ◽

pp. 9151

Author(s):

Yun-Chia Liang ◽

Yona Maimury ◽

Angela Hsiang-Ling Chen ◽

Josue Rodolfo Cuevas Juarez

Keyword(s):

Machine Learning ◽

Air Quality ◽

Random Forest ◽

Prediction Models ◽

Superior Performance ◽

Support Vector ◽

Economic Activities ◽

Adaptive Boosting ◽

Series Of Experiments ◽

Artificial Neural Network Ann

Air, an essential natural resource, has been compromised in terms of quality by economic activities. Considerable research has been devoted to predicting instances of poor air quality, but most studies are limited by insufficient longitudinal data, making it difficult to account for seasonal and other factors. Several prediction models have been developed using an 11-year dataset collected by Taiwan’s Environmental Protection Administration (EPA). Machine learning methods, including adaptive boosting (AdaBoost), artificial neural network (ANN), random forest, stacking ensemble, and support vector machine (SVM), produce promising results for air quality index (AQI) level predictions. A series of experiments, using datasets for three different regions to obtain the best prediction performance from the stacking ensemble, AdaBoost, and random forest, found the stacking ensemble delivers consistently superior performance for R2 and RMSE, while AdaBoost provides best results for MAE.

Download Full-text

Forecasting the risk at infractions: an ensemble comparison of machine learning approach

Industrial Management & Data Systems ◽

10.1108/imds-10-2020-0603 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Lei Li ◽

Desheng Wu

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Short Term Memory ◽

Model Performance ◽

Large Data ◽

Support Vector ◽

Learning Approaches ◽

Content Type ◽

Day To Day Operations ◽

Prediction Approach

PurposeThe infraction of securities regulations (ISRs) of listed firms in their day-to-day operations and management has become one of common problems. This paper proposed several machine learning approaches to forecast the risk at infractions of listed corporates to solve financial problems that are not effective and precise in supervision.Design/methodology/approachThe overall proposed research framework designed for forecasting the infractions (ISRs) include data collection and cleaning, feature engineering, data split, prediction approach application and model performance evaluation. We select Logistic Regression, Naïve Bayes, Random Forest, Support Vector Machines, Artificial Neural Network and Long Short-Term Memory Networks (LSTMs) as ISRs prediction models.FindingsThe research results show that prediction performance of proposed models with the prior infractions provides a significant improvement of the ISRs than those without prior, especially for large sample set. The results also indicate when judging whether a company has infractions, we should pay attention to novel artificial intelligence methods, previous infractions of the company, and large data sets.Originality/valueThe findings could be utilized to address the problems of identifying listed corporates' ISRs at hand to a certain degree. Overall, results elucidate the value of the prior infraction of securities regulations (ISRs). This shows the importance of including more data sources when constructing distress models and not only focus on building increasingly more complex models on the same data. This is also beneficial to the regulatory authorities.

Download Full-text

Machine Learning Frameworks in Cancer Detection

E3S Web of Conferences ◽

10.1051/e3sconf/202129701073 ◽

2021 ◽

Vol 297 ◽

pp. 01073

Author(s):

Sabyasachi Pramanik ◽

K. Martin Sagayam ◽

Om Prakash Jena

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Cancer Development ◽

Support Vector ◽

Learning Approaches ◽

Learning Techniques ◽

Fact Finding ◽

Risk Of Cancer

Cancer has been described as a diverse illness with several distinct subtypes that may occur simultaneously. As a result, early detection and forecast of cancer types have graced essentially in cancer fact-finding methods since they may help to improve the clinical treatment of cancer survivors. The significance of categorizing cancer suffers into higher or lower-threat categories has prompted numerous fact-finding associates from the bioscience and genomics field to investigate the utilization of machine learning (ML) algorithms in cancer diagnosis and treatment. Because of this, these methods have been used with the goal of simulating the development and treatment of malignant diseases in humans. Furthermore, the capacity of machine learning techniques to identify important characteristics from complicated datasets demonstrates the significance of these technologies. These technologies include Bayesian networks and artificial neural networks, along with a number of other approaches. Decision Trees and Support Vector Machines which have already been extensively used in cancer research for the creation of predictive models, also lead to accurate decision making. The application of machine learning techniques may undoubtedly enhance our knowledge of cancer development; nevertheless, a sufficient degree of validation is required before these approaches can be considered for use in daily clinical practice. An overview of current machine learning approaches utilized in the simulation of cancer development is presented in this paper. All of the supervised machine learning approaches described here, along with a variety of input characteristics and data samples, are used to build the prediction models. In light of the increasing trend towards the use of machine learning methods in biomedical research, we offer the most current papers that have used these approaches to predict risk of cancer or patient outcomes in order to better understand cancer.

Download Full-text