Air Pollution Modelling by Machine Learning Methods

Petra Vidnerová; Roman Neruda

doi:10.3390/modelling2040035

Urban Air Pollution Monitoring by Neural Networks and Wireless Sensor Networks Based on LoRa

Advances in Intelligent Systems and Computing - Proceedings of the Future Technologies Conference (FTC) 2020, Volume 2 ◽

10.1007/978-3-030-63089-8_59 ◽

2020 ◽

pp. 907-919

Author(s):

Vanessa Alvear-Puertas ◽

Paul D. Rosero-Montalvo ◽

Jaime R. Michilena-Calderón ◽

Ricardo P. Arciniega-Rocha ◽

Vanessa C. Erazo-Chamorro

Keyword(s):

Neural Networks ◽

Air Pollution ◽

Wireless Sensor Networks ◽

Sensor Networks ◽

Urban Air Pollution ◽

Pollution Monitoring ◽

Wireless Sensor ◽

Urban Air ◽

Air Pollution Monitoring

Download Full-text

A Very Large-Scale Bioactivity Comparison of Deep Learning and Multiple Machine Learning Algorithms for Drug Discovery

10.26434/chemrxiv.12781241 ◽

2020 ◽

Author(s):

Thomas R. Lane ◽

Daniel H. Foil ◽

Eni Minerali ◽

Fabio Urbina ◽

Kimberley M. Zorn ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Drug Discovery ◽

Deep Neural Networks ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies we have applied multiple machine learning algorithms, modeling metrics and in some cases compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and comparison of our proprietary software Assay CentralTM with random forest, k-Nearest Neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (3 levels). Model performance <a>was</a> assessed using an array of five-fold cross-validation metrics including area-under-the-curve, F1 score, Cohen’s kappa and Matthews correlation coefficient. <a>Based on ranked normalized scores for the metrics or datasets all methods appeared comparable while the distance from the top indicated Assay CentralTM and support vector classification were comparable. </a>Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case where minimal tuning was performed of any of the methods. If anything, Assay CentralTM may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay CentralTMperformance, but support vector classification seems to be a strong competitor. We also apply Assay CentralTM to prospective predictions for PXR and hERG to further validate these models. This work currently appears to be the largest comparison of machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors and algorithms, as well as further refining methods for evaluating and comparing models.

Download Full-text

Global Horizontal and Direct Normal Solar Irradiance Modeling by the Machine Learning Methods Xgboost and Deep Neural Networks with CNN-LSTM Layers: A Case Study Using the GOES-16 Satellite Imagery

SSRN Electronic Journal ◽

10.2139/ssrn.3957836 ◽

2021 ◽

Author(s):

Paulo Alexandre Rocha ◽

Victor Santos

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Satellite Imagery ◽

Solar Irradiance ◽

Deep Neural Networks ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

A Very Large-Scale Bioactivity Comparison of Deep Learning and Multiple Machine Learning Algorithms for Drug Discovery

10.26434/chemrxiv.12781241.v1 ◽

2020 ◽

Author(s):

Thomas R. Lane ◽

Daniel H. Foil ◽

Eni Minerali ◽

Fabio Urbina ◽

Kimberley M. Zorn ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Drug Discovery ◽

Deep Neural Networks ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies we have applied multiple machine learning algorithms, modeling metrics and in some cases compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and comparison of our proprietary software Assay CentralTM with random forest, k-Nearest Neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (3 levels). Model performance <a>was</a> assessed using an array of five-fold cross-validation metrics including area-under-the-curve, F1 score, Cohen’s kappa and Matthews correlation coefficient. <a>Based on ranked normalized scores for the metrics or datasets all methods appeared comparable while the distance from the top indicated Assay CentralTM and support vector classification were comparable. </a>Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case where minimal tuning was performed of any of the methods. If anything, Assay CentralTM may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay CentralTMperformance, but support vector classification seems to be a strong competitor. We also apply Assay CentralTM to prospective predictions for PXR and hERG to further validate these models. This work currently appears to be the largest comparison of machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors and algorithms, as well as further refining methods for evaluating and comparing models.

Download Full-text

Potential for Vertical Heterogeneity Prediction in Reservoir Basing on Machine Learning Methods

Geofluids ◽

10.1155/2020/3713525 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Hongqing Song ◽

Shuyi Du ◽

Ruifei Wang ◽

Jiulong Wang ◽

Yuhe Wang ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Neural Networks ◽

Rapid Development ◽

Petroleum Industry ◽

Production Data ◽

Learning Methods ◽

Machine Learning Methods ◽

Vertical Heterogeneity ◽

Dynamic Production

With the rapid development of computer technology, some machine learning methods have begun to gradually integrate into the petroleum industry and have achieved some achievements, whether in conventional or unconventional reservoirs. This paper presents an alternative method to predict vertical heterogeneity of the reservoir utilizing various deep neural networks basing on dynamic production data. A numerical simulation technique was adopted to obtain the required dataset, which contains dynamic production data calculated under different heterogeneous reservoir conditions. Machine learning models were established through deep neural networks, which learn and capture the characteristics better between dynamic production data and reservoir heterogeneity, so as to invert the vertical permeability. On the basis of model validation, the results show that machine learning methods have excellent performance in predicting heterogeneity with the RMSE of 12.71 mD, which effectively estimated the permeability of the entire reservoir. Moreover, the overall AARD of the predictive result obtained by the CNN method was controlled at 11.51%, revealing the highest accuracy compared with BP and LSTM neural networks. And the permeability contrast, an important parameter to characterize heterogeneity, can be predicted precisely as well, with a derivation of below 10%. This study proposed a potential for vertical heterogeneity prediction in reservoir basing on machine learning methods.

Download Full-text

An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments

Sustainability ◽

10.3390/su11030699 ◽

2019 ◽

Vol 11 (3) ◽

pp. 699 ◽

Cited By ~ 13

Author(s):

Lkhagvadorj Munkhdalai ◽

Tsendsuren Munkhdalai ◽

Oyun-Erdene Namsrai ◽

Jong Lee ◽

Keun Ryu

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Random Forest ◽

Deep Neural Networks ◽

Credit Scoring ◽

Support Vector ◽

Learning Approaches ◽

Learning Methods ◽

Human Expert ◽

Machine Learning Methods

Machine learning and artificial intelligence have achieved a human-level performance in many application domains, including image classification, speech recognition and machine translation. However, in the financial domain expert-based credit risk models have still been dominating. Establishing meaningful benchmark and comparisons on machine-learning approaches and human expert-based models is a prerequisite in further introducing novel methods. Therefore, our main goal in this study is to establish a new benchmark using real consumer data and to provide machine-learning approaches that can serve as a baseline on this benchmark. We performed an extensive comparison between the machine-learning approaches and a human expert-based model—FICO credit scoring system—by using a Survey of Consumer Finances (SCF) data. As the SCF data is non-synthetic and consists of a large number of real variables, we applied two variable-selection methods: the first method used hypothesis tests, correlation and random forest-based feature importance measures and the second method was only a random forest-based new approach (NAP), to select the best representative features for effective modelling and to compare them. We then built regression models based on various machine-learning algorithms ranging from logistic regression and support vector machines to an ensemble of gradient boosted trees and deep neural networks. Our results demonstrated that if lending institutions in the 2001s had used their own credit scoring model constructed by machine-learning methods explored in this study, their expected credit losses would have been lower, and they would be more sustainable. In addition, the deep neural networks and XGBoost algorithms trained on the subset selected by NAP achieve the highest area under the curve (AUC) and accuracy, respectively.

Download Full-text

Urban Air Pollution Mapping Using Fleet Vehicles as Mobile Monitors and Machine Learning

Environmental Science & Technology ◽

10.1021/acs.est.0c08034 ◽

2021 ◽

Vol 55 (8) ◽

pp. 5579-5588

Author(s):

Bu Zhao ◽

Long Yu ◽

Chunyan Wang ◽

Chenyang Shuai ◽

Ji Zhu ◽

...

Keyword(s):

Machine Learning ◽

Air Pollution ◽

Urban Air Pollution ◽

Urban Air

Download Full-text

Acoustic feature-based sentiment analysis of call center data

10.32469/10355/66751 ◽

2017 ◽

Author(s):

◽

Zeshan Peng

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Emotion Recognition ◽

Sentiment Analysis ◽

Call Center ◽

Machine Learning Algorithms ◽

Language Recognition ◽

Acoustic Features ◽

Learning Methods ◽

Machine Learning Methods

With the advancement of machine learning methods, audio sentiment analysis has become an active research area in recent years. For example, business organizations are interested in persuasion tactics from vocal cues and acoustic measures in speech. A typical approach is to find a set of acoustic features from audio data that can indicate or predict a customer's attitude, opinion, or emotion state. For audio signals, acoustic features have been widely used in many machine learning applications, such as music classification, language recognition, emotion recognition, and so on. For emotion recognition, previous work shows that pitch and speech rate features are important features. This thesis work focuses on determining sentiment from call center audio records, each containing a conversation between a sales representative and a customer. The sentiment of an audio record is considered positive if the conversation ended with an appointment being made, and is negative otherwise. In this project, a data processing and machine learning pipeline for this problem has been developed. It consists of three major steps: 1) an audio record is split into segments by speaker turns; 2) acoustic features are extracted from each segment; and 3) classification models are trained on the acoustic features to predict sentiment. Different set of features have been used and different machine learning methods, including classical machine learning algorithms and deep neural networks, have been implemented in the pipeline. In our deep neural network method, the feature vectors of audio segments are stacked in temporal order into a feature matrix, which is fed into deep convolution neural networks as input. Experimental results based on real data shows that acoustic features, such as Mel frequency cepstral coefficients, timbre and Chroma features, are good indicators for sentiment. Temporal information in an audio record can be captured by deep convolutional neural networks for improved prediction accuracy.

Download Full-text

Machine Learning Methods Applied for Modeling the Process of Obtaining Bricks Using Silicon-Based Materials

Materials ◽

10.3390/ma14237232 ◽

2021 ◽

Vol 14 (23) ◽

pp. 7232

Author(s):

Costel Anton ◽

Silvia Curteanu ◽

Cătălin Lisa ◽

Florin Leon

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Raw Materials ◽

Optimization Procedure ◽

Exhaust Emission ◽

Energy Potential ◽

Learning Methods ◽

Machine Learning Methods ◽

Emission Changes ◽

The Impact

Most of the time, industrial brick manufacture facilities are designed and commissioned for a particular type of manufacture mix and a particular type of burning process. Productivity and product quality maintenance and improvement is a challenge for process engineers. Our paper aims at using machine learning methods to evaluate the impact of adding new auxiliary materials on the amount of exhaust emissions. Experimental determinations made in similar conditions enabled us to build a database containing information about 121 brick batches. Various models (artificial neural networks and regression algorithms) were designed to make predictions about exhaust emission changes when auxiliary materials are introduced into the manufacture mix. The best models were feed-forward neural networks with two hidden layers, having MSE < 0.01 and r2 > 0.82 and, as regression model, kNN with error < 0.6. Also, an optimization procedure, including the best models, was developed in order to determine the optimal values for the parameters that assure the minimum quantities for the gas emission. The Pareto front obtained in the multi-objective optimization conducted with grid search method allows the user the chose the most convenient values for the dry product mass, clay, ash and organic raw materials which minimize gas emissions with energy potential.

Download Full-text

A Low-Cost Solution for Urban Air-Pollution Monitoring Using Existing Infrastructure and Loosely Connected Ground Based Sensing Equipment

2005 International Conference on Information and Communication Technologies ◽

10.1109/icict.2005.1598577 ◽

2006 ◽

Cited By ~ 2

Author(s):

M.J. Ikram ◽

A.A. Akram ◽

M. Amin

Keyword(s):

Air Pollution ◽

Low Cost ◽

Urban Air Pollution ◽

Pollution Monitoring ◽

Urban Air ◽

Air Pollution Monitoring

Download Full-text