Modeling dengue vector population with earth observation data and a generalized linear model

Mosquitoes propagate many human diseases, some widespread and with no vaccines. The Ae. aegypti mosquito vector transmits Zika, Chikungunya, and Dengue viruses. Effective public health interventions to control the spread of these diseases and protect the population require models that explain the core environmental drivers of the vector population. Field campaigns are expensive, and data from meteorological sites that feed models with the required environmental data often lack detail. As a consequence, we explore temporal modeling of the population of Ae. aegypti mosquito vector species and environmental conditions- temperature, moisture, precipitation, and vegetation- have been shown to have significant effects. We use earth observation (EO) data as our source for estimating these biotic and abiotic environmental variables based on proxy features, namely: Normalized difference vegetation index, Normalized difference water index, Precipitation, and Land surface temperature. We obtained our response variable from field-collected mosquito population measured weekly using 791 mosquito traps in Vila Velha city, Brazil, for 36 weeks in 2017, and 40 weeks in 2018. Recent similar studies have used machine learning (ML) techniques for this task. However, these techniques are neither intuitive nor explainable from an operational point of view. As a result, we use a Generalized Linear Model (GLM) to model this relationship due to its fitness for count response variable modeling, its interpretability, and the ability to visualize the confidence intervals for all inferences. Also, to improve our model, we use the Akaike Information Criterion to select the most informative environmental features. Finally, we show how to improve the quality of the model by weighting our GLM. Our resulting weighted GLM compares well in quality with ML techniques: Random Forest and Support Vector Machines. These results provide an advancement with regards to qualitative and explainable epidemiological risk modeling in urban environments.

Download Full-text

Modeling dengue vector population with earth observation data and a generalized linear model

Acta Tropica ◽

10.1016/j.actatropica.2020.105809 ◽

2021 ◽

Vol 215 ◽

pp. 105809

Author(s):

Oladimeji Mudele ◽

Alejandro C. Frery ◽

Lucas F.R. Zanandrez ◽

Alvaro E. Eiras ◽

Paolo Gamba

Keyword(s):

Linear Model ◽

Generalized Linear Model ◽

Earth Observation ◽

Observation Data ◽

Dengue Vector ◽

Vector Population ◽

Earth Observation Data

Download Full-text

Analisis Trend Topik Penelitian pada Web Of Science dan SINTA untuk Penentuan Tema Tugas Akhir Mahasiswa AMIK Indonesia Banda Aceh

Jurnal SAINTEKOM ◽

10.33020/saintekom.v10i1.91 ◽

2020 ◽

Vol 10 (1) ◽

pp. 13

Author(s):

Bahruni Bahruni ◽

Fathurrahmad Fathurrahmad

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Deep Learning ◽

Decision Tree ◽

Linear Model ◽

Generalized Linear Model ◽

Core Collection ◽

Web Of Science ◽

Support Vector ◽

Large Margin

Penelitian ini mencoba melakukan penambangan dengan menggunakan teknologi web untuk mengumpulkan data informasi yang berasal dari Web of Science dan SINTA yang dikumpulkan. Metodologi Cross Industry Standard Process for Data Mining (CRISP–DM) digunakan sebagai standard proses data mining sekaligus sebagai metode penelitian. Peneliti mengumpulkan data melalui daftar jurnal Web of Science dan SINTA. Untuk melacak trend topik penelitian, peneliti memilih rentang waktu dari tahun 2018 sampai dengan 2019 dan mengekspor data dari Web of Science Core Collection pada April 2019. Ada 38.162 publikasi yang berhasil diambil di Web-Science-defined kategori Ilmu Komputer dan Sistem Informasi dan 230 diambil dari website SINTA. Tetapi, penulis hanya mengambil 20 Jurnal dengan H-Index Tertinggi di Web of Science Core Collection. Sedangkan pada SINTA, penulis juga mengambil 20 Jurnal dengan rangking SINTA 1 dan 2. penelitian ini menyimpulkan topik penelitian dalam jurnal Web of Science dan dikaitkan dengan dengan tren topik penelitian dan yang muncul terbanyak adalah learning, network, analysis, system, control, data, image, optimization, systems, dan neural. Adapun untuk klasifikasi menggunakan model Naive Bayes, Generalized Linear Model, Logistic Regression, Fast Large Margin, Deep Learning, Decision Tree, Random Forest, Gradient Boosted Trees, dan Support Vector Machine. Berdasarkan hasil akurasi, model Generalized Linear Model dan Decision Tree memiliki akurasi sebesar 94.3%, sedangkan Gradient Boosted Trees memiliki persentase akurasi sebesar 93.8%. Naive Bayes menunjukkan tingkat akurasi sebesar 91.4%, diikuti dengan model Fast Large Margin, Deep Learning, Random Forest, dan Support Vector Machine memiliki akurasi sebesar 91.4%. Nilai dengan akurasi terendah menggunakan model Logistic Regression sebesar 65.2%. Hal ini menunjukan bahwa tingkat akurasi tertinggi yaitu dengan menggunakan model Generalized Linear Model dan Decision Tree sehingga hasilnya dapat memprediksi cukup akurat.

Download Full-text

Modeling the Temporal Population Distribution of Ae. aegypti Mosquito using Big Earth Observation Data.

10.36227/techrxiv.11086871.v3 ◽

2020 ◽

Author(s):

Oladimeji Mudele ◽

Fabio M. Bayer ◽

Lucas Zanandrez ◽

Alvaro Eiras ◽

Paolo Gamba

Keyword(s):

Machine Learning ◽

Statistical Models ◽

Earth Observation ◽

Machine Learning Techniques ◽

Distribution Model ◽

Support Vector ◽

World Population ◽

Observation Data ◽

Vector Population ◽

Learning Techniques

<div>Over 50% of the world population is at risk of mosquito-borne diseases. Female Ae. aegypti mosquito species transmit Zika, Dengue, and Chikungunya. The spread of these diseases correlate positively with the vector population, and this population depends on biotic and abiotic environmental factors including temperature, vegetation condition, humidity and precipitation. To combat virus outbreaks, information about vector population is required. To this aim, Earth observation (EO) data provide fast, efficient and economically viable means to estimate environmental features of interest. In this work, we present a temporal distribution model for adult female Ae. aegypti mosquitoes based on the joint use of the Normalized Difference Vegetation Index, the Normalized Difference Water Index, the Land Surface Temperature (both at day and night time), along with the precipitation information, extracted from EO data. The model was applied separately to data obtained during three different vector control and field data collection condition regimes, and used to explain the differences in environmental variable contributions across these regimes. To this aim, a random forest (RF) regression technique and its nonlinear features importance ranking based on mean decrease impurity (MDI) were implemented. To prove the robustness of the proposed model, other machine learning techniques, including support vector regression, decision trees and k-nearest neighbor regression, as well as artificial neural networks, and statistical models such as the linear regression model and generalized linear model were also considered. Our results show that machine learning techniques perform better than linear statistical models for the task at hand, and RF performs best. By ranking the importance of all features based on MDI in RF and selecting the subset comprising the most</div>

Download Full-text

Modeling the Temporal Population Distribution of Ae. aegypti Mosquito using Big Earth Observation Data.

10.36227/techrxiv.11086871 ◽

2020 ◽

Author(s):

Oladimeji Mudele ◽

Fabio M. Bayer ◽

Lucas Zanandrez ◽

Alvaro Eiras ◽

Paolo Gamba

Keyword(s):

Machine Learning ◽

Statistical Models ◽

Earth Observation ◽

Machine Learning Techniques ◽

Distribution Model ◽

Support Vector ◽

World Population ◽

Observation Data ◽

Vector Population ◽

Learning Techniques

<div>Over 50% of the world population is at risk of mosquito-borne diseases. Female Ae. aegypti mosquito species transmit Zika, Dengue, and Chikungunya. The spread of these diseases correlate positively with the vector population, and this population depends on biotic and abiotic environmental factors including temperature, vegetation condition, humidity and precipitation. To combat virus outbreaks, information about vector population is required. To this aim, Earth observation (EO) data provide fast, efficient and economically viable means to estimate environmental features of interest. In this work, we present a temporal distribution model for adult female Ae. aegypti mosquitoes based on the joint use of the Normalized Difference Vegetation Index, the Normalized Difference Water Index, the Land Surface Temperature (both at day and night time), along with the precipitation information, extracted from EO data. The model was applied separately to data obtained during three different vector control and field data collection condition regimes, and used to explain the differences in environmental variable contributions across these regimes. To this aim, a random forest (RF) regression technique and its nonlinear features importance ranking based on mean decrease impurity (MDI) were implemented. To prove the robustness of the proposed model, other machine learning techniques, including support vector regression, decision trees and k-nearest neighbor regression, as well as artificial neural networks, and statistical models such as the linear regression model and generalized linear model were also considered. Our results show that machine learning techniques perform better than linear statistical models for the task at hand, and RF performs best. By ranking the importance of all features based on MDI in RF and selecting the subset comprising the most</div>

Download Full-text

Mapping wind erosion hazard with regression-based machine learning algorithms

Scientific Reports ◽

10.1038/s41598-020-77567-0 ◽

2020 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Hamid Gholami ◽

Aliakbar Mohammadifar ◽

Dieu Tien Bui ◽

Adrian L. Collins

Keyword(s):

Neural Network ◽

Machine Learning ◽

Linear Model ◽

Generalized Linear Model ◽

Wind Erosion ◽

Generalized Additive Model ◽

Additive Model ◽

Support Vector ◽

Erosion Hazard ◽

Stochastic Gradient Boosting

AbstractLand susceptibility to wind erosion hazard in Isfahan province, Iran, was mapped by testing 16 advanced regression-based machine learning methods: Robust linear regression (RLR), Cforest, Non-convex penalized quantile regression (NCPQR), Neural network with feature extraction (NNFE), Monotone multi-layer perception neural network (MMLPNN), Ridge regression (RR), Boosting generalized linear model (BGLM), Negative binomial generalized linear model (NBGLM), Boosting generalized additive model (BGAM), Spline generalized additive model (SGAM), Spike and slab regression (SSR), Stochastic gradient boosting (SGB), support vector machine (SVM), Relevance vector machine (RVM) and the Cubist and Adaptive network-based fuzzy inference system (ANFIS). Thirteen factors controlling wind erosion were mapped, and multicollinearity among these factors was quantified using the tolerance coefficient (TC) and variance inflation factor (VIF). Model performance was assessed by RMSE, MAE, MBE, and a Taylor diagram using both training and validation datasets. The result showed that five models (MMLPNN, SGAM, Cforest, BGAM and SGB) are capable of delivering a high prediction accuracy for land susceptibility to wind erosion hazard. DEM, precipitation, and vegetation (NDVI) are the most critical factors controlling wind erosion in the study area. Overall, regression-based machine learning models are efficient techniques for mapping land susceptibility to wind erosion hazards.

Download Full-text

Modelling world energy security data from multinomial distribution by generalized linear model under different cumulative link functions

Open Chemistry ◽

10.1515/chem-2018-0053 ◽

2018 ◽

Vol 16 (1) ◽

pp. 377-385

Author(s):

Neslihan Iyit

Keyword(s):

Linear Model ◽

Energy Security ◽

Generalized Linear Model ◽

Multinomial Distribution ◽

Energy Performance ◽

Link Function ◽

Response Variable ◽

Ordinal Response ◽

Link Functions ◽

Cumulative Logit

AbstractEnergy securityis one of the major components of energy sustainability in the world’s energy performance. In this study,energy securityis taken as an ordinal response variable coming from the multinomial distribution with the energy grade levelsA,B,C, andD. Thereafter, the worldenergy securitydata is tried to be statistically modelled by usinggeneralized linear model (GLM)approach for the ordinal response variable under different cumulative link functions. The cumulative link functions comparatively used in this study are cumulative logit, cumulative probit, cumulative complementary log-log, cumulative Cauchit, and cumulative negative log-log. In order to avoid a multicollinearity problem in the data structure, principal component analysis (PCA) technique is integrated with theGLMapproach for the ordinal response variable. In this study, statistically, the importance of determining the best cumulative link function on the accuracy of parameter estimates, confidence intervals, and hypothesis tests in theGLMfor the multinomially distributed response variable is highlighted. In terms of energy evaluation, by usingcumulative logitas the best cumulative link function,energy sources consumptions,electricity productions from nuclear energy,natural gas,oil,coal,and hydroelectric,energy use per capita and energy importsare found to have statistically significant effects onenergy securityin the world’s energy performance.

Download Full-text

Avaliando aprendizado de máquina na previsão de curto prazo de séries temporais de energia solar

Revista Brasileira de Computação Aplicada ◽

10.5335/rbca.v13i2.12581 ◽

2021 ◽

Vol 13 (2) ◽

pp. 105-112

Author(s):

Naylene Fraccanabbia ◽

Viviana Cocco Mariani

Keyword(s):

Neural Network ◽

Support Vector Regression ◽

Principal Components Analysis ◽

Linear Model ◽

Principal Components ◽

Generalized Linear Model ◽

Support Vector ◽

Components Analysis

Fontes alternativas de energia estão se tornando cada vez mais frequentes, tendo como objetivo reduzir a poluição ambiental, além de serem ideais para superar a crise energética, logo, neste contexto, a energia solar se destaca por ser abundante. Devido ao alto nível de incerteza dos fatores que interferem diretamente na geração de energia solar, como temperatura e radiação solar, realizar previsões de energia solar com alta precisão é um desafio. Assim, o objetivo deste artigo é desenvolver um modelo de previsão por meio de séries temporais que possibilite prever a produção de energia solar, para 1, 3 e 6 passos à frente, enfatizando a potencialidade da rede neural, utilizando um banco de dados de uma usina fotovoltaica localizada no Uruguai. Para o desenvolvimento da proposta, técnicas de pré-processamento e os métodos de previsão regressão de vetores de suporte (Support Vector Regression, SVR), rede neural perceptron multicamadas com regularização bayesiana (Bayesian Regularized Neural Network, BRNN) e modelo linear generalizado (Generalized Linear Model, GLM) foram combinados. Por fim, tais combinações foram comparadas usando medidas de desempenho. Notou-se que a combinação da análise de componentes principais (Principal Components Analysis - PCA) e a Rede Neural Perceptron Multicamadas com Regularização Bayesiana obteve os melhores resultados, utilizando as três medidas de desempenho.

Download Full-text