scholarly journals Improved Prediction of Dimethyl Sulfide (DMS) Distributions in the NE Subarctic Pacific using Machine Learning Algorithms

2021 ◽  
Author(s):  
Brandon McNabb ◽  
Philippe Tortell

Abstract. Dimethyl sulfide (DMS) is a volatile biogenic gas with the potential to influence regional climate as a source of atmospheric aerosols and cloud condensation nuclei (CCN). The complexity of the oceanic DMS cycle presents a challenge in accurately predicting sea-surface concentrations and sea-air fluxes of this gas. In this study, we applied machine learning methods to model the distribution of DMS in the NE Subarctic Pacific (NESAP), a global DMS hot-spot. Using nearly two decades of ship-based DMS observations, combined with satellite-derived oceanographic data, we constructed ensembles of 1000 machine-learning models using two techniques, random forest regression (RFR) and artificial neural networks (ANN). Our models dramatically improve upon existing statistical DMS models, capturing up to 62 % of observed DMS variability in the NESAP and demonstrate notable regional patterns that are associated with mesoscale oceanographic variability. In particular, our results indicate a strong coherence between DMS concentrations, sea surface nitrate (SSN) concentrations, photosynthetically active radiation (PAR) and sea surface height anomalies (SSHA), suggesting that NESAP DMS cycling is primarily influenced by heterogenous nutrient availability, light-dependent processes and physical mixing. Based on our model output, we derive summertime, sea-air flux estimates ranging between 0.5–2.0 Tg S yr−1 in the NESAP. Our work demonstrates a new approach to capturing spatial and temporal patterns in DMS variability, which is likely applicable to other oceanic regions.

2021 ◽  
Author(s):  
El houssaine Bouras ◽  
Lionel Jarlan ◽  
Salah Er-Raki ◽  
Riad Balaghi ◽  
Abdelhakim Amazirh ◽  
...  

<p>Cereals are the main crop in Morocco. Its production exhibits a high inter-annual due to uncertain rainfall and recurrent drought periods. Considering the importance of this resource to the country's economy, it is thus important for decision makers to have reliable forecasts of the annual cereal production in order to pre-empt importation needs. In this study, we assessed the joint use of satellite-based drought indices, weather (precipitation and temperature) and climate data (pseudo-oscillation indices including NAO and the leading modes of sea surface temperature -SST- in the mid-latitude and in the tropical area) to predict cereal yields at the level of the agricultural province using machine learning algorithms (Support Vector Machine -SVM-, Random forest -FR- and eXtreme Gradient Boost -XGBoost-) in addition to Multiple Linear Regression (MLR). Also, we evaluate the models for different lead times along the growing season from January (about 5 months before harvest) to March (2 months before harvest). The results show the combination of data from the different sources outperformed the use of a single dataset; the highest accuracy being obtained when the three data sources were all considered in the model development. In addition, the results show that the models can accurately predict yields in January (5 months before harvesting) with an R² = 0.90 and RMSE about 3.4 Qt.ha<sup>-1</sup>.  When comparing the model’s performance, XGBoost represents the best one for predicting yields. Also, considering specific models for each province separately improves the statistical metrics by approximately 10-50% depending on the province with regards to one global model applied to all the provinces. The results of this study pointed out that machine learning is a promising tool for cereal yield forecasting. Also, the proposed methodology can be extended to different crops and different regions for crop yield forecasting.</p>


2021 ◽  
Author(s):  
Elizaveta Felsche ◽  
Ralf Ludwig

Abstract. There is strong scientific and social interest to understand the factors leading to extreme events in order to improve the management of risks associated with hazards like droughts. In this study, artificial neural networks are applied to predict the occurrence of a drought in two contrasting European domains, Munich and Lisbon, with a lead time of one month. The approach takes into account a list of 30 atmospheric and soil variables as input parameters from a single-model initial condition large ensemble (CRCM5-LE). The data was produced the context of the ClimEx project by Ouranos with the Canadian Regional Climate Model (CRCM5) driven by 50 members of the Canadian Earth System Model (CanESM2). Drought occurrence was defined using the Standardized Precipitation Index. The best performing machine learning algorithms managed to obtain a correct classification of drought or no drought for a lead time of one month for around 55–60 % of the events of each class for both domains. Explainable AI methods like SHapley Additive exPlanations (SHAP) were applied to gain a better understanding of the trained algorithms. Variables like the North Atlantic Oscillation Index and air pressure one month before the event proved to be of high importance for the prediction. The study showed that seasonality has a high influence on goodness of drought prediction, especially for the Lisbon domain.


Author(s):  
D. Vito

<p><strong>Abstract.</strong> Natural disasters such as flood are regarded to be caused by extreme weather conditions as well as changes in global and regional climate.<br> The prediction of flood incoming is a key factor to ensure civil protection in case of emergency and to provide effective early warning system. The risk of flood is affected by several factors such as land use, meteorological events, hydrology and the topology of the land.<br> Predict such a risk implies the use of data coming from different sources such satellite images, water basin levels, meteorological and GIS data, that nowadays are easily produced by the availability new satellite portals as SENTINEL and distributed sensor networks on the field.<br> In order to have a comprehensive and accurate prediction of flood risk is essential to perform a selective and multivariate analyses among the different types of inputs.<br> Multivariate Analysis refers to all statistical techniques that simultaneously analyse multiple variables.<br> Among multivariate analyses, Machine learning to provide increasing levels of accuracy precision and efficiency by discovering patterns in large and heterogeneous input datasets.<br> Basically, machine learning algorithms automatically acquire experience information from data.<br> This is done by the process of learning, by which the algorithm can generalize beyond the examples given by training data in input. Machine learning is interesting for predictions because it adapts the resolution strategies to the features of the data. This peculiarity can be used to predict extreme from high variable data, as in the case of floods.<br> This work propose strategies and case studies on the application on machine learning algorithms on floods events prediction.<br> Particullarly the study will focus on the application of Support Vector Machines and Artificial Neural Networks on a multivariate set of data related to river Seveso, in order to propose a more general framework from the case study.</p>


2021 ◽  
Author(s):  
Magdalena Arnal Segura ◽  
Dietmar Fernandez ◽  
Claudia Giambartolomei ◽  
Giorgio Bini ◽  
Eleftherios Samaras ◽  
...  

INTRODUCTION Genome-wide association studies (GWAS) in late onset Alzheimer's disease (LOAD) provide lists of individual genetic determinants. However, GWAS are not good at capturing the synergistic effects among multiple genetic variants and lack good specificity. METHODS We applied tree-based machine learning algorithms (MLs) to discriminate LOAD (> 700 individuals) and age-matched unaffected subjects using single nucleotide variants (SNVs) from AD studies, obtaining specific genomic profiles with the prioritized SNVs. RESULTS The MLs prioritized a set of SNVs located in close proximity genes PVRL2, TOMM40, APOE and APOC1. The captured genomic profiles in this region showed a clear interaction between rs405509 and rs1160985. Additionally, rs405509 located in APOE promoter interacts with rs429358 among others, seemingly neutralizing their predisposing effect. Interactions are characterized by their association with specific comorbidities and the presence of eQTL and sQTLs. DISCUSSION Our approach efficiently discriminates LOAD from controls, capturing genomic profiles defined by interactions among SNVs in a hot-spot region.


2020 ◽  
Vol 12 (21) ◽  
pp. 9186
Author(s):  
Jiman Park ◽  
Byungyun Yang

Despite the growing interest in digital twins (DTs) in geospatial technology, the scientific literature is still at the early stage, and concepts of DTs vary. In common perspectives, the primary goals of DTs are to reduce the uncertainty of the physical systems in real-world projects to reduce cost. Thus, this study is aimed at developing a structural schematic of a geographic information system (GIS)-enabled DT system and exploring geospatial technologies that can aid in deploying a DT system for a real-world project—in particular, for the sustainable evaluation of carbon emissions. The schematic includes three major phases: (1) data collection and visualization, (2) analytics, and (3) deployment. Three steps are designed to propose an optimal strategy to reduce carbon emissions in an urban area. In the analytics phase, mapping, machine learning algorithms, and spatial statistics are applied, mapping an ideal counterpart to physical assets. Furthermore, not only are GIS maps able to analyze geographic data that represent the counterparts of physical assets but can also display and analyze spatial relationships between physical assets. In the first step of the analytics phase, a GIS map spatially represented the most vulnerable area based on the values of carbon emissions computed according to the Intergovernmental Panel on Climate Change (IPCC) guidelines. Next, the radial basis function (RBF) kernel algorithm, a machine learning technique, was used to forecast spatial trends of carbon emissions. A backpropagation neural network (BPNN) was used to quantitatively determine which factor was the most influential among the four data sources: electricity, city gas, household waste, and vehicle. Then, a hot spot analysis was used to assess where high values of carbon emissions clustered in the study area. This study on the development of DTs contributes the following. First, with DTs, sustainable urban management systems will be improved and new insights developed more publicly. Ultimately, such improvements can reduce the failures of projects associated with urban planning and management. Second, the structural schematic proposed here is a data-driven approach; consequently, its outputs are more reliable and feasible. Ultimately, innovative approaches become available and services are transformed. Consequently, urban planners or policy makers can apply the system to scenario-based approaches.


2021 ◽  
Vol 13 (16) ◽  
pp. 3101
Author(s):  
El houssaine Bouras ◽  
Lionel Jarlan ◽  
Salah Er-Raki ◽  
Riad Balaghi ◽  
Abdelhakim Amazirh ◽  
...  

Accurate seasonal forecasting of cereal yields is an important decision support tool for countries, such as Morocco, that are not self-sufficient in order to predict, as early as possible, importation needs. This study aims to develop an early forecasting model of cereal yields (soft wheat, barley and durum wheat) at the scale of the agricultural province considering the 15 most productive over 2000–2017 (i.e., 15 × 18 = 270 yields values). To this objective, we built on previous works that showed a tight linkage between cereal yields and various datasets including weather data (rainfall and air temperature), regional climate indices (North Atlantic Oscillation in particular), and drought indices derived from satellite observations in different wavelengths. The combination of the latter three data sets is assessed to predict cereal yields using linear (Multiple Linear Regression, MLR) and non-linear (Support Vector Machine, SVM; Random Forest, RF, and eXtreme Gradient Boost, XGBoost) machine learning algorithms. The calibration of the algorithmic parameters of the different approaches are carried out using a 5-fold cross validation technique and a leave-one-out method is implemented for model validation. The statistical metrics of the models are first analyzed as a function of the input datasets that are used, and as a function of the lead times, from 4 months to 2 months before harvest. The results show that combining data from multiple sources outperformed models based on one dataset only. In addition, the satellite drought indices are a major source of information for cereal prediction when the forecasting is carried out close to harvest (2 months before), while weather data and, to a lesser extent, climate indices, are key variables for earlier predictions. The best models can accurately predict yield in January (4 months before harvest) with an R2 = 0.88 and RMSE around 0.22 t. ha−1. The XGBoost method exhibited the best metrics. Finally, training a specific model separately for each group of provinces, instead of one global model, improved the prediction performance by reducing the RMSE by 10% to 35% depending on the provinces. In conclusion, the results of this study pointed out that combining remote sensing drought indices with climate and weather variables using a machine learning technique is a promising approach for cereal yield forecasting.


2021 ◽  
Vol 21 (12) ◽  
pp. 3679-3691
Author(s):  
Elizaveta Felsche ◽  
Ralf Ludwig

Abstract. There is a strong scientific and social interest in understanding the factors leading to extreme events in order to improve the management of risks associated with hazards like droughts. In this study, artificial neural networks are applied to predict the occurrence of a drought in two contrasting European domains, Munich and Lisbon, with a lead time of 1 month. The approach takes into account a list of 28 atmospheric and soil variables as input parameters from a single-model initial-condition large ensemble (CRCM5-LE). The data were produced in the context of the ClimEx project by Ouranos, with the Canadian Regional Climate Model (CRCM5) driven by 50 members of the Canadian Earth System Model (CanESM2). Drought occurrence is defined using the standardized precipitation index. The best-performing machine learning algorithms manage to obtain a correct classification of drought or no drought for a lead time of 1 month for around 55 %–57 % of the events of each class for both domains. Explainable AI methods like SHapley Additive exPlanations (SHAP) are applied to understand the trained algorithms better. Variables like the North Atlantic Oscillation index and air pressure 1 month before the event prove essential for the prediction. The study shows that seasonality strongly influences the performance of drought prediction, especially for the Lisbon domain.


Molecules ◽  
2018 ◽  
Vol 23 (10) ◽  
pp. 2535 ◽  
Author(s):  
Siyu Liu ◽  
Chuyao Liu ◽  
Lei Deng

Hot spots are the subset of interface residues that account for most of the binding free energy, and they play essential roles in the stability of protein binding. Effectively identifying which specific interface residues of protein–protein complexes form the hot spots is critical for understanding the principles of protein interactions, and it has broad application prospects in protein design and drug development. Experimental methods like alanine scanning mutagenesis are labor-intensive and time-consuming. At present, the experimentally measured hot spots are very limited. Hence, the use of computational approaches to predicting hot spots is becoming increasingly important. Here, we describe the basic concepts and recent advances of machine learning applications in inferring the protein–protein interaction hot spots, and assess the performance of widely used features, machine learning algorithms, and existing state-of-the-art approaches. We also discuss the challenges and future directions in the prediction of hot spots.


2021 ◽  
Vol 13 (23) ◽  
pp. 4826
Author(s):  
Haojie Wang ◽  
Limin Zhang ◽  
Lin Wang ◽  
Jian He ◽  
Hongyu Luo

Snow preserves fresh water and impacts regional climate and the environment. Enabled by modern satellite Earth observations, fast and accurate automated snow mapping is now possible. In this study, we developed the Automated Snow Mapper Powered by Machine Learning (AutoSMILE), which is the first machine learning-based open-source system for snow mapping. It is built in a Python environment based on object-based analysis. AutoSMILE was first applied in a mountainous area of 1002 km2 in Bome County, eastern Tibetan Plateau. A multispectral image from Sentinel-2B, a digital elevation model, and machine learning algorithms such as random forest and convolutional neural network, were utilized. Taking only 5% of the study area as the training zone, AutoSMILE yielded an extraordinarily satisfactory result over the rest of the study area: the producer’s accuracy, user’s accuracy, intersection over union and overall accuracy reached 99.42%, 98.78%, 98.21% and 98.76%, respectively, at object level, corresponding to 98.84%, 98.35%, 97.23% and 98.07%, respectively, at pixel level. The model trained in Bome County was subsequently used to map snow at the Qimantag Mountain region in the northern Tibetan Plateau, and a high overall accuracy of 97.22% was achieved. AutoSMILE outperformed threshold-based methods at both sites and exhibited superior performance especially in handling complex land covers. The outstanding performance and robustness of AutoSMILE in the case studies suggest that AutoSMILE is a fast and reliable tool for large-scale high-accuracy snow mapping and monitoring.


Sign in / Sign up

Export Citation Format

Share Document