scholarly journals Use of machine learning to establish limits in the classification of hyperaccumulator plants growing on serpentine, gypsum and dolomite soils

2021 ◽  
Vol 42 ◽  
pp. e67609
Author(s):  
Marina Mota-Merlo ◽  
Vanessa Martos

The so-called hyperaccumulator plants are capable of storing hundred or thousand times bigger quantities of heavy metals than normal plants, which makes hyperaccumulators very useful in fields such as phytoremediation and phytomining. Among these plants there are many serpentinophytes, i.e., plants that grow exclusively on ultramafic rocks which produce soils with a great proportion of heavy metals. Even though there are multiple classifications, the lack of consensus regarding which parameters to use to determine whether a plant is a hyperaccumulator, as well as the arbitrariness of stablished thresholds, bring about the need to propose more objective criteria. To this end, plant mineral composition data from different vegetal species were analysed using machine learning techniques. Three complementary case studies were established. Firstly, plants were classified in three types of soils: dolomite, gypsum and serpentine. Secondly, data about normal and hyperaccumulator plant Ni composition were analysed with machine learning to find differentiated subgroups. Lastly, association studies were carried out using data about mineral composition and soil type. Results in the classification task reach a success rate over 75%. Clustering of plants by Ni concentration in parts per million (ppm) resulted in four groups with cut-off points in 2.25, 100 (accumulators) and 3000 ppm (hyperaccumulators). Associations with a confidence level above 90% were found between high Ni levels and serpentine soils, as well as between high Ni and Zn levels and the same type of soil. Overall, this work demonstrates the potential of machine learning to analyse data about plant mineral composition. Finally, after consulting the red list of the IUCN and those of countries with high richness in hyperaccumulator species, it is evident that a greater effort should be made to establish the conservation status of this type of flora.

2021 ◽  
Author(s):  
Carlotta Valerio ◽  
Graciela Gómez Nicola ◽  
Rocío Aránzazu Baquero Noriega ◽  
Alberto Garrido ◽  
Lucia De Stefano

<p>Since 1970 the number of freshwater species has suffered a decline of 83% worldwide and anthropic activities are considered to be major drivers of ecosystems degradation. Linking the ecological response to the multiple anthropogenic stressors acting in the system is essential to effectively design policy measures to restore riverine ecosystems. However, obtaining quantitative links between stressors and ecological status is still challenging, given the non-linearity of the ecosystem response and the need to consider multiple factors at play. This study applies machine learning techniques to explore the relationships between anthropogenic pressures and the composition of fish communities in the river basins of Castilla-La Mancha, a region covering nearly 79 500 km² in central Spain. During the past two decades, this region has experienced an alarming decline of the conservation status of native fish species. The starting point for the analysis is a 10x10 km grid that defines for each cell the presence or absence of several fish species before and after 2001. This database was used to characterize the evolution of several metrics of fish species richness over time, accounting for the species origin (native or alien), species features (e.g. pollution tolerance) and habitat preferences. Random Forest and Gradient Boosted Regression Trees algorithms were used to relate the resulting metrics to the stressor variables describing the anthropogenic pressures acting in the rivers, such as urban wastewater discharges, land use cover, hydro-morphological degradation and the alteration of the river flow regime. The study provides new, quantitative insights into pressures-ecosystem relationships in rivers and reveals the main factors that lead to the decline of fish richness in Castilla-La Mancha, which could help inform environmental policy initiatives.</p>


2020 ◽  
Vol 32 (1) ◽  
pp. 39-53
Author(s):  
Dalia Shanshal ◽  
Ceni Babaoglu ◽  
Ayşe Başar

Traffic-related deaths and severe injuries may affect every person on the roads, whether driving, cycling or walking. Toronto, the largest city in Canada and the fourth largest in North America, aims to eliminate traffic-related fatalities and serious injuries on city streets. The aim of this study is to build a prediction model using data analytics and machine learning techniques that learn from past patterns, providing additional data-driven decision support for strategic planning. A detailed exploratory analysis is presented, investigating the relationship between the variables and factors affecting collisions in Toronto. A learning-based model is proposed to predict the fatalities and severe injuries in traffic collisions through a comparison of two predictive models: Lasso Regression and Random Forest. Exploratory data analysis results reveal both spatio-temporal and behavioural patterns such as the prevalence of collisions in intersections, in the spring and summer and aggressive driving and inattentive behaviours in drivers. The prediction results show that the best predictor of injury severity for drivers, cyclists and pedestrians is Random Forest with an accuracy of 0.80, 0.89, and 0.80, respectively. The proposed methods demonstrate the effectiveness of machine learning application to traffic and collision data, both for exploratory and predictive analytics.


Author(s):  
Bhavani Thuraisingham

Data mining is the process of posing queries to large quantities of data and extracting information often previously unknown using mathematical, statistical, and machine-learning techniques. Data mining has many applications in a number of areas, including marketing and sales, medicine, law, manufacturing, and, more recently, homeland security. Using data mining, one can uncover hidden dependencies between terrorist groups as well as possibly predict terrorist events based on past experience. One particular data-mining technique that is being investigated a great deal for homeland security is link analysis, where links are drawn between various nodes, possibly detecting some hidden links.


Author(s):  
Jonathan Becker ◽  
Aveek Purohit ◽  
Zheng Sun

USARSim group at NIST developed a simulated robot that operated in the Unreal Tournament 3 (UT3) gaming environment. They used a software PID controller to control the robot in UT3 worlds. Unfortunately, the PID controller did not work well, so NIST asked us to develop a better controller using machine learning techniques. In the process, we characterized the software PID controller and the robot’s behavior in UT3 worlds. Using data collected from our simulations, we compared different machine learning techniques including linear regression and reinforcement learning (RL). Finally, we implemented a RL based controller in Matlab and ran it in the UT3 environment via a TCP/IP link between Matlab and UT3.


2021 ◽  
Vol 13 (6) ◽  
pp. 0-0

Network Proxies and Virtual Private Networks (VPN) are tools that are used every day to facilitate various business functions. However, they have gained popularity amongst unintended userbases as tools that can be used to hide mask identities while using websites and web-services. Anonymising Proxies and/or VPNs act as an intermediary between a user and a web server with a Proxy and/or VPN IP address taking the place of the user’s IP address that is forwarded to the web server. This paper presents computational models based on intelligent machine learning techniques to address the limitations currently experienced by unauthorised user detection systems. A model to detect usage of anonymising proxies was developed using a Multi-layered perceptron neural network that was trained using data found in the Transmission Control Protocol (TCP) header of captured network packets


2020 ◽  
Author(s):  
Yu Xu ◽  
Dragana Vuckovic ◽  
Scott C Ritchie ◽  
Parsa Akbari ◽  
Tao Jiang ◽  
...  

AbstractPolygenic scores (PGSs) for blood cell traits can be constructed using summary statistics from genome-wide association studies. As the selection of variants and the modelling of their interactions in PGSs may be limited by univariate analysis, therefore, such a conventional method may yield sub-optional performance. This study evaluated the relative effectiveness of four machine learning and deep learning methods, as well as a univariate method, in the construction of PGSs for 26 blood cell traits, using data from UK Biobank (n=~400,000) and INTERVAL (n=~40,000). Our results showed that learning methods can improve PGSs construction for nearly every blood cell trait considered, with this superiority explained by the ability of machine learning methods to capture interactions among variants. This study also demonstrated that populations can be well stratified by the PGSs of these blood cell traits, even for traits that exhibit large differences between ages and sexes, suggesting potential for disease prevention. As our study found genetic correlations between the PGSs for blood cell traits and PGSs for several common human diseases (recapitulating well-known associations between the blood cell traits themselves and certain diseases), it suggests that blood cell traits may be indicators or/and mediators for a variety of common disorders via shared genetic variants and functional pathways.


2020 ◽  
Vol 36 (5) ◽  
pp. 675-689
Author(s):  
A. Salazar ◽  
F. Xiao

ABSTRACTExisting numerical schemes used to solve the governing equations for compressible flow suffer from dissipation errors which tend to smear out sharp discontinuities. Hybrid schemes show potential improvements in this challenging problem; however, the solution quality of a hybrid scheme heavily depends on the criterion to switch between the different candidate reconstruction functions. This work presents a new type of switching criterion (or selector) using machine learning techniques. The selector is trained with randomly generated samples of continuous and discontinuous data profiles, using the exact solution of the governing equation as a reference. Neural networks and random forests were used as the machine learning frameworks to train the selector, and it was later implemented as the indicator function in a hybrid scheme which includes THINC and WENO-Z as the candidate reconstruction functions. The trained selector has been verified to be effective as a reliable switching criterion in the hybrid scheme, which significantly improves the solution quality for both advection and Euler equations.


Sign in / Sign up

Export Citation Format

Share Document