Use of machine learning to establish limits in the classification of hyperaccumulator plants growing on serpentine, gypsum and dolomite soils

The so-called hyperaccumulator plants are capable of storing hundred or thousand times bigger quantities of heavy metals than normal plants, which makes hyperaccumulators very useful in fields such as phytoremediation and phytomining. Among these plants there are many serpentinophytes, i.e., plants that grow exclusively on ultramafic rocks which produce soils with a great proportion of heavy metals. Even though there are multiple classifications, the lack of consensus regarding which parameters to use to determine whether a plant is a hyperaccumulator, as well as the arbitrariness of stablished thresholds, bring about the need to propose more objective criteria. To this end, plant mineral composition data from different vegetal species were analysed using machine learning techniques. Three complementary case studies were established. Firstly, plants were classified in three types of soils: dolomite, gypsum and serpentine. Secondly, data about normal and hyperaccumulator plant Ni composition were analysed with machine learning to find differentiated subgroups. Lastly, association studies were carried out using data about mineral composition and soil type. Results in the classification task reach a success rate over 75%. Clustering of plants by Ni concentration in parts per million (ppm) resulted in four groups with cut-off points in 2.25, 100 (accumulators) and 3000 ppm (hyperaccumulators). Associations with a confidence level above 90% were found between high Ni levels and serpentine soils, as well as between high Ni and Zn levels and the same type of soil. Overall, this work demonstrates the potential of machine learning to analyse data about plant mineral composition. Finally, after consulting the red list of the IUCN and those of countries with high richness in hyperaccumulator species, it is evident that a greater effort should be made to establish the conservation status of this type of flora.

Download Full-text

Insights into fish-anthropogenic pressures relationships using machine learning techniques: the case of Castilla-La Mancha (Spain)

10.5194/egusphere-egu21-7119 ◽

2021 ◽

Author(s):

Carlotta Valerio ◽

Graciela Gómez Nicola ◽

Rocío Aránzazu Baquero Noriega ◽

Alberto Garrido ◽

Lucia De Stefano

Keyword(s):

Machine Learning ◽

Fish Species ◽

Conservation Status ◽

Boosted Regression Trees ◽

Machine Learning Techniques ◽

Freshwater Species ◽

Anthropogenic Pressures ◽

La Mancha ◽

Learning Techniques ◽

Starting Point

<p>Since 1970 the number of freshwater species has suffered a decline of 83% worldwide and anthropic activities are considered to be major drivers of ecosystems degradation. Linking the ecological response to the multiple anthropogenic stressors acting in the system is essential to effectively design policy measures to restore riverine ecosystems. However, obtaining quantitative links between stressors and ecological status is still challenging, given the non-linearity of the ecosystem response and the need to consider multiple factors at play. This study applies machine learning techniques to explore the relationships between anthropogenic pressures and the composition of fish communities in the river basins of Castilla-La Mancha, a region covering nearly 79 500 km&#178; in central Spain. During the past two decades, this region has experienced an alarming decline of the conservation status of native fish species. The starting point for the analysis is a 10x10 km grid that defines for each cell the presence or absence of several fish species before and after 2001. This database was used to characterize the evolution of several metrics of fish species richness over time, accounting for the species origin (native or alien), species features (e.g. pollution tolerance) and habitat preferences. Random Forest and Gradient Boosted Regression Trees algorithms were used to relate the resulting metrics to the stressor variables describing the anthropogenic pressures acting in the rivers, such as urban wastewater discharges, land use cover, hydro-morphological degradation and the alteration of the river flow regime. The study provides new, quantitative insights into pressures-ecosystem relationships in rivers and reveals the main factors that lead to the decline of fish richness in Castilla-La Mancha, which could help inform environmental policy initiatives.</p>

Download Full-text

Agriculture Analysis Using Data Mining And Machine Learning Techniques

2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS) ◽

10.1109/icaccs.2019.8728382 ◽

2019 ◽

Cited By ~ 3

Author(s):

C.N. Vanitha ◽

N. Archana ◽

R. Sowmiya

Keyword(s):

Machine Learning ◽

Data Mining ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Using Data

Download Full-text

Prediction of Fatal and Major Injury of Drivers, Cyclists, and Pedestrians in Collisions

PROMET - Traffic&Transportation ◽

10.7307/ptt.v32i1.3134 ◽

2020 ◽

Vol 32 (1) ◽

pp. 39-53

Author(s):

Dalia Shanshal ◽

Ceni Babaoglu ◽

Ayşe Başar

Keyword(s):

Machine Learning ◽

Random Forest ◽

Injury Severity ◽

Predictive Analytics ◽

Machine Learning Techniques ◽

Lasso Regression ◽

Severe Injuries ◽

Factors Affecting ◽

Spatio Temporal ◽

Using Data

Traffic-related deaths and severe injuries may affect every person on the roads, whether driving, cycling or walking. Toronto, the largest city in Canada and the fourth largest in North America, aims to eliminate traffic-related fatalities and serious injuries on city streets. The aim of this study is to build a prediction model using data analytics and machine learning techniques that learn from past patterns, providing additional data-driven decision support for strategic planning. A detailed exploratory analysis is presented, investigating the relationship between the variables and factors affecting collisions in Toronto. A learning-based model is proposed to predict the fatalities and severe injuries in traffic collisions through a comparison of two predictive models: Lasso Regression and Random Forest. Exploratory data analysis results reveal both spatio-temporal and behavioural patterns such as the prevalence of collisions in intersections, in the spring and summer and aggressive driving and inattentive behaviours in drivers. The prediction results show that the best predictor of injury severity for drivers, cyclists and pedestrians is Random Forest with an accuracy of 0.80, 0.89, and 0.80, respectively. The proposed methods demonstrate the effectiveness of machine learning application to traffic and collision data, both for exploratory and predictive analytics.

Download Full-text

Homeland Security Data Mining and Link Analysis

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch107 ◽

2011 ◽

pp. 566-569

Author(s):

Bhavani Thuraisingham

Keyword(s):

Machine Learning ◽

Data Mining ◽

Homeland Security ◽

Link Analysis ◽

Machine Learning Techniques ◽

Data Mining Technique ◽

Learning Techniques ◽

Terrorist Groups ◽

Using Data ◽

Terrorist Events

Data mining is the process of posing queries to large quantities of data and extracting information often previously unknown using mathematical, statistical, and machine-learning techniques. Data mining has many applications in a number of areas, including marketing and sales, medicine, law, manufacturing, and, more recently, homeland security. Using data mining, one can uncover hidden dependencies between terrorist groups as well as possibly predict terrorist events based on past experience. One particular data-mining technique that is being investigated a great deal for homeland security is link analysis, where links are drawn between various nodes, possibly detecting some hidden links.

Download Full-text

Controlling a Simulated Robot Using Machine Learning Techniques

ASME 2010 World Conference on Innovative Virtual Reality ◽

10.1115/winvr2010-3705 ◽

2010 ◽

Author(s):

Jonathan Becker ◽

Aveek Purohit ◽

Zheng Sun

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Linear Regression ◽

Pid Controller ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Gaming Environment ◽

Using Data

USARSim group at NIST developed a simulated robot that operated in the Unreal Tournament 3 (UT3) gaming environment. They used a software PID controller to control the robot in UT3 worlds. Unfortunately, the PID controller did not work well, so NIST asked us to develop a better controller using machine learning techniques. In the process, we characterized the software PID controller and the robot’s behavior in UT3 worlds. Using data collected from our simulations, we compared different machine learning techniques including linear regression and reinforcement learning (RL). Finally, we implemented a RL based controller in Matlab and ran it in the UT3 environment via a TCP/IP link between Matlab and UT3.

Download Full-text

Examination of Technical Efficiency of Public Hospital Associations Using Data Envelopment Analysis and Machine Learning Techniques

Ekonomik Yaklasim ◽

10.5455/ey.36122 ◽

2017 ◽

Vol 28 (104) ◽

pp. 81

Author(s):

Songül ÇINAROĞLU ◽

Onur BAŞER

Keyword(s):

Machine Learning ◽

Data Envelopment Analysis ◽

Technical Efficiency ◽

Public Hospital ◽

Machine Learning Techniques ◽

Data Envelopment ◽

Learning Techniques ◽

Using Data

Download Full-text

Optimum Crop Prediction using Data Mining and Machine Learning Techniques

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2019.3436 ◽

2019 ◽

Vol 7 (3) ◽

pp. 2392-2396

Author(s):

Prof. Amol Pande

Keyword(s):

Machine Learning ◽

Data Mining ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Using Data

Download Full-text

Detection of Anonymising Proxies using Machine Learning

International Journal of Digital Crime and Forensics ◽

10.4018/ijdcf.286756 ◽

2021 ◽

Vol 13 (6) ◽

pp. 0-0

Keyword(s):

Machine Learning ◽

Computational Models ◽

Transmission Control Protocol ◽

Web Server ◽

Machine Learning Techniques ◽

Virtual Private Networks ◽

Ip Address ◽

Detection Systems ◽

Learning Techniques ◽

Using Data

Network Proxies and Virtual Private Networks (VPN) are tools that are used every day to facilitate various business functions. However, they have gained popularity amongst unintended userbases as tools that can be used to hide mask identities while using websites and web-services. Anonymising Proxies and/or VPNs act as an intermediary between a user and a web server with a Proxy and/or VPN IP address taking the place of the user’s IP address that is forwarded to the web server. This paper presents computational models based on intelligent machine learning techniques to address the limitations currently experienced by unauthorised user detection systems. A model to detect usage of anonymising proxies was developed using a Multi-layered perceptron neural network that was trained using data found in the Transmission Control Protocol (TCP) header of captured network packets

Download Full-text

Learning polygenic scores for human blood cell traits

10.1101/2020.02.17.952788 ◽

2020 ◽

Author(s):

Yu Xu ◽

Dragana Vuckovic ◽

Scott C Ritchie ◽

Parsa Akbari ◽

Tao Jiang ◽

...

Keyword(s):

Machine Learning ◽

Blood Cell ◽

Association Studies ◽

Relative Effectiveness ◽

Univariate Analysis ◽

Genetic Correlations ◽

Genome Wide Association Studies ◽

Learning Methods ◽

Polygenic Scores ◽

Using Data

AbstractPolygenic scores (PGSs) for blood cell traits can be constructed using summary statistics from genome-wide association studies. As the selection of variants and the modelling of their interactions in PGSs may be limited by univariate analysis, therefore, such a conventional method may yield sub-optional performance. This study evaluated the relative effectiveness of four machine learning and deep learning methods, as well as a univariate method, in the construction of PGSs for 26 blood cell traits, using data from UK Biobank (n=~400,000) and INTERVAL (n=~40,000). Our results showed that learning methods can improve PGSs construction for nearly every blood cell trait considered, with this superiority explained by the ability of machine learning methods to capture interactions among variants. This study also demonstrated that populations can be well stratified by the PGSs of these blood cell traits, even for traits that exhibit large differences between ages and sexes, suggesting potential for disease prevention. As our study found genetic correlations between the PGSs for blood cell traits and PGSs for several common human diseases (recapitulating well-known associations between the blood cell traits themselves and certain diseases), it suggests that blood cell traits may be indicators or/and mediators for a variety of common disorders via shared genetic variants and functional pathways.

Download Full-text

Design of Hybrid Reconstruction Scheme for Compressible Flow Using Data-Driven Methods

Journal of Mechanics ◽

10.1017/jmech.2020.33 ◽

2020 ◽

Vol 36 (5) ◽

pp. 675-689

Author(s):

A. Salazar ◽

F. Xiao

Keyword(s):

Machine Learning ◽

Compressible Flow ◽

Machine Learning Techniques ◽

Hybrid Scheme ◽

Solution Quality ◽

Hybrid Schemes ◽

New Type ◽

Switching Criterion ◽

Using Data ◽

Discontinuous Data

ABSTRACTExisting numerical schemes used to solve the governing equations for compressible flow suffer from dissipation errors which tend to smear out sharp discontinuities. Hybrid schemes show potential improvements in this challenging problem; however, the solution quality of a hybrid scheme heavily depends on the criterion to switch between the different candidate reconstruction functions. This work presents a new type of switching criterion (or selector) using machine learning techniques. The selector is trained with randomly generated samples of continuous and discontinuous data profiles, using the exact solution of the governing equation as a reference. Neural networks and random forests were used as the machine learning frameworks to train the selector, and it was later implemented as the indicator function in a hybrid scheme which includes THINC and WENO-Z as the candidate reconstruction functions. The trained selector has been verified to be effective as a reliable switching criterion in the hybrid scheme, which significantly improves the solution quality for both advection and Euler equations.

Download Full-text