scholarly journals Ensemble Methods in Environmental Data Mining

Data Mining ◽  
2018 ◽  
Author(s):  
Goksu Tuysuzoglu ◽  
Derya Birant ◽  
Aysegul Pala
2020 ◽  
Vol 13 (5) ◽  
pp. 818-826
Author(s):  
Ranjan Kumar Panda ◽  
A. Sai Sabitha ◽  
Vikas Deep

Sustainability is defined as the practice of protecting natural resources for future use without harming the nature. Sustainable development includes the environmental, social, political, and economic issues faced by human being for existence. Water is the most vital resource for living being on this earth. The natural resources are being exploited with the increase in world population and shortfall of these resources may threaten humanity in the future. Water sustainability is a part of environmental sustainability. The water crisis is increasing gradually in many places of the world due to agricultural and industrial usage and rapid urbanization. Data mining tools and techniques provide a powerful methodology to understand water sustainability issues using rich environmental data and also helps in building models for possible optimization and reengineering. In this research work, a review on usage of supervised or unsupervised learning algorithms in water sustainability issues like water quality assessment, waste water collection system and water consumption is presented. Advanced technologies have also helped to resolve major water sustainability issues. Some major data mining optimization algorithms have been compared which are used in piped water distribution networks.


Author(s):  
Muhammad Imran ◽  
Shahzad Latif ◽  
Danish Mehmood ◽  
Muhammad Saqlain Shah

Automatic Student performance prediction is a crucial job due to the large volume of data in educational databases. This job is being addressed by educational data mining (EDM). EDM develop methods for discovering data that is derived from educational environment. These methods are used for understanding student and their learning environment. The educational institutions are often curious that how many students will be pass/fail for necessary arrangements. In previous studies, it has been observed that many researchers have intension on the selection of appropriate algorithm for just classification and ignores the solutions of the problems which comes during data mining phases such as data high dimensionality ,class imbalance and classification error etc. Such types of problems reduced the accuracy of the model. Several well-known classification algorithms are applied in this domain but this paper proposed a student performance prediction model based on supervised learning decision tree classifier. In addition, an ensemble method is applied to improve the performance of the classifier. Ensemble methods approach is designed to solve classification, predictions problems. This study proves the importance of data preprocessing and algorithms fine-tuning tasks to resolve the data quality issues. The experimental dataset used in this work belongs to Alentejo region of Portugal which is obtained from UCI Machine Learning Repository. Three supervised learning algorithms (J48, NNge and MLP) are employed in this study for experimental purposes. The results showed that J48 achieved highest accuracy 95.78% among others.


2021 ◽  
Author(s):  
Ekaterina Chuprikova ◽  
Abraham Mejia Aguilar ◽  
Roberto Monsorno

<p>Increasing agricultural production challenges, such as climate change, environmental concerns, energy demands, and growing expectations from consumers triggered the necessity for innovation using data-driven approaches such as visual analytics. Although the visual analytics concept was introduced more than a decade ago, the latest developments in the data mining capacities made it possible to fully exploit the potential of this approach and gain insights into high complexity datasets (multi-source, multi-scale, and different stages). The current study focuses on developing prototypical visual analytics for an apple variety testing program in South Tyrol, Italy. Thus, the work aims (1) to establish a visual analytics interface enabled to integrate and harmonize information about apple variety testing and its interaction with climate by designing a semantic model; and (2) to create a single visual analytics user interface that can turn the data into knowledge for domain experts. </p><p>This study extends the visual analytics approach with a structural way of data organization (ontologies), data mining, and visualization techniques to retrieve knowledge from an extensive collection of apple variety testing program and environmental data. The prototype stands on three main components: ontology, data analysis, and data visualization. Ontologies provide a representation of expert knowledge and create standard concepts for data integration, opening the possibility to share the knowledge using a unified terminology and allowing for inference. Building upon relevant semantic models (e.g., agri-food experiment ontology, plant trait ontology, GeoSPARQL), we propose to extend them based on the apple variety testing and climate data. Data integration and harmonization through developing an ontology-based model provides a framework for integrating relevant concepts and relationships between them, data sources from different repositories, and defining a precise specification for the knowledge retrieval. Besides, as the variety testing is performed on different locations, the geospatial component can enrich the analysis with spatial properties. Furthermore, the visual narratives designed within this study will give a better-integrated view of data entities' relations and the meaningful patterns and clustering based on semantic concepts.</p><p>Therefore, the proposed approach is designed to improve decision-making about variety management through an interactive visual analytics system that can answer "what" and "why" about fruit-growing activities. Thus, the prototype has the potential to go beyond the traditional ways of organizing data by creating an advanced information system enabled to manage heterogeneous data sources and to provide a framework for more collaborative scientific data analysis. This study unites various interdisciplinary aspects and, in particular: Big Data analytics in the agricultural sector and visual methods; thus, the findings will contribute to the EU priority program in digital transformation in the European agricultural sector.</p><p>This project has received funding from the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 894215.</p>


Author(s):  
Ondrej Habala ◽  
Martin Šeleng ◽  
Viet Tran ◽  
Branislav Šimo ◽  
Ladislav Hluchý

The project Advanced Data Mining and Integration Research for Europe (ADMIRE) is designing new methods and tools for comfortable mining and integration of large, distributed data sets. One of the prospective application domains for such methods and tools is the environmental applications domain, which often uses various data sets from different vendors where data mining is becoming increasingly popular and more computer power becomes available. The authors present a set of experimental environmental scenarios, and the application of ADMIRE technology in these scenarios. The scenarios try to predict meteorological and hydrological phenomena which currently cannot or are not predicted by using data mining of distributed data sets from several providers in Slovakia. The scenarios have been designed by environmental experts and apart from being used as the testing grounds for the ADMIRE technology; results are of particular interest to experts who have designed them.


Author(s):  
Zhiyuan Chen ◽  
Aryya Gangopadhyay ◽  
George Karabatis ◽  
Michael McGuire ◽  
Claire Welty

Environmental research and knowledge discovery both require extensive use of data stored in various sources and created in different ways for diverse purposes. We describe a new metadata approach to elicit semantic information from environmental data and implement semantics-based techniques to assist users in integrating, navigating, and mining multiple environmental data sources. Our system contains specifications of various environmental data sources and the relationships that are formed among them. User requests are augmented with semantically related data sources and automatically presented as a visual semantic network. In addition, we present a methodology for data navigation and pattern discovery using multi-resolution browsing and data mining. The data semantics are captured and utilized in terms of their patterns and trends at multiple levels of resolution. We present the efficacy of our methodology through experimental results.


2016 ◽  
Vol 7s2 ◽  
pp. BECB.S36277 ◽  
Author(s):  
Anna L Buczak ◽  
Benjamin Baugher ◽  
Erhan Guven ◽  
Linda Moniz ◽  
Steven M. Babin ◽  
...  

Influenza is a highly contagious disease that causes seasonal epidemics with significant morbidity and mortality. The ability to predict influenza peak several weeks in advance would allow for timely preventive public health planning and interventions to be used to mitigate these outbreaks. Because influenza may also impact the operational readiness of active duty personnel, the US military places a high priority on surveillance and preparedness for seasonal outbreaks. A method for creating models for predicting peak influenza visits per total health-care visits (ie, activity) weeks in advance has been developed using advanced data mining techniques on disparate epidemiological and environmental data. The model results are presented and compared with those of other popular data mining classifiers. By rigorously testing the model on data not used in its development, it is shown that this technique can predict the week of highest influenza activity for a specific region with overall better accuracy than other methods examined in this article.


Author(s):  
Nikunj C. Oza

Ensemble data mining methods, also known as committee methods or model combiners, are machine learning methods that leverage the power of multiple models to achieve better prediction accuracy than any of the individual models could on their own. The basic goal when designing an ensemble is the same as when establishing a committee of people: Each member of the committee should be as competent as possible, but the members should complement one another. If the members are not complementary, that is, if they always agree, then the committee is unnecessary — any one member is sufficient. If the members are complementary, then when one or a few members make an error, the probability is high that the remaining members can correct this error. Research in ensemble methods has largely revolved around designing ensembles consisting of competent yet complementary models.


2018 ◽  
Vol 7 (2) ◽  
pp. 44-47
Author(s):  
Mudasir Ashraf ◽  
Majid Zaman ◽  
Muheet Ahmed

Educational data mining has illustrated an increasing demand for extracting and maneuvering data from academic backdrop, to generate prolific information which is indispensible for decision making. Therefore in this paper, an attempt has been made to deploy various data mining techniques including base and meta learning classifiers across our pedagogical dataset to foretell the performance of students. Among several contemporary ensemble approaches, researchers have practiced widespread learning classifiers viz. boosting to predict the performance of students. As exploitation of ensemble methods is considered to be significant phenomenon in classification and prediction mechanisms, therefore analogous method (boosting) has been applied across our pedagogical dataset. The entire results have been evaluated with 10-fold cross validation, once pedagogical dataset has been subjected to base classifiers including j48, random tree, naive bayes and knn. In addition, techniques such as oversampling (SMOTE) and undersampling (Spread subsampling) have been employed to further draw a comparison among ensemble classifiers and base classifiers. These methods were exploited with the key objective to observe any improvement in prediction accuracy of students.


2008 ◽  
pp. 356-363 ◽  
Author(s):  
Nikunj C. Oza

Ensemble data mining methods, also known as committee methods or model combiners, are machine learning methods that leverage the power of multiple models to achieve better prediction accuracy than any of the individual models could on their own. The basic goal when designing an ensemble is the same as when establishing a committee of people: Each member of the committee should be as competent as possible, but the members should complement one another. If the members are not complementary, that is, if they always agree, then the committee is unnecessary — any one member is sufficient. If the members are complementary, then when one or a few members make an error, the probability is high that the remaining members can correct this error. Research in ensemble methods has largely revolved around designing ensembles consisting of competent yet complementary models.


Sign in / Sign up

Export Citation Format

Share Document