scholarly journals Predicting multi-species bark beetle (Coleoptera: Curculionidae: Scolytinae) occurrence in Alaska: First use of open access big data mining and open source GIS to provide robust inference and a role model for progress in forest conservation

2021 ◽  
Vol 16 (1) ◽  
pp. 1-19 ◽  
Author(s):  
Khodabakhsh Zabihi ◽  
Falk Huettmann ◽  
Brian Young

Native bark beetles (Coleoptera: Curculionidae: Scolytinae) are a multi-species complex that rank among the key disturbances of coniferous forests of western North America. Many landscape-level variables are known to influence beetle outbreaks, such as suitable climatic conditions, spatial arrangement of incipient populations, topography, abundance of mature host trees, and disturbance history that include former outbreaks and fire. We assembled the first open access data, which can be used in open source GIS platforms, for understanding the ecology of the bark beetle organism in Alaska. We used boosted classification and regression tree as a machine learning data mining algorithm to model-predict the relationship between 14 environmental variables, as model predictors, and 838 occurrence records of 68 bark beetle species compared to pseudo-absence locations across the state of Alaska. The model predictors include topography- and climate-related predictors as well as feature proximities and anthropogenic factors. We were able to model, predict, and map the multi-species bark beetle occurrences across the state of Alaska on a 1-km spatial resolution in addition to providing a good quality environmental dataset freely accessible for the public. About 16% of the mixed forest and 59% of evergreen forest are expected to be occupied by the bark beetles based on current climatic conditions and biophysical attributes of the landscape. The open access dataset that we prepared, and the machine learning modeling approach that we used, can provide a foundation for future research not only on scolytines but for other multi-species questions of concern, such as forest defoliators, and small and big game wildlife species worldwide.

2019 ◽  
Vol 1 ◽  
pp. 1-2
Author(s):  
Jan Wilkening

<p><strong>Abstract.</strong> Data is regarded as the oil of the 21st century, and the concept of data science has received increasing attention in the last years. These trends are mainly caused by the rise of big data &amp;ndash; data that is big in terms of volume, variety and velocity. Consequently, data scientists are required to make sense of these large datasets. Companies have problems acquiring talented people to solve data science problems. This is not surprising, as employers often expect skillsets that can hardly be found in one person: Not only does a data scientist need to have a solid background in machine learning, statistics and various programming languages, but often also in IT systems architecture, databases, complex mathematics. Above all, she should have a strong non-technical domain expertise in her field (see Figure 1).</p><p>As it is widely accepted that 80% of data has a spatial component, developments in data science could provide exciting new opportunities for GIS and cartography: Cartographers are experts in spatial data visualization, and often also very skilled in statistics, data pre-processing and analysis in general. The cartographers’ skill levels often depend on the degree to which cartography programs at universities focus on the “front end” (visualisation) of a spatial data and leave the “back end” (modelling, gathering, processing, analysis) to GIScientists. In many university curricula, these front-end and back-end distinctions between cartographers and GIScientists are not clearly defined, and the boundaries are somewhat blurred.</p><p>In order to become good data scientists, cartographers and GIScientists need to acquire certain additional skills that are often beyond their university curricula. These skills include programming, machine learning and data mining. These are important technologies for extracting knowledge big spatial data sets, and thereby the logical advancement to “traditional” geoprocessing, which focuses on “traditional” (small, structured, static) datasets such shapefiles or feature classes.</p><p>To bridge the gap between spatial sciences (such as GIS and cartography) and data science, we need an integrated framework of “spatial data science” (Figure 2).</p><p>Spatial sciences focus on causality, theory-based approaches to explain why things are happening in space. In contrast, the scope of data science is to find similar patterns in big datasets with techniques of machine learning and data mining &amp;ndash; often without considering spatial concepts (such as topology, spatial indexing, spatial autocorrelation, modifiable area unit problems, map projections and coordinate systems, uncertainty in measurement etc.).</p><p>Spatial data science could become the core competency of GIScientists and cartographers who are willing to integrate methods from the data science knowledge stack. Moreover, data scientists could enhance their work by integrating important spatial concepts and tools from GIS and cartography into data science workflows. A non-exhaustive knowledge stack for spatial data scientists, including typical tasks and tools, is given in Table 1.</p><p>There are many interesting ongoing projects at the interface of spatial and data science. Examples from the ArcGIS platform include:</p><ul><li>Integration of Python GIS APIs with Machine Learning libraries, such as scikit-learn or TensorFlow, in Jupyter Notebooks</li><li>Combination of R (advanced statistics and visualization) and GIS (basic geoprocessing, mapping) in ModelBuilder and other automatization frameworks</li><li>Enterprise GIS solutions for distributed geoprocessing operations on big, real-time vector and raster datasets</li><li>Dashboards for visualizing real-time sensor data and integrating it with other data sources</li><li>Applications for interactive data exploration</li><li>GIS tools for Machine Learning tasks for prediction, clustering and classification of spatial data</li><li>GIS Integration for Hadoop</li></ul><p>While the discussion about proprietary (ArcGIS) vs. open-source (QGIS) software is beyond the scope of this article, it has to be stated that a.) many ArcGIS projects are actually open-source and b.) using a complete GIS platform instead of several open-source pieces has several advantages, particularly in efficiency, maintenance and support (see Wilkening et al. (2019) for a more detailed consideration). At any rate, cartography and GIS tools are the essential technology blocks for solving the (80% spatial) data science problems of the future.</p>


Nowadays, Data Mining is used everywhere for extracting information from the data and in turn, acquires knowledge for decision making. Data Mining analyzes patterns which are used to extract information and knowledge for making decisions. Many open source and licensed tools like Weka, RapidMiner, KNIME, and Orange are available for Data Mining and predictive analysis. This paper discusses about different tools available for Data Mining and Machine Learning, followed by the description, pros and cons of these tools. The article provides details of all the algorithms like classification, regression, characterization, discretization, clustering, visualization and feature selection for Data Mining and Machine Learning tools. It will help people for efficient decision making and suggests which tool is suitable according to their requirement.


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0242253
Author(s):  
Zhigang Zhou ◽  
Yanyan Liu ◽  
Hao Yu ◽  
Lihua Ren

The aims are to explore the construction of the knowledge management model for engineering cost consulting enterprises, and to expand the application of data mining techniques and machine learning methods in constructing knowledge management model. Through a questionnaire survey, the construction of the knowledge management model of construction-related enterprises and engineering cost consulting enterprises is discussed. First, through the analysis and discussion of ontology-based data mining (OBDM) algorithm and association analysis (Apriori) algorithm, a data mining algorithm (ML-AR algorithm) on account of ontology-based multilayer association and machine learning is proposed. The performance of the various algorithms is compared and analyzed. Second, based on the knowledge management level, analysis and statistics are conducted on the levels of knowledge acquisition, sharing, storage, and innovation. Finally, according to the foregoing, the knowledge management model based on engineering cost consulting enterprises is built and analyzed. The results show that the reliability coefficient of this questionnaire is above 0.8, and the average extracted value is above 0.7, verifying excellent reliability and validity. The efficiency of the ML-AR algorithm at both the number of transactions and the support level is better than the other two algorithms, which is expected to be applied to the enterprise knowledge management model. There is a positive correlation between each level of knowledge management; among them, the positive correlation between knowledge acquisition and knowledge sharing is the strongest. The enterprise knowledge management model has a positive impact on promoting organizational innovation capability and industrial development. The research work provides a direction for the development of enterprise knowledge management and the improvement of innovation ability.


2016 ◽  
Vol 2016 ◽  
pp. 1-13
Author(s):  
Munish Saini ◽  
Sandeep Mehmi ◽  
Kuljit Kaur Chahal

Source code management systems (such as Concurrent Versions System (CVS), Subversion, and git) record changes to code repositories of open source software projects. This study explores a fuzzy data mining algorithm for time series data to generate the association rules for evaluating the existing trend and regularity in the evolution of open source software project. The idea to choose fuzzy data mining algorithm for time series data is due to the stochastic nature of the open source software development process. Commit activity of an open source project indicates the activeness of its development community. An active development community is a strong contributor to the success of an open source project. Therefore commit activity analysis along with the trend and regularity analysis for commit activity of open source software project acts as an important indicator to the project managers and analyst regarding the evolutionary prospects of the project in the future.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11830
Author(s):  
Richard B. Robold ◽  
Falk Huettmann

American red squirrels (Tamiasciurus hudsonicus) are small mammals that are abundantly distributed throughout North America. Urbanization in the Anthropocene is now a global process, and squirrels live in affected landscapes. This leads to squirrels adjusting to human developments. Not much is known about the distribution of squirrels and squirrel middens near humans, especially not in the subarctic and sub-urbanized regions. Although this species is hunted, there are no real publicly available distribution and abundance estimates nor management plans and bag limits for squirrels in Alaska or in the United States known by us, except the endangered Mt. Graham squirrel. In general, insufficient squirrel conservation research is carried out; they are underrepresented in research and its literature. To further the science-based management for such species, this study aims to generate the first digital open access workflow as a generic research template for small mammal work including the latest machine learning of open source and high-resolution LIDAR data in an Open Source Geographic Information System (QGIS) and ArcGIS. Machine learning has proven to be less modeler biased and improve accuracy of the analysis outcome, therefore it is the preferred approach. This template is designed to be rapid, simple, robust, generic and effective for being used by a global audience. As a unique showcase, here a squirrel midden survey was carried out for two years (2016 and 2017). These squirrel middens were detected in a research area of 45,5 hectares (0,455 km2) in downtown Fairbanks, interior boreal forest of Alaska, U.S. Transect distances were geo-referenced with a GPS and adjusted to the visual conditions to count all squirrel middens within the survey area. Different layers of proximity to humans and habitat characteristics were assembled using aerial imagery and LIDAR data (3D data needed for an arboreal species like the red squirrels) consisting of a 3 × 3 m resolution. The layer data was used to train a predictive distribution model for red squirrel middens with machine learning. The model showed the relative index of occurrence (RIO) in a map and identified canopy height, distance to trails, canopy density and the distance to a lake, together, as the strongest predictors for squirrel midden distribution whereas open landscape and disturbed areas are avoided. It is concluded that squirrels select for high and dense forests for middens while avoiding human disturbance. This study is able to present a machine learning template to easily and rapidly produce an accurate abundance prediction which can be used for management implications.


Malware is a general problems faced in the present day. Malware is a file that may be on the client machine. Malware can root an uncorrectable risk to the safety and protection of personal workstation clients as an expansion in the spiteful threats. In this paper explain a malware threats detection using data mining and machine learning. Malware detection algorithms with machine learning approach and data file. Also explained break executable files, create instruction set and take a look at different machine learning and data mining algorithm for feature extraction, reduction for detection of malware. In the system precisely distinguishes both new and known malware occurrences even though the double distinction among malware and real software is ordinarily little. There is a demand to present a skeleton which can come across latest, malicious executable files.


Sign in / Sign up

Export Citation Format

Share Document