scholarly journals Towards Spatial Data Science: Bridging the Gap between GIS, Cartography and Data Science

2019 ◽  
Vol 1 ◽  
pp. 1-2
Author(s):  
Jan Wilkening

<p><strong>Abstract.</strong> Data is regarded as the oil of the 21st century, and the concept of data science has received increasing attention in the last years. These trends are mainly caused by the rise of big data &amp;ndash; data that is big in terms of volume, variety and velocity. Consequently, data scientists are required to make sense of these large datasets. Companies have problems acquiring talented people to solve data science problems. This is not surprising, as employers often expect skillsets that can hardly be found in one person: Not only does a data scientist need to have a solid background in machine learning, statistics and various programming languages, but often also in IT systems architecture, databases, complex mathematics. Above all, she should have a strong non-technical domain expertise in her field (see Figure 1).</p><p>As it is widely accepted that 80% of data has a spatial component, developments in data science could provide exciting new opportunities for GIS and cartography: Cartographers are experts in spatial data visualization, and often also very skilled in statistics, data pre-processing and analysis in general. The cartographers’ skill levels often depend on the degree to which cartography programs at universities focus on the “front end” (visualisation) of a spatial data and leave the “back end” (modelling, gathering, processing, analysis) to GIScientists. In many university curricula, these front-end and back-end distinctions between cartographers and GIScientists are not clearly defined, and the boundaries are somewhat blurred.</p><p>In order to become good data scientists, cartographers and GIScientists need to acquire certain additional skills that are often beyond their university curricula. These skills include programming, machine learning and data mining. These are important technologies for extracting knowledge big spatial data sets, and thereby the logical advancement to “traditional” geoprocessing, which focuses on “traditional” (small, structured, static) datasets such shapefiles or feature classes.</p><p>To bridge the gap between spatial sciences (such as GIS and cartography) and data science, we need an integrated framework of “spatial data science” (Figure 2).</p><p>Spatial sciences focus on causality, theory-based approaches to explain why things are happening in space. In contrast, the scope of data science is to find similar patterns in big datasets with techniques of machine learning and data mining &amp;ndash; often without considering spatial concepts (such as topology, spatial indexing, spatial autocorrelation, modifiable area unit problems, map projections and coordinate systems, uncertainty in measurement etc.).</p><p>Spatial data science could become the core competency of GIScientists and cartographers who are willing to integrate methods from the data science knowledge stack. Moreover, data scientists could enhance their work by integrating important spatial concepts and tools from GIS and cartography into data science workflows. A non-exhaustive knowledge stack for spatial data scientists, including typical tasks and tools, is given in Table 1.</p><p>There are many interesting ongoing projects at the interface of spatial and data science. Examples from the ArcGIS platform include:</p><ul><li>Integration of Python GIS APIs with Machine Learning libraries, such as scikit-learn or TensorFlow, in Jupyter Notebooks</li><li>Combination of R (advanced statistics and visualization) and GIS (basic geoprocessing, mapping) in ModelBuilder and other automatization frameworks</li><li>Enterprise GIS solutions for distributed geoprocessing operations on big, real-time vector and raster datasets</li><li>Dashboards for visualizing real-time sensor data and integrating it with other data sources</li><li>Applications for interactive data exploration</li><li>GIS tools for Machine Learning tasks for prediction, clustering and classification of spatial data</li><li>GIS Integration for Hadoop</li></ul><p>While the discussion about proprietary (ArcGIS) vs. open-source (QGIS) software is beyond the scope of this article, it has to be stated that a.) many ArcGIS projects are actually open-source and b.) using a complete GIS platform instead of several open-source pieces has several advantages, particularly in efficiency, maintenance and support (see Wilkening et al. (2019) for a more detailed consideration). At any rate, cartography and GIS tools are the essential technology blocks for solving the (80% spatial) data science problems of the future.</p>

Author(s):  
Brian Granger ◽  
Fernando Pérez

Project Jupyter is an open-source project for interactive computing widely used in data science, machine learning, and scientific computing. We argue that even though Jupyter helps users perform complex, technical work, Jupyter itself solves problems that are fundamentally human in nature. Namely, Jupyter helps humans to think and tell stories with code and data. We illustrate this by describing three dimensions of Jupyter: interactive computing, computational narratives, and  the idea that Jupyter is more than software. We illustrate the impact of these dimensions on a community of practice in Earth and climate science.


Author(s):  
Sabitha Rajagopal

Data Science employs techniques and theories to create data products. Data product is merely a data application that acquires its value from the data itself, and creates more data as a result; it's not just an application with data. Data science involves the methodical study of digital data employing techniques of observation, development, analysis, testing and validation. It tackles the real time challenges by adopting a holistic approach. It ‘creates' knowledge about large and dynamic bases, ‘develops' methods to manage data and ‘optimizes' processes to improve its performance. The goal includes vital investigation and innovation in conjunction with functional exploration intended to notify decision-making for individuals, businesses, and governments. This paper discusses the emergence of Data Science and its subsequent developments in the fields of Data Mining and Data Warehousing. The research focuses on need, challenges, impact, ethics and progress of Data Science. Finally the insights of the subsequent phases in research and development of Data Science is provided.


Author(s):  
Pushpa Singh ◽  
Rajeev Agrawal

This article focuses on the prospects of open source software and tools for maximizing the user expectations in heterogeneous networks. The open source software Python is used as a software tool in this research work for implementing machine learning technique for the categorization of the types of user in a heterogeneous network (HN). The KNN classifier available in Python defines the type of user category in real time to predict the available users in a particular category for maximizing profit for a business organization.


2016 ◽  
Vol 21 (3) ◽  
pp. 525-547 ◽  
Author(s):  
Scott Tonidandel ◽  
Eden B. King ◽  
Jose M. Cortina

Advances in data science, such as data mining, data visualization, and machine learning, are extremely well-suited to address numerous questions in the organizational sciences given the explosion of available data. Despite these opportunities, few scholars in our field have discussed the specific ways in which the lens of our science should be brought to bear on the topic of big data and big data's reciprocal impact on our science. The purpose of this paper is to provide an overview of the big data phenomenon and its potential for impacting organizational science in both positive and negative ways. We identifying the biggest opportunities afforded by big data along with the biggest obstacles, and we discuss specifically how we think our methods will be most impacted by the data analytics movement. We also provide a list of resources to help interested readers incorporate big data methods into their existing research. Our hope is that we stimulate interest in big data, motivate future research using big data sources, and encourage the application of associated data science techniques more broadly in the organizational sciences.


Author(s):  
Greg Lawrance ◽  
Raphael Parra Hernandez ◽  
Khalegh Mamakani ◽  
Suraiya Khan ◽  
Brent Hills ◽  
...  

IntroductionLigo is an open source application that provides a framework for managing and executing administrative data linking projects. Ligo provides an easy-to-use web interface that lets analysts select among data linking methods including deterministic, probabilistic and machine learning approaches and use these in a documented, repeatable, tested, step-by-step process. Objectives and ApproachThe linking application has two primary functions: identifying common entities in datasets [de-duplication] and identifying common entities between datasets [linking]. The application is being built from the ground up in a partnership between the Province of British Columbia’s Data Innovation (DI) Program and Population Data BC, and with input from data scientists. The simple web interface allows analysts to streamline the processing of multiple datasets in a straight-forward and reproducible manner. ResultsBuilt in Python and implemented as a desktop-capable and cloud-deployable containerized application, Ligo includes many of the latest data-linking comparison algorithms with a plugin architecture that supports the simple addition of new formulae. Currently, deterministic approaches to linking have been implemented and probabilistic methods are in alpha testing. A fully functional alpha, including deterministic and probabilistic methods is expected to be ready in September, with a machine learning extension expected soon after. Conclusion/ImplicationsLigo has been designed with enterprise users in mind. The application is intended to make the processes of data de-duplication and linking simple, fast and reproducible. By making the application open source, we encourage feedback and collaboration from across the population research and data science community.


Nowadays, Data Mining is used everywhere for extracting information from the data and in turn, acquires knowledge for decision making. Data Mining analyzes patterns which are used to extract information and knowledge for making decisions. Many open source and licensed tools like Weka, RapidMiner, KNIME, and Orange are available for Data Mining and predictive analysis. This paper discusses about different tools available for Data Mining and Machine Learning, followed by the description, pros and cons of these tools. The article provides details of all the algorithms like classification, regression, characterization, discretization, clustering, visualization and feature selection for Data Mining and Machine Learning tools. It will help people for efficient decision making and suggests which tool is suitable according to their requirement.


2019 ◽  
Vol 8 (2S11) ◽  
pp. 2342-2345

Tensor Flow is an open-source Machine Learning library for research and creation. Tensor Flow offers APIs for beginners and specialists to create for work desktop, mobile, web, and cloud. The best utilizations of Google's Tensor flow are the best applications for deep learning . Deep Learning is extraordinary at example acknowledgment/machine recognition, and it's being connected to pictures, video, sound, voice, content and time arrangement information. It groups and bunch information like that with now and again superhuman precision. This can be actualized for the acknowledgment of the diverse items, for example, Ball, Cat, Bottle, Car and so forth. It can utilize Android as its stage with to utilize the cell phone's camera to prepare the informational indexes and perceive diverse items in ongoing process.


2021 ◽  
Author(s):  
Luc Thomès ◽  
Rebekka Burkholz ◽  
Daniel Bojar

AbstractAs a biological sequence, glycans occur in every domain of life and comprise monosaccharides that are chained together to form oligo- or polysaccharides. While glycans are crucial for most biological processes, existing analysis modalities make it difficult for researchers with limited computational background to include information from these diverse and nonlinear sequences into standard workflows. Here, we present glycowork, an open-source Python package that was designed for the processing and analysis of glycan data by end users, with a strong focus on glycan-related data science and machine learning. Glycowork includes numerous functions to, for instance, automatically annotate glycan motifs and analyze their distributions via heatmaps and statistical enrichment. We also provide visualization methods, routines to interact with stored databases, trained machine learning models, and learned glycan representations. We envision that glycowork can extract further insights from any glycan dataset and demonstrate this with several workflows that analyze glycan motifs in various biological contexts. Glycowork can be freely accessed at https://github.com/BojarLab/glycowork/.


2021 ◽  
Vol 73 (04) ◽  
pp. 41-41
Author(s):  
Doug Lehr

In the 2020 Completions Technology Focus, I stated that digitization will forever change how the most complex problems in our industry are solved. And, despite another severe downturn in the upstream industry, data science continues to provide solutions for complex unconventional well problems. Casing Damage Casing collapse is an ongoing problem and almost always occurs in the heel of the well. It prevents passage of frac plugs and milling tools. Forcing a frac plug through the collapsed section damages the plug, predisposing it to failure, which leads to more casing damage and poor stimulation. One team has developed a machine-learning (ML) model showing a positive correlation between zones with high fracturing gradients and collapsed casing. The objective is a predictive tool that enables a completion design that avoids these zones. Fracture-Driven Interactions (FDIs) Can Be Avoided in Real Time Pressurized fracturing fluids from one well can communicate with fractures in a nearby well or can intersect that well-bore. Such FDIs can occur while fracturing a child well and can negatively affect production in the parent well. FDIs are caused by well spacing, depletion, or completion design but, until recently, were not quickly diagnosed. Analytics and machine learning now are being used to analyze streaming data sets during a frac job to detect FDIs. A recently piloted detection system alerts the operator in real time, which enables avoidance of FDIs on the fly. Data Science Provides the Tools Analyzing casing damage and FDIs is a complex task involving large amounts of data already available or easily acquired. Tools such as ML perform the data analysis and enable decision making. Data science is enabling the unconventional “onion” to be peeled many layers at a time. Recommended additional reading at OnePetro: www.onepetro.org. SPE 199967 - Artificial Intelligence for Real-Time Monitoring of Fracture-Driven Interactions and Simultaneous Completion Optimization by Hayley Stephenson, Baker Hughes, et al. SPE 201615 - Novel Completion Design To Bypass Damage and Increase Reservoir Contact: A Middle Magdalena, Central Colombian Case History by Rosana Polo, Oxy, et al. SPE 202966 - Well Completion Optimization in Canada Tight Gas Fields Using Ensemble Machine Learning by Lulu Liao, Sinopec, et al.


Sign in / Sign up

Export Citation Format

Share Document