The R package eseis – a comprehensive software toolbox for environmental seismology

Mapping Intimacies ◽

10.5194/esurf-2017-75 ◽

2018 ◽

Author(s):

Michael Dietze

Keyword(s):

Data Science ◽

Earth Science ◽

Science Research ◽

Worked Examples ◽

R Package ◽

Research Field ◽

Data Sets ◽

Research Fields ◽

Scientific Disciplines ◽

Analysis Environment

Abstract. Environmental seismology is the study of the seismic signals emitted by Earth surface processes. This emerging research field is at the seams of seismology, geomorphology, hydrology, meteorology, and further Earth science disciplines. It amalgamates a wide variety of methods from across these disciplines and, ultimately, fuses them in a common analysis environment. This overarching scope of environmental seismology asks for a coherent, yet integrative software, which is accepted by many of the involved scientific disciplines. The statistic software R has gained paramount importance in the majority of data science research fields. R has well justified advances over other, mostly commercial software, which makes it the ideal language to base a comprehensive analysis toolbox on. The article introduces the avenues and needs of environmental seismology, and how these are met by the R package eseis. The conceptual structure, example data sets and available functions are demonstrated. Worked examples illustrate possible applications of the package and in depth descriptions of the flexible use of the functions. The package is available under the GPL license on the Comprehensive R Archive Network (CRAN) and maintained on Github.

Download Full-text

The R package “eseis” – a software toolbox for environmental seismology

Earth Surface Dynamics ◽

10.5194/esurf-6-669-2018 ◽

2018 ◽

Vol 6 (3) ◽

pp. 669-686 ◽

Cited By ~ 8

Author(s):

Michael Dietze

Keyword(s):

Data Science ◽

Earth Science ◽

Science Research ◽

Worked Examples ◽

R Package ◽

Research Field ◽

Data Sets ◽

Research Fields ◽

Scientific Disciplines ◽

Analysis Environment

Abstract. Environmental seismology is the study of the seismic signals emitted by Earth surface processes. This emerging research field is at the intersection of seismology, geomorphology, hydrology, meteorology, and further Earth science disciplines. It amalgamates a wide variety of methods from across these disciplines and ultimately fuses them in a common analysis environment. This overarching scope of environmental seismology requires a coherent yet integrative software which is accepted by many of the involved scientific disciplines. The statistic software R has gained paramount importance in the majority of data science research fields. R has well-justified advances over other mostly commercial software, which makes it the ideal language to base a comprehensive analysis toolbox on. The article introduces the avenues and needs of environmental seismology, and how these are met by the R package eseis. The conceptual structure, example data sets, and available functions are demonstrated. Worked examples illustrate possible applications of the package and in-depth descriptions of the flexible use of the functions. The package has a registered DOI, is available under the GPL licence on the Comprehensive R Archive Network (CRAN), and is maintained on GitHub.

Download Full-text

A Microservice-Based Big Data Analysis Platform for Online Educational Applications

Scientific Programming ◽

10.1155/2020/6929750 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Kehua Miao ◽

Jie Li ◽

Wenxing Hong ◽

Mingtao Chen

Keyword(s):

Big Data ◽

Data Analysis ◽

Data Science ◽

Modular Design ◽

Science Research ◽

Big Data Analysis ◽

Research Field ◽

Traditional Work ◽

Educational Applications ◽

Analysis Platform

The booming development of data science and big data technology stacks has inspired continuous iterative updates of data science research or working methods. At present, the granularity of the labor division between data science and big data is more refined. Traditional work methods, from work infrastructure environment construction to data modelling and analysis of working methods, will greatly delay work and research efficiency. In this paper, we focus on the purpose of the current friendly collaboration of the data science team to build data science and big data analysis application platform based on microservices architecture for education or nonprofessional research field. In the environment based on microservices that facilitates updating the components of each component, the platform has a personal code experiment environment that integrates JupyterHub based on Spark and HDFS for multiuser use and a visualized modelling tools which follow the modular design of data science engineering based on Greenplum in-database analysis. The entire web service system is developed based on spring boot.

Download Full-text

Implementing FAIR in a Collaborative Data Management Framework

10.5194/egusphere-egu2020-19631 ◽

2020 ◽

Author(s):

Angela Schäfer ◽

Norbert Anselm ◽

Janik Eilers ◽

Stephan Frickenhaus ◽

Peter Gerchow ◽

...

Keyword(s):

Data Management ◽

Data Science ◽

Earth Science ◽

Scientific Progress ◽

Cost Effective ◽

Scientific Workflow ◽

Research Field ◽

Research Infrastructure ◽

Systems Science ◽

Digital Earth

Today's fast digital growth made data the most essential tool for scientific progress in Earth Systems Science. Hence, we strive to assemble a modular research infrastructure comprising a collection of tools and services that allow researchers to turn big data into scientific outcomes.Major roadblocks are (i) the increasing number and complexity of research platforms, devices, and sensors, (ii) the heterogeneous project-driven requirements towards, e. g., satellite data, sensor monitoring, quality assessment and control, processing, analysis and visualization, and (iii) the demand for near real time analyses.These requirements have led us to build a generic and cost-effective framework O2A (Observation to Archive) to enable, control, and access the flow of sensor observations to archives and repositories.By establishing O2A within major cooperative projects like MOSES and Digital Earth in the research field Earth and Environment of the German Helmholtz Association, we extend research data management services, computing powers, and skills to connect with the evolving software and storage services for data science. This fully supports the typical scientific workflow from its very beginning to its very end, that is, from data acquisition to final data publication.&#160;The key modules of O2A's digital research infrastructure established by AWI to enable Digital Earth Science are implementing the FAIR principles:<ul><li>Sensor Web, to register sensor applications and capture controlled meta data before and alongside any measurement in the field</li> <li>Data ingest, allowing researchers to feed data into storage systems and processing pipelines in a prepared and documented way, at best in controlled NRT data streams</li> <li>Dashboards, allowing researchers to find and access data and share and collaborate among partners</li> <li>Workspace, enabling researchers to access and use data with research software in a cloud-based virtualized infrastructure that allows researchers to analyse massive amounts of data on the spot</li> <li>Archiving and publishing data via repositories and Digital Object Identifiers (DOI).</li> </ul>

Download Full-text

Mercator: An R Package for Visualization of Distance Matrices

10.1101/733261 ◽

2019 ◽

Author(s):

Zachary B. Abrams ◽

Caitlin E. Coombes ◽

Suli Li ◽

Kevin R. Coombes

Keyword(s):

Large Scale ◽

R Package ◽

Supplementary Information ◽

Data Sets ◽

Data Types ◽

Vast Number ◽

Large Scale Data ◽

Scientific Disciplines ◽

Compare And Contrast ◽

Scale Data

AbstractSummaryUnsupervised data analysis in many scientific disciplines is based on calculating distances between observations and finding ways to visualize those distances. These kinds of unsupervised analyses help researchers uncover patterns in large-scale data sets. However, researchers can select from a vast number of different distance metrics, each designed to highlight different aspects of different data types. There are also numerous visualization methods with their own strengths and weaknesses. To help researchers perform unsupervised analyses, we developed the Mercator R package. Mercator enables users to see important patterns in their data by generating multiple visualizations using different standard algorithms, making it particularly easy to compare and contrast the results arising from different metrics. By allowing users to select the distance metric that best fits their needs, Mercator helps researchers perform unsupervised analyses that use pattern identification through computation and visual inspection.Availability and ImplementationMercator is freely available at the Comprehensive R Archive Network (https://cran.r-project.org/web/packages/Mercator/index.html)[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

OLAP Visualization

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch222 ◽

2011 ◽

pp. 1439-1446 ◽

Cited By ~ 21

Author(s):

Alfredo Cuzzocrea ◽

Svetlana Mansmann

Keyword(s):

Computer Science ◽

High Dimensional Data ◽

Real Life ◽

Science Research ◽

Research Field ◽

Multidimensional Data ◽

High Dimensional ◽

Data Sets ◽

Visualization Of Data ◽

Multidimensional Data Sets

The problem of efficiently visualizing multidimensional data sets produced by scientific and statistical tasks/ processes is becoming increasingly challenging, and is attracting the attention of a wide multidisciplinary community of researchers and practitioners. Basically, this problem consists in visualizing multidimensional data sets by capturing the dimensionality of data, which is the most difficult aspect to be considered. Human analysts interacting with high-dimensional data often experience disorientation and cognitive overload. Analysis of high- dimensional data is a challenge encountered in a wide set of real-life applications such as (i) biological databases storing massive gene and protein data sets, (ii) real-time monitoring systems accumulating data sets produced by multiple, multi-rate streaming sources, (iii) advanced Business Intelligence (BI) systems collecting business data for decision making purposes etc. Traditional DBMS front-end tools, which are usually tuple-bag-oriented, are completely inadequate to fulfill the requirements posed by an interactive exploration of high-dimensional data sets due to two major reasons: (i) DBMS implement the OLTP paradigm, which is optimized for transaction processing and deliberately neglects the dimensionality of data; (ii) DBMS operators are very poor and offer nothing beyond the capability of conventional SQL statements, what makes such tools very inefficient with respect to the goal of visualizing and, above all, interacting with multidimensional data sets embedding a large number of dimensions. Despite the above-highlighted practical relevance of the problem of visualizing multidimensional data sets, the literature in this field is rather scarce, due to the fact that, for many years, this problem has been of relevance for life science research communities only, and interaction of the latter with the computer science research community has been insufficient. Following the enormous growth of scientific disciplines like Bio-Informatics, this problem has then become a fundamental field in the computer science academic as well as industrial research. At the same time, a number of proposals dealing with the multidimensional data visualization problem appeared in literature, with the amenity of stimulating novel and exciting application fields such as the visualization of Data Mining results generated by challenging techniques like clustering and association rule discovery. The above-mentioned issues are meant to facilitate understanding of the high relevance and attractiveness of the problem of visualizing multidimensional data sets at present and in the future, with challenging research findings accompanied by significant spin-offs in the Information Technology (IT) industrial field. A possible solution to tackle this problem is represented by well-known OLAP techniques (Codd et al., 1993; Chaudhuri & Dayal, 1997; Gray et al., 1997), focused on obtaining very efficient representations of multidimensional data sets, called data cubes, thus leading to the research field which is known in literature under the terms OLAP Visualization and Visual OLAP, which, in the remaining part of the article, are used interchangeably.

Download Full-text

Citizen science as a new approach in Geography and beyond: Review and reflections

Moravian Geographical Reports ◽

10.2478/mgr-2019-0020 ◽

2019 ◽

Vol 27 (4) ◽

pp. 254-264 ◽

Cited By ~ 6

Author(s):

Jakub Trojan ◽

Sven Schade ◽

Rob Lemmens ◽

Bohumil Frantál

Keyword(s):

Citizen Science ◽

Citation Rate ◽

Science Research ◽

Open Science ◽

Scientometric Analysis ◽

New Approach ◽

Research Fields ◽

Scientific Disciplines ◽

Future Challenges

Abstract Issues related to the evolving role of citizen science and open science are reviewed and discussed in this article. We focus on the changing approaches to science, research and development related to the turn to openness and transparency, which has made science more open and inclusive, even for non-researchers. Reproducible and collaborative research, which is driven by the open access principles, involves citizens in many research fields. The article shows how international support is pushing citizen science forward, and how citizens’ involvement is becoming more important. A basic scientometric analysis (based on the Web of Science Core Collection as the source of peer reviewed articles) provides a first insight into the diffusion of the citizen science concept in the field of Geography, mapping the growth of citizen science articles over time, the spectrum of geographical journals that publish them, and their citation rate compared to other scientific disciplines. The authors also discuss future challenges of citizen science and its potential, which for the time being seems to be not fully utilized in some fields, including geographical research.

Download Full-text

Human Networks and Data Science (NSF 19-608): Uncovering the "dark matter" of research

10.31222/osf.io/fdv92 ◽

2020 ◽

Author(s):

Brian A. Nosek ◽

Stasa Milojevic ◽

Valentin Pentchev ◽

Xiaoran Yan ◽

David M Litherland ◽

...

Keyword(s):

Data Science ◽

Science Research ◽

Open Science ◽

Heterogeneous Data ◽

Data Sets ◽

Heterogeneous Data Source ◽

Science Of Science ◽

Science Framework ◽

Additional Support ◽

Human Networks

With funding from the National Science Foundation, the Center for Open Science (COS) and Indiana University will create a dynamic, distributed, and heterogeneous data source for the advancement of science of science research. This will be achieved by using, enhancing, and combining the capabilities of the Open Science Framework (OSF) and the Collaborative Archive & Data Research Environment (CADRE). With over 200,000 users (currently growing by >220 per day), many thousands of projects, registrations, and papers, millions of files stored and managed, and rich metadata tracking researcher actions, the OSF is already a very rich dataset for investigating the research lifecycle, researcher behaviors, and how those behaviors evolve in the social network. As a cross-university effort, CADRE provides an integrated data mining and collaborative environment for big bibliographic data sets. While still under development, the CADRE platform has already attracted long-term financial commitments from 10 research intensive universities with additional support from multiple infrastructure and industry partners. Connecting these efforts will catalyze transformative research of human networks in the science of science.

Download Full-text

A Python Library for the Jupyteo IDE Earth Observation Processing Tool Enabling Interoperability with the QGIS System for Use in Data Science

Geomatics and Environmental Engineering ◽

10.7494/geom.2022.16.1.117 ◽

2021 ◽

Vol 16 (1) ◽

pp. 117-144

Author(s):

Michał Bednarczyk

Keyword(s):

Data Science ◽

Large Data ◽

Earth Observation ◽

Data Sets ◽

Observation Data ◽

Development Environment ◽

Scientific Disciplines ◽

Earth Observation Data ◽

Test Use ◽

Analytical Tools

This paper describes JupyQgis – a new Python library for Jupyteo IDE enabling interoperability with the QGIS system. Jupyteo is an online integrated development environment for earth observation data processing and is available on a cloud platform. It is targeted at remote sensing experts, scientists and users who can develop the Jupyter notebook by reusing embedded open-source tools, WPS interfaces and existing notebooks. In recent years, there has been an increasing popularity of data science methods that have become the focus of many organizations. Many scientific disciplines are facing a significant transformation due to data-driven solutions. This is especially true of geodesy, environmental sciences, and Earth sciences, where large data sets, such as Earth observation satellite data (EO data) and GIS data are used. The previous experience in using Jupyteo, both among the users of this platform and its creators, indicates the need to supplement its functionality with GIS analytical tools. This study analyzed the most efficient way to combine the functionality of the QGIS system with the functionality of the Jupyteo platform in one tool. It was found that the most suitable solution is to create a custom library providing an API for collaboration between both environments. The resulting library makes the work much easier and simplifies the source code of the created Python scripts. The functionality of the developed solution was illustrated with a test use case.

Download Full-text

A New Practical Guide to Using Python for Earth Observation

Eos ◽

10.1029/2021eo161311 ◽

2021 ◽

Vol 102 ◽

Author(s):

Rebekah Esmaili

Keyword(s):

Satellite Data ◽

Earth Science ◽

Science Research ◽

Earth Observation ◽

Data Sets ◽

Practical Guide

A new book presents an example-driven collection of basic methods, applications, and visualizations to process satellite data sets for Earth science research.

Download Full-text

A Theoretical Comprehensive Framework for the Process of Theories Formation

Computational Intelligence and Neuroscience ◽

10.1155/2021/5074913 ◽

2021 ◽

Vol 2021 ◽

pp. 1-21

Author(s):

Goded Shahaf

Keyword(s):

Data Mining ◽

Association Rules ◽

Research Field ◽

Data Sets ◽

Computational Framework ◽

Research Fields ◽

Leading Role ◽

Mining Methods ◽

Comprehensive Framework ◽

Analyze Data

Scientists rely more and more upon computerized data mining and artificial intelligence to analyze data sets and identify association rules, which serve as the basis of evolving theories. This tendency is likely to expand, and computerized intelligence is likely to take a leading role in scientific theorizing. While the ever-advancing technology could be of great benefit, scientists with expertise in many research fields do not necessarily understand thoroughly enough the various assumptions, which underlie different data mining methods and which pose significant limitations on the association rules that could be identified in the first place. There seems to be a need for a comprehensive framework, which should present the various possible technological aids in the context of our neurocognitive process of theorizing and identifying association rules. Such a framework can be hopefully used to understand, identify, and overcome the limitations of the currently fragmented processes of technology-based theorizing and the formation of association rules in any research field. In order to meet this end, we divide theorizing into underlying neurocognitive components, describe their current technological expansions and limitations, and offer a possible comprehensive computational framework for each such component and their combination.

Download Full-text