ChR: Dynamic Functional Constraints Checking in R

Continued technological advancements of the 21st Century afford massive data generation in sectors of our economy to include the domains of agriculture, manufacturing, and education. However, harnessing such large-scale data, using modern technologies for effective decision-making appears to be an evolving science that requires knowledge of Big Data management and analytics. Big data in agriculture, manufacturing, and education are varied such as voluminous text, images, and graphs. Applying Big data science techniques (e.g., functional algorithms) for extracting intelligence data affords decision markers quick response to productivity, market resilience, and student enrollment challenges in today's unpredictable markets. This chapter serves to employ data science for potential solutions to Big Data applications in the sectors of agriculture, manufacturing and education to a lesser extent, using modern technological tools such as Hadoop, Hive, Sqoop, and MongoDB.

Download Full-text

Affordances of Data Science in Agriculture, Manufacturing, and Education

Privacy and Security Policies in Big Data - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-2486-1.ch002 ◽

2017 ◽

pp. 14-40 ◽

Cited By ~ 2

Author(s):

Krishnan Umachandran ◽

Debra Sharon Ferdinand-James

Keyword(s):

Big Data ◽

Large Scale ◽

Data Science ◽

Data Generation ◽

Large Scale Data ◽

Big Data Applications ◽

Effective Decision ◽

Effective Decision Making ◽

Text Images ◽

Scale Data

Continued technological advancements of the 21st Century afford massive data generation in sectors of our economy to include the domains of agriculture, manufacturing, and education. However, harnessing such large-scale data, using modern technologies for effective decision-making appears to be an evolving science that requires knowledge of Big Data management and analytics. Big data in agriculture, manufacturing, and education are varied such as voluminous text, images, and graphs. Applying Big data science techniques (e.g., functional algorithms) for extracting intelligence data affords decision markers quick response to productivity, market resilience, and student enrollment challenges in today's unpredictable markets. This chapter serves to employ data science for potential solutions to Big Data applications in the sectors of agriculture, manufacturing and education to a lesser extent, using modern technological tools such as Hadoop, Hive, Sqoop, and MongoDB.

Download Full-text

Scalable data analysis in proteomics and metabolomics using BioContainers and workflows engines

10.1101/604413 ◽

2019 ◽

Author(s):

Yasset Perez-Riverol ◽

Pablo Moreno

Keyword(s):

Data Analysis ◽

Large Scale ◽

Data Science ◽

Proteomics Data ◽

Computational Proteomics ◽

New Approach ◽

Large Scale Data ◽

Desktop Application ◽

Key Steps ◽

Scale Data

AbstractThe recent improvements in mass spectrometry instruments and new analytical methods are increasing the intersection between proteomics and big data science. In addition, the bioinformatics analysis is becoming an increasingly complex and convoluted process involving multiple algorithms and tools. A wide variety of methods and software tools have been developed for computational proteomics and metabolomics during recent years, and this trend is likely to continue. However, most of the computational proteomics and metabolomics tools are targeted and design for single desktop application limiting the scalability and reproducibility of the data analysis. In this paper we overview the key steps of metabolomic and proteomics data processing including main tools and software use to perform the data analysis. We discuss the combination of software containers with workflows environments for large scale metabolomics and proteomics analysis. Finally, we introduced to the proteomics and metabolomics communities a new approach for reproducible and large-scale data analysis based on BioContainers and two of the most popular workflows environments: Galaxy and Nextflow.

Download Full-text

Configuration Models as an Urn Problem: The Generalized Hypergeometric Ensemble of Random Graphs

10.21203/rs.3.rs-254843/v1 ◽

2021 ◽

Author(s):

Giona Casiraghi ◽

Vahan Nanumyan

Keyword(s):

Large Scale ◽

Data Science ◽

Graph Model ◽

Configuration Model ◽

World Systems ◽

Fundamental Issue ◽

Random Graph Model ◽

Large Scale Data ◽

Standard Configuration ◽

Scale Data

Abstract A fundamental issue of network data science is the ability to discern observed features that can be expected at random from those beyond such expectations. Configuration models play a crucial role there, allowing us to compare observations against degree-corrected null-models. Nonetheless, existing formulations have limited large-scale data analysis applications either because they require expensive Monte-Carlo simulations or lack the required flexibility to model real-world systems. With the generalized hypergeometric ensemble, we address both problems. To achieve this, we map the configuration model to an urn problem, where edges are represented as balls in an appropriately constructed urn. Doing so, we obtain a random graph model reproducing and extending the properties of standard configuration models, with the critical advantage of a closed-form probability distribution.

Download Full-text

Learnings from developing an applied data science curricula for undergraduate and graduate students

MRS Advances ◽

10.1557/adv.2020.135 ◽

2020 ◽

Vol 5 (7) ◽

pp. 347-353

Author(s):

Roger H. French ◽

Laura S. Bruckman

Keyword(s):

Domain Knowledge ◽

Large Scale ◽

Data Science ◽

Engineering Students ◽

Source Coding ◽

Model Development ◽

Science And Engineering ◽

Analysis Techniques ◽

Large Scale Data ◽

Scale Data

ABSTRACTData science has advanced significantly in recent years and allows scientists to harness large-scale data analysis techniques using open source coding frameworks. Data science is a tool that should be taught to science and engineering students in addition to their chosen domain knowledge. An applied data science minor allows students to understand data and data handling as well as statistics and model development. This move will improve reproducibility and openness of research as well as allow for greater interdisciplinarity and more analyses focusing on critical scientific challenges.

Download Full-text

Efficient Graph Analytics in Python for Large-Scale Data Science

10.1007/978-3-030-86534-4_15 ◽

2021 ◽

pp. 158-164

Author(s):

Xiantian Zhou ◽

Carlos Ordonez

Keyword(s):

Large Scale ◽

Data Science ◽

Graph Analytics ◽

Large Scale Data ◽

Scale Data

Download Full-text

Debugging large-scale data science pipelines using dagger

Proceedings of the VLDB Endowment ◽

10.14778/3415478.3415527 ◽

2020 ◽

Vol 13 (12) ◽

pp. 2993-2996

Author(s):

El Kindi Rezig ◽

Ashrita Brahmaroutu ◽

Nesime Tatbul ◽

Mourad Ouzzani ◽

Nan Tang ◽

...

Keyword(s):

Large Scale ◽

Data Science ◽

Large Scale Data ◽

Scale Data

Download Full-text

Packaging data analytical work reproducibly using R (and friends)

10.7287/peerj.preprints.3192 ◽

2018 ◽

Author(s):

Ben Marwick ◽

Carl Boettiger ◽

Lincoln Mullen

Keyword(s):

Large Scale ◽

Research Process ◽

R Programming Language ◽

Large Scale Data ◽

Software Packages ◽

Analytical Work ◽

Computer Based ◽

R Packages ◽

R Programming ◽

Central Tool

Computers are a central tool in the research process, enabling complex and large scale data analysis. As computer-based research has increased in complexity, so have the challenges of ensuring that this research is reproducible. To address this challenge, we review the concept of the research compendium as a solution for providing a standard and easily recognisable way for organising the digital materials of a research project to enable other researchers to inspect, reproduce, and extend the research. We investigate how the structure and tooling of software packages of the R programming language are being used to produce research compendia in a variety of disciplines. We also describe how software engineering tools and services are being used by researchers to streamline working with research compendia. Using real-world examples, we show how researchers can improve the reproducibility of their work using research compendia based on R packages and related tools.

Download Full-text

Configuration models as an urn problem

Scientific Reports ◽

10.1038/s41598-021-92519-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Giona Casiraghi ◽

Vahan Nanumyan

Keyword(s):

Large Scale ◽

Data Science ◽

Graph Model ◽

Configuration Model ◽

World Systems ◽

Fundamental Issue ◽

Random Graph Model ◽

Large Scale Data ◽

Standard Configuration ◽

Scale Data

AbstractA fundamental issue of network data science is the ability to discern observed features that can be expected at random from those beyond such expectations. Configuration models play a crucial role there, allowing us to compare observations against degree-corrected null-models. Nonetheless, existing formulations have limited large-scale data analysis applications either because they require expensive Monte-Carlo simulations or lack the required flexibility to model real-world systems. With the generalized hypergeometric ensemble, we address both problems. To achieve this, we map the configuration model to an urn problem, where edges are represented as balls in an appropriately constructed urn. Doing so, we obtain the generalized hypergeometric ensemble of random graphs: a random graph model reproducing and extending the properties of standard configuration models, with the critical advantage of a closed-form probability distribution.

Download Full-text