A Multi-Year Assessment of Phytoplankton Fluorescence in a Large Temperate River Reveals the Importance of Scale-Dependent Temporal Patterns Associated With Temperature and Other Physicochemical Variables

An integrated temporal study of a long-term ecological research and monitoring database of the St. Lawrence River was carried out. A long and mostly uninterrupted high temporal resolution record of fluorometric data from 2014 to 2018 was used to examine phytoplankton fluorometric variables at several scales and to identify temporal patterns and their main environmental drivers. Sets of temporal eigenvectors were used as modulating variables in a multiscale codependence analysis to relate the fluorometric variables and various environmental variables at different temporal scales. Fluorometric patterns of phytoplankton biomass in the St. Lawrence River are characterized by large, yearly-scale patterns driven by seasonal changes in water temperature, and to a lesser extent water discharge, over which finer-scale temporal patterns related to colored dissolved organic matter and weather variables can be discerned at shorter time scales. The results suggest that such an approach to characterize phytoplankton biomass in large rivers may be useful for processing large data sets from remote sensing efforts for detecting subtle large-scale changes in water quality due to land use practices and climate change.

Download Full-text

Galaxy spin direction distribution in HST and SDSS show similar large-scale asymmetry

Publications of the Astronomical Society of Australia ◽

10.1017/pasa.2020.46 ◽

2020 ◽

Vol 37 ◽

Author(s):

Lior Shamir

Keyword(s):

Large Scale ◽

Spiral Galaxies ◽

Hubble Space Telescope ◽

Gravitational Interaction ◽

Large Data ◽

Sloan Digital Sky Survey ◽

Data Sets ◽

Dipole Axis ◽

Data Set ◽

The Asymmetry

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .

Download Full-text

Compartment and hub definitions tune metabolic networks for metabolomic interpretations

GigaScience ◽

10.1093/gigascience/giz137 ◽

2020 ◽

Vol 9 (1) ◽

Cited By ~ 3

Author(s):

T Cameron Waller ◽

Jordan A Berg ◽

Alexander Lex ◽

Brian E Chapman ◽

Jared Rutter

Keyword(s):

Large Scale ◽

Metabolic Networks ◽

Shortest Paths ◽

Large Data ◽

Differential Regulation ◽

Large Data Sets ◽

Data Sets ◽

Human Metabolism ◽

Experimental Conditions ◽

Systemic Model

Abstract Background Metabolic networks represent all chemical reactions that occur between molecular metabolites in an organism’s cells. They offer biological context in which to integrate, analyze, and interpret omic measurements, but their large scale and extensive connectivity present unique challenges. While it is practical to simplify these networks by placing constraints on compartments and hubs, it is unclear how these simplifications alter the structure of metabolic networks and the interpretation of metabolomic experiments. Results We curated and adapted the latest systemic model of human metabolism and developed customizable tools to define metabolic networks with and without compartmentalization in subcellular organelles and with or without inclusion of prolific metabolite hubs. Compartmentalization made networks larger, less dense, and more modular, whereas hubs made networks larger, more dense, and less modular. When present, these hubs also dominated shortest paths in the network, yet their exclusion exposed the subtler prominence of other metabolites that are typically more relevant to metabolomic experiments. We applied the non-compartmental network without metabolite hubs in a retrospective, exploratory analysis of metabolomic measurements from 5 studies on human tissues. Network clusters identified individual reactions that might experience differential regulation between experimental conditions, several of which were not apparent in the original publications. Conclusions Exclusion of specific metabolite hubs exposes modularity in both compartmental and non-compartmental metabolic networks, improving detection of relevant clusters in omic measurements. Better computational detection of metabolic network clusters in large data sets has potential to identify differential regulation of individual genes, transcripts, and proteins.

Download Full-text

Flexible Non-Intrusive Heat Flux Instrumentation for the AFRL Research Turbine

Volume 5: Heat Transfer, Parts A and B ◽

10.1115/gt2011-46853 ◽

2011 ◽

Cited By ~ 7

Author(s):

Richard J. Anthony ◽

John P. Clark ◽

Stephen W. Kennedy ◽

John M. Finnegan ◽

Dean Johnson ◽

...

Keyword(s):

Thin Film ◽

Heat Flux ◽

Large Scale ◽

Thermal Loading ◽

Pressure Sensors ◽

Large Data ◽

Fast Response ◽

Data Sets ◽

Unsteady Aerodynamic ◽

Analysis System

This paper describes a large scale heat flux instrumentation effort for the AFRL HIT Research Turbine. The work provides a unique amount of high frequency instrumentation to acquire fast response unsteady heat flux in a fully rotational, cooled turbine rig along with unsteady pressure data to investigate thermal loading and unsteady aerodynamic airfoil interactions. Over 1200 dynamic sensors are installed on the 1 & 1/2 stage turbine rig. Airfoils include 658 double-sided thin film gauges for heat flux, 289 fast-response Kulite pressure sensors for unsteady aerodynamic measurements, and over 40 thermocouples. An overview of the instrumentation is given with in-depth focus on the non-commercial thin film heat transfer sensors designed and produced in the Heat Flux Instrumentation Laboratory at WPAFB. The paper further describes the necessary upgrade of data acquisition systems and signal conditioning electronics to handle the increased channel requirements of the HIT Research Turbine. More modern, reliable, and efficient data processing and analysis code provides better handling of large data sets and allows easy integration with the turbine design and analysis system under development at AFRL. Example data from cooled transient blowdown tests in the TRF are included along with measurement uncertainty.

Download Full-text

Research Challenges in Big Data Analytics

Decision Management ◽

10.4018/978-1-5225-1837-2.ch006 ◽

2017 ◽

pp. 83-99

Author(s):

Sivamathi Chokkalingam ◽

Vijayarani S.

Keyword(s):

Big Data ◽

Data Analytics ◽

Large Scale ◽

New Technologies ◽

Big Data Analytics ◽

Large Data ◽

Data Sets ◽

Data Types ◽

Customer Preferences ◽

Research Challenges

The term Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. Big Data is differentiated from traditional technologies in three ways: volume, velocity and variety of data. Big data analytics is the process of analyzing large data sets which contains a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Since Big Data is new emerging field, there is a need for development of new technologies and algorithms for handling big data. The main objective of this paper is to provide knowledge about various research challenges of Big Data analytics. A brief overview of various types of Big Data analytics is discussed in this paper. For each analytics, the paper describes process steps and tools. A banking application is given for each analytics. Some of research challenges and possible solutions for those challenges of big data analytics are also discussed.

Download Full-text

Not all written in stone: interdisciplinary syntheses in echinoderm paleontology

Canadian Journal of Zoology ◽

10.1139/z00-217 ◽

2001 ◽

Vol 79 (7) ◽

pp. 1209-1231 ◽

Cited By ~ 16

Author(s):

Rich Mooi

Keyword(s):

Evolutionary History ◽

Large Scale ◽

Body Wall ◽

Developmental Trajectories ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Primary Target ◽

The Past ◽

Broad Scale

The fossil record of the Echinodermata is relatively complete, and is represented by specimens retaining an abundance of features comparable to that found in extant forms. This yields a half-billion-year record of evolutionary novelties unmatched in any other major group, making the Echinodermata a primary target for studies of biological change. Not all of this change can be understood by studying the rocks alone, leading to synthetic research programs. Study of literature from the past 20 years indicates that over 1400 papers on echinoderm paleontology appeared in that time, and that overall productivity has remained almost constant. Analysis of papers appearing since 1990 shows that research is driven by new finds including, but not restricted to, possible Precambrian echinoderms, bizarre new edrioasteroids, early crinoids, exquisitely preserved homalozoans, echinoids at the K-T boundary, and Antarctic echinoids, stelleroids, and crinoids. New interpretations of echinoderm body wall homologies, broad-scale syntheses of embryological information, the study of developmental trajectories through molecular markers, and the large-scale ecological and phenotypic shifts being explored through morphometry and analyses of large data sets are integrated with study of the fossils themselves. Therefore, recent advances reveal a remarkable and continuing synergistic expansion in our understanding of echinoderm evolutionary history.

Download Full-text

Performance Optimization System for Hadoop and Spark Frameworks

Cybernetics and Information Technologies ◽

10.2478/cait-2020-0056 ◽

2020 ◽

Vol 20 (6) ◽

pp. 5-17

Author(s):

Hrachya Astsatryan ◽

Aram Kocharyan ◽

Daniel Hagimont ◽

Arthur Lalayan

Keyword(s):

Performance Optimization ◽

Large Scale ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Apache Hadoop ◽

Compression Factor ◽

Large Scale Data ◽

Additional Processing ◽

Mapreduce Model

AbstractThe optimization of large-scale data sets depends on the technologies and methods used. The MapReduce model, implemented on Apache Hadoop or Spark, allows splitting large data sets into a set of blocks distributed on several machines. Data compression reduces data size and transfer time between disks and memory but requires additional processing. Therefore, finding an optimal tradeoff is a challenge, as a high compression factor may underload Input/Output but overload the processor. The paper aims to present a system enabling the selection of the compression tools and tuning the compression factor to reach the best performance in Apache Hadoop and Spark infrastructures based on simulation analyzes.

Download Full-text

Large-scale sequence comparisons with sourmash

10.1101/687285 ◽

2019 ◽

Author(s):

N. Tessa Pierce ◽

Luiz Irber ◽

Taylor Reiter ◽

Phillip Brooks ◽

C. Titus Brown

Keyword(s):

Software Package ◽

Large Scale ◽

Sequence Similarity ◽

Protein Sequences ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Sequence Comparisons ◽

Large Databases ◽

Scale Sequence

The sourmash software package uses MinHash-based sketching to create “signatures”, compressed representations of DNA, RNA, and protein sequences, that can be stored, searched, explored, and taxonomically annotated. sourmash signatures can be used to estimate sequence similarity between very large data sets quickly and in low memory, and can be used to search large databases of genomes for matches to query genomes and metagenomes. sourmash is implemented in C++, Rust, and Python, and is freely available under the BSD license at http://github.com/dib-lab/sourmash.

Download Full-text

Space-Time Unit-Level EBLUP for Large Data Sets

Journal of Official Statistics ◽

10.1515/jos-2017-0004 ◽

2017 ◽

Vol 33 (1) ◽

pp. 61-77 ◽

Cited By ~ 2

Author(s):

Michele D’Aló ◽

Stefano Falorsi ◽

Fabrizio Solari

Keyword(s):

Mixed Models ◽

Small Area ◽

Large Scale ◽

Large Data ◽

Cross Product ◽

Large Data Sets ◽

Data Sets ◽

Unit Level ◽

Product Estimator ◽

New Formulation

Abstract Most important large-scale surveys carried out by national statistical institutes are the repeated survey type, typically intended to produce estimates for several parameters of the whole population, as well as parameters related to some subpopulations. Small area estimation techniques are becoming more and more important for the production of official statistics where direct estimators are not able to produce reliable estimates. In order to exploit data from different survey cycles, unit-level linear mixed models with area and time random effects can be considered. However, the large amount of data to be processed may cause computational problems. To overcome the computational issues, a reformulation of predictors and the correspondent mean cross product estimator is given. The R code based on the new formulation enables the elaboration of about 7.2 millions of data records in a matter of minutes.

Download Full-text

Large-scale sequence comparisons with sourmash

F1000Research ◽

10.12688/f1000research.19675.1 ◽

2019 ◽

Vol 8 ◽

pp. 1006 ◽

Cited By ~ 18

Author(s):

N. Tessa Pierce ◽

Luiz Irber ◽

Taylor Reiter ◽

Phillip Brooks ◽

C. Titus Brown

Keyword(s):

Software Package ◽

Large Scale ◽

Sequence Similarity ◽

Protein Sequences ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Sequence Comparisons ◽

Large Databases ◽

Scale Sequence

Download Full-text

Synthesizing Observations and Theory to Understand Galactic Magnetic Fields: Progress and Challenges

Galaxies ◽

10.3390/galaxies8010004 ◽

2019 ◽

Vol 8 (1) ◽

pp. 4 ◽

Cited By ~ 11

Author(s):

Rainer Beck ◽

Luke Chamandy ◽

Ed Elson ◽

Eric G. Blackman

Keyword(s):

Magnetic Fields ◽

Large Scale ◽

Mean Field ◽

Large Data ◽

Data Sets ◽

Dynamo Theory ◽

Galactic Magnetic Fields ◽

Ngc 6946 ◽

Upper Limits ◽

New Instruments

Constraining dynamo theories of magnetic field origin by observation is indispensable but challenging, in part because the basic quantities measured by observers and predicted by modelers are different. We clarify these differences and sketch out ways to bridge the divide. Based on archival and previously unpublished data, we then compile various important properties of galactic magnetic fields for nearby spiral galaxies. We consistently compute strengths of total, ordered, and regular fields, pitch angles of ordered and regular fields, and we summarize the present knowledge on azimuthal modes, field parities, and the properties of non-axisymmetric spiral features called magnetic arms. We review related aspects of dynamo theory, with a focus on mean-field models and their predictions for large-scale magnetic fields in galactic discs and halos. Furthermore, we measure the velocity dispersion of H i gas in arm and inter-arm regions in three galaxies, M 51, M 74, and NGC 6946, since spiral modulation of the root-mean-square turbulent speed has been proposed as a driver of non-axisymmetry in large-scale dynamos. We find no evidence for such a modulation and place upper limits on its strength, helping to narrow down the list of mechanisms to explain magnetic arms. Successes and remaining challenges of dynamo models with respect to explaining observations are briefly summarized, and possible strategies are suggested. With new instruments like the Square Kilometre Array (SKA), large data sets of magnetic and non-magnetic properties from thousands of galaxies will become available, to be compared with theory.

Download Full-text