Data compilation on the biological response to ocean acidification: an update

Y. Yang; L. Hansson; J.-P. Gattuso

doi:10.5194/essdd-8-889-2015

Data compilation on the biological response to ocean acidification: an update

Earth System Science Data Discussions ◽

10.5194/essdd-8-889-2015 ◽

2015 ◽

Vol 8 (2) ◽

pp. 889-912 ◽

Cited By ~ 4

Author(s):

Y. Yang ◽

L. Hansson ◽

J.-P. Gattuso

Keyword(s):

Best Practices ◽

Ocean Acidification ◽

Biological Response ◽

Data Sets ◽

Data Archiving ◽

Data Comparison ◽

Data Compilation ◽

Data Points ◽

Study Sites ◽

Natural Communities

Abstract. The exponential growth of studies on the biological response to ocean acidification over the last few decades has generated a large amount of data. To facilitate data comparison, a data compilation hosted at the data publisher PANGAEA was initiated in 2008 and is updated on a regular basis (doi:10.1594/PANGAEA.149999). By January 2015, a total of 581 data sets (over 4 000 000 data points) from 539 papers had been archived. Here we present the developments of this data compilation five years since its first description by Nisumaa et al. (2010). Most of study sites from which data archived are still in the Northern Hemisphere and the number of archived data from studies from the Southern Hemisphere and polar oceans are still relatively low. Data from 60 studies that investigated the response of a mix of organisms or natural communities were all added after 2010, indicating a welcomed shift from the study of individual organisms to communities and ecosystems. The initial imbalance of considerably more data archived on calcification and primary production than on other processes has improved. There is also a clear tendency towards more data archived from multifactorial studies after 2010. For easier and more effective access to ocean acidification data, the ocean acidification community is strongly encouraged to contribute to the data archiving effort, and help develop standard vocabularies describing the variables and define best practices for archiving ocean acidification data.

Download Full-text

Data compilation on the biological response to ocean acidification: an update

Earth System Science Data ◽

10.5194/essd-8-79-2016 ◽

2016 ◽

Vol 8 (1) ◽

pp. 79-87 ◽

Cited By ~ 9

Author(s):

Y. Yang ◽

L. Hansson ◽

J.-P. Gattuso

Keyword(s):

Best Practices ◽

Ocean Acidification ◽

Biological Response ◽

Data Sets ◽

Data Archiving ◽

Data Comparison ◽

Data Compilation ◽

Data Points ◽

Study Sites ◽

Natural Communities

Abstract. The exponential growth of studies on the biological response to ocean acidification over the last few decades has generated a large amount of data. To facilitate data comparison, a data compilation hosted at the data publisher PANGAEA was initiated in 2008 and is updated on a regular basis (doi:10.1594/PANGAEA.149999). By January 2015, a total of 581 data sets (over 4 000 000 data points) from 539 papers had been archived. Here we present the developments of this data compilation 5 years since its first description by Nisumaa et al. (2010). Most of the study sites from which data have been archived are in the Northern Hemisphere and the number of archived data from studies from the Southern Hemisphere and polar oceans is still relatively low. Data from 60 studies that investigated the response of a mix of organisms or natural communities were all added after 2010, indicating a welcome shift from the study of individual organisms to communities and ecosystems. The initial imbalance of considerably more data archived on calcification and primary production than on other processes has improved. There is also a clear tendency towards more data archived from multifactorial studies after 2010. For easier and more effective access to ocean acidification data, the ocean acidification community is strongly encouraged to contribute to the data archiving effort, and help develop standard vocabularies describing the variables and define best practices for archiving ocean acidification data.

Download Full-text

ON GENERATING DIGITAL ELEVATION MODELS FROM LIDAR DATA – RESOLUTION VERSUS ACCURACY AND TOPOGRAPHIC WETNESS INDEX INDICES IN NORTHERN PEATLANDS

Geodesy and Cartography ◽

10.3846/20296991.2012.702983 ◽

2012 ◽

Vol 38 (2) ◽

pp. 57-69 ◽

Cited By ~ 12

Author(s):

Abdulghani Hasan ◽

Petter Pilesjö ◽

Andreas Persson

Keyword(s):

Large Scale ◽

Drainage Area ◽

Data Sets ◽

Topographic Wetness Index ◽

Absolute Deviation ◽

Digital Elevation ◽

Elevation Data ◽

Scale Modelling ◽

Data Points ◽

Emission Modelling

Global change and GHG emission modelling are dependent on accurate wetness estimations for predictions of e.g. methane emissions. This study aims to quantify how the slope, drainage area and the TWI vary with the resolution of DEMs for a flat peatland area. Six DEMs with spatial resolutions from 0.5 to 90 m were interpolated with four different search radiuses. The relationship between accuracy of the DEM and the slope was tested. The LiDAR elevation data was divided into two data sets. The number of data points facilitated an evaluation dataset with data points not more than 10 mm away from the cell centre points in the interpolation dataset. The DEM was evaluated using a quantile-quantile test and the normalized median absolute deviation. It showed independence of the resolution when using the same search radius. The accuracy of the estimated elevation for different slopes was tested using the 0.5 meter DEM and it showed a higher deviation from evaluation data for steep areas. The slope estimations between resolutions showed differences with values that exceeded 50%. Drainage areas were tested for three resolutions, with coinciding evaluation points. The model ability to generate drainage area at each resolution was tested by pair wise comparison of three data subsets and showed differences of more than 50% in 25% of the evaluated points. The results show that consideration of DEM resolution is a necessity for the use of slope, drainage area and TWI data in large scale modelling.

Download Full-text

PS2-54: Best Practices: Improving Quality and Reliability in Research Data Sets

Clinical Medicine & Research ◽

10.3121/cmr.2013.1176.ps2-54 ◽

2013 ◽

Vol 11 (3) ◽

pp. 157-157

Author(s):

L. McFarland ◽

J. Richter ◽

C. Bredfeldt

Keyword(s):

Best Practices ◽

Research Data ◽

Data Sets ◽

Quality And Reliability

Download Full-text

Inter- and Intralaboratory Comparison of JC Polyomavirus Antibody Testing Using Two Different Virus-Like Particle-Based Assays

Clinical and Vaccine Immunology ◽

10.1128/cvi.00489-14 ◽

2014 ◽

Vol 21 (11) ◽

pp. 1581-1588 ◽

Cited By ~ 11

Author(s):

Piotr Kardas ◽

Mohammadreza Sadeghi ◽

Fabian H. Weissbach ◽

Tingting Chen ◽

Lea Hedman ◽

...

Keyword(s):

Risk Stratification ◽

Data Sets ◽

Antibody Testing ◽

Jc Polyomavirus ◽

Virus Like Particle ◽

Interlaboratory Variability ◽

Increased Risk ◽

Basic Protocol ◽

Data Points ◽

Reference Serum

ABSTRACTJC polyomavirus (JCPyV) can cause progressive multifocal leukoencephalopathy (PML), a debilitating, often fatal brain disease in immunocompromised patients. JCPyV-seropositive multiple sclerosis (MS) patients treated with natalizumab have a 2- to 10-fold increased risk of developing PML. Therefore, JCPyV serology has been recommended for PML risk stratification. However, different antibody tests may not be equivalent. To study intra- and interlaboratory variability, sera from 398 healthy blood donors were compared in 4 independent enzyme-linked immunoassay (ELISA) measurements generating >1,592 data points. Three data sets (Basel1, Basel2, and Basel3) used the same basic protocol but different JCPyV virus-like particle (VLP) preparations and introduced normalization to a reference serum. The data sets were also compared with an independent method using biotinylated VLPs (Helsinki1). VLP preadsorption reducing ≥35% activity was used to identify seropositive sera. The results indicated that Basel1, Basel2, Basel3, and Helsinki1 were similar regarding overall data distribution (P= 0.79) and seroprevalence (58.0, 54.5, 54.8, and 53.5%, respectively;P= 0.95). However, intra-assay intralaboratory comparison yielded 3.7% to 12% discordant results, most of which were close to the cutoff (0.080 < optical density [OD] < 0.250) according to Bland-Altman analysis. Introduction of normalization improved overall performance and reduced discordance. The interlaboratory interassay comparison between Basel3 and Helsinki1 revealed only 15 discordant results, 14 (93%) of which were close to the cutoff. Preadsorption identified specificities of 99.44% and 97.78% and sensitivities of 99.54% and 95.87% for Basel3 and Helsinki1, respectively. Thus, normalization to a preferably WHO-approved reference serum, duplicate testing, and preadsorption for samples around the cutoff may be necessary for reliable JCPyV serology and PML risk stratification.

Download Full-text

A Support Based Initialization Algorithm for Categorical Data Clustering

Journal of Information Technology Research ◽

10.4018/jitr.2018040104 ◽

2018 ◽

Vol 11 (2) ◽

pp. 53-67

Author(s):

Ajay Kumar ◽

Shishir Kumar

Keyword(s):

Categorical Data ◽

Selection Process ◽

Numerical Data ◽

Real Data ◽

Data Sets ◽

Data Set ◽

Data Object ◽

Data Points ◽

Wu Method ◽

Selection Algorithms

Several initial center selection algorithms are proposed in the literature for numerical data, but the values of the categorical data are unordered so, these methods are not applicable to a categorical data set. This article investigates the initial center selection process for the categorical data and after that present a new support based initial center selection algorithm. The proposed algorithm measures the weight of unique data points of an attribute with the help of support and then integrates these weights along the rows, to get the support of every row. Further, a data object having the largest support is chosen as an initial center followed by finding other centers that are at the greatest distance from the initially selected center. The quality of the proposed algorithm is compared with the random initial center selection method, Cao's method, Wu method and the method introduced by Khan and Ahmad. Experimental analysis on real data sets shows the effectiveness of the proposed algorithm.

Download Full-text

Mahalanobis distance informed by clustering

Information and Inference A Journal of the IMA ◽

10.1093/imaiai/iay011 ◽

2018 ◽

Vol 8 (2) ◽

pp. 377-406

Author(s):

Almog Lahav ◽

Ronen Talmon ◽

Yuval Kluger

Keyword(s):

Mahalanobis Distance ◽

High Dimensional Data ◽

Hidden Variables ◽

Real Data ◽

Risk Groups ◽

High Dimensional ◽

Data Sets ◽

Kaplan Meier ◽

Data Points ◽

Survival Plot

Abstract A fundamental question in data analysis, machine learning and signal processing is how to compare between data points. The choice of the distance metric is specifically challenging for high-dimensional data sets, where the problem of meaningfulness is more prominent (e.g. the Euclidean distance between images). In this paper, we propose to exploit a property of high-dimensional data that is usually ignored, which is the structure stemming from the relationships between the coordinates. Specifically, we show that organizing similar coordinates in clusters can be exploited for the construction of the Mahalanobis distance between samples. When the observable samples are generated by a nonlinear transformation of hidden variables, the Mahalanobis distance allows the recovery of the Euclidean distances in the hidden space. We illustrate the advantage of our approach on a synthetic example where the discovery of clusters of correlated coordinates improves the estimation of the principal directions of the samples. Our method was applied to real data of gene expression for lung adenocarcinomas (lung cancer). By using the proposed metric we found a partition of subjects to risk groups with a good separation between their Kaplan–Meier survival plot.

Download Full-text

Analysis of erroneous data entries in paper based and electronic data collection

10.21203/rs.2.11983/v4 ◽

2019 ◽

Author(s):

Benedikt Ley ◽

Komal Raj Rijal ◽

Jutta Marfurt ◽

Nabaraj Adhikari ◽

Megha Banjara ◽

...

Keyword(s):

Data Collection ◽

Data Entry ◽

Categorical Variables ◽

Data Sets ◽

Continuous Variables ◽

Electronic Data ◽

Suitable Alternative ◽

Electronic Data Collection ◽

Data Points ◽

Time Variables

Abstract Objective: Electronic data collection (EDC) has become a suitable alternative to paper based data collection (PBDC) in biomedical research even in resource poor settings. During a survey in Nepal, data were collected using both systems and data entry errors compared between both methods. Collected data were checked for completeness, values outside of realistic ranges, internal logic and date variables for reasonable time frames. Variables were grouped into 5 categories and the number of discordant entries were compared between both systems, overall and per variable category. Results: Data from 52 variables collected from 358 participants were available. Discrepancies between both data sets were found in 12.6% of all entries (2352/18,616). Differences between data points were identified in 18.0% (643/3,580) of continuous variables, 15.8% of time variables (113/716), 13.0% of date variables (140/1,074), 12.0% of text variables (86/716), and 10.9% of categorical variables (1,370/12,530). Overall 64% (1,499/2,352) of all discrepancies were due to data omissions, 76.6% (1,148/1,499) of missing entries were among categorical data. Omissions in PBDC (n=1002) were twice as frequent as in EDC (n=497, p<0.001). Data omissions, specifically among categorical variables were identified as the greatest source of error. If designed accordingly, EDC can address this short fall effectively.

Download Full-text

VOLUME BASED DTM GENERATION FROM VERY HIGH RESOLUTION PHOTOGRAMMETRIC DSMS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xli-b3-83-2016 ◽

2016 ◽

Vol XLI-B3 ◽

pp. 83-90 ◽

Cited By ~ 5

Author(s):

B. Piltz ◽

S. Bayer ◽

A. M. Poznanska

Keyword(s):

Ground Surface ◽

Processing Parameters ◽

Data Sets ◽

Maximum Width ◽

Directional Filtering ◽

Very High Spatial Resolution ◽

Efficient Scheme ◽

Data Points ◽

Minimum Height ◽

Very High

In this paper we propose a new algorithm for digital terrain (DTM) model reconstruction from very high spatial resolution digital surface models (DSMs). It represents a combination of multi-directional filtering with a new metric which we call normalized volume above ground to create an above-ground mask containing buildings and elevated vegetation. This mask can be used to interpolate a ground-only DTM. The presented algorithm works fully automatically, requiring only the processing parameters minimum height and maximum width in metric units. Since slope and breaklines are not decisive criteria, low and smooth and even very extensive flat objects are recognized and masked. The algorithm was developed with the goal to generate the normalized DSM for automatic 3D building reconstruction and works reliably also in environments with distinct hillsides or terrace-shaped terrain where conventional methods would fail. A quantitative comparison with the ISPRS data sets Potsdam and Vaihingen show that 98-99% of all building data points are identified and can be removed, while enough ground data points (~66%) are kept to be able to reconstruct the ground surface. Additionally, we discuss the concept of size dependent height thresholds and present an efficient scheme for pyramidal processing of data sets reducing time complexity to linear to the number of pixels, O(WH).

Download Full-text

Summary of Affinity Propagation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.268-270.811 ◽

2011 ◽

Vol 268-270 ◽

pp. 811-816

Author(s):

Yong Zhou ◽

Yan Xing

Keyword(s):

Clustering Algorithm ◽

Large Data ◽

Large Data Sets ◽

Affinity Propagation ◽

Damping Factor ◽

Data Sets ◽

Similarity Matrix ◽

Data Points

Affinity Propagation(AP)is a new clustering algorithm, which is based on the similarity matrix between pairs of data points and messages are exchanged between data points until clustering result emerges. It is efficient and fast , and it can solve the clustering on large data sets. But the traditional Affinity Propagation has many limitations, this paper introduces the Affinity Propagation, and analyzes in depth the advantages and limitations of it, focuses on the improvements of the algorithm — improve the similarity matrix, adjust the preference and the damping-factor, combine with other algorithms. Finally, discusses the development of Affinity Propagation.

Download Full-text

An Incremental Isomap Method for Hyperspectral Dimensionality Reduction and Classification

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.7.445 ◽

2021 ◽

Vol 87 (6) ◽

pp. 445-455

Author(s):

Yi Ma ◽

Zezhong Zheng ◽

Yutang Ma ◽

Mingcang Zhu ◽

Ran Huang ◽

...

Keyword(s):

Manifold Learning ◽

Nearest Neighbor ◽

Hyperspectral Image ◽

Hyperspectral Data ◽

Training Data ◽

Support Vector ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Data Points

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.

Download Full-text