Assessing statistical and spatial validity of sediment survey design and sampling densities: examples from Lake Erie

Danielle E. Mitchell; K. Wayne Forsythe; Chris H. Marvin; Debbie A. Burniston

doi:10.2166/wqrj.2018.029

Assessing statistical and spatial validity of sediment survey design and sampling densities: examples from Lake Erie

Water Quality Research Journal ◽

10.2166/wqrj.2018.029 ◽

2018 ◽

Vol 53 (3) ◽

pp. 118-132

Author(s):

Danielle E. Mitchell ◽

K. Wayne Forsythe ◽

Chris H. Marvin ◽

Debbie A. Burniston

Keyword(s):

Spatial Interpolation ◽

Lake Erie ◽

Survey Design ◽

Original Data ◽

Original Dataset ◽

Sampling Locations ◽

Data Density ◽

Data Points ◽

Procedural Costs ◽

Point Data

Abstract Spatial interpolation methods translate sediment contamination point data into informative area-based visualizations. Lake Erie was first sampled in 1971 based on a survey grid of 263 locations. Due to procedural costs, the 2014 survey was reduced to 34 sampling locations mostly located in deep offshore regions of the lake. Using the 1971 dataset, this study identifies the minimum sampling density at which statistically valid, and spatially accurate predictions can be made using ordinary kriging. Randomly down-sampled subsets at 10% intervals of the 1971 survey were created to include at least one set of data points with a smaller sample size than that of the 2014 dataset. Regression analyses of predicted contamination values assessed spatial autocorrelation between kriged surfaces created from the down-sampled subsets and the original dataset. Subsets at 10% and 20% of the original data density accurately predicted 51% and 75% (respectively) of the original dataset's predictions. Subsets representing 70%, 80% and 90% of the original data density accurately predicted 88%, 90% and 97% of the original dataset's predictions. Although all subsets proved to be statistically valid, sampling densities below 0.002 locations/km2 are likely to create very generalized contamination maps from which environmental decisions might not be justified.

Get full-text (via PubEx)

Privacy Preservation using (L, D) Inference Model Based on Dependency Identification Information Gain

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1196.0986s319 ◽

2019 ◽

Vol 8 (6S3) ◽

pp. 1170-1173

Keyword(s):

Data Mining ◽

Information Gain ◽

Original Data ◽

Perturbation Approach ◽

Sensitive Information ◽

Functional Dependencies ◽

Inference Model ◽

Data Set ◽

Data Mining Techniques ◽

Original Dataset

The improvement of an information processing and Memory capacity, the vast amount of data is collected for various data analyses purposes. Data mining techniques are used to get knowledgeable information. The process of extraction of data by using data mining techniques the data get discovered publically and this leads to breaches of specific privacy data. Privacypreserving data mining is used to provide to protection of sensitive information from unwanted or unsanctioned disclosure. In this paper, we analysis the problem of discovering similarity checks for functional dependencies from a given dataset such that application of algorithm (l, d) inference with generalization can anonymised the micro data without loss in utility. [8] This work has presented Functional dependency based perturbation approach which hides sensitive information from the user, by applying (l, d) inference model on the dependency attributes based on Information Gain. This approach works on both categorical and numerical attributes. The perturbed data set does not affects the original dataset it maintains the same or very comparable patterns as the original data set. Hence the utility of the application is always high, when compared to other data mining techniques. The accuracy of the original and perturbed datasets is compared and analysed using tools, data mining classification algorithm.

Get full-text (via PubEx)

A Robust Noise Resistant Algorithm for POI Identification from Flickr Data

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/460 ◽

2017 ◽

Cited By ~ 3

Author(s):

Yiyang Yang ◽

Zhiguo Gong ◽

Qing Li ◽

Leong Hou U ◽

Ruichu Cai ◽

...

Keyword(s):

Social Media ◽

Joint Space ◽

Zero Crossing ◽

Social Media Data ◽

Local Maxima ◽

Gradient Ascent ◽

Data Density ◽

Density Values ◽

Data Points ◽

Media Data

Point of Interests (POI) identification using social media data (e.g. Flickr, Microblog) is one of the most popular research topics in recent years. However, there exist large amounts of noises (POI irrelevant data) in such crowd-contributed collections. Traditional solutions to this problem is to set a global density threshold and remove the data point as noise if its density is lower than the threshold. However, the density values vary significantly among POIs. As the result, some POIs with relatively lower density could not be identified. To solve the problem, we propose a technique based on the local drastic changes of the data density. First we define the local maxima of the density function as the Urban POIs, and the gradient ascent algorithm is exploited to assign data points into different clusters. To remove noises, we incorporate the Laplacian Zero-Crossing points along the gradient ascent process as the boundaries of the POI. Points located outside the POI region are regarded as noises. Then the technique is extended into the geographical and textual joint space so that it can make use of the heterogeneous features of social media. The experimental results show the significance of the proposed approach in removing noises.

Get full-text (via PubEx)

Distributionally Adversarial Attack

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33012253 ◽

2019 ◽

Vol 33 ◽

pp. 2253-2260 ◽

Cited By ~ 10

Author(s):

Tianhang Zheng ◽

Changyou Chen ◽

Kui Ren

Keyword(s):

Data Distribution ◽

Original Data ◽

First Order ◽

Risk Optimization ◽

Wide Range ◽

Data Points ◽

Adversarial Attack ◽

Projected Gradient Descent ◽

Original Objective ◽

Direct Dependency

Recent work on adversarial attack has shown that Projected Gradient Descent (PGD) Adversary is a universal first-order adversary, and the classifier adversarially trained by PGD is robust against a wide range of first-order attacks. It is worth noting that the original objective of an attack/defense model relies on a data distribution p(x), typically in the form of risk maximization/minimization, e.g., max/min Ep(x) L(x) with p(x) some unknown data distribution and L(·) a loss function. However, since PGD generates attack samples independently for each data sample based on L(·), the procedure does not necessarily lead to good generalization in terms of risk optimization. In this paper, we achieve the goal by proposing distributionally adversarial attack (DAA), a framework to solve an optimal adversarial-data distribution, a perturbed distribution that satisfies the L∞ constraint but deviates from the original data distribution to increase the generalization risk maximally. Algorithmically, DAA performs optimization on the space of potential data distributions, which introduces direct dependency between all data points when generating adversarial samples. DAA is evaluated by attacking state-of-the-art defense models, including the adversarially-trained models provided by MIT MadryLab. Notably, DAA ranks the first place on MadryLab’s white-box leaderboards, reducing the accuracy of their secret MNIST model to 88.56% (with l∞ perturbations of ε = 0.3) and the accuracy of their secret CIFAR model to 44.71% (with l∞ perturbations of ε = 8.0). Code for the experiments is released on https://github.com/tianzheng4/Distributionally-Adversarial-Attack.

Get full-text (via PubEx)

Smooth regional estimation of low-flow indices: physiographical space based interpolation and top-kriging

Hydrology and Earth System Sciences ◽

10.5194/hess-15-715-2011 ◽

2011 ◽

Vol 15 (3) ◽

pp. 715-727 ◽

Cited By ~ 43

Author(s):

S. Castiglioni ◽

A. Castellarin ◽

A. Montanari ◽

J. O. Skøien ◽

G. Laaha ◽

...

Keyword(s):

Spatial Interpolation ◽

Central Italy ◽

Low Flow ◽

Flow Index ◽

Ungauged Basins ◽

Low Flows ◽

Interpolation Techniques ◽

Point Data ◽

Leave One Out ◽

Low Flow Index

Abstract. Recent studies highlight that spatial interpolation techniques of point data can be effectively applied to the problem of regionalization of hydrometric information. This study compares two innovative interpolation techniques for the prediction of low-flows in ungauged basins. The first one, named Physiographical-Space Based Interpolation (PSBI), performs the spatial interpolation of the desired streamflow index (e.g., annual streamflow, low-flow index, flood quantile, etc.) in the space of catchment descriptors. The second technique, named Topological kriging or Top-kriging, predicts the variable of interest along river networks taking both the area and nested nature of catchments into account. PSBI and Top-kriging are applied for the regionalization of Q355 (i.e., a low-flow index that indicates the streamflow that is equalled or exceeded 355 days in a year, on average) over a broad geographical region in central Italy, which contains 51 gauged catchments. The two techniques are cross-validated through a leave-one-out procedure at all available gauges and applied to a subregion to produce a continuous estimation of Q355 along the river network extracted from a 90m elevation model. The results of the study show that Top-kriging and PSBI present complementary features. Top-kriging outperforms PSBI at larger river branches while PSBI outperforms Top-kriging for headwater catchments. Overall, they have comparable performances (Nash-Sutcliffe efficiencies in cross-validation of 0.89 and 0.83, respectively). Both techniques provide plausible and accurate predictions of Q355 in ungauged basins and represent promising opportunities for regionalization of low-flows.

Get full-text (via PubEx)

The Effects of Display Format and Data Density on Time Spent Reading Statistics in Text, Tables and Graphs

Journalism Quarterly ◽

10.1177/107769909307000116 ◽

1993 ◽

Vol 70 (1) ◽

pp. 140-149 ◽

Cited By ~ 11

Author(s):

James D. Kelly

Keyword(s):

Cognitive Processing ◽

Latin Square ◽

Time Variable ◽

Information Recall ◽

Data Density ◽

Data Points ◽

Text Presentation ◽

Processing Effort ◽

Main Effects ◽

Answering Questions

An experiment using a 3 × 3 × 3 Latin square design tested the effects of the number of data points and type of statistical display on time spent answering questions about the information. The design allowed within-subject comparisons of main effects, and the procedure was administered by a Macintosh computer. The results, that tables and graphs are more efficiently processed than text presentation of the same data, partially confirm earlier studies that used information recall as the dependent variable, but suggest the time variable is a more realistic measure of cognitive processing effort.

Get full-text (via PubEx)

Glacial Lake Whittlesey: the probable ice frontal position in the eastern end of the Erie basin

Canadian Journal of Earth Sciences ◽

10.1139/e79-051 ◽

1979 ◽

Vol 16 (3) ◽

pp. 568-574 ◽

Cited By ~ 10

Author(s):

Peter J. Barnett

Keyword(s):

Lake Erie ◽

Plane Curve ◽

Glacial Lake ◽

Glacial Ice ◽

Data Points ◽

Glacial Advance

Glacial Lake Whittlesey is related to the glacial advance during the Port Huron Stadial, approximately 13 000 radiocarbon years BP. This episode has previously been thought to be represented, in the eastern end of the Lake Erie basin, by the Paris and Galt Moraines and the deposition of the Wentworth Till in southern Ontario.However, the location of Whittlesey shoreline features suggests that this lake existed during the subsequent advance which deposited the Halton Till. A 'Halton' ice-contact delta, at Summit, Ontario, coincides with the Lake Whittlesey water plane curve constructed from 72 data points around Lakes Huron and Erie.It is suggested, therefore, that the glacial ice advance represented by the Halton Till, based on its relationship to Lake Whittlesey, has an age of about 13 000 radiocarbon years BP (Port Huron Stadial) and that the Wentworth Till (Paris and Galt Moraines) is of Port Bruce Stadial age.

Get full-text (via PubEx)

Study on Real-Time Traffic Data Modification Method Based on the Improved Etkin Interpolation Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.1072 ◽

2014 ◽

Vol 513-517 ◽

pp. 1072-1076

Author(s):

Qiang Gao ◽

Yuan Li Gu ◽

Teng Hua Zhang

Keyword(s):

Real Time ◽

Successive Approximation ◽

Original Data ◽

Successive Approximation Method ◽

Engineering Practice ◽

Interpolation Algorithm ◽

Traffic Data ◽

Time Data ◽

Real Time Traffic ◽

Point Data

Identification and modification of real-time traffic data has been the basic and critical part in the intelligent transportation system.Through the research to a large number of data,the original data is divided into the correct data,the irregular time-point data,inaccurate detection data,missing data and event data. Etkin interpolation algorithm is to gain the values of specified missing value by a successive approximation method with high order polynomial and implemented by using a successive approximation of multiple linear combinations.The paper selects improved Etkin interpolation algorithm to correct the traffic data and makes use of the DongZhiMen Bridge North 728 meters' 2001 detector data for example.The algorithm not only considers the practicability in the engineering practice,but also improves the accuracy of real-time data.

Get full-text (via PubEx)

Geostatistical regionalization of low-flow indices: PSBI and Top-Kriging

Hydrology and Earth System Sciences Discussions ◽

10.5194/hessd-7-7231-2010 ◽

2010 ◽

Vol 7 (5) ◽

pp. 7231-7261 ◽

Cited By ~ 2

Author(s):

S. Castiglioni ◽

A. Castellarin ◽

A. Montanari ◽

J. O. Skøien ◽

G. Laaha ◽

...

Keyword(s):

Spatial Interpolation ◽

Central Italy ◽

Low Flow ◽

Flow Index ◽

Ungauged Basins ◽

Low Flows ◽

Index Flood ◽

Point Data ◽

Leave One Out ◽

Continuous Estimation

Abstract. Recent studies highlight that geostatistical interpolation, which has been originally developed for the spatial interpolation of point data, can be effectively applied to the problem of regionalization of hydrometric information. This study compares two innovative geostatistical approaches for the prediction of low-flows in ungauged basins. The first one, named Physiographic-Space Based Interpolation (PSBI), performs the spatial interpolation of the desired streamflow index (e.g., annual streamflow, low-flow index, flood quantile, etc.) in the space of catchment descriptors. The second technique, named Topological kriging or Top-Kriging, predicts the variable of interest along river networks taking both the area and nested nature of catchments into account. PSBI and Top-Kriging are applied for the regionalization of Q355 (i.e., the streamflow that is equalled or exceeded 355 days in a year, on average) over a broad geographical region in central Italy, which contains 51 gauged catchments. Both techniques are cross-validated through a leave-one-out procedure at all available gauges and applied to a subregion to produce a continuous estimation of Q355 along the river network extracted from a 90 m DEM. The results of the study show that Top-Kriging and PSBI present complementary features and have comparable performances (Nash-Sutcliffe efficiencies in cross-validation of 0.89 and 0.83, respectively). Both techniques provide plausible and accurate predictions of Q355 in ungauged basins and represent promising opportunities for regionalization of low-flows.

Get full-text (via PubEx)

FEMALE ADVANTAGE? MANAGEMENT AND FINANCIAL PERFORMANCE IN MICROFINANCE

Verslas teorija ir praktika ◽

10.3846/btp.2020.11354 ◽

2020 ◽

Vol 21 (1) ◽

pp. 83-91 ◽

Cited By ~ 2

Author(s):

Sigurdur Gudjonsson ◽

Kari Kristinsson ◽

Haukur Freyr Gylfason ◽

Inga Minelgaite

Keyword(s):

Financial Performance ◽

Original Data ◽

Management Team ◽

Board Members ◽

Microfinance Institutions ◽

Tobit Regression ◽

Female Managers ◽

Original Dataset ◽

Female Presence ◽

Loan Officers

The purpose of the article is to investigate whether female presence in microfinance institutions’ management team, i.e. board members, managers and loan officers, will improve their financial performance. We combine financial data on MFIs that is available from the MIX Market database with original data on the gender composition of MFIs’ management team, who include board members, managers and loan officers. This original dataset of 223 MFIs is analyzed using Logit-Tobit regression models with return on assets (ROA) as the dependent variable and proportion of female board members, female loan officers and female managers as the main independent variables. We find that a higher proportion of female managers and female loan officers improve financial performance in microfinance, while a higher proportion of female board members does not. Our results indicate that a major contributor to the financial sustainability of microfinance institutions is having a higher rate of women in vital decision-making roles, especially lower level management positions.

Get full-text (via PubEx)

Fast and simple dataset selection for machine learning

at - Automatisierungstechnik ◽

10.1515/auto-2019-0010 ◽

2019 ◽

Vol 67 (10) ◽

pp. 833-842

Author(s):

Timm J. Peter ◽

Oliver Nelles

Keyword(s):

New Method ◽

Complex Method ◽

Optimal Point ◽

Point Distribution ◽

New Approach ◽

Original Dataset ◽

Representative Subset ◽

New Strategy ◽

Data Points ◽

Space Filling Design

Abstract The task of data reduction is discussed and a novel selection approach which allows to control the optimal point distribution of the selected data subset is proposed. The proposed approach utilizes the estimation of probability density functions (pdfs). Due to its structure, the new method is capable of selecting a subset either by approximating the pdf of the original dataset or by approximating an arbitrary, desired target pdf. The new strategy evaluates the estimated pdfs solely on the selected data points, resulting in a simple and efficient algorithm with low computational and memory demand. The performance of the new approach is investigated for two different scenarios. For representative subset selection of a dataset, the new approach is compared to a recently proposed, more complex method and shows comparable results. For the demonstration of the capability of matching a target pdf, a uniform distribution is chosen as an example. Here the new method is compared to strategies for space-filling design of experiments and shows convincing results.

Get full-text (via PubEx)