Give Chance a Chance: Modeling Density to Enhance Scatter Plot Quality through Random Data Sampling

The problem of visualizing huge amounts of data is well known in information visualization. Dealing with a large number of items forces almost any kind of Infovis technique to reveal its limits in terms of expressivity and scalability. In this paper we focus on 2D scatter plots, proposing a ‘feature preservation’ approach, based on the idea of modeling the visualization in a virtual space in order to analyze its features (e.g., absolute density, relative density, etc.). In this way we provide a formal framework to measure the visual overlapping, obtaining precise quality metrics about the visualization degradation and devising automatic sampling strategies able to improve the overall image quality. Metrics and algorithms have been improved through suitable user studies.

Download Full-text

Data Sampling Strategies for Click Fraud Detection Using Imbalanced User Click Data of Online Advertising: An Empirical Review

IETE Technical Review ◽

10.1080/02564602.2021.1915892 ◽

2021 ◽

pp. 1-10

Author(s):

Deepti Sisodia ◽

Dilip Singh Sisodia

Keyword(s):

Online Advertising ◽

Fraud Detection ◽

Sampling Strategies ◽

Data Sampling ◽

Click Fraud

Download Full-text

A Multivariate Empirical Orthogonal Function Method to Construct Nitrate Maps in the Southern Ocean

Journal of Atmospheric and Oceanic Technology ◽

10.1175/jtech-d-18-0018.1 ◽

2018 ◽

Vol 35 (7) ◽

pp. 1505-1519 ◽

Cited By ~ 4

Author(s):

Yu-Chiao Liang ◽

Matthew R. Mazloff ◽

Isabella Rosso ◽

Shih-Wei Fang ◽

Jin-Yi Yu

Keyword(s):

Southern Ocean ◽

General Circulation ◽

Model Simulation ◽

Potential Temperature ◽

Circulation Model ◽

Empirical Orthogonal Functions ◽

Cluster Method ◽

Sampling Strategies ◽

Data Sampling ◽

Mean Square Errors

AbstractThe ability to construct nitrate maps in the Southern Ocean (SO) from sparse observations is important for marine biogeochemistry research, as it offers a geographical estimate of biological productivity. The goal of this study is to infer the skill of constructed SO nitrate maps using varying data sampling strategies. The mapping method uses multivariate empirical orthogonal functions (MEOFs) constructed from nitrate, salinity, and potential temperature (N-S-T) fields from a biogeochemical general circulation model simulation Synthetic N-S-T datasets are created by sampling modeled N-S-T fields in specific regions, determined either by random selection or by selecting regions over a certain threshold of nitrate temporal variances. The first 500 MEOF modes, determined by their capability to reconstruct the original N-S-T fields, are projected onto these synthetic N-S-T data to construct time-varying nitrate maps. Normalized root-mean-square errors (NRMSEs) are calculated between the constructed nitrate maps and the original modeled fields for different sampling strategies. The sampling strategy according to nitrate variances is shown to yield maps with lower NRMSEs than mapping adopting random sampling. A k-means cluster method that considers the N-S-T combined variances to identify key regions to insert data is most effective in reducing the mapping errors. These findings are further quantified by a series of mapping error analyses that also address the significance of data sampling density. The results provide a sampling framework to prioritize the deployment of biogeochemical Argo floats for constructing nitrate maps.

Download Full-text

Rethinking low-temperature thermochronology data sampling strategies for quantification of denudation and relief histories: A case study in the French western Alps

Earth and Planetary Science Letters ◽

10.1016/j.epsl.2011.05.003 ◽

2011 ◽

Vol 307 (3-4) ◽

pp. 309-322 ◽

Cited By ~ 9

Author(s):

Pierre G. Valla ◽

Peter A. van der Beek ◽

Jean Braun

Keyword(s):

Low Temperature ◽

Western Alps ◽

Sampling Strategies ◽

Data Sampling

Download Full-text

Data sampling strategies for general circulation models

Quarterly Journal of the Royal Meteorological Society ◽

10.1002/qj.49711749807 ◽

1991 ◽

Vol 117 (498) ◽

pp. 385-397 ◽

Cited By ~ 7

Author(s):

John Thuburn

Keyword(s):

General Circulation ◽

General Circulation Models ◽

Sampling Strategies ◽

Data Sampling

Download Full-text

Quality Metrics for Information Visualization

Computer Graphics Forum ◽

10.1111/cgf.13446 ◽

2018 ◽

Vol 37 (3) ◽

pp. 625-662 ◽

Cited By ~ 29

Author(s):

M. Behrisch ◽

M. Blumenschein ◽

N. W. Kim ◽

L. Shao ◽

M. El-Assady ◽

...

Keyword(s):

Information Visualization ◽

Quality Metrics

Download Full-text

Statistical Modeling for Ceramic Analysis

The Oxford Handbook of Archaeological Ceramic Analysis ◽

10.1093/oxfordhb/9780199681532.013.5 ◽

2016 ◽

pp. 57-72

Author(s):

Gulsebnem Bishop

Keyword(s):

Complex Data ◽

Sampling Strategies ◽

Data Sampling ◽

Ceramic Analysis ◽

Chi Square ◽

Statistical Population ◽

Archaeological Data ◽

Principle Components Analysis ◽

Chi Square Test ◽

Components Analysis

Statistics can be used to describe, model, and predict archaeological data, provided that the analyst has an understanding of the strengths and limitations of their data type and has a well-defined statistical population. This chapter discusses the major types of archaeological data, sampling strategies, and statistics appropriate for both describing and predicting outcomes for simple and complex ceramic datasets. Description and modeling of complex data can be done with many tools ranging from simple charts and histograms to more complicated methods, such as T-Test, Chi-Square Test, Multi-Response Permutation Procedure (MRPP), and Kernel Density Estimation (KDE), as well as Principle Components Analysis (PCA).

Download Full-text

Analyzing Internet Forums

Journal of Media Psychology Theories Methods and Applications ◽

10.1027/1864-1105/a000062 ◽

2012 ◽

Vol 24 (2) ◽

pp. 55-66 ◽

Cited By ~ 53

Author(s):

Peter Holtz ◽

Nicole Kronberger ◽

Wolfgang Wagner

Keyword(s):

Discussion Board ◽

Computer Assisted ◽

Sampling Strategies ◽

Data Sampling ◽

Online Forums ◽

New Members ◽

Depth Analysis ◽

Internet Forums ◽

Social Scientific ◽

Matters Of Concern

Within Internet forums, members of certain (online) communities discuss matters of concern to the respective groups, with comparatively few social restraints. For radical, extremist, and other ideologically “sensitive” groups and organizations in particular, Internet forums are a very efficient and widely used tool to connect members, inform others about the group’s agenda, and attract new members. Whereas members of such groups may be reluctant to express their opinions in interviews or surveys, we argue that Internet forums can yield an abundance of useful “natural” discursive data for social scientific research. Based on two exemplary studies, we present a practical guide for the analysis of such data, including data-sampling strategies, the refinement of the data for computer-assisted qualitative and quantitative analysis, and strategies for in-depth analysis. The first study is an in-depth analysis of discourses within a German neo-Nazi discussion board. In the second, nine online forums for young German Muslims were analyzed and compared. Advantages and potential issues with analyzing Internet forums are discussed.

Download Full-text

Bi-Objective Continual Learning: Learning ‘New’ While Consolidating ‘Known’

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6060 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5989-5996 ◽

Cited By ~ 2

Author(s):

Xiaoyu Tao ◽

Xiaopeng Hong ◽

Xinyuan Chang ◽

Yihong Gong

Keyword(s):

Classification Performance ◽

Sampling Strategies ◽

Data Sampling ◽

Historical Knowledge ◽

Learning Framework ◽

Performance Accuracy ◽

Small Set ◽

Loss Term ◽

Computational Resources ◽

Continual Learning

In this paper, we propose a novel single-task continual learning framework named Bi-Objective Continual Learning (BOCL). BOCL aims at both consolidating historical knowledge and learning from new data. On one hand, we propose to preserve the old knowledge using a small set of pillars, and develop the pillar consolidation (PLC) loss to preserve the old knowledge and to alleviate the catastrophic forgetting problem. On the other hand, we develop the contrastive pillar (CPL) loss term to improve the classification performance, and examine several data sampling strategies for efficient onsite learning from ‘new’ with a reasonable amount of computational resources. Comprehensive experiments on CIFAR10/100, CORe50 and a subset of ImageNet validate the BOCL framework. We also reveal the performance accuracy of different sampling strategies when used to finetune a given CNN model. The code will be released.

Download Full-text

Information Visualization in the Educational Process: Current Trends

International Journal of Emerging Technologies in Learning (iJET) ◽

10.3991/ijet.v15i13.14671 ◽

2020 ◽

Vol 15 (13) ◽

pp. 49

Author(s):

Zhi-Jiang Liu ◽

Vera Levina ◽

Yuliya Frolova

Keyword(s):

Information Visualization ◽

Rapid Development ◽

Subjective Assessment ◽

Educational Process ◽

Virtual Space ◽

Control Group ◽

Learning Material ◽

Real Time Control ◽

Visualization System

The rapid development of computer visualization techniques as well as virtual and augmented reality has led to the possibility of perfect data visualization and the creation of a special virtual space for educating new generation. Simultaneously, the increase in the amount of data to be processed requires a proper selection and presentation of data for solving specific problems. Education sets such tasks as 1) improving the efficiency of presenting information and its assimilation by stu-dents, and 2) increasing the convenience and quality of the teachers’ work. The purpose of this study is to test an acceleration and improvement of the teacher’s response to students studying with more productive visualized learning material. Meanwhile, the created visualization system was based on minimizing the efforts and costs of its preparation and constant support. Only free cloud-based services and visualization tools were used. Students were given the opportunity to con-stantly, in real time control their learning process and create education markers with the help of perspicuous visual environment. To create a visualization system, already existing works on the implementation and verification of the system was used. The study was based on the results of applying this technology. A survey of 300 students from three universities in China, Russia and Kazakhstan was conducted. The control group consisted of 150 students from the same universi-ties who did not use visualization to master the same educational material. Ac-cording to the results of the study, students who used information visualization showed a sharp increase in the subjective assessment of the speed and quality of their learning (58.58% and 37.73%, respectively, of the total number of partici-pants gave a high rating, while in the control group – only12.25%). Further, the level of anxiety associated with an assimilation of new language material was significantly decreased (13.54% in the study group did not feel anxiety, while on-ly 7% – in the control group).

Download Full-text

Study of 2D Information Visualization Design in Virtual Space for Distance Learning

Transactions of Japan Society of Kansei Engineering ◽

10.5057/jjske.10.333 ◽

2011 ◽

Vol 10 (3) ◽

pp. 333-340

Author(s):

Sittapong SETTAPAT ◽

Haruka MAEDA ◽

Michiko OHKURA

Keyword(s):

Distance Learning ◽

Information Visualization ◽

Virtual Space ◽

Visualization Design

Download Full-text