How Bad Data Quality Can Turn a Simulation Into a Dissimulation that Shapes the Future

This paper presents the concepts, definitions, issues, and solutions that revolve around the adoption of capacity-driven Web services. Because of the intrinsic characteristics of these Web services compared to regular, mono-capacity Web services, they are examined in a different way and across four steps denoted by description, discovery, composition, and enactment. Implemented as operations to execute at run-time, the capacities that empower a Web service are selected with respect to requirements put on this Web service such as data quality and network bandwidth. In addition, this paper reports on first the experiments that were conducted to demonstrate the feasibility of capacity-driven Web services, and also the research opportunities that will be pursued in the future.

Download Full-text

Bots or inattentive humans? Identifying sources of low-quality data in online platforms

10.31234/osf.io/wr8ds ◽

2021 ◽

Author(s):

Aaron J Moss ◽

Cheskie Rosenzweig ◽

Shalom Noach Jaffe ◽

Richa Gautam ◽

Jonathan Robinson ◽

...

Keyword(s):

Social Sciences ◽

Data Collection ◽

Data Quality ◽

Quality Data ◽

Online Data ◽

The Social ◽

Online Data Collection ◽

The Right ◽

Quality Tools ◽

Bad Data

Online data collection has become indispensable to the social sciences, polling, marketing, and corporate research. However, in recent years, online data collection has been inundated with low quality data. Low quality data threatens the validity of online research and, at times, invalidates entire studies. It is often assumed that random, inconsistent, and fraudulent data in online surveys comes from ‘bots.’ But little is known about whether bad data is caused by bots or ill-intentioned or inattentive humans. We examined this issue on Mechanical Turk (MTurk), a popular online data collection platform. In the summer of 2018, researchers noticed a sharp increase in the number of data quality problems on MTurk, problems that were commonly attributed to bots. Despite this assumption, few studies have directly examined whether problematic data on MTurk are from bots or inattentive humans, even though identifying the source of bad data has important implications for creating the right solutions. Using CloudResearch’s data quality tools to identify problematic participants in 2018 and 2020, we provide evidence that much of the data quality problems on MTurk can be tied to fraudulent users from outside of the U.S. who pose as American workers. Hence, our evidence strongly suggests that the source of low quality data is real humans, not bots. We additionally present evidence that these fraudulent users are behind data quality problems on other platforms.

Download Full-text

Practical use of aggregator data quality metrics in a collection scenario

Biodiversity Information Science and Standards ◽

10.3897/biss.2.25970 ◽

2018 ◽

Vol 2 ◽

pp. e25970

Author(s):

Andrew Bentley

Keyword(s):

Data Quality ◽

User Experience ◽

Batch Processing ◽

Quality Metrics ◽

University Of Kansas ◽

Core Data ◽

Darwin Core ◽

Data Quality Metrics ◽

Bad Data

The recent incorporation of standardized data quality metrics into the GBIF, iDigBio, and ALA portal infrastructures enables data providers with useful information they can use to clean or augment Darwin Core data at the source based on these recommendations. Numerous taxonomic and geographic based metrics provide useful information on the quality of various Darwin Core fields in this realm, while also providing input on Darwin Core compliance for others. As a provider/data manager for the Biodiversity Institute, University of Kansas, and having spent some time evaluating their efficacy and reliability, this presentation will highlight some of the positive and negative aspects of my experience with specific examples while highlighting concerns regarding the user experience and standardization of these metrics across the aggregator landscape. These metrics have indicated both data and publishing issues that have increased the utility and cleanliness of our data while also highlighting batch processing challenges and issues with the process of inferring "bad" data. The integration of these metrics into source database infrastructure will also be postulated, with Specify Software as an example.

Download Full-text

Capacity-Driven Web Services

Theoretical and Analytical Service-Focused Systems Design and Development ◽

10.4018/978-1-4666-1767-4.ch021 ◽

2012 ◽

pp. 392-415

Author(s):

Samir Tata ◽

Zakaria Maamar ◽

Djamel Belaïd ◽

Khouloud Boukadi

Keyword(s):

Web Services ◽

Data Quality ◽

Web Service ◽

Research Opportunities ◽

Network Bandwidth ◽

The Future ◽

Run Time

This paper presents the concepts, definitions, issues, and solutions that revolve around the adoption of capacity-driven Web services. Because of the intrinsic characteristics of these Web services compared to regular, mono-capacity Web services, they are examined in a different way and across four steps denoted by description, discovery, composition, and enactment. Implemented as operations to execute at run-time, the capacities that empower a Web service are selected with respect to requirements put on this Web service such as data quality and network bandwidth. In addition, this paper reports on first the experiments that were conducted to demonstrate the feasibility of capacity-driven Web services, and also the research opportunities that will be pursued in the future.

Download Full-text

Text and Data Quality Mining in CRIS

Information ◽

10.3390/info10120374 ◽

2019 ◽

Vol 10 (12) ◽

pp. 374 ◽

Cited By ~ 2

Author(s):

Azeroual

Keyword(s):

Data Quality ◽

Research Information ◽

Essential Requirement ◽

High Data ◽

Scientific Institutions ◽

Research Information System ◽

The Individual ◽

Bad Data ◽

Text And Data Mining

To provide scientific institutions with comprehensive and well-maintained documentation of their research information in a current research information system (CRIS), they have the best prerequisites for the implementation of text and data mining (TDM) methods. Using TDM helps to better identify and eliminate errors, improve the process, develop the business, and make informed decisions. In addition, TDM increases understanding of the data and its context. This not only improves the quality of the data itself, but also the institution’s handling of the data and consequently the analyses. This present paper deploys TDM in CRIS to analyze, quantify, and correct the unstructured data and its quality issues. Bad data leads to increased costs or wrong decisions. Ensuring high data quality is an essential requirement when creating a CRIS project. User acceptance in a CRIS depends, among other things, on data quality. Not only is the objective data quality the decisive criterion, but also the subjective quality that the individual user assigns to the data.

Download Full-text

In-patient Child Psychiatry; Modern Practice, Research and the Future. Edited by Jonathan Green and Brian Jacobs. Jessica Kingsley, London, 1998. pp. 230. £15.95 (pb).

Journal of Child Psychology and Psychiatry ◽

10.1017/s0021963099224171 ◽

1999 ◽

Vol 40 (7) ◽

pp. 1141-1142

Author(s):

Gillian C. Forrest

Keyword(s):

Child Psychiatry ◽

Practice Research ◽

The Future

Download Full-text

International determination of the total motion of the pole

Symposium - International Astronomical Union ◽

10.1017/s0074180900177824 ◽

1961 ◽

Vol 13 ◽

pp. 29-41

Author(s):

Wm. Markowitz

Keyword(s):

Secular Motion ◽

The Future

A symposium on the future of the International Latitude Service (I. L. S.) is to be held in Helsinki in July 1960. My report for the symposium consists of two parts. Part I, denoded (Mk I) was published [1] earlier in 1960 under the title “Latitude and Longitude, and the Secular Motion of the Pole”. Part II is the present paper, denoded (Mk II).

Download Full-text

Status of the Lick Proper Motion Program

International Astronomical Union Colloquium ◽

10.1017/s0252921100074339 ◽

1978 ◽

Vol 48 ◽

pp. 387-388

Author(s):

A. R. Klemola

Keyword(s):

Proper Motion ◽

The Future

Second-epoch photographs have now been obtained for nearly 850 of the 1246 fields of the proper motion program with centers at declination -20° and northwards. For the sky at 0° and northward only 130 fields remain to be taken in the next year or two. The 270 southern fields with centers at -5° to -20° remain for the future.

Download Full-text

Some Structural Features of Metaphase Chromosomes of Mice and Men

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100060969 ◽

1967 ◽

Vol 25 ◽

pp. 190-191

Author(s):

Godfrey C. Hoskins ◽

Betty B. Hoskins

Keyword(s):

Electron Microscopy ◽

Electron Micrographs ◽

Structural Features ◽

Metaphase Chromosomes ◽

The Future ◽

Of Mice And Men ◽

Mouse Cells ◽

Human And Mouse

Metaphase chromosomes from human and mouse cells in vitro are isolated by micrurgy, fixed, and placed on grids for electron microscopy. Interpretations of electron micrographs by current methods indicate the following structural features.Chromosomal spindle fibrils about 200Å thick form fascicles about 600Å thick, wrapped by dense spiraling fibrils (DSF) less than 100Å thick as they near the kinomere. Such a fascicle joins the future daughter kinomere of each metaphase chromatid with those of adjacent non-homologous chromatids to either side. Thus, four fascicles (SF, 1-4) attach to each metaphase kinomere (K). It is thought that fascicles extend from the kinomere poleward, fray out to let chromosomal fibrils act as traction fibrils against polar fibrils, then regroup to join the adjacent kinomere.

Download Full-text