Data Visualization and Statistical Literacy for Open and Big Data - Advances in Data Mining and Database Management
Latest Publications


TOTAL DOCUMENTS

12
(FIVE YEARS 0)

H-INDEX

1
(FIVE YEARS 0)

Published By IGI Global

9781522525127, 9781522525134

Author(s):  
Franck Cotton ◽  
Daniel Gillman

Linked Open Statistical Metadata (LOSM) is Linked Open Data (LOD) applied to statistical metadata. LOD is a model for identifying, structuring, interlinking, and querying data published directly on the web. It builds on the standards of the semantic web defined by the W3C. LOD uses the Resource Description Framework (RDF), a simple data model expressing content as predicates linking resources between them or with literal properties. The simplicity of the model makes it able to represent any data, including metadata. We define statistical data as data produced through some statistical process or intended for statistical analyses, and statistical metadata as metadata describing statistical data. LOSM promotes discovery and the meaning and structure of statistical data in an automated way. Consequently, it helps with understanding and interpreting data and preventing inadequate or flawed visualizations for statistical data. This enhances statistical literacy and efforts at visualizing statistics.


Author(s):  
Jacques Raubenheimer

Spreadsheets were arguably the first information calculation and analysis tools employed by microcomputer users, and today are arguably ubiquitously used for information calculation and analysis. The fluctuating fortunes of PC makers coincided with those of spreadsheet applications, although the last two decades have seen the dominance of Microsoft Excel in the spreadsheet market. This chapter plots the historical development of spreadsheets in general, and Excel in particular, highlighting how new features have allowed for new forms of data analysis in the spreadsheet environment. Microsoft has undoubtedly cast Excel as a tool for the analysis of big data through the addition and development of features aimed at reporting on data too large for a spreadsheet. This chapter discusses Excel's ability to handle these data by means of an applied example. Data visualization by means of charts and dashboards is discussed as a common strategy for dealing with large volumes of data.


Author(s):  
Busayasachee Puang-Ngern ◽  
Ayse A. Bilgin ◽  
Timothy J. Kyng

There is currently a shortage of graduates with the necessary skills for jobs in data analytics and “Big Data”. Recently many new university degrees have been created to address the skills gap, but they are mostly computer science based with little coverage of statistics. In this chapter, the perceptions of graduates and academics about the types of expertise and the types of software skills required for this field are documented based on two online surveys in Australia and New Zealand. The results showed that Statistical Analysis and Statistical Software Skills were the most necessary type of expertise required. Graduates in industry identified SQL as the most necessary software skill while academics teaching in relevant disciplines identified R programming as the most necessary software skill for Big Data analysis. The authors recommend multidisciplinary degrees where the appropriate combination of skills in statistics and computing can be provided for future graduates.


Author(s):  
Belinda A. Chiera ◽  
Malgorzata W. Korolkiewicz

Technological advances have led to increasingly more data becoming available, a phenomenon known as Big Data. The volume of Big Data is to the order of zettabytes, offering the promise of valuable insights with visualisation the key to unlocking these insights, however the size and variety of Big Data poses significant challenges. The fundamental principles behind tried-and-tested methods for visualising data are still as relevant as ever, although the emphasis necessarily shifts to why visualisation is being attempted. This chapter outlines the use of graph semiotics to build data visualisations for exploration and decision-making and the formulation of elementary, intermediate- and overall-level analytical questions. The public scanner database Dominick's Finer Foods, consisting of approximately 98 million observations, is used as a demonstrative case study. Common Big Data analytic tools (SAS, R and Python) are used to produce visualisations and exemplars of student work are presented, based on the outlined visualisation approach.


Author(s):  
Elaine M. Barclay

An understanding of crime data and analysis is central to any Criminology degree. Graduates need to know how and where to access a wide variety of secondary data sources, and understand how to read and critically evaluate crime statistics, crime maps, and quantitative research publications, and through assessment, know how to apply this learning to understanding crime rates within a community. This chapter reviews the various types of data and analysis that form a substantial part of content within a Bachelor of Criminology degree. Several types of assessment are described as examples of how to engage students in practical exercises to show them how data and analysis can provide fascinating insight into the social life of their own community.


Author(s):  
Antonino Virgillito ◽  
Federico Polidoro

Following the advent of Big Data, statistical offices have been largely exploring the use of Internet as data source for modernizing their data collection process. Particularly, prices are collected online in several statistical institutes through a technique known as web scraping. The objective of the chapter is to discuss the challenges of web scraping for setting up a continuous data collection process, exploring and classifying the more widespread techniques and presenting how they are used in practical cases. The main technical notions behind web scraping are presented and explained in order to give also to readers with no background in IT the sufficient elements to fully comprehend scraping techniques, promoting the building of mixed skills that is at the core of the spirit of modern data science. Challenges for official statistics deriving from the use of web scraping are briefly sketched. Finally, research ideas for overcoming the limitations of current techniques are presented and discussed.


Author(s):  
Thida Chaw Hlaing ◽  
Julian Prior

Statistical literacy presents many aspects about food security in the world. It highlights weaknesses, it creates awareness of threats in current situations, helps overcome challenges and creates opportunities for the future. Statistical data analysis enables existing food security interventions and programs to be reviewed and revised, and this better understanding of current situations enables more authoritative and relevant decision-making processes for the future. Statistical literacy involves skills and expertise in data description and interpretation (in words as well as in numbers) to name, explore and amend beliefs, opinions and suggestions. It helps decision-making processes about food security in a sub-nation, nation and region, as well as the world. This chapter will demonstrate the importance of open data and visualization, including its challenges and opportunities, in the food security context at national and global level to make decision-makers aware of the need to enhance their capacity for and investment in statistical literacy.


Author(s):  
Frederic Clarke ◽  
Chien-Hung Chien

This chapter outlines recent ABS research in applying data visualisation to the analysis of big data for official statistics. Examples are presented from the application of a prototype analytical platform created by the ABS to two significant big data use cases. This platform – the Graphically Linked Information Discovery Environment (GLIDE) – demonstrates a new approach to representing, integrating and exploring complex information from diverse sources. This chapter discusses the role of data visualisation in meeting the analytical challenges of big data and describes the entity-relationship network model and data visualisation features implemented in GLIDE, together with examples drawn from two recent projects. It concludes and outlines future directions.


Author(s):  
Jane Watson

This chapter focuses on statistical literacy and the practice of statistics from the perspective of middle school students and how their experiences can be enhanced by the availability of open data. The open data sets selected illustrate the types of contexts that are available and their connections to the Australian school curriculum. The importance of visualisation is stressed and the software TinkerPlots is the tool used for students to create representations and develop the understanding necessary to analyse data and draw conclusions. Building appreciation of the practice of statistics in this way further assists students to become critical thinkers in judging the claims of others later as statistically literate adults.


Author(s):  
Kees Zeelenberg ◽  
Barteld Braaksma

Big data come in high volume, high velocity and high variety. Their high volume may lead to better accuracy and more details, their high velocity may lead to more frequent and more timely statistical estimates, and their high variety may give opportunities for statistics in new areas. But there are also many challenges: there are uncontrolled changes in sources that threaten continuity and comparability, and data that refer only indirectly to phenomena of statistical interest. Furthermore, big data may be highly volatile and selective: the coverage of the population to which they refer, may change from day to day, leading to inexplicable jumps in time-series. And very often, the individual observations in these big data sets lack variables that allow them to be linked to other datasets or population frames. This severely limits the possibilities for correction of selectivity and volatility. In this chapter, we describe and discuss opportunities for big data in official statistics.


Sign in / Sign up

Export Citation Format

Share Document