scholarly journals Visual Data Science

2021 ◽  
Author(s):  
Johanna Schmidt

Organizations are collecting an increasing amount of data every day. To make use of this rich source of information, more and more employees have to deal with data analysis and data science. Exploring data, understanding its structure, and finding new insights, can be greatly supported by data visualization. Therefore, the increasing interest in data science and data analytics also leads to a growing interest in data visualization and exploratory data analysis. We will outline how existing data visualization techniques are already successfully employed in different data science workflow stages. In some cases, visualization is beneficial, while still future research will be needed for other categories. The vast amount of libraries and applications available for data visualization has fostered its usage in data science. We will highlight the differences among the libraries and applications currently available. Unfortunately, there is still a clear gap between visualization research developments over the past decades and the features provided by commonly used tools and data science applications. Although basic charting options are commonly available, more advanced visualization techniques have hardly been integrated as new features yet.

2021 ◽  
pp. 107-132
Author(s):  
Magy Seif El-Nasr ◽  
Truong Huy Nguyen Dinh ◽  
Alessandro Canossa ◽  
Anders Drachen

This chapter discusses the topic of how one can use visualization techniques to analyze game data. Specifically, the chapter delves into the development of heatmaps to analyze spatio-temporal data. The chapter also discusses spatio-temporal visualizations and state-action transition visualizations. We also discuss two visualization systems that we have developed within the GUII lab: Stratmapper and Glyph. We provide you with a link that allows you to explore the use of these visualizations with real game data. This chapter is written in collaboration with Riddhi Padte and Varun Sriram, based on their work in Dr. Seif El-Nasr’s game data science class at Northeastern University; Erica Kleinman, PhD student at University of California at Santa Cruz; and Andy Bryant, software engineer at GUII Lab. The chapter also includes labs where you get to experience the analysis of game data through visualization.


Sensors ◽  
2019 ◽  
Vol 19 (12) ◽  
pp. 2772 ◽  
Author(s):  
Aguinaldo Bezerra ◽  
Ivanovitch Silva ◽  
Luiz Affonso Guedes ◽  
Diego Silva ◽  
Gustavo Leitão ◽  
...  

Alarm and event logs are an immense but latent source of knowledge commonly undervalued in industry. Though, the current massive data-exchange, high efficiency and strong competitiveness landscape, boosted by Industry 4.0 and IIoT (Industrial Internet of Things) paradigms, does not accommodate such a data misuse and demands more incisive approaches when analyzing industrial data. Advances in Data Science and Big Data (or more precisely, Industrial Big Data) have been enabling novel approaches in data analysis which can be great allies in extracting hitherto hidden information from plant operation data. Coping with that, this work proposes the use of Exploratory Data Analysis (EDA) as a promising data-driven approach to pave industrial alarm and event analysis. This approach proved to be fully able to increase industrial perception by extracting insights and valuable information from real-world industrial data without making prior assumptions.


1990 ◽  
Vol 83 (2) ◽  
pp. 90-93
Author(s):  
Richard L. Scheaffer

Recent years have witnessed a strong movement away from what might be termed classical statistics to a more empirical, data-oriented approach to statistics, sometimes termed exploratory data analysis, or EDA. This movement has been active among professional statisticians for twenty or twenty-five years but has begun permeating the area of statistical education for nonstatisticians only in the past five to ten years. At this point, there seems to be little doubt that EDA approaches to applied statistics will gain support over classical approaches in the years to come. That is not to say that classical statistics will disappear. The two approaches begin with different assumptions and have different objectives, but both are important. These differences will be outlined in this article.


2021 ◽  
Author(s):  
Annalise Aleta LaPlume

Note: This pre-print includes the accepted version of a manuscript that will be published in Sage Research Methods: Doing Research Online. In this case study, I describe methodological insights from data analysis of an online adult lifespan dataset (over 100,000 completions, ages 15-100). The data were used to study cross-sectional age differences in cognitive performance. I cover the steps of data analysis for large-scale web-based data, namely data cleaning, analysis, and visualization techniques. In each step, I describe the unique challenges that face analysis of data collected online, and potential solutions to address them, by drawing on practical lessons and examples from this study. First, I address how to identify problematic recordings such as technical issues (incomplete data, multiple completions by the same person, etc.), unreliable self-reported demographic information (age), and cognitive task outliers (accuracy, response times). I propose rigorous data cleaning as an essential first step to ensure that analytical conclusions are reliable and unbiased. Next, I demonstrate data visualization techniques that are better suited to large online datasets than more conventional techniques (e.g., density plots or locally weighted scatterplot smoothing instead of dot-plots or linear regression). Lastly, I cover the limitations of significance testing in large online datasets, and the value of complementary approaches such as data visualization, effect size estimation, and use of parsimony criteria. I also discuss more sophisticated analysis options enabled by large online datasets, such as non-linear regression, model comparison and selection, data resampling, and addition of covariates.


1988 ◽  
Vol 15 (1) ◽  
pp. 45-64 ◽  
Author(s):  
Miklos A. Vasarhelyi ◽  
Da Hsien Bao ◽  
Joel Berk

Contemporary Accounting Research (CAR) has expanded substantially in scope over the past two decades. This paper provides an overview of these trends using both quantitative techniques from statistics and exploratory data analysis (EDA). Articles in CAR are classified into taxonomies and the literature tracked over 22 years. Analysis focuses on four taxonomies: foundation discipline, school of thought, research method and mode of reasoning. The paper first examines journals vis-a-vis article publication frequency and dominant taxonomies. Secondly, three assertions concerning the relative posture of the Journal of Accounting Research and the literature are examined. Next the context of the literature is examined through major taxonomies and a crosstabulation of research method vs school of thought. The last part of the analysis focuses on trends within the taxonomies in the 1963–1984 period.


2018 ◽  
Vol 64 (244) ◽  
pp. 208-226 ◽  
Author(s):  
ANDREW G. WILLIAMSON ◽  
IAN C. WILLIS ◽  
NEIL S. ARNOLD ◽  
ALISON F. BANWELL

ABSTRACTThe controls on rapid surface lake drainage on the Greenland ice sheet (GrIS) remain uncertain, making it challenging to incorporate lake drainage into models of GrIS hydrology, and so to determine the ice-dynamic impact of meltwater reaching the ice-sheet bed. Here, we first use a lake area and volume tracking algorithm to identify rapidly draining lakes within West Greenland during summer 2014. Second, we derive hydrological, morphological, glaciological and surface-mass-balance data for various factors that may influence rapid lake drainage. Third, these factors are used within Exploratory Data Analysis to examine existing hypotheses for rapid lake drainage. This involves testing for statistical differences between the rapidly and non-rapidly draining lake types, as well as examining associations between lake size and the potential controlling factors. This study shows that the two lake types are statistically indistinguishable for almost all factors investigated, except lake area. Thus, we are unable to recommend an empirically supported, deterministic alternative to the fracture area threshold parameter for modelling rapid lake drainage within existing surface-hydrology models of the GrIS. However, if improved remotely sensed datasets (e.g. ice-velocity maps, climate model outputs) were included in future research, it may be possible to detect the causes of rapid drainage.


Author(s):  
Sean Kross ◽  
Roger D Peng ◽  
Brian S Caffo ◽  
Ira Gooding ◽  
Jeffrey T Leek

Over the last three decades data has become ubiquitous and cheap. This transition has accelerated over the last five years and training in statistics, machine learning, and data analysis have struggled to keep up. In April 2014 we launched a program of nine courses, the Johns Hopkins Data Science Specialization, which has now had more than 4 million enrollments over the past three years. Here the program is described and compared to both standard and more recently developed data science curricula. We show that novel pedagogical and administrative decisions introduced in our program are now standard in online data science programs. The impact of the Data Science Specialization on data science education in the US is also discussed. Finally we conclude with some thoughts about the future of data science education in a data democratized world.


The past two years have seen a tremendous number of changes in the global AI landscape. There has been a stable balance with the US as the unquestioned leader in the global IT market for nearly the past 20 years and by extension the international AI industry as well, which has evolved from the data science and big data analysis sector to become the engine of the 4th industrial revolution, global economic growth, and social progress that it is today. However, when it comes to AI spending, the US is outgunned by China whose government is investing $150 billion to support its goal to become the undisputed global leader in the AI race by 2030. This chapter will offer a broad overview of the UK AI industry and share insights on its present state, near-future, and what can be done in order to optimise the industry's trajectory over the course of the next several years and to maximise the UK's potential to become a global AI leader by 2020. It is not intended to be an exhaustive study and instead demonstrates the forces at work and possible areas for future research.


Author(s):  
M. Govindarajan

This chapter focuses on introduction to the field of data science. Data science is the area of study which involves extracting insights from vast amounts of data by the use of various scientific methods, algorithms, and processes. The term data science has emerged because of the evolution of mathematical statistics, data analysis, and big data. Data science helps to discover hidden patterns from the raw data. It enables to translate a business problem into a research project and then translate it back into a practical solution. The purpose of this chapter is to provide emphasis on integration and synthesis of concepts, techniques, applications, and tools to deal with various facets of data science practice, including data collection and integration, exploratory data analysis, predictive modeling, descriptive modeling, data product creation, evaluation, and effective communication.


Sign in / Sign up

Export Citation Format

Share Document