Using visual analytics to make sense of railway Close Calls

In the big data era, large and complex data sets will exceed scientists’ capacity to make sense of them in the traditional way. New approaches in data analysis, supported by computer science, will be necessary to address the problems that emerge with the rise of big data. The analysis of the Close Call database, which is a text-based database for near-miss reporting on the GB railways, provides a test case. The traditional analysis of Close Calls is time consuming and prone to differences in interpretation. This paper investigates the use of visual analytics techniques, based on network text analysis, to conduct data analysis and extract safety knowledge from 500 randomly selected Close Call records relating to worker slips, trips and falls. The results demonstrate a straightforward, yet effective, way to identify hazardous conditions without having to read each report individually. This opens up new ways to perform data analysis in safety science.

Download Full-text

Using Hadoop Technology to Overcome Big Data Problems by Choosing Proposed Cost-efficient Scheduler Algorithm for Heterogeneous Hadoop System (BD3)

Journal of Scientific Research and Reports ◽

10.9734/jsrr/2020/v26i930310 ◽

2020 ◽

pp. 58-84

Author(s):

Abou_el_ela Abdou Hussein

Keyword(s):

Big Data ◽

Data Processing ◽

Data Storage ◽

Database Management System ◽

Data Sets ◽

Complex Data ◽

Daily Data ◽

Complex Data Sets ◽

Cost Efficient ◽

Hadoop System

Day by day advanced web technologies have led to tremendous growth amount of daily data generated volumes. This mountain of huge and spread data sets leads to phenomenon that called big data which is a collection of massive, heterogeneous, unstructured, enormous and complex data sets. Big Data life cycle could be represented as, Collecting (capture), storing, distribute, manipulating, interpreting, analyzing, investigate and visualizing big data. Traditional techniques as Relational Database Management System (RDBMS) couldn’t handle big data because it has its own limitations, so Advancement in computing architecture is required to handle both the data storage requisites and the weighty processing needed to analyze huge volumes and variety of data economically. There are many technologies manipulating a big data, one of them is hadoop. Hadoop could be understand as an open source spread data processing that is one of the prominent and well known solutions to overcome handling big data problem. Apache Hadoop was based on Google File System and Map Reduce programming paradigm. Through this paper we dived to search for all big data characteristics starting from first three V's that have been extended during time through researches to be more than fifty six V's and making comparisons between researchers to reach to best representation and the precise clarification of all big data V’s characteristics. We highlight the challenges that face big data processing and how to overcome these challenges using Hadoop and its use in processing big data sets as a solution for resolving various problems in a distributed cloud based environment. This paper mainly focuses on different components of hadoop like Hive, Pig, and Hbase, etc. Also we institutes absolute description of Hadoop Pros and cons and improvements to face hadoop problems by choosing proposed Cost-efficient Scheduler Algorithm for heterogeneous Hadoop system.

Download Full-text

Anomaly Detection for Inferring Social Structure

Social Computing ◽

10.4018/978-1-60566-984-7.ch118 ◽

2010 ◽

pp. 1797-1803

Author(s):

Lisa Friedland

Keyword(s):

Data Analysis ◽

Anomaly Detection ◽

Social Structure ◽

Small Groups ◽

Analysis Data ◽

Data Sets ◽

Complex Data ◽

Detection Approach ◽

Complex Data Sets ◽

Data Points

In traditional data analysis, data points lie in a Cartesian space, and an analyst asks certain questions: (1) What distribution can I fit to the data? (2) Which points are outliers? (3) Are there distinct clusters or substructure? Today, data mining treats richer and richer types of data. Social networks encode information about people and their communities; relational data sets incorporate multiple types of entities and links; and temporal information describes the dynamics of these systems. With such semantically complex data sets, a greater variety of patterns can be described and views constructed of the data. This article describes a specific social structure that may be present in such data sources and presents a framework for detecting it. The goal is to identify tribes, or small groups of individuals that intentionally coordinate their behavior—individuals with enough in common that they are unlikely to be acting independently. While this task can only be conceived of in a domain of interacting entities, the solution techniques return to the traditional data analysis questions. In order to find hidden structure (3), we use an anomaly detection approach: develop a model to describe the data (1), then identify outliers (2).

Download Full-text

Introduction to Big Data and Business Analytics

10.4018/978-1-6684-3662-2.ch004 ◽

2022 ◽

pp. 67-76

Author(s):

Dineshkumar Bhagwandas Vaghela

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Data Sets ◽

Complex Data ◽

Business Analytics ◽

Database Applications ◽

Complex Data Sets ◽

History Of ◽

Rapid Generation

The term big data has come due to rapid generation of data in various organizations. In big data, the big is the buzzword. Here the data are so large and complex that the traditional database applications are not able to process (i.e., they are inadequate to deal with such volume of data). Usually the big data are described by 5Vs (volume, velocity, variety, variability, veracity). The big data can be structured, semi-structured, or unstructured. Big data analytics is the process to uncover hidden patterns, unknown correlations, predict the future values from large and complex data sets. In this chapter, the following topics will be covered more in detail. History of big data and business analytics, big data analytics technologies and tools, and big data analytics uses and challenges.

Download Full-text

Application of Multivariate-Rank-Based Techniques in Clustering of Big Data

Vikalpa The Journal for Decision Makers ◽

10.1177/0256090918804385 ◽

2018 ◽

Vol 43 (4) ◽

pp. 179-190

Author(s):

Pritha Guha

Keyword(s):

Big Data ◽

Dna Analysis ◽

Retail Banking ◽

Data Sets ◽

Complex Data ◽

Data Handling ◽

Credit Risk Management ◽

Statistical Tools ◽

Executive Summary ◽

Complex Data Sets

Executive Summary Very large or complex data sets, which are difficult to process or analyse using traditional data handling techniques, are usually referred to as big data. The idea of big data is characterized by the three ‘v’s which are volume, velocity, and variety ( Liu, McGree, Ge, & Xie, 2015 ) referring respectively to the volume of data, the velocity at which the data are processed and the wide varieties in which big data are available. Every single day, different sectors such as credit risk management, healthcare, media, retail, retail banking, climate prediction, DNA analysis and, sports generate petabytes of data (1 petabyte = 250 bytes). Even basic handling of big data, therefore, poses significant challenges, one of them being organizing the data in such a way that it can give better insights into analysing and decision-making. With the explosion of data in our life, it has become very important to use statistical tools to analyse them.

Download Full-text

LIVE: A Work-Centered Approach to Support Visual Analytics of Multi-Dimensional Engineering Design Data With Interactive Visualization and Data-Mining

Volume 5: 37th Design Automation Conference, Parts A and B ◽

10.1115/detc2011-48333 ◽

2011 ◽

Cited By ~ 1

Author(s):

Xin Yan ◽

Mu Qiao ◽

Timothy W. Simpson ◽

Jia Li ◽

Xiaolong Luke Zhang

Keyword(s):

Data Analysis ◽

Engineering Design ◽

Visual Analytics ◽

Interactive Visualization ◽

User Interaction ◽

Data Sets ◽

Preliminary Evaluation ◽

Complex Data ◽

Design Data ◽

Depth Analysis

During the process of trade space exploration, information overload has become a notable problem. To find the best design, designers need more efficient tools to analyze the data, explore possible hidden patterns, and identify preferable solutions. When dealing with large-scale, multi-dimensional, continuous data sets (e.g., design alternatives and potential solutions), designers can be easily overwhelmed by the volume and complexity of the data. Traditional information visualization tools have some limits to support the analysis and knowledge exploration of such data, largely because they usually emphasize the visual presentation of and user interaction with data sets, and lack the capacity to identify hidden data patterns that are critical to in-depth analysis. There is a need for the integration of user-centered visualization designs and data-oriented data analysis algorithms in support of complex data analysis. In this paper, we present a work-centered approach to support visual analytics of multi-dimensional engineering design data by combining visualization, user interaction, and computational algorithms. We describe a system, Learning-based Interactive Visualization for Engineering design (LIVE), that allows designer to interactively examine large design input data and performance output data analysis simultaneously through visualization. We expect that our approach can help designers analyze complex design data more efficiently and effectively. We report our preliminary evaluation on the use of our system in analyzing a design problem related to aircraft wing sizing.

Download Full-text

Facies classification using machine learning

The Leading Edge ◽

10.1190/tle35100906.1 ◽

2016 ◽

Vol 35 (10) ◽

pp. 906-909 ◽

Cited By ~ 47

Author(s):

Brendon Hall

Keyword(s):

Machine Learning ◽

Big Data ◽

Open Source ◽

Data Sets ◽

Complex Data ◽

Facies Classification ◽

Large Complex ◽

Complex Data Sets ◽

Software Platforms ◽

Tools And Techniques

There has been much excitement recently about big data and the dire need for data scientists who possess the ability to extract meaning from it. Geoscientists, meanwhile, have been doing science with voluminous data for years, without needing to brag about how big it is. But now that large, complex data sets are widely available, there has been a proliferation of tools and techniques for analyzing them. Many free and open-source packages now exist that provide powerful additions to the geoscientist's toolbox, much of which used to be only available in proprietary (and expensive) software platforms.

Download Full-text

Use of Big-Data Analytics with the Interactive Advertisement for Product/Service Representation towards its Customers

INTERNATIONAL JOURNAL OF ADVANCED PRODUCTION AND INDUSTRIAL ENGINEERING ◽

10.35121/ijapie201904235 ◽

2019 ◽

Vol 4 (2) ◽

Author(s):

HarshmitKaur Saluja ◽

Vinod Kumar Yadav ◽

K.M. Mohapatra

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Service Performance ◽

Data Sets ◽

Complex Data ◽

Product Service ◽

Literature Reviews ◽

Complex Data Sets ◽

The One

On the one hand, big-data analytics has brought revolution in the predictive modeler by enabling the complex data sets getting structured. On the other hand, the interactive advertisement has changed the complete scenario of the advertising sector by making advertisements content structured in such a way that it is customer-centric. The paper helps to widen the view to explore the growing urge of customization technique in advertising sector with interactive enablers. The paper further examines that how interactive advertisement and big-data has helped to represent product/service from the view of a customer and also improved the product/service performance. In order of study, exhaustive literature reviews resulting in three hypothesis are developed to take on the above-mentioned concerns.

Download Full-text

Comparison of Multivariate Data Analysis Strategies for High-Content Screening

CrossRef Listing of Deleted DOIs ◽

10.1177/1087057110395390 ◽

2011 ◽

Vol 16 (3) ◽

pp. 338-347 ◽

Cited By ~ 19

Author(s):

Anne Kümmel ◽

Paul Selzer ◽

Martin Beibel ◽

Hanspeter Gubler ◽

Christian N. Parker ◽

...

Keyword(s):

Data Analysis ◽

High Content Screening ◽

Data Sets ◽

Complex Data ◽

Data Set ◽

Processing Strategies ◽

Analysis Strategies ◽

Complex Data Sets ◽

High Degree ◽

Cell Data

High-content screening (HCS) is increasingly used in biomedical research generating multivariate, single-cell data sets. Before scoring a treatment, the complex data sets are processed (e.g., normalized, reduced to a lower dimensionality) to help extract valuable information. However, there has been no published comparison of the performance of these methods. This study comparatively evaluates unbiased approaches to reduce dimensionality as well as to summarize cell populations. To evaluate these different data-processing strategies, the prediction accuracies and the Z′ factors of control compounds of a HCS cell cycle data set were monitored. As expected, dimension reduction led to a lower degree of discrimination between control samples. A high degree of classification accuracy was achieved when the cell population was summarized on well level using percentile values. As a conclusion, the generic data analysis pipeline described here enables a systematic review of alternative strategies to analyze multiparametric results from biological systems.

Download Full-text

Incorporating Computer Programming & Data Science into a Guided Inquiry-Based Undergraduate Ecology Lab

The American Biology Teacher ◽

10.1525/abt.2019.81.9.649 ◽

2019 ◽

Vol 81 (9) ◽

pp. 649-657 ◽

Cited By ~ 1

Author(s):

Jennifer Rahn ◽

Dana Willner ◽

James Deverick ◽

Peter Kemper ◽

Margaret Saha

Keyword(s):

Biological Sciences ◽

Data Analysis ◽

Computer Science ◽

Data Science ◽

Computer Programming ◽

Biological Diversity ◽

Data Sets ◽

Complex Data ◽

Complex Data Sets ◽

Basic Ideas

The biological sciences are becoming increasingly reliant on computer science and associated technologies to quickly and efficiently analyze and interpret complex data sets. Introducing students to data analysis techniques is a critical part of their development as well-rounded, scientifically literate citizens. As part of a collaborative effort between the Biology and Computer Science departments at William & Mary, we sought to develop laboratory exercises that would introduce basic ideas of data analysis while also exposing students to Python, a commonly used computer programming language. We accomplished this by developing exercises within the interactive Jupyter Notebook platform, an open-source application that allows Python code to be written and executed as discrete blocks in real time. Students used the developed Jupyter Notebook to analyze data collected as part of a multiweek ecology field experiment aimed at determining the effect of white-tailed deer on aspects of biological diversity. These inquiry-based laboratory exercises generated scientifically relevant data and gave students a chance to experience and participate in ongoing scientific research while demonstrating the utility of computer science in the scientific process.

Download Full-text