Meeting Big Data challenges with visual analytics

Purpose – This paper aims to explore the role of records management in supporting the effective use of information visualisation and visual analytics (VA) to meet the challenges associated with the analysis of Big Data. Design/methodology/approach – This exploratory research entailed conducting and analysing interviews with a convenience sample of visual analysts and VA tool developers, affiliated with a major VA institute, to gain a deeper understanding of data-related issues that constrain or prevent effective visual analysis of large data sets or the use of VA tools, and analysing key emergent themes related to data challenges to map them to records management controls that may be used to address them. Findings – The authors identify key data-related issues that constrain or prevent effective visual analysis of large data sets or the use of VA tools, and identify records management controls that may be used to address these data-related issues. Originality/value – This paper discusses a relatively new field, VA, which has emerged in response to meeting the challenge of analysing big, open data. It contributes a small exploratory research study aimed at helping records professionals understand the data challenges faced by visual analysts and, by extension, data scientists for the analysis of large and heterogeneous data sets. It further aims to help records professionals identify how records management controls may be used to address data issues in the context of VA.

Download Full-text

Electronic Records Management - An Old Solution to a New Problem

Big Data ◽

10.4018/978-1-4666-9840-6.ch102 ◽

2016 ◽

pp. 2249-2274

Author(s):

Chinh Nguyen ◽

Rosemary Stockdale ◽

Helana Scheepers ◽

Jason Sargent

Keyword(s):

Big Data ◽

Rapid Development ◽

Large Data ◽

Large Data Sets ◽

Electronic Records ◽

Future Research ◽

Records Management ◽

Data Sets ◽

Interactive Nature ◽

Electronic Records Management

The rapid development of technology and interactive nature of Government 2.0 (Gov 2.0) is generating large data sets for Government, resulting in a struggle to control, manage, and extract the right information. Therefore, research into these large data sets (termed Big Data) has become necessary. Governments are now spending significant finances on storing and processing vast amounts of information because of the huge proliferation and complexity of Big Data and a lack of effective records management. On the other hand, there is a method called Electronic Records Management (ERM), for controlling and governing the important data of an organisation. This paper investigates the challenges identified from reviewing the literature for Gov 2.0, Big Data, and ERM in order to develop a better understanding of the application of ERM to Big Data to extract useable information in the context of Gov 2.0. The paper suggests that a key building block in providing useable information to stakeholders could potentially be ERM with its well established governance policies. A framework is constructed to illustrate how ERM can play a role in the context of Gov 2.0. Future research is necessary to address the specific constraints and expectations placed on governments in terms of data retention and use.

Download Full-text

Electronic Records Management - An Old Solution to a New Problem

International Journal of Electronic Government Research ◽

10.4018/ijegr.2014100105 ◽

2014 ◽

Vol 10 (4) ◽

pp. 94-116 ◽

Cited By ~ 1

Author(s):

Chinh Nguyen ◽

Rosemary Stockdale ◽

Helana Scheepers ◽

Jason Sargent

Keyword(s):

Big Data ◽

Rapid Development ◽

Large Data ◽

Large Data Sets ◽

Electronic Records ◽

Future Research ◽

Records Management ◽

Data Sets ◽

The Right ◽

Electronic Records Management

Download Full-text

Sensing Big Data: Multimodal Information Interfaces for Exploration of Large Data Sets

Big Data at Work ◽

10.4324/9781315780504-12 ◽

2015 ◽

pp. 172-192

Keyword(s):

Big Data ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Multimodal Information

Download Full-text

A Detailed Study on Classification Algorithms in Big Data

Big Data Analytics for Sustainable Computing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-9750-6.ch002 ◽

2020 ◽

pp. 30-46

Author(s):

Saranya N. ◽

Saravana Selvam

Keyword(s):

Big Data ◽

Random Forest ◽

Linear Regression ◽

Comprehensive Evaluation ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Classification Methods ◽

Computing Science ◽

Data Collections

After an era of managing data collection difficulties, these days the issue has turned into the problem of how to process these vast amounts of information. Scientists, as well as researchers, think that today, probably the most essential topic in computing science is Big Data. Big Data is used to clarify the huge volume of data that could exist in any structure. This makes it difficult for standard controlling approaches for mining the best possible data through such large data sets. Classification in Big Data is a procedure of summing up data sets dependent on various examples. There are distinctive classification frameworks which help us to classify data collections. A few methods that discussed in the chapter are Multi-Layer Perception Linear Regression, C4.5, CART, J48, SVM, ID3, Random Forest, and KNN. The target of this chapter is to provide a comprehensive evaluation of classification methods that are in effect commonly utilized.

Download Full-text

Uncertainty-Based Clustering Algorithms for Large Data Sets

Modern Technologies for Big Data Classification and Clustering - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-2805-0.ch001 ◽

2018 ◽

pp. 1-33 ◽

Cited By ~ 1

Author(s):

B. K. Tripathy ◽

Hari Seetha ◽

M. N. Murty

Keyword(s):

Big Data ◽

Data Clustering ◽

Clustering Algorithms ◽

Large Data ◽

Large Data Sets ◽

Mining Machine ◽

Data Sets ◽

Fuzzy C Means ◽

Intuitionistic Fuzzy ◽

New Algorithms

Data clustering plays a very important role in Data mining, machine learning and Image processing areas. As modern day databases have inherent uncertainties, many uncertainty-based data clustering algorithms have been developed in this direction. These algorithms are fuzzy c-means, rough c-means, intuitionistic fuzzy c-means and the means like rough fuzzy c-means, rough intuitionistic fuzzy c-means which base on hybrid models. Also, we find many variants of these algorithms which improve them in different directions like their Kernelised versions, possibilistic versions, and possibilistic Kernelised versions. However, all the above algorithms are not effective on big data for various reasons. So, researchers have been trying for the past few years to improve these algorithms in order they can be applied to cluster big data. The algorithms are relatively few in comparison to those for datasets of reasonable size. It is our aim in this chapter to present the uncertainty based clustering algorithms developed so far and proposes a few new algorithms which can be developed further.

Download Full-text

Applying the K-Means Algorithm in Big Raw Data Sets with Hadoop and MapReduce

Business Intelligence ◽

10.4018/978-1-4666-9562-7.ch062 ◽

2016 ◽

pp. 1220-1243

Author(s):

Ilias K. Savvas ◽

Georgia N. Sofianidou ◽

M-Tahar Kechadi

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

File System ◽

Large Data ◽

Large Data Sets ◽

Distributed File System ◽

Data Sets ◽

Raw Data ◽

Hadoop Distributed File System ◽

Access To Data

Big data refers to data sets whose size is beyond the capabilities of most current hardware and software technologies. The Apache Hadoop software library is a framework for distributed processing of large data sets, while HDFS is a distributed file system that provides high-throughput access to data-driven applications, and MapReduce is software framework for distributed computing of large data sets. Huge collections of raw data require fast and accurate mining processes in order to extract useful knowledge. One of the most popular techniques of data mining is the K-means clustering algorithm. In this study, the authors develop a distributed version of the K-means algorithm using the MapReduce framework on the Hadoop Distributed File System. The theoretical and experimental results of the technique prove its efficiency; thus, HDFS and MapReduce can apply to big data with very promising results.

Download Full-text

A distributed big data library extending Java 8

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.1.9476 ◽

2017 ◽

Vol 7 (1.1) ◽

pp. 237

Author(s):

MD. A R Quadri ◽

B. Sruthi ◽

A. D. SriRam ◽

B. Lavanya

Keyword(s):

Big Data ◽

Distributed Computing ◽

Programming Model ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Distributed Environment ◽

Multiple Systems ◽

Huge Data ◽

Distributed Streams

Java is one of the finest language for big data because of its write once and run anywhere nature. The new release of java 8 introduced few strategies like lambda expressions and streams which are helpful for parallel computing. Though these new strategies helps in extracting, sorting and filtering data from collections and arrays, still there are problems with it. Streams cannot properly process with the large data sets like big data. Also, there are few problems associated while executing in distributed environment. The new streams introduced in java are restricted to computations inside the single system there is no method for distributed computing over multiple systems. And streams store data in their memory and therefore cannot support huge data sets. Now, this paper cope with java 8 behalf of massive data and deed in distributed environment by providing extensions to the Programming model with distributed streams. The distributed computing of large data programming models may be consummated by introducing distributed stream frameworks.

Download Full-text

Biplot Analysis of Host-by-Pathogen Data

Plant Disease ◽

10.1094/pdis.2002.86.12.1396 ◽

2002 ◽

Vol 86 (12) ◽

pp. 1396-1401 ◽

Cited By ~ 28

Author(s):

Weikai Yan ◽

Duane E. Falk

Keyword(s):

Visual Analysis ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Net Blotch ◽

Host Genotype ◽

Visual Evaluation ◽

Biplot Analysis ◽

Value Decomposition ◽

Single Scatter

Effective breeding for disease resistance relies on a thorough understanding of host-by-pathogen relations. Achieving such understanding can be difficult and challenging, particularly for large data sets with complex host genotype-by-pathogen strain interactions. This paper presents a biplot approach that facilitates visual analysis of host-by-pathogen data. A biplot displays both host genotypes and pathogen isolates in a single scatter plot; each genotype or isolate is displayed as a point defined by its scores on the first two principal components derived from subjecting genotype- or strain-centered data to singular value decomposition. From a biplot, clusters of host genotypes and clusters of pathogen strains can be simultaneously visualized. Moreover, the basis for genotype and strain classifications, i.e., interactions between individual genotypes and strains, can be visualized at the same time. A biplot based on genotype-centered data and that based on strain-centered data are appropriate for visual evaluation of susceptibility/resistance of genotypes and virulence/avirulence of strains, respectively. Biplot analysis of genotype-by-strain is illustrated with published response scores of 13 barley line groups to 8 net blotch isolate groups.

Download Full-text

An Investigation Into the Efficacy of Deep Learning Tools for Big Data Analysis in Health Care

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2018070101 ◽

2018 ◽

Vol 10 (3) ◽

pp. 1-13 ◽

Cited By ~ 5

Author(s):

Rojalina Priyadarshini ◽

Rabindra K. Barik ◽

Chhabi Panigrahi ◽

Harishchandra Dubey ◽

Brojo Kishore Mishra

Keyword(s):

Big Data ◽

Deep Learning ◽

Large Data ◽

Optimization Techniques ◽

Large Data Sets ◽

Data Sets ◽

Learning Tools ◽

Healthcare Applications ◽

Proper Training ◽

Future Prediction

This article describes how machine learning (ML) algorithms are very useful for analysis of data and finding some meaningful information out of them, which could be used in various other applications. In the last few years, an explosive growth has been seen in the dimension and structure of data. There are several difficulties faced by conventional ML algorithms while dealing with such highly voluminous and unstructured big data. The modern ML tools are designed and used to deal with all sorts of complexities of data. Deep learning (DL) is one of the modern ML tools which are commonly used to find the hidden structure and cohesion among these large data sets by giving proper training in parallel platforms with intelligent optimization techniques to further analyze and interpret the data for future prediction and classification. This article focuses on the use of DL tools and software which are used in past couple of years in various areas and especially in the area of healthcare applications.

Download Full-text

Big Data in the Industry - Overview of Selected Issues

Management Systems in Production Engineering ◽

10.1515/mspe-2017-0036 ◽

2017 ◽

Vol 25 (4) ◽

pp. 251-254 ◽

Cited By ~ 2

Author(s):

Sylwia Gierej

Keyword(s):

Big Data ◽

Mass Production ◽

Data Analytics ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Manufacturing Companies ◽

Professional Literature ◽

Mass Data ◽

Definition Of

AbstractThis article reviews selected issues related to the use of Big Data in the industry. The aim is to define the potential scope and forms of using large data sets in manufacturing companies. By systematically reviewing scientific and professional literature, selected issues related to the use of mass data analytics in production were analyzed. A definition of Big Data was presented, detailing its main attributes. The importance of mass data processing technology in the development of Industry 4.0 concept has been highlighted. Subsequently, attention was paid to issues such as production process optimization, decision making and mass production individualisation, and indicated the potential for large volumes of data. As a result, conclusions were drawn regarding the potential of using Big Data in the industry.

Download Full-text