Programming big data analysis: principles and solutions

AbstractIn the age of the Internet of Things and social media platforms, huge amounts of digital data are generated by and collected from many sources, including sensors, mobile devices, wearable trackers and security cameras. This data, commonly referred to as Big Data, is challenging current storage, processing, and analysis capabilities. New models, languages, systems and algorithms continue to be developed to effectively collect, store, analyze and learn from Big Data. Most of the recent surveys provide a global analysis of the tools that are used in the main phases of Big Data management (generation, acquisition, storage, querying and visualization of data). Differently, this work analyzes and reviews parallel and distributed paradigms, languages and systems used today to analyze and learn from Big Data on scalable computers. In particular, we provide an in-depth analysis of the properties of the main parallel programming paradigms (MapReduce, workflow, BSP, message passing, and SQL-like) and, through programming examples, we describe the most used systems for Big Data analysis (e.g., Hadoop, Spark, and Storm). Furthermore, we discuss and compare the different systems by highlighting the main features of each of them, their diffusion (community of developers and users) and the main advantages and disadvantages of using them to implement Big Data analysis applications. The final goal of this work is to help designers and developers in identifying and selecting the best/appropriate programming solution based on their skills, hardware availability, application domains and purposes, and also considering the support provided by the developer community.

Download Full-text

Carpooling: travelers’ perceptions from a big data analysis

The TQM Journal ◽

10.1108/tqm-11-2017-0156 ◽

2018 ◽

Vol 30 (5) ◽

pp. 554-571 ◽

Cited By ~ 10

Author(s):

Maria Vincenza Ciasullo ◽

Orlando Troisi ◽

Francesca Loia ◽

Gennaro Maione

Keyword(s):

Big Data ◽

Data Analysis ◽

Large Scale ◽

Big Data Analysis ◽

Web Crawler ◽

Content Type ◽

Advantages And Disadvantages ◽

Depth Analysis ◽

Research Findings ◽

One Year

Purpose The purpose of this paper is to provide a better understanding of the reasons why people use or do not use carpooling. A further aim is to collect and analyze empirical evidence concerning the advantages and disadvantages of carpooling. Design/methodology/approach A large-scale text analytics study has been conducted: the collection of the peoples’ opinions have been realized on Twitter by means of a dedicated web crawler, named “Twitter4J.” After their mining, the collected data have been treated through a sentiment analysis realized by means of “SentiWordNet.” Findings The big data analysis identified the 12 most frequently used concepts about carpooling by Twitter’s users: seven advantages (economic efficiency, environmental efficiency, comfort, traffic, socialization, reliability, curiosity) and five disadvantages (lack of effectiveness, lack of flexibility, lack of privacy, danger, lack of trust). Research limitations/implications Although the sample is particularly large (10 percent of the data flow published on Twitter from all over the world in about one year), the automated collection of people’s comments has prevented a more in-depth analysis of users’ thoughts and opinions. Practical implications The research findings may direct entrepreneurs, managers and policy makers to understand the variables to be leveraged and the actions to be taken to take advantage of the potential benefits that carpooling offers. Originality/value The work has utilized skills from three different areas, i.e., business management, computing science and statistics, which have been synergistically integrated for customizing, implementing and using two IT tools capable of automatically identifying, selecting, collecting, categorizing and analyzing people’s tweets about carpooling.

Download Full-text

Machine Vision and Big Data-Driven Sports Athletes Action Training Intervention Model

Scientific Programming ◽

10.1155/2021/9956710 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Hui Jiang ◽

Ping wang ◽

Lei Peng ◽

Xiaofeng Wang

Keyword(s):

Big Data ◽

Data Analysis ◽

Machine Vision ◽

Big Data Analysis ◽

Research Field ◽

Visual Methods ◽

Motion Sensors ◽

Convolutional Network ◽

Significant Information ◽

Depth Analysis

In recent years, athlete action recognition has become an important research field for showing and recognition of athlete actions. Generally speaking, movement recognition of athletes can be performed through a variety of modes, such as motion sensors, machine vision, and big data analysis. Among them, machine vision and big data analysis usually contain significant information which can be used for various purposes. Machine vision can be expressed as the recognition of the time sequence of a series of athlete actions captured through camera, so that it can intervene in the training of athletes by visual methods and approaches. Big data contains a large number of athletes’ historical training and competition data which need exploration. In-depth analysis and feature mining of big data will help coach teams to develop training plans and devise new suggestions. On the basis of the above observations, this paper proposes a novel spatiotemporal attention map convolutional network to identify athletes’ actions, and through the auxiliary analysis of big data, gives reasonable action intervention suggestions, and provides coaches and decision-making teams to formulate scientific training programs. Results of the study show the effectiveness of the proposed research.

Download Full-text

Significance of Digital Data Visualization Tools in Big Data Analysis for Business Decisions

International Journal of Computer Applications ◽

10.5120/ijca2017913858 ◽

2017 ◽

Vol 165 (5) ◽

pp. 15-18

Author(s):

Kirti Mahajan ◽

Leena Ajay

Keyword(s):

Big Data ◽

Data Analysis ◽

Data Visualization ◽

Big Data Analysis ◽

Digital Data ◽

Business Decisions ◽

Visualization Tools

Download Full-text

A Survey on Big Data in the Media and Entertainment Industry

ITEJ (Information Technology Engineering Journals) ◽

10.24235/itej.v4i2.50 ◽

2019 ◽

Vol 4 (2) ◽

pp. 75-88

Author(s):

Annisaa Nurhayati

Keyword(s):

Big Data ◽

Data Analysis ◽

Mobile Devices ◽

Data Streams ◽

Big Data Analysis ◽

The Internet ◽

Media Industry ◽

Data Formats ◽

The Media ◽

Many Sources

Big Data has affected all industries, including the media dan entertainment industries. The popularity of using mobile devices and the internet has changed the way people enjoy entertainment. This popularity also generates data streams from many sources with various data formats and large volumes, known as big data. Carrying out big data analysis can help the media industry and entertainment achieve its goals, like providing content that makes users happy, provides user experience, and increases profits. Many researchers have conducted research on the use of big data in the media and entertainment industries. The purpose of this paper is to provide an overview of the problems, challenges and various technologies related to Big Data in the media and entertainment industries.

Download Full-text

Broken data: Conceptualising data in an emerging world

Big Data & Society ◽

10.1177/2053951717753228 ◽

2018 ◽

Vol 5 (1) ◽

pp. 205395171775322 ◽

Cited By ~ 30

Author(s):

Sarah Pink ◽

Minna Ruckenstein ◽

Robert Willim ◽

Melisa Duque

Keyword(s):

Big Data ◽

Data Analysis ◽

Human Activity ◽

Data Cleaning ◽

Big Data Analysis ◽

Sound Art ◽

Digital Data ◽

Tracking Data ◽

Everyday Activity ◽

Music Production

In this article, we introduce and demonstrate the concept-metaphor of broken data. In doing so, we advance critical discussions of digital data by accounting for how data might be in processes of decay, making, repair, re-making and growth, which are inextricable from the ongoing forms of creativity that stem from everyday contingencies and improvisatory human activity. We build and demonstrate our argument through three examples drawn from mundane everyday activity: the incompleteness, inaccuracy and dispersed nature of personal self-tracking data; the data cleaning and repair processes of Big Data analysis and how data can turn into noise and vice versa when they are transduced into sound within practices of music production and sound art. This, we argue is a necessary step for considering the meaning and implications of data as it is increasingly mobilised in ways that impact society and our everyday worlds.

Download Full-text

An Empirical Study on Interactive Flipped Classroom Model Based on Digital Micro-Video Course by Big Data Analysis and Models

Mathematical Problems in Engineering ◽

10.1155/2021/8789355 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Na Tian ◽

Sang-Bing Tsai

Keyword(s):

Big Data ◽

Data Analysis ◽

Flipped Classroom ◽

Language Use ◽

Big Data Analysis ◽

Teaching Resources ◽

Depth Analysis ◽

Data Statistics ◽

Content Design ◽

Selection Of

This paper provides an in-depth analysis and study of the interactive flipped classroom model for a digital micro-video for a big data English course. To improve the learning efficiency of English courses and reduce the learning pressure of students, the thesis also uses certain techniques to apply audiovisual language to the production of specific micro-class videos, broadcast the successfully recorded micro-class courses to students, and then use the questionnaire to randomly distribute the designed audiovisual language use questionnaire. Micro-classes earnestly perform data statistics for students and finally conduct data analysis to summarize and verify the effects of micro-class audiovisual language use. The improved algorithm can effectively reduce the fluctuation of the consumption of various resources in the cluster and make the services in the cluster more stable. The new distributed interprocess communication based on protocol and serialization technology is more efficient than traditional communication based on protocol standards, reduces bandwidth consumption in the cluster, and improves the throughput of each node in the cluster. The content design and scripting of micro-video teaching resources are based on this. Then, the production process of micro-video teaching resources is explained, according to the selection of tools, the preparation, recording, editing, and generation of materials.

Download Full-text

Key Technologies of Media Data In-Depth Analysis System Based on Artificial Intelligence-Based Big Data

Mobile Information Systems ◽

10.1155/2021/7191567 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Yi Zheng

Keyword(s):

Artificial Intelligence ◽

Big Data ◽

Data Analysis ◽

User Behavior ◽

Big Data Analysis ◽

Combination Method ◽

Depth Analysis ◽

Analysis System ◽

Data Analysis System ◽

Media Data

At present, big data related technologies are developing rapidly, and major companies provide big data analysis services. However, the big data analysis system formed by the combination method cannot sense each other and lacks cooperation, resulting in a certain amount of waste of resources in the big data analysis system. In order to find the key technology of the data analysis system and conduct in-depth analysis of the media data, this paper proposes a scheduling algorithm based on artificial intelligence (AI) to implement task scheduling and logical data block migration. By analyzing the experimental results, we know that the performance of LAS (Logistic-Block Affinity Scheduler) is improved by 23.97%, 16.11%, and 10.56%, respectively, compared with the other three algorithms. Based on real new media data, this article analyzes the content of media data and user behavior in depth through big data analysis methods. Compared with other methods, the algorithm model in this paper optimizes the accuracy of hot topic extraction, which has important implications for media data mining. In addition, the analysis results of the emotional characteristics, audience characteristics, and hot topic communication characteristics obtained by the research also have practical value. This method improves the recall rate and F value by 5% and 4.7%, respectively, and the overall F value of emotional judgment is about 88.9%.

Download Full-text