Erratum to: ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics

With the development of increasingly advanced information technology and electronic technology, especially with regard to physical information systems, cloud computing systems, and social services, big data will be widely visible, creating benefits for people and at the same time facing huge challenges. In addition, with the advent of the era of big data, the scale of data sets is getting larger and larger. Traditional data analysis methods can no longer solve the problem of large-scale data sets, and the hidden information behind big data is digging out, especially in the field of e-commerce. We have become a key factor in competition among enterprises. We use a support vector machine method based on parallel computing to analyze the data. First, the training samples are divided into several working subsets through the SOM self-organizing neural network classification method. Compared with the ever-increasing progress of information technology and electronic equipment, especially the related physical information system finally merges the training results of each working set, so as to quickly deal with the problem of massive data prediction and analysis. This paper proposes that big data has the flexibility of expansion and quality assessment system, so it is meaningful to replace the double-sidedness of quality assessment with big data. Finally, considering the excellent performance of parallel support vector machines in data mining and analysis, we apply this method to the big data analysis of e-commerce. The research results show that parallel support vector machines can solve the problem of processing large-scale data sets. The emergence of data dirty problems has increased the effective rate by at least 70%.

Download Full-text

Large-Scale Visualization on Exascale Hardware Technical Advances in the Era of Big Data Analysis and Visualization LDAV 2020 Panel.

10.2172/1830929 ◽

2020 ◽

Author(s):

Kenneth Moreland

Keyword(s):

Big Data ◽

Data Analysis ◽

Large Scale ◽

Big Data Analysis ◽

Technical Advances

Download Full-text

Big data theory based spectrum sensing algorithm for the satellite cognitive radio network

Wireless Networks ◽

10.1007/s11276-021-02808-7 ◽

2021 ◽

Author(s):

Mingchuan Yang ◽

Xinye Shao ◽

Guanchang Xue ◽

Bingyu Xie

Keyword(s):

Big Data ◽

Data Analysis ◽

Cognitive Radio ◽

Spectrum Sensing ◽

Large Scale ◽

Radio Spectrum ◽

Big Data Analysis ◽

New Approach ◽

Analysis Theory ◽

Sample Covariance

AbstractIn order to deal with the difficulty of spectrum sensing in cognitive satellite wireless networks, a large-scale cognitive network spectrum sensing algorithm based on big data analysis theory is studied, and a new algorithm using mean exponential eigenvalue is proposed. This new approach fully uses all the eigenvalues in sample covariance matrix of the sensing results to make the decision, which can effectively improve the detection performance without obtaining the prior information from licensed users. Through simulation, the performance of various large scale cognitive radio spectrum sensing algorithms based on big data analysis theory is compared, and the influence of satellite to ground channel conditions and the number of sensing nodes on the performance of the algorithm is discussed.

Download Full-text

Big-Data Analysis and Visualization as Research Methods for a Large-Scale Undergraduate Research Program at a Research University

Council on Undergraduate Research Quarterly ◽

10.18833/spur/2/47 ◽

2019 ◽

Vol 2 (4) ◽

pp. 14-22

Author(s):

Patrick J. Killion ◽

Ian B. Page ◽

Victoria Yu

Keyword(s):

Big Data ◽

Data Analysis ◽

Research Methods ◽

Research Program ◽

Large Scale ◽

Research University ◽

Undergraduate Research ◽

Big Data Analysis ◽

Undergraduate Research Program

Download Full-text

Big-Data Analysis and Visualization as Research Methods for a Large-Scale Undergraduate Research Program at a Research University

Scholarship and Practice of Undergraduate Research ◽

10.18833/spur/2/4/7 ◽

2019 ◽

Vol 2 (4) ◽

pp. 14-22

Author(s):

Patrick Killion ◽

Ian Page ◽

Victoria Yu

Keyword(s):

Big Data ◽

Data Analysis ◽

Research Methods ◽

Research Program ◽

Large Scale ◽

Research University ◽

Undergraduate Research ◽

Big Data Analysis ◽

Undergraduate Research Program

Download Full-text

Towards easily accessible interactive big-data analysis on supercomputers

10.5194/egusphere-egu2020-22618 ◽

2020 ◽

Author(s):

Katharina Höflich ◽

Martin Claus ◽

Willi Rath ◽

Dorian Krause ◽

Benedikt von St. Vieth ◽

...

Keyword(s):

Big Data ◽

Data Analysis ◽

High Performance ◽

Large Scale ◽

Big Data Analysis ◽

Ocean Science ◽

Working Environments ◽

Interactive Data Analysis ◽

Science Community ◽

Interactive Data

<p>Demand on high-end high performance computer (HPC) systems by the Earth system science community today encompasses not only the handling of complex simulations but also machine and deep learning as well as interactive data analysis workloads on large volumes of data. This poster addresses the infrastructure needs of large-scale interactive data analysis workloads on supercomputers. It lays out how to enable optimizations of existing infrastructure with respect to accessibility, usability and interactivity and aims at informing decision making about future systems. To enhance accessibility, options for distributed access, e.g. through JupyterHub, will be evaluated. To increase usability, the unification of working environments via the operation and the joint maintenance of containers will be explored. Containers serve as a portable base software setting for data analysis application stacks and allow for long-term usability of individual working environments and repeatability of scientific analysis. Aiming for interactive big-data analysis on HPC will also help the scientific community in utilizing increasingly heterogeneous supercomputers, since the modular data-analysis stack already contains solutions for seamless use of various architectures such as accelerators. However, to enable day-to-day interactive work on supercomputers, the inter-operation of workloads with quick turn-around times and highly variable resource demands needs to be understood and evaluated. To this end, scheduling policies on selected HPC systems are reviewed with respect to existing technical solutions such as job preemption, utilizing the resiliency features of parallel computing toolkits like Dask. Presented are preliminary results focussing on the aspects of usability and interactive use of HPC systems on the basis of typical use cases from the ocean science community.</p>

Download Full-text

A DIVISION OF LABOR: THE ROLE OF BIG DATA ANALYSIS IN THE REPERTOIRE OF INTERNET RESEARCH METHODS

AoIR Selected Papers of Internet Research ◽

10.5210/spir.v2018i0.10467 ◽

2020 ◽

Author(s):

Rasmus Helles ◽

Jacob Ørmen ◽

Klaus Bruhn Jensen ◽

Signe Sophus Lai ◽

Ericka Menchen-Trevino ◽

...

Keyword(s):

Social Media ◽

Big Data ◽

Data Analysis ◽

Research Methods ◽

Real World ◽

Large Scale ◽

Big Data Analysis ◽

Internet Research ◽

Internet Research Methods

In recent years, large-scale analysis of log data from digital devices - often termed ""big data analysis"" (Lazer, Kennedy, King, & Vespignani, 2014) - have taken hold in the field of internet research. Through Application Programming Interfaces (APIs) and commercial measurement, scholars have been able to analyze social media users (Freelon 2014) and web audiences (Taneja, 2016) on an uprecedented scale. And by developing digital research tools, scholars have been able to track individuals across websites (Menchen-Trevino, 2013) and mobile applications (Ørmen & Thorhauge 2015) in greater detail than ever before. Big data analysis holds unique potential for studying communication in depth and across many individuals (see e.g. Boase & Ling, 2013; Prior, 2013). At the same time, this approach introduces new methodological challenges in the transparency of data collection (Webster, 2014), sampling of participants and validity of conclusions (Rieder, Abdulla, Poell, Woltering, & Zack, 2015). Firstly, data aggregation is typically designed for commercial rather than academic purposes. The type of data included as well as how it is presented depend in large part on the business interests of measurement and advertisement companies (Webster, 2014). Secondly, when relying on this kind of secondary data it can be difficult to validate the output or techniques used to generate the data (Rieder, Abdulla, Poell, Woltering, & Zack, 2015). Thirdly, often the unit of analysis is media-centric, taking specific websites or social network pages as the empirical basis instead of individual users (Taneja, 2016). This makes it hard to untangle the behavior of real-world users from the aggregate trends. Lastly, variations in what users do might be so large that it is necessary to move from the aggregate to smaller groups of users to make meaningful inferences (Welles, 2014). Internet research is thus faced with a new research approach in big data analysis with potentials and perils that need to be discussed in combination with traditional approaches. This panel explores the role of big data analysis in relation to the wider repertoire of methods in internet research. The panel comprises four presentations that each sheds light on the complementarity of big data analysis with more traditional qualitative and quantitative methods. The first presentation opens the discussion with an overview of strategies for combining digital traces and commercial audience data with qualitative interviews and quantitative survey methods. The next presentation explores the potential of trace data to improve upon the experimental method. Researcher-collected data enables scholars to operate in a real-world setting, in contrast to a research lab, while obtaining informed consent from participants. The third presentation argues that large-scale audience data provide a unique perspective on internet use. By integrating census-level information about users with detailed traces of their behavior across websites, commercial audience data combines the strength of surveys and digital trace data respectively. Lastly, the fourth presentation shows how multi-institutional collaboration makes it possible do document social media activity (on Twitter) for a whole country (Australia) in a comprehensive manner. A feat not possible through other methods on a similar scale. Through these four presentations, the panel aims to situate big data analysis in the broader repertoire of internet research methods.

Download Full-text

Big Data Analysis with Hadoop on Personalized Incentive Model with Statistical Hotel Customer Data

International Journal of Software Innovation ◽

10.4018/ijsi.2016070101 ◽

2016 ◽

Vol 4 (3) ◽

pp. 1-21 ◽

Cited By ~ 4

Author(s):

Sungchul Lee ◽

Eunmin Hwang ◽

Ju-Yeon Jo ◽

Yoohwan Kim

Keyword(s):

Big Data ◽

Data Analysis ◽

Large Scale ◽

Big Data Analysis ◽

Data Sets ◽

New Approach ◽

Customer Data ◽

Customer Information ◽

Large Scale Data ◽

Incentive Model

Due to the advancement of Information Technology (IT), the hospitality industry is seeing a great value in gathering various kinds of and a large amount of customers' data. However, many hotels are facing a challenge in analyzing customer data and using it as an effective tool to understand the hospitality customers better and, ultimately, to increase the revenue. The authors' research attempts to resolve the current challenges of analyzing customer data in hospitality by utilizing the big data analysis tools, especially Hadoop and R. Hadoop is a framework for processing large-scale data. With the integration of new approach, their study demonstrates the ways of aggregating and analyzing the hospitality customer data to find meaningful customer information. Multiple decision trees are constructed from the customer data sets with the intention of classifying customers' needs and customers' clusters. By analyzing the customer data, the study suggests three strategies to increase the total expenditure of the customers within a limited amount of time during their stay.

Download Full-text

Carpooling: travelers’ perceptions from a big data analysis

The TQM Journal ◽

10.1108/tqm-11-2017-0156 ◽

2018 ◽

Vol 30 (5) ◽

pp. 554-571 ◽

Cited By ~ 10

Author(s):

Maria Vincenza Ciasullo ◽

Orlando Troisi ◽

Francesca Loia ◽

Gennaro Maione

Keyword(s):

Big Data ◽

Data Analysis ◽

Large Scale ◽

Big Data Analysis ◽

Web Crawler ◽

Content Type ◽

Advantages And Disadvantages ◽

Depth Analysis ◽

Research Findings ◽

One Year

Purpose The purpose of this paper is to provide a better understanding of the reasons why people use or do not use carpooling. A further aim is to collect and analyze empirical evidence concerning the advantages and disadvantages of carpooling. Design/methodology/approach A large-scale text analytics study has been conducted: the collection of the peoples’ opinions have been realized on Twitter by means of a dedicated web crawler, named “Twitter4J.” After their mining, the collected data have been treated through a sentiment analysis realized by means of “SentiWordNet.” Findings The big data analysis identified the 12 most frequently used concepts about carpooling by Twitter’s users: seven advantages (economic efficiency, environmental efficiency, comfort, traffic, socialization, reliability, curiosity) and five disadvantages (lack of effectiveness, lack of flexibility, lack of privacy, danger, lack of trust). Research limitations/implications Although the sample is particularly large (10 percent of the data flow published on Twitter from all over the world in about one year), the automated collection of people’s comments has prevented a more in-depth analysis of users’ thoughts and opinions. Practical implications The research findings may direct entrepreneurs, managers and policy makers to understand the variables to be leveraged and the actions to be taken to take advantage of the potential benefits that carpooling offers. Originality/value The work has utilized skills from three different areas, i.e., business management, computing science and statistics, which have been synergistically integrated for customizing, implementing and using two IT tools capable of automatically identifying, selecting, collecting, categorizing and analyzing people’s tweets about carpooling.

Download Full-text