Open Source Software (OSS) for Big Data

Research Anthology on Usage and Development of Open Source Software ◽

10.4018/978-1-7998-9158-1.ch042 ◽

2021 ◽

pp. 858-875

Author(s):

Richard S. Segall

Keyword(s):

Big Data ◽

Data Processing ◽

Open Source ◽

Open Source Software ◽

Statistical Software ◽

Big Data Processing ◽

Open Source Data ◽

Source Data ◽

And Storage ◽

Aggregation Analysis

This chapter discusses Open Source Software and associated technologies for the processing of Big Data. This includes discussions of Hadoop-related projects, the current top open source data tools and frameworks such as SMACK that is acronym for open source technologies Spark, Mesos, Akka, Cassandra, and Kafka that together compose the ingestion, aggregation, analysis, and storage layers for Big Data processing. Tabular summaries and categories for 38 Open Source Statistical Software (OSSS) are provided that include for each listing of features and URLs for free downloads. The current challenges of Big Data and Open Source Software are also discussed.

Download Full-text

Chinese Open Source Data Collection, Big Data, And Private Enterprise Work For State Intelligence and Security: The Case of Shenzhen Zhenhua

SSRN Electronic Journal ◽

10.2139/ssrn.3691999 ◽

2020 ◽

Author(s):

Christopher Balding

Keyword(s):

Big Data ◽

Data Collection ◽

Open Source ◽

Private Enterprise ◽

Open Source Data ◽

Source Data

Download Full-text

Using Open-Source Data in Correlative Species Distribution Modeling of Marine Species

The American Biology Teacher ◽

10.1525/abt.2018.80.6.457 ◽

2018 ◽

Vol 80 (6) ◽

pp. 457-461

Author(s):

Carlos A. Morales-Ramirez ◽

Pearlyn Y. Pang

Keyword(s):

Open Source ◽

Species Distribution ◽

Open Source Software ◽

Species Distribution Modeling ◽

Science Research ◽

Marine Species ◽

Distribution Modeling ◽

Open Source Data ◽

Rapid Changes ◽

Source Data

Open-source data are information provided free online. It is gaining popularity in science research, especially for modeling species distribution. MaxEnt is an open-source software that models using presence-only data and environmental variables. These variables can also be found online and are generally free. Using all of these open-source data and tools makes species distribution modeling (SDM) more accessible. With the rapid changes our planet is undergoing, SDM helps understand future habitat suitability for species. Due to increasing interest in biogeographic research, SDM has increased for marine species, which were previously not commonly found in this modeling. Here we provide examples of where to obtain the data and how the modeling can be performed and taught.

Download Full-text

NoSQL Databases

Advances in Data Mining and Database Management - Handbook of Research on Cloud Infrastructures for Big Data Analytics ◽

10.4018/978-1-4666-5864-6.ch008 ◽

2014 ◽

pp. 186-215 ◽

Cited By ~ 2

Author(s):

Ganesh Chandra Deka

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Open Source ◽

Data Storage ◽

Big Data Processing ◽

Nosql Databases ◽

Data Intensive ◽

Huge Data ◽

Data Intensive Applications

NoSQL databases are designed to meet the huge data storage requirements of cloud computing and big data processing. NoSQL databases have lots of advanced features in addition to the conventional RDBMS features. Hence, the “NoSQL” databases are popularly known as “Not only SQL” databases. A variety of NoSQL databases having different features to deal with exponentially growing data-intensive applications are available with open source and proprietary option. This chapter discusses some of the popular NoSQL databases and their features on the light of CAP theorem.

Download Full-text

A Consistent Approach to Building Secure Big Data Processing and Storage Systems

Automatic Control and Computer Sciences ◽

10.3103/s0146411619080273 ◽

2019 ◽

Vol 53 (8) ◽

pp. 914-921

Author(s):

M. A. Poltavtseva

Keyword(s):

Big Data ◽

Data Processing ◽

Storage Systems ◽

Big Data Processing ◽

Processing And Storage ◽

And Storage ◽

Consistent Approach

Download Full-text

IR-TEx: An Open Source Data Integration Tool for Big Data Transcriptomics Designed for the Malaria Vector Anopheles gambiae

Journal of Visualized Experiments ◽

10.3791/60721 ◽

2020 ◽

Author(s):

Victoria A. Ingham ◽

Andrew Bennett ◽

Duo Peng ◽

Simon C. Wagstaff ◽

Hilary Ranson

Keyword(s):

Big Data ◽

Data Integration ◽

Anopheles Gambiae ◽

Open Source ◽

Malaria Vector ◽

Open Source Data ◽

Source Data

Download Full-text

A Performance-Improved and Storage-Efficient Secondary Index for Big Data Processing

2017 IEEE International Conference on Smart Cloud (SmartCloud) ◽

10.1109/smartcloud.2017.32 ◽

2017 ◽

Cited By ~ 1

Author(s):

Han Wu ◽

Yongxin Zhu ◽

Chang Wang ◽

Junjie Hou ◽

Mengjun Li ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

Big Data Processing ◽

Secondary Index ◽

And Storage ◽

A Performance

Download Full-text

Comparative Study of Open Source Data Mining Software for Big Data

Journal of Computer & Information Technology ◽

10.22147/jucit/070301 ◽

2016 ◽

Vol 07 (03) ◽

pp. 31-33

Author(s):

ATIF AZIZ ◽

◽

RAJEEV ARYA ◽

SANA SHAFIQUE ◽

◽

...

Keyword(s):

Data Mining ◽

Big Data ◽

Comparative Study ◽

Open Source ◽

Open Source Data ◽

Source Data

Download Full-text

Big data processing using Open Source Software- A Questionnaire on the data science

Scholedge International Journal of Multidisciplinary & Allied Studies ISSN 2394-336X ◽

10.19085/journal.sijmas030101 ◽

2016 ◽

Vol 3 (1) ◽

pp. 1

Author(s):

Andrew McCullum

Keyword(s):

Big Data ◽

Data Processing ◽

World Trade Organization ◽

Central Asia ◽

Open Source ◽

Open Source Software ◽

World Trade ◽

Data Science ◽

Customs Union ◽

The World

In 2015, Central Asia made some vital enhancements in nature for cross-fringe e-business: Kazakhstan's promotion to the World Trade Organization (WTO) will help business straightforwardness, while the Kyrgyz Republic's enrollment in the Eurasian Customs Union grows its buyer base. Why e-business? Two reasons to begin with, e-trade diminishes the expense of separation. Focal Asia is the most elevated exchange cost locale on the planet: unlimited separations from real markets make discovering purchasers testing, shipping merchandise moderate, and fare costs high. Second, e-business can pull in populaces that are customarily under-spoke to in fare markets, for example, ladies, little organizations and rustic business visionaries.

Download Full-text

Indian Premier League Dataset Analytics using Hadoop-Hive

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b4579.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 3999-4004

Keyword(s):

Big Data ◽

Open Source ◽

Database Systems ◽

Huge Amount ◽

Premier League ◽

Open Source Data ◽

Open Source Framework ◽

Source Data ◽

Set Up ◽

Processing Techniques

Big Data is a term used to represent huge volume of both unstructured and structured data which cannot be processed by the traditional data processing techniques. This data is too huge, grows exponentially and doesn't fit into the structure of the traditional database systems. Analyzing Big Data is a very challenging task since it involves the processing of huge amount of data. As the industry or its business grows, the data related to the industries also tend to grow on a larger scale. Prominent data analysis tools are required to analyze the data in order to gain value out of it. Hadoop is a sought-after open source framework that uses MapReduce techniques to store and process huge datasets. However, the programs written using MapReduce techniques are not flexible and also require maintenance. This problem is overcome by making use of HiveQL. In order to execute queries in HiveQL, the platform required is Hive. It is an open-source data warehousing set-up built on Hadoop. HiveQL queries are compiled into MapReduce jobs that are executed utilizing Hadoop. In this paper we have analyzed the Indian Premier League dataset using HiveQL and compared its execution time with that of traditional SQL queries. It was found that the HiveQL provided better performance with larger dataset while SQL performed better with smaller datasets

Download Full-text