Big Data Analytics | ScienceGate

The term Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. Big Data is differentiated from traditional technologies in three ways: volume, velocity and variety of data. Big data analytics is the process of analyzing large data sets which contains a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Since Big Data is new emerging field, there is a need for development of new technologies and algorithms for handling big data. The main objective of this paper is to provide knowledge about various research challenges of Big Data analytics. A brief overview of various types of Big Data analytics is discussed in this paper. For each analytics, the paper describes process steps and tools. A banking application is given for each analytics. Some of research challenges and possible solutions for those challenges of big data analytics are also discussed.

Download Full-text

‘SasCsvToolkit’ - A versatile parallel `bag-of-tasks` job submission application on heterogeneous and homogeneous platforms for Big Data Analytics such as for Biomedical Informatics

10.21203/rs.2.15498/v1 ◽

2019 ◽

Author(s):

Abhishek Singh

Keyword(s):

Big Data ◽

Data Analysis ◽

Big Data Analytics ◽

Large Data ◽

Big Data Analysis ◽

Organizational History ◽

Corporate Groups ◽

File Formats ◽

The Right ◽

Training Background

Abstract Background: The need for big data analysis requires being able to process large data which are being held fine-tuned for usage by corporates. It is only very recently that the need for big data has caught attention for low budget corporate groups and academia who typically do not have money and resources to buy expensive licenses of big data analysis platforms such as SAS. The corporates continue to work on SAS data format largely because of systemic organizational history and that the prior codes have been built on them. The data-providers continue to thus provide data in SAS formats. Acute sudden need has arisen because of this gap of data being in SAS format and the coders not having a SAS expertise or training background as the economic and inertial forces acting of having shaped these two class of people have been different. Method: We analyze the differences and thus the need for SasCsvToolkit which helps to generate a CSV file for a SAS format data so that the data scientist can then make use of his skills in other tools that can process CSVs such as R, SPSS, or even Microsoft Excel. At the same time, it also provides conversion of CSV files to SAS format. Apart from this, a SAS database programmer always struggles in finding the right method to do a database search, exact match, substring match, except condition, filters, unique values, table joins and data mining for which the toolbox also provides template scripts to modify and use from command line. Results: The toolkit has been implemented on SLURM scheduler platform as a `bag-of-tasks` algorithm for parallel and distributed workflow though serial version has also been incorporated. Conclusion: In the age of Big Data where there are way too many file formats and software and analytics environment each having their own semantics to deal with specific file types, SasCsvToolkit will find its functions very handy to a data engineer.

Download Full-text

Big Data Analytics in Intra-Data Center Networks and Components of Data Mining

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206272 ◽

2016 ◽

pp. 82-89

Author(s):

Pushpa Mannava

Keyword(s):

Data Mining ◽

Big Data ◽

Data Center ◽

Data Analytics ◽

Big Data Analytics ◽

Large Data ◽

Data Center Networks ◽

Advanced Analytics ◽

Scalable Design ◽

Data Collections

Data mining is considered as a vital procedure as it is used for locating brand-new, legitimate, useful as well as reasonable kinds of data. The assimilation of data mining methods in cloud computing gives a versatile and also scalable design that can be made use of for reliable mining of significant quantity of data from virtually incorporated data resources with the goal of creating beneficial information which is useful in decision making. The procedure of removing concealed, beneficial patterns, as well as useful info from big data is called big data analytics. This is done via using advanced analytics techniques on large data collections. This paper provides the information about big data analytics in intra-data center networks, components of data mining and also techniques of Data mining.

Download Full-text

Big Data Analytics Using Apache Hive to Analyze Health Data

10.4018/978-1-6684-3662-2.ch046 ◽

2022 ◽

pp. 979-992

Author(s):

Pavani Konagala

Keyword(s):

Big Data ◽

Stock Exchange ◽

Big Data Analytics ◽

Large Data ◽

Massive Data ◽

Data Sets ◽

Related Data ◽

Health Related ◽

Relational Database Management ◽

Apache Hive

A large volume of data is stored electronically. It is very difficult to measure the total volume of that data. This large amount of data is coming from various sources such as stock exchange, which may generate terabytes of data every day, Facebook, which may take about one petabyte of storage, and internet archives, which may store up to two petabytes of data, etc. So, it is very difficult to manage that data using relational database management systems. With the massive data, reading and writing from and into the drive takes more time. So, the storage and analysis of this massive data has become a big problem. Big data gives the solution for these problems. It specifies the methods to store and analyze the large data sets. This chapter specifies a brief study of big data techniques to analyze these types of data. It includes a wide study of Hadoop characteristics, Hadoop architecture, advantages of big data and big data eco system. Further, this chapter includes a comprehensive study of Apache Hive for executing health-related data and deaths data of U.S. government.

Download Full-text

Research Challenges in Big Data Analytics

Advances in Business Information Systems and Analytics - Enterprise Big Data Engineering, Analytics, and Management ◽

10.4018/978-1-5225-0293-7.ch004 ◽

2016 ◽

pp. 48-64

Author(s):

Sivamathi Chokkalingam ◽

Vijayarani S.

Keyword(s):

Big Data ◽

Data Analytics ◽

Large Scale ◽

New Technologies ◽

Big Data Analytics ◽

Large Data ◽

Data Sets ◽

Data Types ◽

Customer Preferences ◽

Research Challenges

The term Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. Big Data is differentiated from traditional technologies in three ways: volume, velocity and variety of data. Big data analytics is the process of analyzing large data sets which contains a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Since Big Data is new emerging field, there is a need for development of new technologies and algorithms for handling big data. The main objective of this paper is to provide knowledge about various research challenges of Big Data analytics. A brief overview of various types of Big Data analytics is discussed in this paper. For each analytics, the paper describes process steps and tools. A banking application is given for each analytics. Some of research challenges and possible solutions for those challenges of big data analytics are also discussed.

Download Full-text

Big Data Analytics Using Apache Hive to Analyze Health Data

Nature-Inspired Algorithms for Big Data Frameworks - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-5852-1.ch015 ◽

2019 ◽

pp. 358-372

Author(s):

Pavani Konagala

Keyword(s):

Big Data ◽

Stock Exchange ◽

Big Data Analytics ◽

Large Data ◽

Massive Data ◽

Data Sets ◽

Related Data ◽

Health Related ◽

Relational Database Management ◽

Apache Hive

A large volume of data is stored electronically. It is very difficult to measure the total volume of that data. This large amount of data is coming from various sources such as stock exchange, which may generate terabytes of data every day, Facebook, which may take about one petabyte of storage, and internet archives, which may store up to two petabytes of data, etc. So, it is very difficult to manage that data using relational database management systems. With the massive data, reading and writing from and into the drive takes more time. So, the storage and analysis of this massive data has become a big problem. Big data gives the solution for these problems. It specifies the methods to store and analyze the large data sets. This chapter specifies a brief study of big data techniques to analyze these types of data. It includes a wide study of Hadoop characteristics, Hadoop architecture, advantages of big data and big data eco system. Further, this chapter includes a comprehensive study of Apache Hive for executing health-related data and deaths data of U.S. government.

Download Full-text

Big Data analytics for prediction: parallel processing of the big learning base with the possibility of improving the final result of the prediction

Information Discovery and Delivery ◽

10.1108/idd-02-2018-0002 ◽

2018 ◽

Vol 46 (3) ◽

pp. 147-160 ◽

Cited By ~ 2

Author(s):

Laouni Djafri ◽

Djamel Amar Bensaber ◽

Reda Adjoudj

Keyword(s):

Big Data ◽

Data Analytics ◽

Sampling Method ◽

New Technologies ◽

Predictive Analytics ◽

Big Data Analytics ◽

Sampling Strategy ◽

Original Data ◽

Data Set ◽

Content Type

Purpose This paper aims to solve the problems of big data analytics for prediction including volume, veracity and velocity by improving the prediction result to an acceptable level and in the shortest possible time. Design/methodology/approach This paper is divided into two parts. The first one is to improve the result of the prediction. In this part, two ideas are proposed: the double pruning enhanced random forest algorithm and extracting a shared learning base from the stratified random sampling method to obtain a representative learning base of all original data. The second part proposes to design a distributed architecture supported by new technologies solutions, which in turn works in a coherent and efficient way with the sampling strategy under the supervision of the Map-Reduce algorithm. Findings The representative learning base obtained by the integration of two learning bases, the partial base and the shared base, presents an excellent representation of the original data set and gives very good results of the Big Data predictive analytics. Furthermore, these results were supported by the improved random forests supervised learning method, which played a key role in this context. Originality/value All companies are concerned, especially those with large amounts of information and want to screen them to improve their knowledge for the customer and optimize their campaigns.

Download Full-text

Development of Big Data Analytics Model

ITEJ (Information Technology Engineering Journals) ◽

10.24235/itej.v4i1.47 ◽

2019 ◽

Vol 4 (1) ◽

pp. 14-25

Author(s):

Saiful Rizal

Keyword(s):

Data Mining ◽

Information Technology ◽

Big Data ◽

Data Storage ◽

Data Structures ◽

Data Analytics ◽

Big Data Analytics ◽

Large Data ◽

Complex Data ◽

Survey Paper

The development of information technology produces very large data sizes, with various variations in data and complex data structures. Traditional data storage techniques are not sufficient for storage and analysis with very large volumes of data. Many researchers conducted their research in analyzing big data with various analytics models in big data. Therefore, the purpose of the survey paper is to provide an understanding of analytics models in big data for various uses using algorithms in data mining. Preprocessing big data is the key to turning big data into big value.

Download Full-text

Trends and Opportunities in Health Analytics as a Service and Implications for Use in Low Resource Settings: A Literature Review Abstract (Preprint)

10.2196/preprints.15737 ◽

2019 ◽

Author(s):

Meghana Bastwadkar ◽

Carolyn McGregor ◽

S Balaji

Keyword(s):

Data Mining ◽

Cloud Computing ◽

Big Data ◽

Intensive Care ◽

Literature Review ◽

Health Monitoring ◽

Data Analytics ◽

Neonatal Intensive Care ◽

Big Data Analytics ◽

Healthcare Facilities

BACKGROUND This paper presents a systematic literature review of existing remote health monitoring systems with special reference to neonatal intensive care (NICU). Articles on NICU clinical decision support systems (CDSSs) which used cloud computing and big data analytics were surveyed. OBJECTIVE The aim of this study is to review technologies used to provide NICU CDSS. The literature review highlights the gaps within frameworks providing HAaaS paradigm for big data analytics METHODS Literature searches were performed in Google Scholar, IEEE Digital Library, JMIR Medical Informatics, JMIR Human Factors and JMIR mHealth and only English articles published on and after 2015 were included. The overall search strategy was to retrieve articles that included terms that were related to “health analytics” and “as a service” or “internet of things” / ”IoT” and “neonatal intensive care unit” / ”NICU”. Title and abstracts were reviewed to assess relevance. RESULTS In total, 17 full papers met all criteria and were selected for full review. Results showed that in most cases bedside medical devices like pulse oximeters have been used as the sensor device. Results revealed a great diversity in data acquisition techniques used however in most cases the same physiological data (heart rate, respiratory rate, blood pressure, blood oxygen saturation) was acquired. Results obtained have shown that in most cases data analytics involved data mining classification techniques, fuzzy logic-NICU decision support systems (DSS) etc where as big data analytics involving Artemis cloud data analysis have used CRISP-TDM and STDM temporal data mining technique to support clinical research studies. In most scenarios both real-time and retrospective analytics have been performed. Results reveal that most of the research study has been performed within small and medium sized urban hospitals so there is wide scope for research within rural and remote hospitals with NICU set ups. Results have shown creating a HAaaS approach where data acquisition and data analytics are not tightly coupled remains an open research area. Reviewed articles have described architecture and base technologies for neonatal health monitoring with an IoT approach. CONCLUSIONS The current work supports implementation of the expanded Artemis cloud as a commercial offering to healthcare facilities in Canada and worldwide to provide cloud computing services to critical care. However, no work till date has been completed for low resource setting environment within healthcare facilities in India which results in scope for research. It is observed that all the big data analytics frameworks which have been reviewed in this study have tight coupling of components within the framework, so there is a need for a framework with functional decoupling of components.

Download Full-text

The Impact of Big Data on Electronic Commerce in Profit Organisations in Saudi Arabia

Research in World Economy ◽

10.5430/rwe.v10n4p106 ◽

2019 ◽

Vol 10 (4) ◽

pp. 106

Author(s):

Bader A. Alyoubi

Keyword(s):

Saudi Arabia ◽

Big Data ◽

Inventory Management ◽

New Technologies ◽

Predictive Analytics ◽

Business Environment ◽

Primary Data ◽

Research Approach ◽

Multiple Sources ◽

The Impact

Big Data is gaining rapid popularity in e-commerce sector across the globe. There is a general consensus among experts that Saudi organisations are late in adopting new technologies. It is generally believed that the lack of research in latest technologies that are specific to Saudi Arabia that is culturally, socially, and economically different from the West, is one of the key factors for the delay in technology adoption in Saudi Arabia. Hence, to fill this gap to a certain extent and create awareness about Big Data technology, the primary goal of this research was to identify the impact of Big Data on e-commerce organisations in Saudi Arabia. Internet has changed the business environment of Saudi Arabia too. E-commerce is set for achieving new heights due to latest technological advancements. A qualitative research approach was used by conducting interviews with highly experienced professional to gather primary data. Using multiple sources of evidence, this research found out that traditional databases are not capable of handling massive data. Big Data is a promising technology that can be adopted by e-commerce companies in Saudi Arabia. Big Data’s predictive analytics will certainly help e-commerce companies to gain better insight of the consumer behaviour and thus offer customised products and services. The key finding of this research is that Big Data has a significant impact in e-commerce organisations in Saudi Arabia on various verticals like customer retention, inventory management, product customisation, and fraud detection.

Download Full-text