Big Data Analytics

Author(s):  
Pratiyush Guleria ◽  
Manu Sood

Due to an increase in the number of digital transactions and data sources, a huge amount of unstructured data is generated by every interaction. In such a scenario, the concepts of data mining assume great significance as useful information/trends/predictions can be retrieved from this large amount of data, known as big data. Big data predictive analytics are making big inroads into the educational field because with the adoption of new technologies, new academic trends are being introduced into educational systems. This accumulation of large data of different varieties throws a new set of challenges to the learners as well as educational institutions in ensuring the quality of their education by improving strategic/operational decision-making capabilities. Therefore, the authors address this issue by proposing a support system that can guide the student to choose and to focus on the right course(s) based on their personal preferences. This chapter provides the readers with the requisite information about educational frameworks and related data mining.

2017 ◽  
pp. 83-99
Author(s):  
Sivamathi Chokkalingam ◽  
Vijayarani S.

The term Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. Big Data is differentiated from traditional technologies in three ways: volume, velocity and variety of data. Big data analytics is the process of analyzing large data sets which contains a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Since Big Data is new emerging field, there is a need for development of new technologies and algorithms for handling big data. The main objective of this paper is to provide knowledge about various research challenges of Big Data analytics. A brief overview of various types of Big Data analytics is discussed in this paper. For each analytics, the paper describes process steps and tools. A banking application is given for each analytics. Some of research challenges and possible solutions for those challenges of big data analytics are also discussed.


2019 ◽  
Author(s):  
Abhishek Singh

Abstract Background: The need for big data analysis requires being able to process large data which are being held fine-tuned for usage by corporates. It is only very recently that the need for big data has caught attention for low budget corporate groups and academia who typically do not have money and resources to buy expensive licenses of big data analysis platforms such as SAS. The corporates continue to work on SAS data format largely because of systemic organizational history and that the prior codes have been built on them. The data-providers continue to thus provide data in SAS formats. Acute sudden need has arisen because of this gap of data being in SAS format and the coders not having a SAS expertise or training background as the economic and inertial forces acting of having shaped these two class of people have been different. Method: We analyze the differences and thus the need for SasCsvToolkit which helps to generate a CSV file for a SAS format data so that the data scientist can then make use of his skills in other tools that can process CSVs such as R, SPSS, or even Microsoft Excel. At the same time, it also provides conversion of CSV files to SAS format. Apart from this, a SAS database programmer always struggles in finding the right method to do a database search, exact match, substring match, except condition, filters, unique values, table joins and data mining for which the toolbox also provides template scripts to modify and use from command line. Results: The toolkit has been implemented on SLURM scheduler platform as a `bag-of-tasks` algorithm for parallel and distributed workflow though serial version has also been incorporated. Conclusion: In the age of Big Data where there are way too many file formats and software and analytics environment each having their own semantics to deal with specific file types, SasCsvToolkit will find its functions very handy to a data engineer.


Author(s):  
Pushpa Mannava

Data mining is considered as a vital procedure as it is used for locating brand-new, legitimate, useful as well as reasonable kinds of data. The assimilation of data mining methods in cloud computing gives a versatile and also scalable design that can be made use of for reliable mining of significant quantity of data from virtually incorporated data resources with the goal of creating beneficial information which is useful in decision making. The procedure of removing concealed, beneficial patterns, as well as useful info from big data is called big data analytics. This is done via using advanced analytics techniques on large data collections. This paper provides the information about big data analytics in intra-data center networks, components of data mining and also techniques of Data mining.


2022 ◽  
pp. 979-992
Author(s):  
Pavani Konagala

A large volume of data is stored electronically. It is very difficult to measure the total volume of that data. This large amount of data is coming from various sources such as stock exchange, which may generate terabytes of data every day, Facebook, which may take about one petabyte of storage, and internet archives, which may store up to two petabytes of data, etc. So, it is very difficult to manage that data using relational database management systems. With the massive data, reading and writing from and into the drive takes more time. So, the storage and analysis of this massive data has become a big problem. Big data gives the solution for these problems. It specifies the methods to store and analyze the large data sets. This chapter specifies a brief study of big data techniques to analyze these types of data. It includes a wide study of Hadoop characteristics, Hadoop architecture, advantages of big data and big data eco system. Further, this chapter includes a comprehensive study of Apache Hive for executing health-related data and deaths data of U.S. government.


Author(s):  
Sivamathi Chokkalingam ◽  
Vijayarani S.

The term Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. Big Data is differentiated from traditional technologies in three ways: volume, velocity and variety of data. Big data analytics is the process of analyzing large data sets which contains a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Since Big Data is new emerging field, there is a need for development of new technologies and algorithms for handling big data. The main objective of this paper is to provide knowledge about various research challenges of Big Data analytics. A brief overview of various types of Big Data analytics is discussed in this paper. For each analytics, the paper describes process steps and tools. A banking application is given for each analytics. Some of research challenges and possible solutions for those challenges of big data analytics are also discussed.


Author(s):  
Pavani Konagala

A large volume of data is stored electronically. It is very difficult to measure the total volume of that data. This large amount of data is coming from various sources such as stock exchange, which may generate terabytes of data every day, Facebook, which may take about one petabyte of storage, and internet archives, which may store up to two petabytes of data, etc. So, it is very difficult to manage that data using relational database management systems. With the massive data, reading and writing from and into the drive takes more time. So, the storage and analysis of this massive data has become a big problem. Big data gives the solution for these problems. It specifies the methods to store and analyze the large data sets. This chapter specifies a brief study of big data techniques to analyze these types of data. It includes a wide study of Hadoop characteristics, Hadoop architecture, advantages of big data and big data eco system. Further, this chapter includes a comprehensive study of Apache Hive for executing health-related data and deaths data of U.S. government.


2018 ◽  
Vol 46 (3) ◽  
pp. 147-160 ◽  
Author(s):  
Laouni Djafri ◽  
Djamel Amar Bensaber ◽  
Reda Adjoudj

Purpose This paper aims to solve the problems of big data analytics for prediction including volume, veracity and velocity by improving the prediction result to an acceptable level and in the shortest possible time. Design/methodology/approach This paper is divided into two parts. The first one is to improve the result of the prediction. In this part, two ideas are proposed: the double pruning enhanced random forest algorithm and extracting a shared learning base from the stratified random sampling method to obtain a representative learning base of all original data. The second part proposes to design a distributed architecture supported by new technologies solutions, which in turn works in a coherent and efficient way with the sampling strategy under the supervision of the Map-Reduce algorithm. Findings The representative learning base obtained by the integration of two learning bases, the partial base and the shared base, presents an excellent representation of the original data set and gives very good results of the Big Data predictive analytics. Furthermore, these results were supported by the improved random forests supervised learning method, which played a key role in this context. Originality/value All companies are concerned, especially those with large amounts of information and want to screen them to improve their knowledge for the customer and optimize their campaigns.


2019 ◽  
Vol 4 (1) ◽  
pp. 14-25
Author(s):  
Saiful Rizal

The development of information technology produces very large data sizes, with various variations in data and complex data structures. Traditional data storage techniques are not sufficient for storage and analysis with very large volumes of data. Many researchers conducted their research in analyzing big data with various analytics models in big data. Therefore, the purpose of the survey paper is to provide an understanding of analytics models in big data for various uses using algorithms in data mining. Preprocessing big data is the key to turning big data into big value.


2019 ◽  
Author(s):  
Meghana Bastwadkar ◽  
Carolyn McGregor ◽  
S Balaji

BACKGROUND This paper presents a systematic literature review of existing remote health monitoring systems with special reference to neonatal intensive care (NICU). Articles on NICU clinical decision support systems (CDSSs) which used cloud computing and big data analytics were surveyed. OBJECTIVE The aim of this study is to review technologies used to provide NICU CDSS. The literature review highlights the gaps within frameworks providing HAaaS paradigm for big data analytics METHODS Literature searches were performed in Google Scholar, IEEE Digital Library, JMIR Medical Informatics, JMIR Human Factors and JMIR mHealth and only English articles published on and after 2015 were included. The overall search strategy was to retrieve articles that included terms that were related to “health analytics” and “as a service” or “internet of things” / ”IoT” and “neonatal intensive care unit” / ”NICU”. Title and abstracts were reviewed to assess relevance. RESULTS In total, 17 full papers met all criteria and were selected for full review. Results showed that in most cases bedside medical devices like pulse oximeters have been used as the sensor device. Results revealed a great diversity in data acquisition techniques used however in most cases the same physiological data (heart rate, respiratory rate, blood pressure, blood oxygen saturation) was acquired. Results obtained have shown that in most cases data analytics involved data mining classification techniques, fuzzy logic-NICU decision support systems (DSS) etc where as big data analytics involving Artemis cloud data analysis have used CRISP-TDM and STDM temporal data mining technique to support clinical research studies. In most scenarios both real-time and retrospective analytics have been performed. Results reveal that most of the research study has been performed within small and medium sized urban hospitals so there is wide scope for research within rural and remote hospitals with NICU set ups. Results have shown creating a HAaaS approach where data acquisition and data analytics are not tightly coupled remains an open research area. Reviewed articles have described architecture and base technologies for neonatal health monitoring with an IoT approach. CONCLUSIONS The current work supports implementation of the expanded Artemis cloud as a commercial offering to healthcare facilities in Canada and worldwide to provide cloud computing services to critical care. However, no work till date has been completed for low resource setting environment within healthcare facilities in India which results in scope for research. It is observed that all the big data analytics frameworks which have been reviewed in this study have tight coupling of components within the framework, so there is a need for a framework with functional decoupling of components.


2019 ◽  
Vol 10 (4) ◽  
pp. 106
Author(s):  
Bader A. Alyoubi

Big Data is gaining rapid popularity in e-commerce sector across the globe. There is a general consensus among experts that Saudi organisations are late in adopting new technologies. It is generally believed that the lack of research in latest technologies that are specific to Saudi Arabia that is culturally, socially, and economically different from the West, is one of the key factors for the delay in technology adoption in Saudi Arabia. Hence, to fill this gap to a certain extent and create awareness about Big Data technology, the primary goal of this research was to identify the impact of Big Data on e-commerce organisations in Saudi Arabia. Internet has changed the business environment of Saudi Arabia too. E-commerce is set for achieving new heights due to latest technological advancements. A qualitative research approach was used by conducting interviews with highly experienced professional to gather primary data. Using multiple sources of evidence, this research found out that traditional databases are not capable of handling massive data. Big Data is a promising technology that can be adopted by e-commerce companies in Saudi Arabia. Big Data’s predictive analytics will certainly help e-commerce companies to gain better insight of the consumer behaviour and thus offer customised products and services. The key finding of this research is that Big Data has a significant impact in e-commerce organisations in Saudi Arabia on various verticals like customer retention, inventory management, product customisation, and fraud detection.


Sign in / Sign up

Export Citation Format

Share Document