Big Data and Analytics

Author(s):  
Sheik Abdullah A. ◽  
Priyadharshini P.

The term Big Data corresponds to a large dataset which is available in different forms of occurrence. In recent years, most of the organizations generate vast amounts of data in different forms which makes the context of volume, variety, velocity, and veracity. Big Data on the volume aspect is based on data set maintenance. The data volume goes to processing usual a database but cannot be handled by a traditional database. Big Data is stored among structured, unstructured, and semi-structured data. Big Data is used for programming, data warehousing, computational frameworks, quantitative aptitude and statistics, and business knowledge. Upon considering the analytics in the Big Data sector, predictive analytics and social media analytics are widely used for determining the pattern or trend which is about to happen. This chapter mainly deals with the tools and techniques that corresponds to big data analytics of various applications.

Author(s):  
Sheik Abdullah A ◽  
Selvakumar S ◽  
Ramya C

Data analytics has becoming one of the challenging platforms across various domains such as telecom, health care, social media and so on. The challenging and most promising task in analytics is the understanding of various patterns in the data. The mechanism of data retrieval and analysis seems to be the promising one in which the algorithms, techniques, way of processing data are in need with the ability to target upon large volumes of data. There are various types of analytical methods such as predictive analytics, descriptive analytics, text analytics, social media analytics and survival analytics. This chapter mainly focuses towards the mechanism of descriptive analytics its types, algorithms and applications. There are various forms of tools and techniques such as association rule mining, sequence rule mining, and data categorization such as hierarchical and non-hierarchical clustering methods with its variants.


Author(s):  
Jisoo Sim ◽  
Patrick Miller

To meet the needs of park users, planners and designers must know what park users want to do and how they want the park to offer different activities. Big data may help planners and designers gain this knowledge. This study examines how big data collected in an urban park could be used to identify meaningful implications for planning and design. While big data have emerged as a new data source, big data have not become an accepted source of data due to a lack of understanding of big data analytics. By comparing a survey as a traditional data source with big data, this study identifies the strengths and weaknesses of using big data analytics in park planning and design. There are two research questions: (1) what activities do park users want; and (2) how satisfied are users with different activities. The Gyeongui Line Forest Park, which was built on an abandoned railway, was selected as the study site. A total of 177 responses were collected through the onsite survey, and 3703 tweets mentioning the park were collected from Twitter. Results from the survey show that ordinary activities such as walking and taking a rest in the park were the most common. These findings also support existing studies. The results from social media analytics found notable things such as positive tweets about how the railway was turned into a park, and negative tweets about diseases that may occur in the park. Therefore, a survey as traditional data and social media analytics as big data can be complementary methods for the design and planning process.


2019 ◽  
Vol 8 (S3) ◽  
pp. 35-40
Author(s):  
S. Mamatha ◽  
T. Sudha

In this digital world, as organizations are evolving rapidly with data centric asset the explosion of data and size of the databases have been growing exponentially. Data is generated from different sources like business processes, transactions, social networking sites, web servers, etc. and remains in structured as well as unstructured form. The term ― Big data is used for large data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data varies in size ranging from a few dozen terabytes to many petabytes of data in a single data set. Difficulties include capture, storage, search, sharing, analytics and visualizing. Big data is available in structured, unstructured and semi-structured data format. Relational database fails to store this multi-structured data. Apache Hadoop is efficient, robust, reliable and scalable framework to store, process, transforms and extracts big data. Hadoop framework is open source and fee software which is available at Apache Software Foundation. In this paper we will present Hadoop, HDFS, Map Reduce and c-means big data algorithm to minimize efforts of big data analysis using Map Reduce code. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools and related fields.


2021 ◽  
Author(s):  
Steven F. Lehrer ◽  
Tian Xie

There exists significant hype regarding how much machine learning and incorporating social media data can improve forecast accuracy in commercial applications. To assess if the hype is warranted, we use data from the film industry in simulation experiments that contrast econometric approaches with tools from the predictive analytics literature. Further, we propose new strategies that combine elements from each literature in a bid to capture richer patterns of heterogeneity in the underlying relationship governing revenue. Our results demonstrate the importance of social media data and value from hybrid strategies that combine econometrics and machine learning when conducting forecasts with new big data sources. Specifically, although both least squares support vector regression and recursive partitioning strategies greatly outperform dimension reduction strategies and traditional econometrics approaches in forecast accuracy, there are further significant gains from using hybrid approaches. Further, Monte Carlo experiments demonstrate that these benefits arise from the significant heterogeneity in how social media measures and other film characteristics influence box office outcomes. This paper was accepted by J. George Shanthikumar, big data analytics.


2018 ◽  
Vol 7 (2.27) ◽  
pp. 1
Author(s):  
Rapinder Kaur ◽  
Vaishali Chauhan ◽  
Urvashi Mittal

Immoderate amount of data is being generated everyday across the world via miscellaneous sources or fields which create issues to the users. Due to this rapid growth, the crucial issue is to analyse the big data with the help of traditional data processing tactics. Structured data is not the peerless but moreover unstructured data and semi-structured data charge up the supplementary consequences to handle this voluminous data. As in this gigantic bulk of data highly advantageous information is hidden which can be good for what ails the individual, group or organization and for adding up to more sophisticated or valuable decisions. So in order to deal with this many new tools and techniques have been excogitated. These tools can analyse the large volume of data being generated at unprecedented speed. This paper shows the comparative study of some of the data analytics techniques which can untangle the big data analytics issues by examining it in more précised manner. The contrast study of Hadoop, Hive and Pig has been illustrated which covers the working of these techniques.


Author(s):  
Sanjeev Kumar Punia ◽  
Manoj Kumar ◽  
Thompson Stephan ◽  
Ganesh Gopal Deverajan ◽  
Rizwan Patan

In broad, three machine learning classification algorithms are used to discover correlations, hidden patterns, and other useful information from different data sets known as big data. Today, Twitter, Facebook, Instagram, and many other social media networks are used to collect the unstructured data. The conversion of unstructured data into structured data or meaningful information is a very tedious task. The different machine learning classification algorithms are used to convert unstructured data into structured data. In this paper, the authors first collect the unstructured research data from a frequently used social media network (i.e., Twitter) by using a Twitter application program interface (API) stream. Secondly, they implement different machine classification algorithms (supervised, unsupervised, and reinforcement) like decision trees (DT), neural networks (NN), support vector machines (SVM), naive Bayes (NB), linear regression (LR), and k-nearest neighbor (K-NN) from the collected research data set. The comparison of different machine learning classification algorithms is concluded.


Author(s):  
Yannick Dufresne ◽  
Brittany I. Davidson

This chapter assesses big data. Within the social sciences, big data could refer to an emerging field of research that brings together academics from a variety of disciplines using and developing tools to widen perspective, to utilize latent data sets, as well as for the generation of new data. Another way to define big data in the social sciences refers to data corresponding to at least one of the three s of big data: volume, variety, or velocity.. These characteristics are widely used by researchers attempting to define and distinguish new types of data from conventional ones. However, there are a number of ethical and consent issues with big data analytics. For example, many studies across the social sciences utilize big data from the web, from social media, online communities, and the darknet, where there is a question as to whether users provided consent to the reuse of their posts, profiles, or other data shared when they signed up, knowing their profiles and information would be public. This has led to a number of issues regarding algorithms making decisions that cannot be explained. The chapter then considers the opportunities and pitfalls that come along with big data.


2018 ◽  
Vol 46 (3) ◽  
pp. 147-160 ◽  
Author(s):  
Laouni Djafri ◽  
Djamel Amar Bensaber ◽  
Reda Adjoudj

Purpose This paper aims to solve the problems of big data analytics for prediction including volume, veracity and velocity by improving the prediction result to an acceptable level and in the shortest possible time. Design/methodology/approach This paper is divided into two parts. The first one is to improve the result of the prediction. In this part, two ideas are proposed: the double pruning enhanced random forest algorithm and extracting a shared learning base from the stratified random sampling method to obtain a representative learning base of all original data. The second part proposes to design a distributed architecture supported by new technologies solutions, which in turn works in a coherent and efficient way with the sampling strategy under the supervision of the Map-Reduce algorithm. Findings The representative learning base obtained by the integration of two learning bases, the partial base and the shared base, presents an excellent representation of the original data set and gives very good results of the Big Data predictive analytics. Furthermore, these results were supported by the improved random forests supervised learning method, which played a key role in this context. Originality/value All companies are concerned, especially those with large amounts of information and want to screen them to improve their knowledge for the customer and optimize their campaigns.


2021 ◽  
Vol 16 (4) ◽  
pp. 82
Author(s):  
Muhamad Hariz Muhamad Adnan ◽  
Shamsul Arrieya Ariffin ◽  
Hafizul Fahri Hanafi ◽  
Mohd Shahid Husain ◽  
Ismail Yusuf Panessai

Recently, the promotion of Science, technology, engineering and mathematics (STEM) education has become the highlight due to the shortage in the STEM workforce. Surprisingly, the enrolment rates in STEM degrees are still low in many countries. Social media has been identified as one of the main platforms that can help to increase prospective students’ interest in STEM and also Technical and Vocational Education and Training (TVET) subjects. However, very little research has been done for the higher education institutions in Malaysia in leveraging social media and social media analytics effectively to increase the students’ interests and awareness of STEM and TVET disciplines. Therefore, this paper aims to propose a framework to increase prospective students’ interest in STEM and TVET using social media and big data analytics. The objectives of this study are to explore various social media applications in education and study these applications towards increasing students’ interests and propose a suitable framework for Malaysian higher education institutions. The framework is proposed by following the theory synthesis methodology. Four main components of the framework have been proposed, namely social media, role model or mentoring, massive open online courses and big data analytics. Each component is significant and requires a considerable amount of time to develop. The suggested framework is anticipated to benefit higher education institutions with a significant gain of the number of students, revenues and positive reputations.   Keywords: Social media, Social media analytics, STEM, E-learning, Education  


Nowadays, large volume of data is generated in the form of text, voice, video, images and sound. It is very challenging job to handle and to get process these different types of data. It is very laborious process to analysis big data by using the traditional data processing applications. Due to huge scattered file systems, a big data analysis is a difficult task. So, to analyses the big data, a number of tools and techniques are required. Some of the techniques of data mining are used to analyze the big data such as clustering, prediction, and classification and decision tree etc. Apache Hadoop, Apache spark, Apache Storm, MongoDB, NOSQL, HPCC are the tools used to handle big data. This paper presents a review and comparative study of these tools and techniques which are basically used for Big Data analytics. A brief summary of tools and techniques is represented here.


Sign in / Sign up

Export Citation Format

Share Document