Enhanced Over_Sampling Techniques for Imbalanced Big Data Set Classification

Author(s):  
Sachin Subhash Patil ◽  
Shefali Pratap Sonavane
Keyword(s):  
Big Data ◽  
Author(s):  
Yihao Tian

Big data is an unstructured data set with a considerable volume, coming from various sources such as the internet, business organizations, etc., in various formats. Predicting consumer behavior is a core responsibility for most dealers. Market research can show consumer intentions; it can be a big order for a best-designed research project to penetrate the veil, protecting real customer motivations from closer scrutiny. Customer behavior usually focuses on customer data mining, and each model is structured at one stage to answer one query. Customer behavior prediction is a complex and unpredictable challenge. In this paper, advanced mathematical and big data analytical (BDA) methods to predict customer behavior. Predictive behavior analytics can provide modern marketers with multiple insights to optimize efforts in their strategies. This model goes beyond analyzing historical evidence and making the most knowledgeable assumptions about what will happen in the future using mathematical. Because the method is complex, it is quite straightforward for most customers. As a result, most consumer behavior models, so many variables that produce predictions that are usually quite accurate using big data. This paper attempts to develop a model of association rule mining to predict customers’ behavior, improve accuracy, and derive major consumer data patterns. The finding recommended BDA method improves Big data analytics usability in the organization (98.2%), risk management ratio (96.2%), operational cost (97.1%), customer feedback ratio (98.5%), and demand prediction ratio (95.2%).


2019 ◽  
Vol 2 ◽  
pp. 1-6
Author(s):  
Wenjuan Lu ◽  
Aiguo Liu ◽  
Chengcheng Zhang

<p><strong>Abstract.</strong> With the development of geographic information technology, the way to get geographical information is constantly, and the data of space-time is exploding, and more and more scholars have started to develop a field of data processing and space and time analysis. In this, the traditional data visualization technology is high in popularity and simple and easy to understand, through simple pie chart and histogram, which can reveal and analyze the characteristics of the data itself, but still cannot combine with the map better to display the hidden time and space information to exert its application value. How to fully explore the spatiotemporal information contained in massive data and accurately explore the spatial distribution and variation rules of geographical things and phenomena is a key research problem at present. Based on this, this paper designed and constructed a universal thematic data visual analysis system that supports the full functions of data warehousing, data management, data analysis and data visualization. In this paper, Weifang city is taken as the research area, starting from the aspects of rainfall interpolation analysis and population comprehensive analysis of Weifang, etc., the author realizes the fast and efficient display under the big data set, and fully displays the characteristics of spatial and temporal data through the visualization effect of thematic data. At the same time, Cassandra distributed database is adopted in this research, which can also store, manage and analyze big data. To a certain extent, it reduces the pressure of front-end map drawing, and has good query analysis efficiency and fast processing ability.</p>


A large volume of datasets is available in various fields that are stored to be somewhere which is called big data. Big Data healthcare has clinical data set of every patient records in huge amount and they are maintained by Electronic Health Records (EHR). More than 80 % of clinical data is the unstructured format and reposit in hundreds of forms. The challenges and demand for data storage, analysis is to handling large datasets in terms of efficiency and scalability. Hadoop Map reduces framework uses big data to store and operate any kinds of data speedily. It is not solely meant for storage system however conjointly a platform for information storage moreover as processing. It is scalable and fault-tolerant to the systems. Also, the prediction of the data sets is handled by machine learning algorithm. This work focuses on the Extreme Machine Learning algorithm (ELM) that can utilize the optimized way of finding a solution to find disease risk prediction by combining ELM with Cuckoo Search optimization-based Support Vector Machine (CS-SVM). The proposed work also considers the scalability and accuracy of big data models, thus the proposed algorithm greatly achieves the computing work and got good results in performance of both veracity and efficiency.


Author(s):  
Hena Iqbal ◽  
Sujni Paul ◽  
Khaliquzzaman Khan

Evaluation is an analytical and organized process to figure out the present positive influences, favourable future prospects, existing shortcomings and ulterior complexities of any plan, program, practice or a policy. Evaluation of policy is an essential and vital process required to measure the performance or progression of the scheme. The main purpose of policy evaluation is to empower various stakeholders and enhance their socio-economic environment. A large number of policies or schemes in different areas are launched by government in view of citizen welfare. Although, the governmental policies intend to better shape up the life quality of people but may also impact their every day’s life. A latest governmental scheme Saubhagya launched by Indian government in 2017 has been selected for evaluation by applying opinion mining techniques. The data set of public opinion associated with this scheme has been captured by Twitter. The primary intent is to offer opinion mining as a smart city technology that harness the user-generated big data and analyse it to offer a sustainable governance model.


2018 ◽  
Vol 36 (3) ◽  
pp. 458-481 ◽  
Author(s):  
Yezheng Liu ◽  
Lu Yang ◽  
Jianshan Sun ◽  
Yuanchun Jiang ◽  
Jinkun Wang

Purpose Academic groups are designed specifically for researchers. A group recommendation procedure is essential to support scholars’ research-based social activities. However, group recommendation methods are rarely applied in online libraries and they often suffer from scalability problem in big data context. The purpose of this paper is to facilitate academic group activities in big data-based library systems by recommending satisfying articles for academic groups. Design/methodology/approach The authors propose a collaborative matrix factorization (CoMF) mechanism and implement paralleled CoMF under Hadoop framework. Its rationale is collaboratively decomposing researcher-article interaction matrix and group-article interaction matrix. Furthermore, three extended models of CoMF are proposed. Findings Empirical studies on CiteULike data set demonstrate that CoMF and three variants outperform baseline algorithms in terms of accuracy and robustness. The scalability evaluation of paralleled CoMF shows its potential value in scholarly big data environment. Research limitations/implications The proposed methods fill the gap of group-article recommendation in online libraries domain. The proposed methods have enriched the group recommendation methods by considering the interaction effects between groups and members. The proposed methods are the first attempt to implement group recommendation methods in big data contexts. Practical implications The proposed methods can improve group activity effectiveness and information shareability in academic groups, which are beneficial to membership retention and enhance the service quality of online library systems. Furthermore, the proposed methods are applicable to big data contexts and make library system services more efficient. Social implications The proposed methods have potential value to improve scientific collaboration and research innovation. Originality/value The proposed CoMF method is a novel group recommendation method based on the collaboratively decomposition of researcher-article matrix and group-article matrix. The process indirectly reflects the interaction between groups and members, which accords with actual library environments and provides an interpretable recommendation result.


2021 ◽  
Vol 105 ◽  
pp. 348-355
Author(s):  
Hou Xiang Liu ◽  
Sheng Han Zhou ◽  
Bang Chen ◽  
Chao Fan Wei ◽  
Wen Bing Chang ◽  
...  

The paper proposed a practice teaching mode by making analysis on Didi data set. There are more and more universities have provided the big data analysis courses with the rapid development and wide application of big data analysis technology. The theoretical knowledge of big data analysis is professional and hard to understand. That may reduce students' interest in learning and learning motivation. And the practice teaching plays an important role between theory learning and application. This paper first introduces the theoretical teaching part of the course, and the theoretical methods involved in the course. Then the practice teaching content of Didi data analysis case was briefly described. And the study selects the related evaluation index to evaluate the teaching effect through questionnaire survey and verify the effectiveness of teaching method. The results show that 78% of students think that practical teaching can greatly improve students' interest in learning, 89% of students think that practical teaching can help them learn theoretical knowledge, 89% of students have basically mastered the method of big data analysis technology introduced in the course, 90% of students think that the teaching method proposed in this paper can greatly improve students' practical ability. The teaching mode is effective, which can improve the learning effect and practical ability of students in data analysis, so as to improve the teaching effect.


Author(s):  
Sheik Abdullah A. ◽  
Priyadharshini P.

The term Big Data corresponds to a large dataset which is available in different forms of occurrence. In recent years, most of the organizations generate vast amounts of data in different forms which makes the context of volume, variety, velocity, and veracity. Big Data on the volume aspect is based on data set maintenance. The data volume goes to processing usual a database but cannot be handled by a traditional database. Big Data is stored among structured, unstructured, and semi-structured data. Big Data is used for programming, data warehousing, computational frameworks, quantitative aptitude and statistics, and business knowledge. Upon considering the analytics in the Big Data sector, predictive analytics and social media analytics are widely used for determining the pattern or trend which is about to happen. This chapter mainly deals with the tools and techniques that corresponds to big data analytics of various applications.


Author(s):  
Trupti Vishwambhar Kenekar ◽  
Ajay R. Dani

As Big Data is group of structured, unstructured and semi-structure data collected from various sources, it is important to mine and provide privacy to individual data. Differential Privacy is one the best measure which provides strong privacy guarantee. The chapter proposed differentially private frequent item set mining using map reduce requires less time for privately mining large dataset. The chapter discussed problem of preserving data privacy, different challenges to preserving data privacy in big data environment, Data privacy techniques and their applications to unstructured data. The analyses of experimental results on structured and unstructured data set are also presented.


2020 ◽  
pp. 214-244
Author(s):  
Prithish Banerjee ◽  
Mark Vere Culp ◽  
Kenneth Jospeh Ryan ◽  
George Michailidis

This chapter presents some popular graph-based semi-supervised approaches. These techniques apply to classification and regression problems and can be extended to big data problems using recently developed anchor graph enhancements. The background necessary for understanding this Chapter includes linear algebra and optimization. No prior knowledge in methods of machine learning is necessary. An empirical demonstration of the techniques for these methods is also provided on real data set benchmarks.


Sign in / Sign up

Export Citation Format

Share Document