Modern Technologies for Big Data Classification and Clustering - Advances in Data Mining and Database Management
Latest Publications


TOTAL DOCUMENTS

10
(FIVE YEARS 0)

H-INDEX

1
(FIVE YEARS 0)

Published By IGI Global

9781522528050, 9781522528067

Author(s):  
Anu Singha ◽  
Phub Namgay

A tool which algorithmically traces the effectiveness of the text files would be helpful in determining whether the text file have all the characteristic of important concepts. Every text source is build up on key phrases, and these paramount phrases follow a certain grammatical linguistic pattern widely used. An enormous amount of information can be derived from these key concepts for the further analysis such as their dispersion, relationship among the concepts etc. The relationship among the key concepts can be used to draw a concept graphs. So, this chapter presents a detailed methodologies and technologies which evaluate the effectiveness of the extracted information from text files.


Author(s):  
B. K. Tripathy ◽  
Hari Seetha ◽  
M. N. Murty

Data clustering plays a very important role in Data mining, machine learning and Image processing areas. As modern day databases have inherent uncertainties, many uncertainty-based data clustering algorithms have been developed in this direction. These algorithms are fuzzy c-means, rough c-means, intuitionistic fuzzy c-means and the means like rough fuzzy c-means, rough intuitionistic fuzzy c-means which base on hybrid models. Also, we find many variants of these algorithms which improve them in different directions like their Kernelised versions, possibilistic versions, and possibilistic Kernelised versions. However, all the above algorithms are not effective on big data for various reasons. So, researchers have been trying for the past few years to improve these algorithms in order they can be applied to cluster big data. The algorithms are relatively few in comparison to those for datasets of reasonable size. It is our aim in this chapter to present the uncertainty based clustering algorithms developed so far and proposes a few new algorithms which can be developed further.


Author(s):  
R. Raj Kumar ◽  
P. Viswanath ◽  
C. Shoba Bindu

A large dataset is not preferable as it increases computational burden on the methods operating over it. Given the Large dataset, it is always interesting that whether one can generate smaller dataset which is a subset or a set (cardinality should be less when compare to original dataset) of extracted patterns from that large dataset. The patterns in the subset are representatives of the patterns in the original dataset. The subset (set) of representing patterns forms the Prototype set. Forming Prototype set is broadly categorized into two types. 1) Prototype set which is a proper subset of original dataset. 2) Prototype set which contains patterns extracted by using the patterns in the original dataset. This process of reducing the training set can also be done with the features of the training set. The authors discuss the reduction of the datasets in the both directions. These methods are well known as Data Compaction Techniques.


Author(s):  
Ravindra Babu Tallamaraju ◽  
Manas Kirti

With reducing cost of storage devices, increasing amounts of data is being stored and processed for extracting intelligence. Classification and clustering have been two major approaches in generating data abstraction. Over the last few years, text data is dominating the types of data shared and stored. Some of the sources of such datasets are mobile data, e-commerce, and wide-range of continuously expanding social-networking services. Within each of these sources, the nature of data differs drastically from formal language text to Twitter or SMS slangs thereby leading to the need for different ways of processing the data for making meaningful summarization. Such summaries could effectively be used for business advantage. Processing of such data requires identifying appropriate set of features both for efficiency and effectiveness. In the current Chapter, we propose to discuss approaches to text feature selection and make a comparative study.


Author(s):  
S Rao Chintalapudi ◽  
H. M. Krishna Prasad M

Social network analysis is one of the emerging research areas in the modern world. Social networks can be adapted to all the sectors by using graph theory concepts such as transportation networks, collaboration networks, and biological networks and so on. The most important property of social networks is community, collection of nodes with dense connections inside and sparse connections at outside. Community detection is similar to clustering analysis and has many applications in the real-time world such as recommendation systems, target marketing and so on. Community detection algorithms are broadly classified into two categories. One is disjoint community detection algorithms and the other is overlapping community detection algorithms. This chapter reviews overlapping community detection algorithms with their strengths and limitations. To evaluate these algorithms, a popular synthetic network generator, i.e., LFR benchmark generator and the new extended quality measures are discussed in detail.


Author(s):  
Ashok Kumar J ◽  
Abirami S ◽  
Tina Esther Trueman

Sentiment analysis is one of the most important applications in the field of text mining. It computes people's opinions, comments, posts, reviews, evaluations, and emotions which are expressed on products, sales, services, individuals, organizations, etc. Nowadays, large amounts of structured and unstructured data are being produced on the web. The categorizing and grouping of these data become a real-world problem. In this chapter, the authors address the current research in this field, issues and the problem of sentiment analysis on Big Data for classification and clustering. It suggests new methods, applications, algorithm extensions of classification and clustering and software tools in the field of sentiment analysis.


Author(s):  
Bangaru Kamatchi Seethapathy ◽  
Parvathi R

Spatial dataset, which is becoming nontraditional due to the increase in usage of social media sensor networks, gaming and many other new emerging technologies and applications. The wide variety of sensors are used in solving real time problems like natural calamities, traffic analysis, analyzing climatic conditions and the usage of GPS, GPRS in mobile phones all together creates huge amount of spatial data which really exceeds the traditional spatial data analytics platform and become spatial big data .Spatial big data provide new demanding situations for their size, analysis, and exploration. This chapter discusses about the analysis of spatial data and how it gets descriptive manipulation, so that one can understand how multi variant variables get interact with each other along with the different visualization tools which make the understanding of spatial data easier.


Author(s):  
Vignesh U ◽  
Parvathi R

The chapter deals with the big data in biology. The largest collection of biological data maintenance paves the way for big data analytics and big data mining due to its inefficiency in finding noisy and voluminous data from normal database management systems. This provides the domains such as bioinformatics, image informatics, clinical informatics, public health informatics, etc. for big data analytics to achieve better results with higher efficiency and accuracy in clustering, classification and association mining. The complexity measures of the health care data leads to EHR (Evidence-based HealthcaRe) technology for maintenance. EHR includes major challenges such as patient details in structured and unstructured format, medical image data mining, genome analysis and patient communications analysis through sensors – biomarkers, etc. The big biological data have many complications in their data management and maintenance especially after completing the latest genome sequencing technology, next generation sequencing which provides large data in zettabyte size.


Author(s):  
Sushruta Mishra ◽  
Brojo Kishore Mishra ◽  
Hrudaya Kumar Tripathy ◽  
Monalisa Mishra ◽  
Bijayalaxmi Panda

Social network analysis (SNA) is the analysis of social communication through network and graph theory. In our chapter the application of SNA has been explored in telecommunication domain. Telecom data consist of Customer data and Call Detail Data (CDR). The proposed work, considers the attributes of call detail data and customer data as different relationship types to model our Multi-relational Telecommunication social network. Typical work on social network analysis includes the discovery of group of customers who shares similar properties. A new challenge is the mining of hidden communities on such heterogeneous social networks, to group the customers as churners and non-churners in Telecommunication social network. After the analysis of the available data we constructed a Weights Multi-relational Social Network, in which each relation carry a different weight, representing how close two customers are with one another. The centrality measures depict the intensity of the customer closeness, hence we can determine the customer who influence the other customer to churn.


Author(s):  
Chitrakala S

Analyzing Social network data using Big Data Tools and techniques promises to provide information that could be of use in recommendation systems, personalized service and many other applications. A few of the analytics that do this include sentiment analysis, trending topic analysis, topic modeling, information diffusion modeling, provenance determination and social influence study. Twitter Data Analysis involves analyzing data specifically obtained from Twitter, both tweets and the topology. There are three major classifications on the type of analysis being performed such as Content based, Network based and Hybrid analysis. Trending Topic Analysis in the context of Content based static data analysis and Influence Maximization in the context of Hybrid analysis on data streams using the power of Big Data Analytics are discussed. A novel solution to Trending Topic analysis to generate topic evolved, conflict-free sequential sub summaries and influence maximization to handle streaming data are explained with experimental results.


Sign in / Sign up

Export Citation Format

Share Document