scholarly journals Performance Evaluation of Different Classifier for Big data in Data mining Industries

2018 ◽  
Vol 2 (1) ◽  
pp. 11-17
Author(s):  

Data mining is the set of computational techniques and methodologies aimed to extract knowledge from a large amount of data, by using sophisticated data analysis tools to highlight information structure underlying large data sets. Data scientist and data engineer are facing big challenges today in society because of global increases in the dataset in the industries and sector today. Machine learning methods represent one of these tools, allowing, not only data management but also analysis and prediction operations. Supervised learning, a kind of machine learning methodology, uses input data and products outputs of two types: qualitative and quantitative, respectively describing data classes and predicting data trends. Classification task provides qualitative responses whereas prediction or regression task offers quantitative outputs. In this paper, an attempt has been made to demonstrate how big data can be analyzed, classified and predicted using weka tool in industries.

2020 ◽  
pp. 0887302X2093119 ◽  
Author(s):  
Rachel Rose Getman ◽  
Denise Nicole Green ◽  
Kavita Bala ◽  
Utkarsh Mall ◽  
Nehal Rawat ◽  
...  

With the proliferation of digital photographs and the increasing digitization of historical imagery, fashion studies scholars must consider new methods for interpreting large data sets. Computational methods to analyze visual forms of big data have been underway in the field of computer science through computer vision, where computers are trained to “read” images through a process called machine learning. In this study, fashion historians and computer scientists collaborated to explore the practical potential of this emergent method by examining a trend related to one particular fashion item—the baseball cap—across two big data sets—the Vogue Runway database (2000–2018) and the Matzen et al. Streetstyle-27K data set (2013–2016). We illustrate one implementation of high-level concept recognition to map a fashion trend. Tracking trend frequency helps visualize larger patterns and cultural shifts while creating sociohistorical records of aesthetics, which benefits fashion scholars and industry alike.


Psychology ◽  
2020 ◽  
Author(s):  
Jeffrey Stanton

The term “data science” refers to an emerging field of research and practice that focuses on obtaining, processing, visualizing, analyzing, preserving, and re-using large collections of information. A related term, “big data,” has been used to refer to one of the important challenges faced by data scientists in many applied environments: the need to analyze large data sources, in certain cases using high-speed, real-time data analysis techniques. Data science encompasses much more than big data, however, as a result of many advancements in cognate fields such as computer science and statistics. Data science has also benefited from the widespread availability of inexpensive computing hardware—a development that has enabled “cloud-based” services for the storage and analysis of large data sets. The techniques and tools of data science have broad applicability in the sciences. Within the field of psychology, data science offers new opportunities for data collection and data analysis that have begun to streamline and augment efforts to investigate the brain and behavior. The tools of data science also enable new areas of research, such as computational neuroscience. As an example of the impact of data science, psychologists frequently use predictive analysis as an investigative tool to probe the relationships between a set of independent variables and one or more dependent variables. While predictive analysis has traditionally been accomplished with techniques such as multiple regression, recent developments in the area of machine learning have put new predictive tools in the hands of psychologists. These machine learning tools relax distributional assumptions and facilitate exploration of non-linear relationships among variables. These tools also enable the analysis of large data sets by opening options for parallel processing. In this article, a range of relevant areas from data science is reviewed for applicability to key research problems in psychology including large-scale data collection, exploratory data analysis, confirmatory data analysis, and visualization. This bibliography covers data mining, machine learning, deep learning, natural language processing, Bayesian data analysis, visualization, crowdsourcing, web scraping, open source software, application programming interfaces, and research resources such as journals and textbooks.


2017 ◽  
Vol 10 (3) ◽  
pp. 660-663
Author(s):  
L. Dhanapriya ◽  
Dr. S. MANJU

In the recent development of IT technology, the capacity of data has surpassed the zettabyte, and improving the efficiency of business is done by increasing the ability of predictive through an efficient analysis on these data which has emerged as an issue in the current society. Now the market needs for methods that are capable of extracting valuable information from large data sets. Recently big data is becoming the focus of attention, and using any of the machine learning techniques to extract the valuable information from the huge data of complex structures has become a concern yet an urgent problem to resolve. The aim of this work is to provide a better understanding of this Machine Learning technique for discovering interesting patterns and introduces some machine learning algorithms to explore the developing trend.


2019 ◽  
Vol 15 (S341) ◽  
pp. 88-98
Author(s):  
Viviana Acquaviva

AbstractThis paper summarizes my thoughts, given in an invited review at the IAU symposium 341 “Challenges in Panchromatical Galaxy Modelling with Next Generation Facilities”, about how machine learning methods can help us solve some of the big data problems associated with current and upcoming large galaxy surveys.


2020 ◽  
Vol 6 ◽  
Author(s):  
Jaime de Miguel Rodríguez ◽  
Maria Eugenia Villafañe ◽  
Luka Piškorec ◽  
Fernando Sancho Caparrini

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.


2021 ◽  
pp. 1826-1839
Author(s):  
Sandeep Adhikari, Dr. Sunita Chaudhary

The exponential growth in the use of computers over networks, as well as the proliferation of applications that operate on different platforms, has drawn attention to network security. This paradigm takes advantage of security flaws in all operating systems that are both technically difficult and costly to fix. As a result, intrusion is used as a key to worldwide a computer resource's credibility, availability, and confidentiality. The Intrusion Detection System (IDS) is critical in detecting network anomalies and attacks. In this paper, the data mining principle is combined with IDS to efficiently and quickly identify important, secret data of interest to the user. The proposed algorithm addresses four issues: data classification, high levels of human interaction, lack of labeled data, and the effectiveness of distributed denial of service attacks. We're also working on a decision tree classifier that has a variety of parameters. The previous algorithm classified IDS up to 90% of the time and was not appropriate for large data sets. Our proposed algorithm was designed to accurately classify large data sets. Aside from that, we quantify a few more decision tree classifier parameters.


Author(s):  
Saranya N. ◽  
Saravana Selvam

After an era of managing data collection difficulties, these days the issue has turned into the problem of how to process these vast amounts of information. Scientists, as well as researchers, think that today, probably the most essential topic in computing science is Big Data. Big Data is used to clarify the huge volume of data that could exist in any structure. This makes it difficult for standard controlling approaches for mining the best possible data through such large data sets. Classification in Big Data is a procedure of summing up data sets dependent on various examples. There are distinctive classification frameworks which help us to classify data collections. A few methods that discussed in the chapter are Multi-Layer Perception Linear Regression, C4.5, CART, J48, SVM, ID3, Random Forest, and KNN. The target of this chapter is to provide a comprehensive evaluation of classification methods that are in effect commonly utilized.


Sign in / Sign up

Export Citation Format

Share Document