A system for analyzing large data sets using machine learning algorithms

A lot of user-generated data is available these days from huge platforms, blogs, websites, and other review sites. These data are usually unstructured. Analyzing sentiments from these data automatically is considered an important challenge. Several machine learning algorithms are implemented to check the opinions from large data sets. A lot of research has been undergone in understanding machine learning approaches to analyze sentiments. Machine learning mainly depends on the data required for model building, and hence, suitable feature exactions techniques also need to be carried. In this chapter, several deep learning approaches, its challenges, and future issues will be addressed. Deep learning techniques are considered important in predicting the sentiments of users. This chapter aims to analyze the deep-learning techniques for predicting sentiments and understanding the importance of several approaches for mining opinions and determining sentiment polarity.

Download Full-text

A Study on Machine Learning in Big Data

Oriental journal of computer science and technology ◽

10.13005/ojcst/10.03.15 ◽

2017 ◽

Vol 10 (3) ◽

pp. 660-663

Author(s):

L. Dhanapriya ◽

Dr. S. MANJU

Keyword(s):

Machine Learning ◽

Big Data ◽

Large Data ◽

Machine Learning Algorithms ◽

Large Data Sets ◽

Machine Learning Techniques ◽

Data Sets ◽

Huge Data ◽

Learning Techniques ◽

Market Needs

In the recent development of IT technology, the capacity of data has surpassed the zettabyte, and improving the efficiency of business is done by increasing the ability of predictive through an efficient analysis on these data which has emerged as an issue in the current society. Now the market needs for methods that are capable of extracting valuable information from large data sets. Recently big data is becoming the focus of attention, and using any of the machine learning techniques to extract the valuable information from the huge data of complex structures has become a concern yet an urgent problem to resolve. The aim of this work is to provide a better understanding of this Machine Learning technique for discovering interesting patterns and introduces some machine learning algorithms to explore the developing trend.

Download Full-text

Generation of geometric interpolations of building types with deep variational autoencoders

Design Science ◽

10.1017/dsj.2020.31 ◽

2020 ◽

Vol 6 ◽

Author(s):

Jaime de Miguel Rodríguez ◽

Maria Eugenia Villafañe ◽

Luka Piškorec ◽

Fernando Sancho Caparrini

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Large Data ◽

Learning Model ◽

Large Data Sets ◽

Data Sets ◽

Connectivity Map ◽

Data Set ◽

3D Objects ◽

Machine Learning Model

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.

Download Full-text

Significant Impact of Improved Machine Learning Algorithm in The Processes of Large Data Sets

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206133 ◽

2020 ◽

pp. 458-467

Author(s):

Virendra Tiwari ◽

Balendra Garg ◽

Uday Prakash Sharma

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Dynamic Environment ◽

Large Data ◽

Machine Learning Algorithms ◽

Streaming Data ◽

Machine Learning Techniques ◽

Machine Learning Algorithm ◽

Learning Mechanisms

The machine learning algorithms are capable of managing multi-dimensional data under the dynamic environment. Despite its so many vital features, there are some challenges to overcome. The machine learning algorithms still requires some additional mechanisms or procedures for predicting a large number of new classes with managing privacy. The deficiencies show the reliable use of a machine learning algorithm relies on human experts because raw data may complicate the learning process which may generate inaccurate results. So the interpretation of outcomes with expertise in machine learning mechanisms is a significant challenge in the machine learning algorithm. The machine learning technique suffers from the issue of high dimensionality, adaptability, distributed computing, scalability, the streaming data, and the duplicity. The main issue of the machine learning algorithm is found its vulnerability to manage errors. Furthermore, machine learning techniques are also found to lack variability. This paper studies how can be reduced the computational complexity of machine learning algorithms by finding how to make predictions using an improved algorithm.

Download Full-text

SeisBench: A toolbox for benchmarking and applying machine learning in seismology.

10.5194/egusphere-egu21-12218 ◽

2021 ◽

Author(s):

Jack Woollam ◽

Jannes Münchmeyer ◽

Carlo Giunchi ◽

Dario Jozinovic ◽

Tobias Diehl ◽

...

Keyword(s):

Machine Learning ◽

Model Comparison ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Quality Data ◽

Data Sets ◽

Waveform Data ◽

Detection Techniques ◽

Benchmark Data

<p>Machine learning methods have seen widespread adoption within the seismological community in recent years due to their ability to effectively process large amounts of data, while equalling or surpassing the performance of human analysts or classic algorithms. In the wider machine learning world, for example in imaging applications, the open availability of extensive high-quality datasets for training, validation, and the benchmarking of competing algorithms is seen as a vital ingredient to the rapid progress observed throughout the last decade. Within seismology, vast catalogues of labelled data are readily available, but collecting the waveform data for millions of records and assessing the quality of training examples is a time-consuming, tedious process. The natural variability in source processes and seismic wave propagation also presents a critical problem during training. The performance of models trained on different regions, distance and magnitude ranges are not easily comparable. The inability to easily compare and contrast state-of-the-art machine learning-based detection techniques on varying seismic data sets is currently a barrier to further progress within this emerging field. We present SeisBench, an extensible open-source framework for training, benchmarking, and applying machine learning algorithms. SeisBench provides access to various benchmark data sets and models from literature, along with pre-trained model weights, through a unified API. Built to be extensible, and modular, SeisBench allows for the simple addition of new models and data sets, which can be easily interchanged with existing pre-trained models and benchmark data. Standardising the access of varying quality data, and metadata simplifies comparison workflows, enabling the development of more robust machine learning algorithms. We initially focus on phase detection, identification and picking, but the framework is designed to be extended for other purposes, for example direct estimation of event parameters. Users will be able to contribute their own benchmarks and (trained) models. In the future, it will thus be much easier to compare both the performance of new algorithms against published machine learning models/architectures and to check the performance of established algorithms against new data sets. We hope that the ease of validation and inter-model comparison enabled by SeisBench will serve as a catalyst for the development of the next generation of machine learning techniques within the seismological community. The SeisBench source code will be published with an open license and explicitly encourages community involvement.</p>

Download Full-text

The Problem of Organizing and Partitioning Large Data Sets in Learning Algorithms for SOM-RBF Mixed Structures - Application to the Approximation of Environmental Variables

Proceedings of the 5th International Joint Conference on Computational Intelligence ◽

10.5220/0004554604970501 ◽

2013 ◽

Keyword(s):

Environmental Variables ◽

Learning Algorithms ◽

Large Data ◽

Large Data Sets ◽

Data Sets

Download Full-text

A Survey on Machine Learning Algorithms and finding the best out there for the considered seven Medical Data Sets Scenario

Research Journal of Pharmacy and Technology ◽

10.5958/0974-360x.2019.00518.3 ◽

2019 ◽

Vol 12 (6) ◽

pp. 3059

Author(s):

R. M. Balajee ◽

K. Venkatesh

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Medical Data ◽

Machine Learning Algorithms ◽

Data Sets

Download Full-text

Implementation of Supervised Learning towards Optimizing Queries in Database Systems

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b3531.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 1182-1187

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Student Loans ◽

Large Data ◽

Database Systems ◽

Large Data Sets ◽

Data Sets ◽

Human Intervention ◽

Huge Data ◽

Future Direction

Machine learning is a technology which with accumulated data provides better decisions towards future applications. It is also the scientific study of algorithms implemented efficiently to perform a specific task without using explicit instructions. It may also be viewed as a subset of artificial intelligence in which it may be linked with the ability to automatically learn and improve from experience without being explicitly programmed. Its primary intention is to allow the computers learn automatically and produce more accurate results in order to identify profitable opportunities. Combining machine learning with AI and cognitive technologies can make it even more effective in processing large volumes human intervention or assistance and adjust actions accordingly. It may enable analyzing the huge data of information. It may also be linked to algorithm driven study towards improving the performance of the tasks. In such scenario, the techniques can be applied to judge and predict large data sets. The paper concerns the mechanism of supervised learning in the database systems, which would be self driven as well as secure. Also the citation of an organization dealing with student loans has been presented. The paper ends discussion, future direction and conclusion.

Download Full-text

Machine learning in diachronic corpus phonology: mining verse data to infer trajectories in English phonotactics

Papers in Historical Phonology ◽

10.2218/pihph.3.2018.2878 ◽

2018 ◽

Vol 3 ◽

Author(s):

Andreas Baumann

Keyword(s):

Machine Learning ◽

Middle English ◽

Large Data ◽

Large Data Sets ◽

Machine Learning Techniques ◽

Data Sets ◽

Powerful Method ◽

K Nearest Neighbors ◽

Learning Techniques ◽

Standard Techniques

Machine learning is a powerful method when working with large data sets such as diachronic corpora. However, as opposed to standard techniques from inferential statistics like regression modeling, machine learning is less commonly used among phonological corpus linguists. This paper discusses three different machine learning techniques (K nearest neighbors classifiers; Naïve Bayes classifiers; artificial neural networks) and how they can be applied to diachronic corpus data to address specific phonological questions. To illustrate the methodology, I investigate Middle English schwa deletion and when and how it potentially triggered reduction of final /mb/ clusters in English.

Download Full-text