A Framework for Implementing Machine Learning algorithms using Data sets

The rapid development of cloud computing, big data, machine learning and datamining made information technology and human society to enter new era of technology. Statistical and mathematical analysis on data given a new way of research on prediction and estimation using samples and data sets. Data mining is a mechanism that explores and analyzes many dis-organized or dis-ordered data to obtain potentially useful information and model it based on different algorithms. Machine learning is an iterative process rather than a linear process that requires each step to be revisited as more is learned about the problem. We discussed different machine learning algorithms that can manipulate data and analyses datasets based on best cases for accurate results. Design and Implementation of a framework that is associated with different machine learning algorithms. This paper expounds the definition, model, development stage, classification and commercial application of machine learning, and emphasizes the role of machine learning in data mining by deploying the framework. Therefore, this paper summarizes and analyzes machine learning technology, and discusses the use of machine learning algorithms in data mining. Finally, the mathematical analysis along with results and graphical analysis is given

Download Full-text

Big Data on Machine Learning – A Review

Engineering and Scientific International Journal ◽

10.30726/esij/v8.i3.2021.83018 ◽

2021 ◽

Vol 8 (3) ◽

Author(s):

Balasree K ◽

Dharmarajan K

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Storage ◽

Data Analytics ◽

Rapid Development ◽

Learning Algorithms ◽

Big Data Analytics ◽

Machine Learning Algorithms ◽

Data Sets ◽

Big Data Technology

In rapid development of Big Data technology over the recent years, this paper discussing about the Machine Learning (ML) playing role that is based on methods and algorithms to Big Data Processing and Big Data Analytics. In evolutionary fields and computing fields of developments that both are complementing each other. Big Data: The rapid growth of such data solutions needed to be studied and provided to handle then to gain the knowledge from datasets and extracting values due to the data sets are very high in velocity and variety. The Big data analytics are involving and indicating the appropriate data storage and computational outline that enhanced by using Scalable Machine Learning Algorithms and Big Data Analytics then the analytics to reveal the massive amounts of hidden data’s and secret correlations. This type of Analytic information useful for organizations and companies to gain deeper knowledge, development and getting advantages over the competition. When using this Analytics we can predict the accurate implementation over the data. This paper presented about the detailed review of state-of-the-art developments and overview of advantages and challenges in Machine Learning Algorithms over big data analytics.

Download Full-text

Data Fusion and Machine Learning in Medical Diagnosis: A Bird Eye View

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8574 ◽

2019 ◽

Vol 16 (12) ◽

pp. 5127-5133 ◽

Cited By ~ 1

Author(s):

A. Arunkumar ◽

D. Surendran ◽

S. Sreya

Keyword(s):

Machine Learning ◽

Data Fusion ◽

Medical Diagnosis ◽

Rapid Development ◽

Learning Algorithms ◽

Research Area ◽

Machine Learning Algorithms ◽

Data Sets ◽

Verbal Reports ◽

Efficient Data

With the invent of computer-mediated technologies, urge of medical diagnosis, surveillance system and the rapid development in satellite and sensor networks, demands an efficient data fusion techniques, methodologies and machine learning algorithms. Expert system and Data fusion has materialized as a promising research area for medical diagnosis in the upcoming years. In Data fusion, information may be in various nature: it ranges from measurements to verbal reports. Data fusion is a framework for analysis of data sets such that different datasets can interact and inform each other. Machine learning together with data fusion provides results with high accuracy and prediction. This paper presents a comparative analysis of existing expert systems for medical diagnosis which uses data fusion and machine learning algorithms to diagnose various diseases.

Download Full-text

Big Data Mining Algorithms

Encyclopedia of Information Science and Technology, Fifth Edition - Advances in Information Quality and Management ◽

10.4018/978-1-7998-3479-3.ch052 ◽

2021 ◽

pp. 768-777

Author(s):

M. Govindarajan

Keyword(s):

Machine Learning ◽

Data Mining ◽

Big Data ◽

Unsupervised Learning ◽

Supervised Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Data Sets ◽

Big Data Mining ◽

Supervised Learning Algorithms

Big data mining involves knowledge discovery from these large data sets. The purpose of this chapter is to provide an analysis of different machine learning algorithms available for performing big data analytics. The machine learning algorithms are categorized in three key categories, namely, supervised, unsupervised, and semi-supervised machine learning algorithm. The supervised learning algorithms are trained with a complete set of data, and thus, the supervised learning algorithms are used to predict/forecast. Example algorithms include logistic regression and the back propagation neural network. The unsupervised learning algorithms starts learning from scratch, and therefore, the unsupervised learning algorithms are used for clustering. Example algorithms include: the Apriori algorithm and K-Means. The semi-supervised learning combines both supervised and unsupervised learning algorithms. The semi-supervised algorithms are trained, and the algorithms also include non-trained learning.

Download Full-text

Birds Sound Classification Based on Machine Learning Algorithms

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v9i430227 ◽

2021 ◽

pp. 1-11

Author(s):

Aska E. Mehyadin ◽

Adnan Mohsin Abdulazeez ◽

Dathar Abas Hasan ◽

Jwan N. Saeed

Keyword(s):

Machine Learning ◽

Noise Suppression ◽

Bird Species ◽

Machine Learning Algorithms ◽

Data Sets ◽

Learning Technology ◽

Species Classification ◽

Data Set ◽

Sound Classification ◽

Mel Frequency Cepstral Coefficient

The bird classifier is a system that is equipped with an area machine learning technology and uses a machine learning method to store and classify bird calls. Bird species can be known by recording only the sound of the bird, which will make it easier for the system to manage. The system also provides species classification resources to allow automated species detection from observations that can teach a machine how to recognize whether or classify the species. Non-undesirable noises are filtered out of and sorted into data sets, where each sound is run via a noise suppression filter and a separate classification procedure so that the most useful data set can be easily processed. Mel-frequency cepstral coefficient (MFCC) is used and tested through different algorithms, namely Naïve Bayes, J4.8 and Multilayer perceptron (MLP), to classify bird species. J4.8 has the highest accuracy (78.40%) and is the best. Accuracy and elapsed time are (39.4 seconds).

Download Full-text

Predicting Student Failure in University Examination using Machine Learning Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e2643.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 956-959

Keyword(s):

Machine Learning ◽

Data Mining ◽

Performance Management ◽

Student Performance ◽

Learning Algorithms ◽

Educational Data Mining ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Social Characteristics ◽

Student Failure

Student Performance Management is one of the key pillars of the higher education institutions since it directly impacts the student’s career prospects and college rankings. This paper follows the path of learning analytics and educational data mining by applying machine learning techniques in student data for identifying students who are at the more likely to fail in the university examinations and thus providing needed interventions for improved student performance. The Paper uses data mining approach with 10 fold cross validation to classify students based on predictors which are demographic and social characteristics of the students. This paper compares five popular machine learning algorithms Rep Tree, Jrip, Random Forest, Random Tree, Naive Bayes algorithms based on overall classifier accuracy as well as other class specific indicators i.e. precision, recall, f-measure. Results proved that Rep tree algorithm outperformed other machine learning algorithms in classifying students who are at more likely to fail in the examinations.

Download Full-text

Lead-based virtual screening and prediction of EGFR inhibitors using PubChem’s database with data mining and machine learning algorithms

10.1021/scimeetings.0c03836 ◽

2020 ◽

Cited By ~ 1

Author(s):

Kedan He

Keyword(s):

Machine Learning ◽

Data Mining ◽

Virtual Screening ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Egfr Inhibitors

Download Full-text

SeisBench: A toolbox for benchmarking and applying machine learning in seismology.

10.5194/egusphere-egu21-12218 ◽

2021 ◽

Author(s):

Jack Woollam ◽

Jannes Münchmeyer ◽

Carlo Giunchi ◽

Dario Jozinovic ◽

Tobias Diehl ◽

...

Keyword(s):

Machine Learning ◽

Model Comparison ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Quality Data ◽

Data Sets ◽

Waveform Data ◽

Detection Techniques ◽

Benchmark Data

<p>Machine learning methods have seen widespread adoption within the seismological community in recent years due to their ability to effectively process large amounts of data, while equalling or surpassing the performance of human analysts or classic algorithms. In the wider machine learning world, for example in imaging applications, the open availability of extensive high-quality datasets for training, validation, and the benchmarking of competing algorithms is seen as a vital ingredient to the rapid progress observed throughout the last decade. Within seismology, vast catalogues of labelled data are readily available, but collecting the waveform data for millions of records and assessing the quality of training examples is a time-consuming, tedious process. The natural variability in source processes and seismic wave propagation also presents a critical problem during training. The performance of models trained on different regions, distance and magnitude ranges are not easily comparable. The inability to easily compare and contrast state-of-the-art machine learning-based detection techniques on varying seismic data sets is currently a barrier to further progress within this emerging field. We present SeisBench, an extensible open-source framework for training, benchmarking, and applying machine learning algorithms. SeisBench provides access to various benchmark data sets and models from literature, along with pre-trained model weights, through a unified API. Built to be extensible, and modular, SeisBench allows for the simple addition of new models and data sets, which can be easily interchanged with existing pre-trained models and benchmark data. Standardising the access of varying quality data, and metadata simplifies comparison workflows, enabling the development of more robust machine learning algorithms. We initially focus on phase detection, identification and picking, but the framework is designed to be extended for other purposes, for example direct estimation of event parameters. Users will be able to contribute their own benchmarks and (trained) models. In the future, it will thus be much easier to compare both the performance of new algorithms against published machine learning models/architectures and to check the performance of established algorithms against new data sets. We hope that the ease of validation and inter-model comparison enabled by SeisBench will serve as a catalyst for the development of the next generation of machine learning techniques within the seismological community. The SeisBench source code will be published with an open license and explicitly encourages community involvement.</p>

Download Full-text