Forest Cover Types Classification Based on Online Machine Learning on Distributed Cloud Computing Platforms of Storm and SAMOA

Storm is the most popular realtime stream processing platform, which can be used to deal with online machine learning. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. SAMOA includes distributed algorithms for the most common machine learning tasks like Mahout for Hadoop. SAMOA is both a platform and a library. In this paper, Forest cover types, a large benchmaking dataset available at the UCI KDD Archive is used as the data stream source. Vertical Hoeffding Tree, a parallelizing streaming decision tree induction for distributed enviroment, which is incorporated in SAMOA API is applied on Storm platform. This study compared stream prcessing technique for predicting forest cover types from cartographic variables with traditional classic machine learning algorithms applied on this dataset. The test then train method used in this system is totally different from the traditional train then test. The results of the stream processing technique indicated that it’s output is aymptotically nearly identical to that of a conventional learner, but the model derived from this system is totally scalable, real-time, capable of dealing with evolving streams and insensitive to stream ordering.

Download Full-text

Machine Learning Based Predictive Action on Categorical Non-Sequential Data

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190417150421 ◽

2020 ◽

Vol 13 (5) ◽

pp. 1020-1030

Author(s):

Pradeep S. ◽

Jagadish S. Kallimani

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Categorical Data ◽

Numerical Data ◽

Processing Technique ◽

Machine Learning Algorithms ◽

Sequential Data ◽

Industry Standard ◽

Robust Model ◽

Future Work

Background: With the advent of data analysis and machine learning, there is a growing impetus of analyzing and generating models on historic data. The data comes in numerous forms and shapes with an abundance of challenges. The most sorted form of data for analysis is the numerical data. With the plethora of algorithms and tools it is quite manageable to deal with such data. Another form of data is of categorical nature, which is subdivided into, ordinal (order wise) and nominal (number wise). This data can be broadly classified as Sequential and Non-Sequential. Sequential data analysis is easier to preprocess using algorithms. Objective: The challenge of applying machine learning algorithms on categorical data of nonsequential nature is dealt in this paper. Methods: Upon implementing several data analysis algorithms on such data, we end up getting a biased result, which makes it impossible to generate a reliable predictive model. In this paper, we will address this problem by walking through a handful of techniques which during our research helped us in dealing with a large categorical data of non-sequential nature. In subsequent sections, we will discuss the possible implementable solutions and shortfalls of these techniques. Results: The methods are applied to sample datasets available in public domain and the results with respect to accuracy of classification are satisfactory. Conclusion: The best pre-processing technique we observed in our research is one hot encoding, which facilitates breaking down the categorical features into binary and feeding it into an Algorithm to predict the outcome. The example that we took is not abstract but it is a real – time production services dataset, which had many complex variations of categorical features. Our Future work includes creating a robust model on such data and deploying it into industry standard applications.

Download Full-text

A content spectral-based text representation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219248 ◽

2021 ◽

pp. 1-12

Author(s):

Melesio Crespo-Sanchez ◽

Ivan Lopez-Arevalo ◽

Edwin Aldana-Bobadilla ◽

Alejandro Molina-Villegas

Keyword(s):

Machine Learning ◽

Text Analysis ◽

Question Answering ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Text Representation ◽

Feature Vectors ◽

Learning Tasks ◽

Semantic Component ◽

Vector Representations

In the last few years, text analysis has grown as a keystone in several domains for solving many real-world problems, such as machine translation, spam detection, and question answering, to mention a few. Many of these tasks can be approached by means of machine learning algorithms. Most of these algorithms take as input a transformation of the text in the form of feature vectors containing an abstraction of the content. Most of recent vector representations focus on the semantic component of text, however, we consider that also taking into account the lexical and syntactic components the abstraction of content could be beneficial for learning tasks. In this work, we propose a content spectral-based text representation applicable to machine learning algorithms for text analysis. This representation integrates the spectra from the lexical, syntactic, and semantic components of text producing an abstract image, which can also be treated by both, text and image learning algorithms. These components came from feature vectors of text. For demonstrating the goodness of our proposal, this was tested on text classification and complexity reading score prediction tasks obtaining promising results.

Download Full-text

Feature-Based Opinion Mining and Managed Machine Learning with Sentiment Classification Models

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b4555.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 3992-3998

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Opinion Mining ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbors ◽

Data Intensive ◽

Learning Tasks ◽

Feature Based

Sentiment Analysis is individuals' opinions and feedbacks study towards a substance, which can be items, services, movies, people or events. The opinions are mostly expressed as remarks or reviews. With the social network, gatherings and websites, these reviews rose as a significant factor for the client’s decision to buy anything or not. These days, a vast scalable computing environment provides us with very sophisticated way of carrying out various data-intensive natural language processing (NLP) and machine-learning tasks to examine these reviews. One such example is text classification, a compelling method for predicting the clients' sentiment. In this paper, we attempt to center our work of sentiment analysis on movie review database. We look at the sentiment expression to order the extremity of the movie reviews on a size of 0(highly disliked) to 4(highly preferred) and perform feature extraction and ranking and utilize these features to prepare our multilabel classifier to group the movie review into its right rating. This paper incorporates sentiment analysis utilizing feature-based opinion mining and managed machine learning. The principle center is to decide the extremity of reviews utilizing nouns, verbs, and adjectives as opinion words. In addition, a comparative study on different classification approaches has been performed to determine the most appropriate classifier to suit our concern problem space. In our study, we utilized six distinctive machine learning algorithms – Naïve Bayes, Logistic Regression, SVM (Support Vector Machine), RF (Random Forest) KNN (K nearest neighbors) and SoftMax Regression.

Download Full-text

Machine Learning: A Quantum Perspective

10.3233/apc210214 ◽

2021 ◽

Author(s):

Aishwarya Jhanwar ◽

Manisha J. Nene

Keyword(s):

Machine Learning ◽

Quantum Mechanics ◽

Quantum Computing ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Quantum Computers ◽

Learning Tasks ◽

Current Scenario ◽

Chip Fabrication ◽

Classical Computing

Recently, increased availability of the data has led to advances in the field of machine learning. Despite of the growth in the domain of machine learning, the proximity to the physical limits of chip fabrication in classical computing is motivating researchers to explore the properties of quantum computing. Since quantum computers leverages the properties of quantum mechanics, it carries the ability to surpass classical computers in machine learning tasks. The study in this paper contributes in enabling researchers to understand how quantum computers can bring a paradigm shift in the field of machine learning. This paper addresses the concepts of quantum computing which influences machine learning in a quantum world. It also states the speedup observed in different machine learning algorithms when executed on quantum computers. The paper towards the end advocates the use of quantum application software and throw light on the existing challenges faced by quantum computers in the current scenario.

Download Full-text

A Survey on Data Analysis on Large-Scale Wireless Networks: Online Stream Processing, Trends, and Challenges

10.21203/rs.3.rs-17789/v2 ◽

2020 ◽

Author(s):

Dianne Scherly Varela de Medeiros ◽

Helio do Nascimento Cunha Neto ◽

Martin Andreoni Lopez ◽

Luiz Claudio Schara Magalhães ◽

Natalia Castro Fernandes ◽

...

Keyword(s):

Machine Learning ◽

Wireless Networks ◽

Big Data ◽

Wireless Network ◽

Data Stream ◽

Large Scale ◽

Stream Processing ◽

Knowledge Extraction ◽

Machine Learning Algorithms ◽

Data Stream Processing

Abstract In this paper we focus on knowledge extraction from large-scale wireless networks through stream processing. We present the primary methods for sampling, data collection, and monitoring of wireless networks and we characterize knowledge extraction as a machine learning problem on big data stream processing. We show the main trends in big data stream processing frameworks. Additionally, we explore the data preprocessing, feature engineering, and the machine learning algorithms applied to the scenario of wireless network analytics. We address challenges and present research projects in wireless network monitoring and stream processing. Finally, future perspectives, such as deep learning and reinforcement learning in stream processing, are anticipated.

Download Full-text

Use of Machine Learning in the Pattern Finding

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a1237.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 527-531

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Machine Learning Algorithms ◽

Training Set ◽

Learning Tasks ◽

Sample Data ◽

Approximate Result ◽

Generate Model ◽

Pattern Finding

Today is the generation of Machine Learning and Artificial Intelligence. Machine Learning is a field of scientific study and statistical models to predict the answers of never before asked questions. Machine Learning algorithms use a huge quantity of sample data that is further used to generate model. The higher amount and quality of training set lead to higher accuracy in approximate result calculation. ML is the most popular field to research and also helpful in pattern finding, artificial intelligence and data analysis. In this paper we are going to explain the basic concept of Machine Learning with its various types of methods. These methods can be used according to user’s requirement. Machine Learning tasks are divided into various categories . These tasks are accomplished by computer system without being explicitly programmed.

Download Full-text

A Survey on Data Analysis on Large-Scale Wireless Networks: Online Stream Processing, Trends, and Challenges

10.21203/rs.3.rs-17789/v1 ◽

2020 ◽

Author(s):

Dianne Scherly Varela de Medeiros ◽

Helio do Nascimento Cunha Neto ◽

Martin Andreoni Lopez ◽

Luiz Claudio Schara Magalhães ◽

Natalia Castro Fernandes ◽

...

Keyword(s):

Machine Learning ◽

Wireless Networks ◽

Big Data ◽

Wireless Network ◽

Data Stream ◽

Large Scale ◽

Stream Processing ◽

Knowledge Extraction ◽

Machine Learning Algorithms ◽

Data Stream Processing

Download Full-text

Predicting crystallisation propensity of small molecules

Acta Crystallographica Section A Foundations and Advances ◽

10.1107/s2053273314083715 ◽

2014 ◽

Vol 70 (a1) ◽

pp. C1628-C1628 ◽

Cited By ~ 1

Author(s):

Jerome Wicker ◽

Richard Cooper ◽

William David

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Learning Algorithms ◽

Amino Acid Sequences ◽

Machine Learning Algorithms ◽

Training Data ◽

Learning Tasks ◽

Zinc Database ◽

Training Examples

We show that suitably chosen machine learning algorithms can be used to predict the "crystallisation propensity" of classes of molecules with a promisingly low error rate, using the Cambridge Structural Database and ZINC database to provide training examples of crystalline and non-crystalline molecules. Supervised learning tasks involve using machine learning algorithms to infer a function from known training data which allows classification of unknown test data. Such algorithms have been successfully used to predict continuous properties of compounds, such as melting point[1] and solubility[2]. Similar methods have also been applied to protein crystallinity predictions based on amino acid sequences[3], but little has previously been done to attempt to classify small organic molecules as crystalline or non-crystalline due to the difficulty in finding descriptors appropriate to the problem. Our approach uses only information about the atomic types and connectivity, leaving aside the confounding effects of solvents and crystallisation conditions. The result is reinforced by a blind microcrystallisation screening of a sample of materials, which confirmed the classification accuracy of the predictive model. An analysis of the most significant descriptors used in the classification is also presented, and we show that significant predictive accuracy can be obtained using relatively few descriptors.

Download Full-text

Empirical Evaluation of Map Reduce Based Hybrid Approach for Problem of Imbalanced Classification in Big Data

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2019070102 ◽

2019 ◽

Vol 11 (3) ◽

pp. 23-45 ◽

Cited By ~ 2

Author(s):

Khyati Ahlawat ◽

Anuradha Chug ◽

Amit Prakash Singh

Keyword(s):

Machine Learning ◽

Big Data ◽

Hybrid Approach ◽

Empirical Evaluation ◽

Processing Technique ◽

Machine Learning Algorithms ◽

Future Research ◽

Svm Classifier ◽

Hybrid Technique ◽

Imbalanced Classification

Imbalanced datasets are the ones with uneven distribution of classes that deteriorates classifier's performance. In this paper, SVM classifier is combined with K-Means clustering approach and a hybrid approach, Hy_SVM_KM is introduced. The performance of proposed method is also empirically evaluated using Accuracy and FN Rate measure and compared with existing methods like SMOTE. The results have shown that the proposed hybrid technique has outperformed traditional machine learning classifier SVM in mostly datasets and have performed better than known pre-processing technique SMOTE for all datasets. The goal of this article is to extend capabilities of popular machine learning algorithms and adapt it to meet the challenges of imbalanced big data classification. This article can provide a baseline study for future research on imbalanced big datasets classification and provides an efficient mechanism to deal with imbalanced nature big dataset with modified SVM classifier and improves the overall performance of the model.

Download Full-text

TRADITIONAL AND MODERN METHODS OF SATELLITE IMAGES PROCESSING FOR OPERATIONAL MAPPING OF FOREST COVER DISTUBANCES

Vestnik SSUGT (Siberian State University of Geosystems and Technologies) ◽

10.33764/2411-1759-2020-25-3-201-213 ◽

2020 ◽

Vol 25 (3) ◽

pp. 201-213

Author(s):

Andrey V. Tarasov ◽

Keyword(s):

Machine Learning ◽

Forest Management ◽

Forest Cover ◽

Forest Disturbance ◽

Remote Sensing Data ◽

Machine Learning Algorithms ◽

Forest Disturbances ◽

Detection Algorithms ◽

Vegetation Indexes ◽

Images Processing

Real-time mapping of forest disturbances is important for forest management. Detection of forest stands damaged by natural or human-induced factors allows making immediate necessary management decisions. To implement such a management strategy, it is necessary to use the methods of operational mapping. With the advent of the Earth remote sensing data (RSD), which have high spatial and temporal resolution (Planet Scope and Sentinel-2), it becomes possible to implement modern operational mapping methods for forest management operations (particularly, forest disturbance detection). Since the monitoring area and the number of images sharply increases, the need for automated image processing methods also rises. This paper provides an overview of “traditional methods” for identifying forest cover disturbances (vegetation indexes, Tasseled Cap, multiband and single band change detection etc), their basis, limitations, and experience of their application in Russia and in the world. Instead, algorithm based on machine learning methods and their classification are presented. Benefits and limitations of both groups of forest disturbances detection algorithms are noted. In addition, it was found out that there is limited experience of application of machine learning algorithms for RSD processing and such kind of research is relevant.

Download Full-text