Forest Cover Types Classification Based on Online Machine Learning on Distributed Cloud Computing Platforms of Storm and SAMOA

2014 ◽  
Vol 955-959 ◽  
pp. 3803-3812
Author(s):  
Guang Di Li ◽  
Guo Yin Wang ◽  
Xue Rui Zhang ◽  
Wei Hui Deng ◽  
Fan Zhang

Storm is the most popular realtime stream processing platform, which can be used to deal with online machine learning. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. SAMOA includes distributed algorithms for the most common machine learning tasks like Mahout for Hadoop. SAMOA is both a platform and a library. In this paper, Forest cover types, a large benchmaking dataset available at the UCI KDD Archive is used as the data stream source. Vertical Hoeffding Tree, a parallelizing streaming decision tree induction for distributed enviroment, which is incorporated in SAMOA API is applied on Storm platform. This study compared stream prcessing technique for predicting forest cover types from cartographic variables with traditional classic machine learning algorithms applied on this dataset. The test then train method used in this system is totally different from the traditional train then test. The results of the stream processing technique indicated that it’s output is aymptotically nearly identical to that of a conventional learner, but the model derived from this system is totally scalable, real-time, capable of dealing with evolving streams and insensitive to stream ordering.

2020 ◽  
Vol 13 (5) ◽  
pp. 1020-1030
Author(s):  
Pradeep S. ◽  
Jagadish S. Kallimani

Background: With the advent of data analysis and machine learning, there is a growing impetus of analyzing and generating models on historic data. The data comes in numerous forms and shapes with an abundance of challenges. The most sorted form of data for analysis is the numerical data. With the plethora of algorithms and tools it is quite manageable to deal with such data. Another form of data is of categorical nature, which is subdivided into, ordinal (order wise) and nominal (number wise). This data can be broadly classified as Sequential and Non-Sequential. Sequential data analysis is easier to preprocess using algorithms. Objective: The challenge of applying machine learning algorithms on categorical data of nonsequential nature is dealt in this paper. Methods: Upon implementing several data analysis algorithms on such data, we end up getting a biased result, which makes it impossible to generate a reliable predictive model. In this paper, we will address this problem by walking through a handful of techniques which during our research helped us in dealing with a large categorical data of non-sequential nature. In subsequent sections, we will discuss the possible implementable solutions and shortfalls of these techniques. Results: The methods are applied to sample datasets available in public domain and the results with respect to accuracy of classification are satisfactory. Conclusion: The best pre-processing technique we observed in our research is one hot encoding, which facilitates breaking down the categorical features into binary and feeding it into an Algorithm to predict the outcome. The example that we took is not abstract but it is a real – time production services dataset, which had many complex variations of categorical features. Our Future work includes creating a robust model on such data and deploying it into industry standard applications.


2021 ◽  
pp. 1-12
Author(s):  
Melesio Crespo-Sanchez ◽  
Ivan Lopez-Arevalo ◽  
Edwin Aldana-Bobadilla ◽  
Alejandro Molina-Villegas

In the last few years, text analysis has grown as a keystone in several domains for solving many real-world problems, such as machine translation, spam detection, and question answering, to mention a few. Many of these tasks can be approached by means of machine learning algorithms. Most of these algorithms take as input a transformation of the text in the form of feature vectors containing an abstraction of the content. Most of recent vector representations focus on the semantic component of text, however, we consider that also taking into account the lexical and syntactic components the abstraction of content could be beneficial for learning tasks. In this work, we propose a content spectral-based text representation applicable to machine learning algorithms for text analysis. This representation integrates the spectra from the lexical, syntactic, and semantic components of text producing an abstract image, which can also be treated by both, text and image learning algorithms. These components came from feature vectors of text. For demonstrating the goodness of our proposal, this was tested on text classification and complexity reading score prediction tasks obtaining promising results.


Sentiment Analysis is individuals' opinions and feedbacks study towards a substance, which can be items, services, movies, people or events. The opinions are mostly expressed as remarks or reviews. With the social network, gatherings and websites, these reviews rose as a significant factor for the client’s decision to buy anything or not. These days, a vast scalable computing environment provides us with very sophisticated way of carrying out various data-intensive natural language processing (NLP) and machine-learning tasks to examine these reviews. One such example is text classification, a compelling method for predicting the clients' sentiment. In this paper, we attempt to center our work of sentiment analysis on movie review database. We look at the sentiment expression to order the extremity of the movie reviews on a size of 0(highly disliked) to 4(highly preferred) and perform feature extraction and ranking and utilize these features to prepare our multilabel classifier to group the movie review into its right rating. This paper incorporates sentiment analysis utilizing feature-based opinion mining and managed machine learning. The principle center is to decide the extremity of reviews utilizing nouns, verbs, and adjectives as opinion words. In addition, a comparative study on different classification approaches has been performed to determine the most appropriate classifier to suit our concern problem space. In our study, we utilized six distinctive machine learning algorithms – Naïve Bayes, Logistic Regression, SVM (Support Vector Machine), RF (Random Forest) KNN (K nearest neighbors) and SoftMax Regression.


2021 ◽  
Author(s):  
Aishwarya Jhanwar ◽  
Manisha J. Nene

Recently, increased availability of the data has led to advances in the field of machine learning. Despite of the growth in the domain of machine learning, the proximity to the physical limits of chip fabrication in classical computing is motivating researchers to explore the properties of quantum computing. Since quantum computers leverages the properties of quantum mechanics, it carries the ability to surpass classical computers in machine learning tasks. The study in this paper contributes in enabling researchers to understand how quantum computers can bring a paradigm shift in the field of machine learning. This paper addresses the concepts of quantum computing which influences machine learning in a quantum world. It also states the speedup observed in different machine learning algorithms when executed on quantum computers. The paper towards the end advocates the use of quantum application software and throw light on the existing challenges faced by quantum computers in the current scenario.


2020 ◽  
Author(s):  
Dianne Scherly Varela de Medeiros ◽  
Helio do Nascimento Cunha Neto ◽  
Martin Andreoni Lopez ◽  
Luiz Claudio Schara Magalhães ◽  
Natalia Castro Fernandes ◽  
...  

Abstract In this paper we focus on knowledge extraction from large-scale wireless networks through stream processing. We present the primary methods for sampling, data collection, and monitoring of wireless networks and we characterize knowledge extraction as a machine learning problem on big data stream processing. We show the main trends in big data stream processing frameworks. Additionally, we explore the data preprocessing, feature engineering, and the machine learning algorithms applied to the scenario of wireless network analytics. We address challenges and present research projects in wireless network monitoring and stream processing. Finally, future perspectives, such as deep learning and reinforcement learning in stream processing, are anticipated.


Today is the generation of Machine Learning and Artificial Intelligence. Machine Learning is a field of scientific study and statistical models to predict the answers of never before asked questions. Machine Learning algorithms use a huge quantity of sample data that is further used to generate model. The higher amount and quality of training set lead to higher accuracy in approximate result calculation. ML is the most popular field to research and also helpful in pattern finding, artificial intelligence and data analysis. In this paper we are going to explain the basic concept of Machine Learning with its various types of methods. These methods can be used according to user’s requirement. Machine Learning tasks are divided into various categories . These tasks are accomplished by computer system without being explicitly programmed.


2020 ◽  
Author(s):  
Dianne Scherly Varela de Medeiros ◽  
Helio do Nascimento Cunha Neto ◽  
Martin Andreoni Lopez ◽  
Luiz Claudio Schara Magalhães ◽  
Natalia Castro Fernandes ◽  
...  

Abstract In this paper we focus on knowledge extraction from large-scale wireless networks through stream processing. We present the primary methods for sampling, data collection, and monitoring of wireless networks and we characterize knowledge extraction as a machine learning problem on big data stream processing. We show the main trends in big data stream processing frameworks. Additionally, we explore the data preprocessing, feature engineering, and the machine learning algorithms applied to the scenario of wireless network analytics. We address challenges and present research projects in wireless network monitoring and stream processing. Finally, future perspectives, such as deep learning and reinforcement learning in stream processing, are anticipated.


2014 ◽  
Vol 70 (a1) ◽  
pp. C1628-C1628 ◽  
Author(s):  
Jerome Wicker ◽  
Richard Cooper ◽  
William David

We show that suitably chosen machine learning algorithms can be used to predict the "crystallisation propensity" of classes of molecules with a promisingly low error rate, using the Cambridge Structural Database and ZINC database to provide training examples of crystalline and non-crystalline molecules. Supervised learning tasks involve using machine learning algorithms to infer a function from known training data which allows classification of unknown test data. Such algorithms have been successfully used to predict continuous properties of compounds, such as melting point[1] and solubility[2]. Similar methods have also been applied to protein crystallinity predictions based on amino acid sequences[3], but little has previously been done to attempt to classify small organic molecules as crystalline or non-crystalline due to the difficulty in finding descriptors appropriate to the problem. Our approach uses only information about the atomic types and connectivity, leaving aside the confounding effects of solvents and crystallisation conditions. The result is reinforced by a blind microcrystallisation screening of a sample of materials, which confirmed the classification accuracy of the predictive model. An analysis of the most significant descriptors used in the classification is also presented, and we show that significant predictive accuracy can be obtained using relatively few descriptors.


2019 ◽  
Vol 11 (3) ◽  
pp. 23-45 ◽  
Author(s):  
Khyati Ahlawat ◽  
Anuradha Chug ◽  
Amit Prakash Singh

Imbalanced datasets are the ones with uneven distribution of classes that deteriorates classifier's performance. In this paper, SVM classifier is combined with K-Means clustering approach and a hybrid approach, Hy_SVM_KM is introduced. The performance of proposed method is also empirically evaluated using Accuracy and FN Rate measure and compared with existing methods like SMOTE. The results have shown that the proposed hybrid technique has outperformed traditional machine learning classifier SVM in mostly datasets and have performed better than known pre-processing technique SMOTE for all datasets. The goal of this article is to extend capabilities of popular machine learning algorithms and adapt it to meet the challenges of imbalanced big data classification. This article can provide a baseline study for future research on imbalanced big datasets classification and provides an efficient mechanism to deal with imbalanced nature big dataset with modified SVM classifier and improves the overall performance of the model.


Author(s):  
Andrey V. Tarasov ◽  

Real-time mapping of forest disturbances is important for forest management. Detection of forest stands damaged by natural or human-induced factors allows making immediate necessary management decisions. To implement such a management strategy, it is necessary to use the methods of operational mapping. With the advent of the Earth remote sensing data (RSD), which have high spatial and temporal resolution (Planet Scope and Sentinel-2), it becomes possible to implement modern operational mapping methods for forest management operations (particularly, forest disturbance detection). Since the monitoring area and the number of images sharply increases, the need for automated image processing methods also rises. This paper provides an overview of “traditional methods” for identifying forest cover disturbances (vegetation indexes, Tasseled Cap, multiband and single band change detection etc), their basis, limitations, and experience of their application in Russia and in the world. Instead, algorithm based on machine learning methods and their classification are presented. Benefits and limitations of both groups of forest disturbances detection algorithms are noted. In addition, it was found out that there is limited experience of application of machine learning algorithms for RSD processing and such kind of research is relevant.


Sign in / Sign up

Export Citation Format

Share Document