Ecological Interactions and the Netflix Problem

0AbstractSpecies interactions are a key component of ecosystems but we generally have an incomplete picture of who-eats-who in a given community. Different techniques have been devised to predict species interactions using theoretical models or abundances. Here, we explore the K nearest neighbour approach, with a special emphasis on recommendation, along with other machine learning techniques. Recommenders are algorithms developed for companies like Netflix to predict if a customer would like a product given the preferences of similar customers. These machine learning techniques are well-suited to study binary ecological interactions since they focus on positive-only data. We also explore how the K nearest neighbour approach can be used with both positive and negative information, in which case the goal of the algorithm is to fill missing entries from a matrix (imputation). By removing a prey from a predator, we find that recommenders can guess the missing prey around 50% of the times on the first try, with up to 881 possibilities. Traits do not improve significantly the results for the K nearest neighbour, although a simple test with a supervised learning approach (random forests) show we can predict interactions with high accuracy using only three traits per species. This result shows that binary interactions can be predicted without regard to the ecological community given only three variables: body mass and two variables for the species’ phylogeny. These techniques are complementary, as recommenders can predict interactions in the absence of traits, using only information about other species’ interactions, while supervised learning algorithms such as random forests base their predictions on traits only but do not exploit other species’ interactions. Further work should focus on developing custom similarity measures specialized to ecology to improve the KNN algorithms and using richer data to capture indirect relationships between species.

Download Full-text

Ecological interactions and the Netflix problem

PeerJ ◽

10.7717/peerj.3644 ◽

2017 ◽

Vol 5 ◽

pp. e3644 ◽

Cited By ~ 16

Author(s):

Philippe Desjardins-Proulx ◽

Idaline Laigle ◽

Timothée Poisot ◽

Dominique Gravel

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Random Forests ◽

Species Interactions ◽

Similarity Measures ◽

Theoretical Models ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Nearest Neighbour ◽

Ecological Interactions

Species interactions are a key component of ecosystems but we generally have an incomplete picture of who-eats-who in a given community. Different techniques have been devised to predict species interactions using theoretical models or abundances. Here, we explore the K nearest neighbour approach, with a special emphasis on recommendation, along with a supervised machine learning technique. Recommenders are algorithms developed for companies like Netflix to predict whether a customer will like a product given the preferences of similar customers. These machine learning techniques are well-suited to study binary ecological interactions since they focus on positive-only data. By removing a prey from a predator, we find that recommenders can guess the missing prey around 50% of the times on the first try, with up to 881 possibilities. Traits do not improve significantly the results for the K nearest neighbour, although a simple test with a supervised learning approach (random forests) show we can predict interactions with high accuracy using only three traits per species. This result shows that binary interactions can be predicted without regard to the ecological community given only three variables: body mass and two variables for the species’ phylogeny. These techniques are complementary, as recommenders can predict interactions in the absence of traits, using only information about other species’ interactions, while supervised learning algorithms such as random forests base their predictions on traits only but do not exploit other species’ interactions. Further work should focus on developing custom similarity measures specialized for ecology to improve the KNN algorithms and using richer data to capture indirect relationships between species.

Download Full-text

k-Nearest Neighbour Classifiers - A Tutorial

ACM Computing Surveys ◽

10.1145/3459665 ◽

2021 ◽

Vol 54 (6) ◽

pp. 1-25

Author(s):

Pádraig Cunningham ◽

Sarah Jane Delany

Keyword(s):

Machine Learning ◽

Similarity Measures ◽

Machine Learning Techniques ◽

Nearest Neighbour ◽

Computational Power ◽

Learning Techniques ◽

Nearest Neighbours ◽

Technical Report ◽

Nearest Neighbour Classifier ◽

Similarity Distance

Perhaps the most straightforward classifier in the arsenal or Machine Learning techniques is the Nearest Neighbour Classifier—classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance, because issues of poor runtime performance is not such a problem these days with the computational power that is available. This article presents an overview of techniques for Nearest Neighbour classification focusing on: mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours, and mechanisms for reducing the dimension of the data. This article is the second edition of a paper previously published as a technical report [16]. Sections on similarity measures for time-series, retrieval speedup, and intrinsic dimensionality have been added. An Appendix is included, providing access to Python code for the key methods.

Download Full-text

Comparison of Machine Learning algorithm for COVID-19 Death Risk Prediction

10.21203/rs.3.rs-196077/v1 ◽

2021 ◽

Author(s):

Praveeen Anandhanathan ◽

Priyanka Gopalan

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Machine Learning Techniques ◽

Support Vector ◽

Nearest Neighbour ◽

Decision Tree Algorithm ◽

The Past ◽

Random Forest Method ◽

Learning Techniques ◽

The World

Abstract Coronavirus disease (COVID-19) is spreading across the world. Since at first it has appeared in Wuhan, China in December 2019, it has become a serious issue across the globe. There are no accurate resources to predict and find the disease. So, by knowing the past patients’ records, it could guide the clinicians to fight against the pandemic. Therefore, for the prediction of healthiness from symptoms Machine learning techniques can be implemented. From this we are going to analyse only the symptoms which occurs in every patient. These predictions can help clinicians in the easier manner to cure the patients. Already for prediction of many of the diseases, techniques like SVM (Support vector Machine), Fuzzy k-Means Clustering, Decision Tree algorithm, Random Forest Method, ANN (Artificial Neural Network), KNN (k-Nearest Neighbour), Naïve Bayes, Linear Regression model are used. As we haven’t faced this disease before, we can’t say which technique will give the maximum accuracy. So, we are going to provide an efficient result by comparing all the such algorithms in RStudio.

Download Full-text

Heart Disease Prediction using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f9780.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 700-704

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Machine Learning Techniques ◽

Support Vector ◽

Disease Prediction ◽

Nearest Neighbour ◽

Decision Tree Classifier ◽

Support Vector Classifier ◽

Learning Techniques ◽

Tree Classifier

Deriving the methodologies to detect heart issues at an earlier stage and intimating the patient to improve their health. To resolve this problem, we will use Machine Learning techniques to predict the incidence at an earlier stage. We have a tendency to use sure parameters like age, sex, height, weight, case history, smoking and alcohol consumption and test like pressure ,cholesterol, diabetes, ECG, ECHO for prediction. In machine learning there are many algorithms which will be used to solve this issue. The algorithms include K-Nearest Neighbour, Support vector classifier, decision tree classifier, logistic regression and Random Forest classifier. Using these parameters and algorithms we need to predict whether or not the patient has heart disease or not and recommend the patient to improve his/her health.

Download Full-text

Towards the new Similarity Measures in Application of Machine Learning Techniques on Agriculture Dataset

International Journal of Computer Applications ◽

10.5120/ijca2016912571 ◽

2016 ◽

Vol 156 (11) ◽

pp. 38-41

Author(s):

Bhagirath Parshuram ◽

Dhaval R.

Keyword(s):

Machine Learning ◽

Similarity Measures ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

Supervised Machine Learning for Plants Identification Based on Images of Their Leaves

Cognitive Analytics ◽

10.4018/978-1-7998-2460-2.ch066 ◽

2020 ◽

pp. 1314-1330 ◽

Cited By ~ 1

Author(s):

Mohamed Elhadi Rahmani ◽

Abdelmalek Amine ◽

Reda Mohamed Hamou

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Texture Feature ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Nearest Neighbour ◽

Curve Shape ◽

Distance Curve ◽

Shape Signature ◽

Learning Techniques

Botanists study in general the characteristics of leaves to give to each plant a scientific name; such as shape, margin...etc. This paper proposes a comparison of supervised plant identification using different approaches. The identification is done according to three different features extracted from images of leaves: a fine-scale margin feature histogram, a Centroid Contour Distance Curve shape signature and an interior texture feature histogram. First represent each leaf by one feature at a time in, then represent leaves by two features, and each leaf was represented by the three features. After that, the authors classified the obtained vectors using different supervised machine learning techniques; the used techniques are Decision tree, Naïve Bayes, K-nearest neighbour, and neural network. Finally, they evaluated the classification using cross validation. The main goal of this work is studying the influence of representation of leaves' images on the identification of plants, and also studying the use of supervised machine learning algorithm for plant leaves classification.

Download Full-text

A Semi-Supervised Learning Approach for Tackling Twitter Spam Drift

International Journal of Computational Intelligence and Applications ◽

10.1142/s146902681950010x ◽

2019 ◽

Vol 18 (02) ◽

pp. 1950010 ◽

Cited By ~ 2

Author(s):

Niddal Imam ◽

Biju Issac ◽

Seibu Mary Jacob

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Research Community ◽

Machine Learning Techniques ◽

Spam Detection ◽

Learning Approach ◽

New Approach ◽

Detection Systems ◽

Learning Techniques ◽

Over Time

Twitter has changed the way people get information by allowing them to express their opinion and comments on the daily tweets. Unfortunately, due to the high popularity of Twitter, it has become very attractive to spammers. Unlike other types of spam, Twitter spam has become a serious issue in the last few years. The large number of users and the high amount of information being shared on Twitter play an important role in accelerating the spread of spam. In order to protect the users, Twitter and the research community have been developing different spam detection systems by applying different machine-learning techniques. However, a recent study showed that the current machine learning-based detection systems are not able to detect spam accurately because spam tweet characteristics vary over time. This issue is called “Twitter Spam Drift”. In this paper, a semi-supervised learning approach (SSLA) has been proposed to tackle this. The new approach uses the unlabeled data to learn the structure of the domain. Different experiments were performed on English and Arabic datasets to test and evaluate the proposed approach and the results show that the proposed SSLA can reduce the effect of Twitter spam drift and outperform the existing techniques.

Download Full-text

Soil Classification and Harvest Proposal Implemented using Machine Learning Techniques

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.h6110.1091220 ◽

2020 ◽

Vol 9 (12) ◽

pp. 19-22

Keyword(s):

Machine Learning ◽

Soil Type ◽

Soil Classification ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Nearest Neighbour ◽

The People ◽

Learning Techniques ◽

Crop Type

The major source of living for the people of India is agriculture. It is considered as important economy for the country. India is one of the country that suffer from natural calamities like drought and flood that may destroy the crops which may lead to heavy loss for the people doing agriculture. Predicting the crop type can help them to cultivate the suitable crop that can be cultivated in that particular soil type. Soil is one major factor or agriculture. There are several types of soil available in our county. In order to classify the soil type we need to understand the characteristics of the soil. Data mining and machine learning is one of the emerging technology in the field of agriculture and horticulture. In order to classify the soil type and Provide suggestion of fertilizers that can improve the growth of the crop cultivated in that particular soil type plays major role in agriculture. For that here exploring Several machine learning algorithms such as Support vector machine(SVM),k-Nearest Neighbour(k-NN) and logistic regression are used to classify the soil type.

Download Full-text

Recognition of Soybean Diseases Using Machine Learning Techniques Based on Segmentation of Images Captured By UAVs

10.5753/wvc.2020.13476 ◽

2020 ◽

Author(s):

Gercina Da Silva ◽

Alessandro Ferreira ◽

Denilson Guilherme ◽

José Fernando Grigolli ◽

Vanessa Weber ◽

...

Keyword(s):

Machine Learning ◽

Computer Vision ◽

Computer Program ◽

Random Forests ◽

Machine Learning Techniques ◽

Target Spot ◽

Learning Techniques ◽

Image Dataset ◽

Segmentation Of Images ◽

Soybean Diseases

Soybean is an important product for the Brazilian economy, however it has factors that can limit its productive income, like the diseases that are generally difficult to control. Thus, this article aims to use a computer program to recognize diseases in images obtained by a UAV in a soybean plantation. The program is based on computer vision and machine learning, using the SLIC algorithm to segment the images into superpixels. To achieve the objective, after the segmentation of the images, an image dataset was created with the following classes: mildew, target spot, Asian rust, soil, straw and healthy leaves, totaling 22,140 images. Diagrammatic scales were used to assess disease severity. The disease recognition computer program explored four supervised learning techniques: SVM, J48, Random Forest and KNN. The techniques that obtained the best performance were SVM and Random Forests, taking into account the results obtained with all the evaluation metrics used. It was found that the program is efficient to differentiate the classes of diseases treated in this article.

Download Full-text

Application of Machine Learning Algorithms in Stock Market Prediction

Handbook of Research on Smart Technology Models for Business and Industry - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-3645-2.ch007 ◽

2020 ◽

pp. 153-180

Author(s):

Sumit Kumar ◽

Sanlap Acharya

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Stock Prices ◽

Stock Price ◽

Short Term Memory ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Better Than

The prediction of stock prices has always been a very challenging problem for investors. Using machine learning techniques to predict stock prices is also one of the favourite topics for academics working in this domain. This chapter discusses five supervised learning techniques and two unsupervised learning techniques to solve the problem of stock price prediction and has compared the performances of all the algorithms. Among the supervised learning techniques, Long Short-Term Memory (LSTM) algorithm performed better than the others whereas, among the unsupervised learning techniques, Restricted Boltzmann Machine (RBM) performed better. RBM is found to be performing even better than LSTM.

Download Full-text