scholarly journals Ecological interactions and the Netflix problem

PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3644 ◽  
Author(s):  
Philippe Desjardins-Proulx ◽  
Idaline Laigle ◽  
Timothée Poisot ◽  
Dominique Gravel

Species interactions are a key component of ecosystems but we generally have an incomplete picture of who-eats-who in a given community. Different techniques have been devised to predict species interactions using theoretical models or abundances. Here, we explore the K nearest neighbour approach, with a special emphasis on recommendation, along with a supervised machine learning technique. Recommenders are algorithms developed for companies like Netflix to predict whether a customer will like a product given the preferences of similar customers. These machine learning techniques are well-suited to study binary ecological interactions since they focus on positive-only data. By removing a prey from a predator, we find that recommenders can guess the missing prey around 50% of the times on the first try, with up to 881 possibilities. Traits do not improve significantly the results for the K nearest neighbour, although a simple test with a supervised learning approach (random forests) show we can predict interactions with high accuracy using only three traits per species. This result shows that binary interactions can be predicted without regard to the ecological community given only three variables: body mass and two variables for the species’ phylogeny. These techniques are complementary, as recommenders can predict interactions in the absence of traits, using only information about other species’ interactions, while supervised learning algorithms such as random forests base their predictions on traits only but do not exploit other species’ interactions. Further work should focus on developing custom similarity measures specialized for ecology to improve the KNN algorithms and using richer data to capture indirect relationships between species.

2016 ◽  
Author(s):  
Philippe Desjardins-Proulx ◽  
Idaline Laigle ◽  
Timothée Poisot ◽  
Dominique Gravel

0AbstractSpecies interactions are a key component of ecosystems but we generally have an incomplete picture of who-eats-who in a given community. Different techniques have been devised to predict species interactions using theoretical models or abundances. Here, we explore the K nearest neighbour approach, with a special emphasis on recommendation, along with other machine learning techniques. Recommenders are algorithms developed for companies like Netflix to predict if a customer would like a product given the preferences of similar customers. These machine learning techniques are well-suited to study binary ecological interactions since they focus on positive-only data. We also explore how the K nearest neighbour approach can be used with both positive and negative information, in which case the goal of the algorithm is to fill missing entries from a matrix (imputation). By removing a prey from a predator, we find that recommenders can guess the missing prey around 50% of the times on the first try, with up to 881 possibilities. Traits do not improve significantly the results for the K nearest neighbour, although a simple test with a supervised learning approach (random forests) show we can predict interactions with high accuracy using only three traits per species. This result shows that binary interactions can be predicted without regard to the ecological community given only three variables: body mass and two variables for the species’ phylogeny. These techniques are complementary, as recommenders can predict interactions in the absence of traits, using only information about other species’ interactions, while supervised learning algorithms such as random forests base their predictions on traits only but do not exploit other species’ interactions. Further work should focus on developing custom similarity measures specialized to ecology to improve the KNN algorithms and using richer data to capture indirect relationships between species.


2020 ◽  
pp. 1314-1330 ◽  
Author(s):  
Mohamed Elhadi Rahmani ◽  
Abdelmalek Amine ◽  
Reda Mohamed Hamou

Botanists study in general the characteristics of leaves to give to each plant a scientific name; such as shape, margin...etc. This paper proposes a comparison of supervised plant identification using different approaches. The identification is done according to three different features extracted from images of leaves: a fine-scale margin feature histogram, a Centroid Contour Distance Curve shape signature and an interior texture feature histogram. First represent each leaf by one feature at a time in, then represent leaves by two features, and each leaf was represented by the three features. After that, the authors classified the obtained vectors using different supervised machine learning techniques; the used techniques are Decision tree, Naïve Bayes, K-nearest neighbour, and neural network. Finally, they evaluated the classification using cross validation. The main goal of this work is studying the influence of representation of leaves' images on the identification of plants, and also studying the use of supervised machine learning algorithm for plant leaves classification.


Sensors ◽  
2019 ◽  
Vol 19 (10) ◽  
pp. 2266 ◽  
Author(s):  
Nikolaos Sideris ◽  
Georgios Bardis ◽  
Athanasios Voulodimos ◽  
Georgios Miaoulis ◽  
Djamchid Ghazanfarpour

The constantly increasing amount and availability of urban data derived from varying sources leads to an assortment of challenges that include, among others, the consolidation, visualization, and maximal exploitation prospects of the aforementioned data. A preeminent problem affecting urban planning is the appropriate choice of location to host a particular activity (either commercial or common welfare service) or the correct use of an existing building or empty space. In this paper, we propose an approach to address these challenges availed with machine learning techniques. The proposed system combines, fuses, and merges various types of data from different sources, encodes them using a novel semantic model that can capture and utilize both low-level geometric information and higher level semantic information and subsequently feeds them to the random forests classifier, as well as other supervised machine learning models for comparisons. Our experimental evaluation on multiple real-world data sets comparing the performance of several classifiers (including Feedforward Neural Networks, Support Vector Machines, Bag of Decision Trees, k-Nearest Neighbors and Naïve Bayes), indicated the superiority of Random Forests in terms of the examined performance metrics (Accuracy, Specificity, Precision, Recall, F-measure and G-mean).


2021 ◽  
Vol 54 (6) ◽  
pp. 1-25
Author(s):  
Pádraig Cunningham ◽  
Sarah Jane Delany

Perhaps the most straightforward classifier in the arsenal or Machine Learning techniques is the Nearest Neighbour Classifier—classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance, because issues of poor runtime performance is not such a problem these days with the computational power that is available. This article presents an overview of techniques for Nearest Neighbour classification focusing on: mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours, and mechanisms for reducing the dimension of the data. This article is the second edition of a paper previously published as a technical report [16]. Sections on similarity measures for time-series, retrieval speedup, and intrinsic dimensionality have been added. An Appendix is included, providing access to Python code for the key methods.


Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Wasiq Sheikh ◽  
Anshul Parulkar ◽  
Malik B Ahmed ◽  
Suleman Ilyas ◽  
Esseim Sharma ◽  
...  

Introduction: TAVR is approved for use for a range of patients with aortic stenosis. The need for a permanent pacemaker device (PPD) after TAVR varies, but can range from 2 to 51%. Risk factors for requiring PPD after TAVR appear to include being male and having baseline conduction disturbances. Being able to predict who may require PPD could identify at risk patients early and may confer cost savings. Given the advent of machine learning classifier techniques, random forests may aid in better predicting need for PPD after TAVR by using pre-operative variables. Hypothesis: Random Forests offer discriminatory ability in predicting the need for PPD after TAVR using primarily pre-operative variables. Methods: Pre-operative data from a single institution were collected patients undergoing TAVR without a history of PPD between January 2016 and December 2019. EKG data was obtained including underlying rhythm, QRS duration and any underlying conduction abnormality. Other variables included anti-arrhythmic data, comorbidities, and eGFR. Data was imported into Python and a stratified 5 fold cross validation with SMOTE oversampling running at every fold to avoid overfitting was run on the training set. The model that optimized the receiver under the operator curve was exported and applied to a test data set. Precision and recall were calculated to assess classification. Results: A total of 513 patients were identified with nearly 9% eventually requiring PPD. A total of 40 predictor variables were utilized in the modeling. A stratified split of the data resulted in a training set of 384 patients and a test set of 129 patients. A total of 500 trees were used on the training set. The final optimized model had an ROC of 0.71 with the following parameters: gini criterion, max depth of 4, and logarithm max features . When applied to the test set, the model had an ROC of 0.63. Overall accuracy was 0.78, with a precision and recall for no PPD after TAVR being 0.94 and 0.81 and a precision and recall for PPD after TAVR of 0.18 and 0.45. Conclusions: Our results show that machine learning techniques, specifically random forests have discriminatory ability in predicting PPD after TAVR. More tuning of the models are required to achieve better discrimination.


2020 ◽  
Vol 28 (2) ◽  
pp. 253-265 ◽  
Author(s):  
Gabriela Bitencourt-Ferreira ◽  
Amauri Duarte da Silva ◽  
Walter Filgueira de Azevedo

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.


2019 ◽  
Vol 23 (1) ◽  
pp. 12-21 ◽  
Author(s):  
Shikha N. Khera ◽  
Divya

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.


Author(s):  
Augusto Cerqua ◽  
Roberta Di Stefano ◽  
Marco Letta ◽  
Sara Miccoli

AbstractEstimates of the real death toll of the COVID-19 pandemic have proven to be problematic in many countries, Italy being no exception. Mortality estimates at the local level are even more uncertain as they require stringent conditions, such as granularity and accuracy of the data at hand, which are rarely met. The “official” approach adopted by public institutions to estimate the “excess mortality” during the pandemic draws on a comparison between observed all-cause mortality data for 2020 and averages of mortality figures in the past years for the same period. In this paper, we apply the recently developed machine learning control method to build a more realistic counterfactual scenario of mortality in the absence of COVID-19. We demonstrate that supervised machine learning techniques outperform the official method by substantially improving the prediction accuracy of the local mortality in “ordinary” years, especially in small- and medium-sized municipalities. We then apply the best-performing algorithms to derive estimates of local excess mortality for the period between February and September 2020. Such estimates allow us to provide insights about the demographic evolution of the first wave of the pandemic throughout the country. To help improve diagnostic and monitoring efforts, our dataset is freely available to the research community.


Author(s):  
Vara Vundavalli ◽  
Farhat Barsha ◽  
Mohammad Masum ◽  
Hossain Shahriar ◽  
Hisham Haddad

Sign in / Sign up

Export Citation Format

Share Document