scholarly journals Machine learning approaches to predict the plant-associated phenotype of Xanthomonas strains

2021 ◽  
Author(s):  
Dennie te Molder ◽  
Wasin Poncheewin ◽  
Peter Schaap ◽  
Jasper Koehorst

The genus Xanthomonas has long been considered to consist predominantly of plant pathogens, but over the last decade there has been an increasing number of reports on non-pathogenic and endophytic members. As Xanthomonas species are prevalent pathogens on a wide variety of important crops around the world, there is a need to distinguish between these plant-associated phenotypes. To date a large number of Xanthomonas genomes have been sequenced, which enables the application of machine learning (ML) approaches on the genome content to predict this phenotype. Until now such approaches to the pathogenomics of Xanthomonas strains have been hampered by the fragmentation of information regarding strain pathogenicity over many studies. Unification of this information into a single resource was therefore considered to be an essential step. Mining of 39 papers considering both plant-associated phenotypes, allowed for a phenotypic classification of 578 Xanthomonas strains. For 65 plant-pathogenic and 53 non-pathogenic strains the corresponding genomes were available and de novo annotated for the presence of Pfam protein domains used as features to train and compare three ML classification algorithms; CART, Lasso and Random Forest. Recursive feature extraction provided further insights into the virulence enabling factors, but also yielded domains linked to traits not present in pathogenic strains.

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Dennie te Molder ◽  
Wasin Poncheewin ◽  
Peter J. Schaap ◽  
Jasper J. Koehorst

Abstract Background The genus Xanthomonas has long been considered to consist predominantly of plant pathogens, but over the last decade there has been an increasing number of reports on non-pathogenic and endophytic members. As Xanthomonas species are prevalent pathogens on a wide variety of important crops around the world, there is a need to distinguish between these plant-associated phenotypes. To date a large number of Xanthomonas genomes have been sequenced, which enables the application of machine learning (ML) approaches on the genome content to predict this phenotype. Until now such approaches to the pathogenomics of Xanthomonas strains have been hampered by the fragmentation of information regarding pathogenicity of individual strains over many studies. Unification of this information into a single resource was therefore considered to be an essential step. Results Mining of 39 papers considering both plant-associated phenotypes, allowed for a phenotypic classification of 578 Xanthomonas strains. For 65 plant-pathogenic and 53 non-pathogenic strains the corresponding genomes were available and de novo annotated for the presence of Pfam protein domains used as features to train and compare three ML classification algorithms; CART, Lasso and Random Forest. Conclusion The literature resource in combination with recursive feature extraction used in the ML classification algorithms provided further insights into the virulence enabling factors, but also highlighted domains linked to traits not present in pathogenic strains.


Author(s):  
Mamehgol Yousefi ◽  
Azmin Shakrine ◽  
Samsuzana bt. Abd Aziz ◽  
Syaril Azrad ◽  
Mohamed Mazmira ◽  
...  

2020 ◽  
Author(s):  
Valerio Carruba

<p>Asteroid families are groups of asteroids that are the product of collisions or of the rotational fission of a parent object.  These groups are mainly identified in proper elements or frequencies domains.   Because of robotic telescope surveys, the number of known asteroids has increased from about 10,000 in the early 90's to more than 750,000 nowadays. Traditional approaches for identifying new members of asteroid families, like the hierarchical clustering method (HCM), may   struggle to keep up with the growing rate of new discoveries. Here we used machine learning classification algorithms to identify new family members based on the orbital distribution in proper (a,e,sin(i)) of previously known family constituents. We compared the outcome of nine classification algorithms from stand alone and ensemble approaches.  The Extremely Randomized Trees (ExtraTree) method had the highest precision, enabling to  retrieve up to 97% of family members identified with standard HCM.</p>


Software engineering is an important area that deals with development and maintenance of software. After developing a software, it is always important to track its performance. One has to always see whether the software functions according to customer requirements. To ensure this, faulty and non- faulty modules must be identified. For this purpose, one can make use of a model for binary class classification of faults. Different technique's outputs differ in one or the other way with respect to the following: fault dataset used, complexity, classification algorithm implemented, etc. Various machine learning techniques can be used for this purpose. But this paper deals with the best classification algorithms available till date and they are decision tree, random forest, naive bayes and logistic regression (tree-based techniques and bayesian based techniques). The motive behind developing such a project is to identify the faulty modules within a software before the actual software testing takes place. As a result, the time consumed by testers or the workload of the testers can be reduced to an extent. This work is very well useful to those working in software industry and also to those people carrying out research in software engineering where the lifecycle of development of a software is discussed.


2020 ◽  
Vol 77 (4) ◽  
pp. 1267-1273
Author(s):  
Cigdem Beyan ◽  
Howard I Browman

Abstract Machine learning, a subfield of artificial intelligence, offers various methods that can be applied in marine science. It supports data-driven learning, which can result in automated decision making of de novo data. It has significant advantages compared with manual analyses that are labour intensive and require considerable time. Machine learning approaches have great potential to improve the quality and extent of marine research by identifying latent patterns and hidden trends, particularly in large datasets that are intractable using other approaches. New sensor technology supports collection of large amounts of data from the marine environment. The rapidly developing machine learning subfield known as deep learning—which applies algorithms (artificial neural networks) inspired by the structure and function of the brain—is able to solve very complex problems by processing big datasets in a short time, sometimes achieving better performance than human experts. Given the opportunities that machine learning can provide, its integration into marine science and marine resource management is inevitable. The purpose of this themed set of articles is to provide as wide a selection as possible of case studies that demonstrate the applications, utility, and promise of machine learning in marine science. We also provide a forward-look by envisioning a marine science of the future into which machine learning has been fully incorporated.


Sign in / Sign up

Export Citation Format

Share Document