Challenges and Opportunities in Applied Machine Learning

Machine learning research is often conducted in vitro, divorced from motivating practical applications. A researcher might develop a new method for the general task of classification, then assess its utility by comparing its performance (such as accuracy or AUC) to that of existing classification models on publicly available datasets. In terms of advancing machine learning as an academic discipline, this approach has thus far proven quite fruitful. However, it is our view that the most interesting open problems in machine learning are those that arise during its application to real-world problems. We illustrate this point by reviewing two of our interdisciplinary collaborations, both of which have posed unique machine learning problems, providing fertile ground for novel research.

Download Full-text

Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review

Biology ◽

10.3390/biology9120453 ◽

2020 ◽

Vol 9 (12) ◽

pp. 453

Author(s):

Petar Tonkovic ◽

Slobodan Kalajdziski ◽

Eftim Zdravevski ◽

Petre Lameski ◽

Roberto Corizzo ◽

...

Keyword(s):

Machine Learning ◽

Language Processing ◽

Scoping Review ◽

Digital Libraries ◽

Research Field ◽

Time Interval ◽

Research Papers ◽

Data Set ◽

Practical Applications ◽

Applied Machine Learning

Applied machine learning in bioinformatics is growing as computer science slowly invades all research spheres. With the arrival of modern next-generation DNA sequencing algorithms, metagenomics is becoming an increasingly interesting research field as it finds countless practical applications exploiting the vast amounts of generated data. This study aims to scope the scientific literature in the field of metagenomic classification in the time interval 2008–2019 and provide an evolutionary timeline of data processing and machine learning in this field. This study follows the scoping review methodology and PRISMA guidelines to identify and process the available literature. Natural Language Processing (NLP) is deployed to ensure efficient and exhaustive search of the literary corpus of three large digital libraries: IEEE, PubMed, and Springer. The search is based on keywords and properties looked up using the digital libraries’ search engines. The scoping review results reveal an increasing number of research papers related to metagenomic classification over the past decade. The research is mainly focused on metagenomic classifiers, identifying scope specific metrics for model evaluation, data set sanitization, and dimensionality reduction. Out of all of these subproblems, data preprocessing is the least researched with considerable potential for improvement.

Download Full-text

Class Imbalance Learning

10.34048/2017.1.f1 ◽

2017 ◽

Author(s):

Sudarsun Santhiappan ◽

Balaraman Ravindran

Keyword(s):

Machine Learning ◽

Real World ◽

Class Imbalance ◽

Classification Problem ◽

Classification Algorithms ◽

Challenges And Opportunities ◽

Data Points ◽

Imbalance Learning ◽

Class Imbalance Learning ◽

Real World Problems

Data classiﬁcation task assigns labels to data points using a model that is learned from a collection of pre-labeled data points. The Class Imbalance Learning (CIL) problem is concerned with the performance of classiﬁcation algorithms in the presence of under-represented data and severe class distribution skews. Due to the inherent complex characteristics of imbalanced datasets, learning from such data requires new understandings, principles, algorithms, and tools to transform vast amounts of raw data effciently into information and knowledge representation. It is important to study CIL because it is rare to ﬁnd a classiﬁcation problem in real world scenarios that follows balanced class distributions. In this article, we have presented how machine learning has become the integral part of modern lifestyle and how some of the real world problems are modeled as CIL problems. We have also provided a detailed survey on the fundamentals and solutions to class imbalance learning. We conclude the survey by presenting some of the challenges and opportunities with class imbalance learning.

Download Full-text

Identification of stress response proteins through fusion of machine learning models and statistical paradigms

Scientific Reports ◽

10.1038/s41598-021-99083-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ebraheem Alzahrani ◽

Wajdi Alghamdi ◽

Malik Zaka Ullah ◽

Yaser Daanial Khan

Keyword(s):

Machine Learning ◽

Random Forest ◽

Stress Proteins ◽

Ex Vivo ◽

Structural Characteristics ◽

Support Vector ◽

Practical Applications ◽

Link Type

AbstractProteins are a vital component of cells that perform physiological functions to ensure smooth operations of bodily functions. Identification of a protein's function involves a detailed understanding of the structure of proteins. Stress proteins are essential mediators of several responses to cellular stress and are categorized based on their structural characteristics. These proteins are found to be conserved across many eukaryotic and prokaryotic linkages and demonstrate varied crucial functional activities inside a cell. The in-vivo, ex vivo, and in-vitro identification of stress proteins are a time-consuming and costly task. This study is aimed at the identification of stress protein sequences with the aid of mathematical modelling and machine learning methods to supplement the aforementioned wet lab methods. The model developed using Random Forest showed remarkable results with 91.1% accuracy while models based on neural network and support vector machine showed 87.7% and 47.0% accuracy, respectively. Based on evaluation results it was concluded that random-forest based classifier surpassed all other predictors and is suitable for use in practical applications for the identification of stress proteins. Live web server is available at http://biopred.org/stressprotiens, while the webserver code available is at https://github.com/abdullah5naveed/SRP_WebServer.git

Download Full-text

Machine Learning Algorithm Application in Software Quality Improvement using Metrics

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1359.0986s319 ◽

2019 ◽

Vol 8 (6S3) ◽

pp. 1873-1876

Keyword(s):

Machine Learning ◽

Software Quality ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Learning Problems ◽

Fertile Ground ◽

Quality Testing ◽

Software Quality Metrics ◽

Design And Testing

Machine learning purely concerned on the concept with building the program that improves the tasks performance through experience. Machine learning algorithms have proven to be of great practical value in a variety of application domains. the field of software engineering turns out to be a fertile ground where many software development tasks could be formulated as learning problems, analyzing design and testing plays the major role and approached in terms of learning algorithms We discuss several metrics in each of five types of software quality metrics: product quality, in-process quality, testing quality, maintenance equality, and customer satisfaction quality.

Download Full-text

Handling class overlapping to detect noisy instances in classification

The Knowledge Engineering Review ◽

10.1017/s0269888918000115 ◽

2018 ◽

Vol 33 ◽

Cited By ~ 2

Author(s):

Shivani Gupta ◽

Atul Gupta

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Binary Classification ◽

Vital Role ◽

Learning Problems ◽

Data Sets ◽

Class Noise ◽

Principal Contributor ◽

Real World Problems ◽

Noise Filters

AbstractAutomated machine classification will play a vital role in the machine learning and data mining. It is probable that each classifier will work well on some data sets and not so well in others, increasing the evaluation significance. The performance of the learning models will intensely rely on upon the characteristics of the data sets. The previous outcomes recommend that overlapping between classes and the presence of noise has the most grounded impact on the performance of learning algorithm. The class overlap problem is a critical problem in which data samples appear as valid instances of more than one class which may be responsible for the presence of noise in data sets.The objective of this paper is to comprehend better the data used as a part of machine learning problems so as to learn issues and to analyze the instances that are profoundly covered by utilizing new proposed overlap measures. The proposed overlap measures are Nearest Enemy Ratio, SubConcept Ratio, Likelihood Ratio and Soft Margin Ratio. To perform this experiment, we have created 438 binary classification data sets from real-world problems and computed the value of 12 data complexity metrics to find highly overlapped data sets. After that we apply measures to identify the overlapped instances and four noise filters to find the noisy instances. From results, we found that 60–80% overlapped instances are noisy instances in data sets by using four noise filters. We found that class overlap is a principal contributor to introduce class noise in data sets.

Download Full-text

Hybrid Design of Isonicotinic Acid Hydrazide Derivatives: Machine Learning Studies, Synthesis and Biological Evaluation of their Antituberculosis Activity

Current Drug Discovery Technologies ◽

10.2174/1570163816666190411110331 ◽

2020 ◽

Vol 17 (3) ◽

pp. 365-375

Author(s):

Vasyl Kovalishyn ◽

Diana Hodyna ◽

Vitaliy O. Sinenko ◽

Volodymyr Blagodatny ◽

Ivan Semenyuta ◽

...

Keyword(s):

Machine Learning ◽

Isonicotinic Acid ◽

Acid Hydrazide ◽

Predictive Ability ◽

Antitubercular Activity ◽

Biological Evaluation ◽

Starting Point ◽

Binary Classifiers ◽

Synthesis And Biological Evaluation

Background: Tuberculosis (TB) is an infection disease caused by Mycobacterium tuberculosis (Mtb) bacteria. One of the main causes of mortality from TB is the problem of Mtb resistance to known drugs. Objective: The goal of this work is to identify potent small molecule anti-TB agents by machine learning, synthesis and biological evaluation. Methods: The On-line Chemical Database and Modeling Environment (OCHEM) was used to build predictive machine learning models. Seven compounds were synthesized and tested in vitro for their antitubercular activity against H37Rv and resistant Mtb strains. Results: A set of predictive models was built with OCHEM based on a set of previously synthesized isoniazid (INH) derivatives containing a thiazole core and tested against Mtb. The predictive ability of the models was tested by a 5-fold cross-validation, and resulted in balanced accuracies (BA) of 61–78% for the binary classifiers. Test set validation showed that the models could be instrumental in predicting anti- TB activity with a reasonable accuracy (with BA = 67–79 %) within the applicability domain. Seven designed compounds were synthesized and demonstrated activity against both the H37Rv and multidrugresistant (MDR) Mtb strains resistant to rifampicin and isoniazid. According to the acute toxicity evaluation in Daphnia magna neonates, six compounds were classified as moderately toxic (LD50 in the range of 10−100 mg/L) and one as practically harmless (LD50 in the range of 100−1000 mg/L). Conclusion: The newly identified compounds may represent a starting point for further development of therapies against Mtb. The developed models are available online at OCHEM http://ochem.eu/article/11 1066 and can be used to virtually screen for potential compounds with anti-TB activity.

Download Full-text

Learning and control

10.1093/oso/9780199674923.003.0026 ◽

2018 ◽

Author(s):

Ivan Herreros

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Brain Function ◽

Control Strategies ◽

Learning Problems ◽

Animal Learning ◽

Feed Forward Control ◽

Machine Learning Applications ◽

And Control

This chapter discusses basic concepts from control theory and machine learning to facilitate a formal understanding of animal learning and motor control. It first distinguishes between feedback and feed-forward control strategies, and later introduces the classification of machine learning applications into supervised, unsupervised, and reinforcement learning problems. Next, it links these concepts with their counterparts in the domain of the psychology of animal learning, highlighting the analogies between supervised learning and classical conditioning, reinforcement learning and operant conditioning, and between unsupervised and perceptual learning. Additionally, it interprets innate and acquired actions from the standpoint of feedback vs anticipatory and adaptive control. Finally, it argues how this framework of translating knowledge between formal and biological disciplines can serve us to not only structure and advance our understanding of brain function but also enrich engineering solutions at the level of robot learning and control with insights coming from biology.

Download Full-text

Challenges and Opportunities for Data Science and Machine Learning in IoT Systems - A Timely Debate: Part 2

IEEE Internet of Things Magazine ◽

10.1109/iotm.0022.2000002 ◽

2020 ◽

pp. 2-6

Author(s):

Sumi Helal ◽

Flavia C. Delicato ◽

Cintia B. Margi ◽

Satyajayant Misra ◽

Markus Endler

Keyword(s):

Machine Learning ◽

Data Science ◽

Challenges And Opportunities

Download Full-text

Computational Methods for Structure-to-Function Analysis of Diet-Derived Catechins-Mediated Targeting of In Vitro Vasculogenic Mimicry

Cancer Informatics ◽

10.1177/11769351211009229 ◽

2021 ◽

Vol 20 ◽

pp. 117693512110092

Author(s):

Abicumaran Uthamacumaran ◽

Narjara Gonzalez Suarez ◽

Abdoulaye Baniré Diallo ◽

Borhane Annabi

Keyword(s):

Machine Learning ◽

Cancer Cells ◽

Structural Changes ◽

Function Analysis ◽

Vasculogenic Mimicry ◽

Machine Learning Algorithms ◽

Emergent Behavior ◽

Molecular Signature ◽

Ovarian Cancer Cells

Background: Vasculogenic mimicry (VM) is an adaptive biological phenomenon wherein cancer cells spontaneously self-organize into 3-dimensional (3D) branching network structures. This emergent behavior is considered central in promoting an invasive, metastatic, and therapy resistance molecular signature to cancer cells. The quantitative analysis of such complex phenotypic systems could require the use of computational approaches including machine learning algorithms originating from complexity science. Procedures: In vitro 3D VM was performed with SKOV3 and ES2 ovarian cancer cells cultured on Matrigel. Diet-derived catechins disruption of VM was monitored at 24 hours with pictures taken with an inverted microscope. Three computational algorithms for complex feature extraction relevant for 3D VM, including 2D wavelet analysis, fractal dimension, and percolation clustering scores were assessed coupled with machine learning classifiers. Results: These algorithms demonstrated the structure-to-function galloyl moiety impact on VM for each of the gallated catechin tested, and shown applicable in quantifying the drug-mediated structural changes in VM processes. Conclusions: Our study provides evidence of how appropriate 3D VM compression and feature extractors coupled with classification/regression methods could be efficient to study in vitro drug-induced perturbation of complex processes. Such approaches could be exploited in the development and characterization of drugs targeting VM.

Download Full-text

Introduce structural equation modelling to machine learning problems for building an explainable and persuasive model

SICE Journal of Control Measurement and System Integration ◽

10.1080/18824889.2021.1894040 ◽

2021 ◽

pp. 1-13

Author(s):

Jiarui Li ◽

Tetsuo Sawaragi ◽

Yukio Horiguchi

Keyword(s):

Machine Learning ◽

Structural Equation Modelling ◽

Structural Equation ◽

Learning Problems ◽

Equation Modelling

Download Full-text