Automated Training for Algorithms That Learn from Genomic Data

Supervised machine learning algorithms are used by life scientists for a variety of objectives. Expert-curated public gene and protein databases are major resources for gathering data to train these algorithms. While these data resources are continuously updated, generally, these updates are not incorporated into published machine learning algorithms which thereby can become outdated soon after their introduction. In this paper, we propose a new model of operation for supervised machine learning algorithms that learn from genomic data. By defining these algorithms in a pipeline in which the training data gathering procedure and the learning process are automated, one can create a system that generates a classifier or predictor using information available from public resources. The proposed model is explained using three case studies on SignalP, MemLoci, and ApicoAP in which existing machine learning models are utilized in pipelines. Given that the vast majority of the procedures described for gathering training data can easily be automated, it is possible to transform valuable machine learning algorithms into self-evolving learners that benefit from the ever-changing data available for gene products and to develop new machine learning algorithms that are similarly capable.

Download Full-text

Structure label prediction using similarity-based retrieval and weakly supervised label mapping

Geophysics ◽

10.1190/geo2018-0028.1 ◽

2019 ◽

Vol 84 (1) ◽

pp. V67-V79 ◽

Cited By ~ 9

Author(s):

Yazeed Alaudah ◽

Motaz Alfarraj ◽

Ghassan AlRegib

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Seismic Interpretation ◽

Machine Learning Algorithms ◽

Training Data ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Subsurface Structures ◽

Weakly Supervised ◽

Seismic Volumes

Recently, there has been significant interest in various supervised machine learning techniques that can help reduce the time and effort consumed by manual interpretation workflows. However, most successful supervised machine learning algorithms require huge amounts of annotated training data. Obtaining these labels for large seismic volumes is a very time-consuming and laborious task. We have addressed this problem by presenting a weakly supervised approach for predicting the labels of various seismic structures. By having an interpreter select a very small number of exemplar images for every class of subsurface structures, we use a novel similarity-based retrieval technique to extract thousands of images that contain similar subsurface structures from the seismic volume. By assuming that similar images belong to the same class, we obtain thousands of image-level labels for these images; we validate this assumption. We have evaluated a novel weakly supervised algorithm for mapping these rough image-level labels into more accurate pixel-level labels that localize the different subsurface structures within the image. This approach dramatically simplifies the process of obtaining labeled data for training supervised machine learning algorithms on seismic interpretation tasks. Using our method, we generate thousands of automatically labeled images from the Netherlands Offshore F3 block with reasonably accurate pixel-level labels. We believe that this work will allow for more advances in machine learning-enabled seismic interpretation.

Download Full-text

Influence Distribution Training Data on Performance Supervised Machine Learning Algorithms

2020 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI) ◽

10.1109/isriti51436.2020.9315413 ◽

2020 ◽

Author(s):

Ignasius Boli Suban ◽

Andi W. R. Emanuel

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Supervised Machine Learning

Download Full-text

A feature-centric spam email detection model using diverse supervised machine learning algorithms

The Electronic Library ◽

10.1108/el-07-2019-0181 ◽

2020 ◽

Vol 38 (3) ◽

pp. 633-657

Author(s):

Ammara Zamir ◽

Hikmat Ullah Khan ◽

Waqar Mehmood ◽

Tassawar Iqbal ◽

Abubakker Usman Akram

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Classification Accuracy ◽

Research Study ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Content Type ◽

Detection Model ◽

Proposed Model

Purpose This research study proposes a feature-centric spam email detection model (FSEDM) based on content, sentiment, semantic, user and spam-lexicon features set. The purpose of this study is to exploit the role of sentiment features along with other proposed features to evaluate the classification accuracy of machine learning algorithms for spam email detection. Design/methodology/approach Existing studies primarily exploits content-based feature engineering approach; however, a limited number of features is considered. In this regard, this research study proposed a feature-centric framework (FSEDM) based on existing and novel features of email data set, which are extracted after pre-processing. Afterwards, diverse supervised learning techniques are applied on the proposed features in conjunction with feature selection techniques such as information gain, gain ratio and Relief-F to rank most prominent features and classify the emails into spam or ham (not spam). Findings Analysis and experimental results indicated that the proposed model with sentiment analysis is competitive approach for spam email detection. Using the proposed model, deep neural network applied with sentiment features outperformed other classifiers in terms of classification accuracy up to 97.2%. Originality/value This research is novel in this regard that no previous research focuses on sentiment analysis in conjunction with other email features for detection of spam emails.

Download Full-text

Application of Supervised Machine Learning Algorithms for Lithofacies Classification.

10.2523/19349-ms ◽

2019 ◽

Author(s):

Subhadeep Sarkar ◽

Chandan Majumdar

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Lithofacies Classification

Download Full-text

Optimization of Diabetes Training DATA using Machine Learning Algorithms

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i2.283286 ◽

2018 ◽

Vol 6 (2) ◽

pp. 283-286

Author(s):

M. Samba Siva Rao ◽

◽

M.Yaswanth . ◽

K. Raghavendra Swamy ◽

◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data

Download Full-text

A Deep Analysis and Efficient Implementation of Supervised Machine Learning Algorithms for Enhancing The Classification Ability of System

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i3.10941101 ◽

2019 ◽

Vol 7 (3) ◽

pp. 1094-1101

Author(s):

Sandeep Kumar Verma ◽

Turendar Sahu ◽

Manjit Jaiswal

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Efficient Implementation ◽

Machine Learning Algorithms ◽

Supervised Machine Learning

Download Full-text

A Comparative Study of Three Supervised Machine-Learning Algorithms for Classifying Carbonate Vuggy Facies in the Kansas Arbuckle Formation

Petrophysics – The SPWLA Journal of Formation Evaluation and Reservoir Description ◽

10.30632/pjv60n6-2019a8 ◽

2019 ◽

Vol 60 (6) ◽

pp. 838-853

Author(s):

◽

Chicheng Xu ◽

Dawn Jobe ◽

Rui Xu ◽

◽

...

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning

Download Full-text

Crop price prediction using supervised machine learning algorithms

Journal of Physics Conference Series ◽

10.1088/1742-6596/1916/1/012042 ◽

2021 ◽

Vol 1916 (1) ◽

pp. 012042

Author(s):

Ranjani Dhanapal ◽

A AjanRaj ◽

S Balavinayagapragathish ◽

J Balaji

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Price Prediction

Download Full-text

Performance Improvement of Decision Tree: A Robust Classifier Using Tabu Search Algorithm

Applied Sciences ◽

10.3390/app11156728 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6728

Author(s):

Muhammad Asfand Hafeez ◽

Muhammad Rashid ◽

Hassan Tariq ◽

Zain Ul Abideen ◽

Saud S. Alotaibi ◽

...

Keyword(s):

Machine Learning ◽

Tabu Search ◽

Decision Tree ◽

Decision Trees ◽

Search Algorithm ◽

Learning Algorithms ◽

Performance Comparison ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Tabu Search Algorithm

Classification and regression are the major applications of machine learning algorithms which are widely used to solve problems in numerous domains of engineering and computer science. Different classifiers based on the optimization of the decision tree have been proposed, however, it is still evolving over time. This paper presents a novel and robust classifier based on a decision tree and tabu search algorithms, respectively. In the aim of improving performance, our proposed algorithm constructs multiple decision trees while employing a tabu search algorithm to consistently monitor the leaf and decision nodes in the corresponding decision trees. Additionally, the used tabu search algorithm is responsible to balance the entropy of the corresponding decision trees. For training the model, we used the clinical data of COVID-19 patients to predict whether a patient is suffering. The experimental results were obtained using our proposed classifier based on the built-in sci-kit learn library in Python. The extensive analysis for the performance comparison was presented using Big O and statistical analysis for conventional supervised machine learning algorithms. Moreover, the performance comparison to optimized state-of-the-art classifiers is also presented. The achieved accuracy of 98%, the required execution time of 55.6 ms and the area under receiver operating characteristic (AUROC) for proposed method of 0.95 reveals that the proposed classifier algorithm is convenient for large datasets.

Download Full-text

Spatial Roadway Condition-Assessment Mapping Utilizing Smartphones and Machine Learning Algorithms

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211006105 ◽

2021 ◽

pp. 036119812110061

Author(s):

Charalambos Kyriakou ◽

Symeon E. Christodoulou ◽

Loukas Dimitriou

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Condition Assessment ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Geographical Information ◽

Related Field ◽

Pavement Surface ◽

Automated Method ◽

Smartphone Technology

The paper presents a data-driven framework and related field studies on the use of supervised machine learning and smartphone technology for the spatial condition-assessment mapping of roadway pavement surface anomalies. The study explores the use of data, collected by sensors from a smartphone and a vehicle’s onboard diagnostic device while the vehicle is in movement, for the detection of roadway anomalies. The research proposes a low-cost and automated method to obtain up-to-date information on roadway pavement surface anomalies with the use of smartphone technology, artificial neural networks, robust regression analysis, and supervised machine learning algorithms for multiclass problems. The technology for the suggested system is readily available and accurate and can be utilized in pavement monitoring systems and geographical information system applications. Further, the proposed methodology has been field-tested, exhibiting accuracy levels higher than 90%, and it is currently expanded to include larger datasets and a bigger number of common roadway pavement surface defect types. The proposed system is of practical importance since it provides continuous information on roadway pavement surface conditions, which can be valuable for pavement engineers and public safety.

Download Full-text