Benchmarking machine learning models for the analysis of genetic data using FRESA.CAD Binary Classification Benchmarking

AbstractBackgroundMachine learning models have proven to be useful tools for the analysis of genetic data. However, with the availability of a wide variety of such methods, model selection has become increasingly difficult, both from the human and computational perspective.ResultsWe present the R package FRESA.CAD Binary Classification Benchmarking that performs systematic comparisons between a collection of representative machine learning methods for solving binary classification problems on genetic datasets.ConclusionsFRESA.CAD Binary Benchmarking demonstrates to be a useful tool over a variety of binary classification problems comprising the analysis of genetic data showing both quantitative and qualitative advantages over similar packages.

Download Full-text

Utilizing Blockchain Technology in Social Media Bot Identification

10.36227/techrxiv.12049374 ◽

2020 ◽

Author(s):

Shreya Reddy ◽

Lisa Ewen ◽

Pankti Patel ◽

Prerak Patel ◽

Ankit Kundal ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Gold Standard ◽

The Internet ◽

Learning Models ◽

Current Time ◽

Machine Learning Methods ◽

Blockchain Technology ◽

Modern Age ◽

Machine Learning Models

<p>As bots become more prevalent and smarter in the modern age of the internet, it becomes ever more important that they be identified and removed. Recent research has dictated that machine learning methods are accurate and the gold standard of bot identification on social media. Unfortunately, machine learning models do not come without their negative aspects such as lengthy training times, difficult feature selection, and overwhelming pre-processing tasks. To overcome these difficulties, we are proposing a blockchain framework for bot identification. At the current time, it is unknown how this method will perform, but it serves to prove the existence of an overwhelming gap of research under this area.<i></i></p>

Download Full-text

Application of Machine Learning Methods in the Task of Identifying User Accounts in Two Social Networks

Computer Tools in Education ◽

10.32603/2071-2340-2019-3-29-43 ◽

2019 ◽

pp. 29-43

Author(s):

Anastasiya A. Korepanova ◽

◽

Valerii D. Oliseenko ◽

Maxim V. Abramov ◽

Alexander L. Tulupyev ◽

...

Keyword(s):

Machine Learning ◽

Social Networks ◽

Information System ◽

New Combination ◽

Practical Significance ◽

User Profiles ◽

Learning Models ◽

Machine Learning Methods ◽

The Social ◽

Machine Learning Models

The article describes the approach to solving the problem of comparing user profiles of different social networks and identifying those that belong to one person. An appropriate method is proposed based on a comparison of the social environment and the values of account profile attributes in two different social networks. The results of applying various machine learning models to solving this problem are compared. The novelty of the approach lies in the proposed new combination of various methods and application to new social networks. The practical significance of the study is to automate the process of determining the ownership of profiles in various social networks to one user. These results can be applied in the task of constructing a meta-profile of a user of an information system for the subsequent construction of a profile of his vulnerabilities, as well as in other studies devoted to social networks.

Download Full-text

Automated Retraining of Machine Learning Models

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3322.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 445-452

Keyword(s):

Machine Learning ◽

Input Data ◽

Research Work ◽

Learning Models ◽

Machine Learning Methods ◽

Machine Learning Model ◽

Crucial Component ◽

Conventional Machine ◽

Over Time ◽

Machine Learning Models

Data is the most crucial component of a successful ML system. Once a machine learning model is developed, it gets obsolete over time due to presence of new input data being generated every second. In order to keep our predictions accurate we need to find a way to keep our models up to date. Our research work involves finding a mechanism which can retrain the model with new data automatically. This research also involves exploring the possibilities of automating machine learning processes. We started this project by training and testing our model using conventional machine learning methods. The outcome was then compared with the outcome of those experiments conducted using the AutoML methods like TPOT. This helped us in finding an efficient technique to retrain our models. These techniques can be used in areas where people do not deal with the actual working of a ML model but only require the outputs of ML processes

Download Full-text

TOPICAL ISSUES OF APPLICATION OF MACHINE LEARNING METHODS IN ECONOMY

Инновационные аспекты развития науки и техники. Сборник статей VIII Международной научно-практической конференции: сборник статей, [электронное издание сетевого распространения] / Под ред. Н.В. Емельянова. – М.: “КДУ”, “Добросвет”, 2021. – 149 с. ◽

10.31453/kdu.ru.978-5-7913-1176-4-2021-28-33 ◽

2021 ◽

Author(s):

Natalia Pavlovna Persteneva ◽

◽

Darya Dmitrievn Skryleva ◽

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Learning Model ◽

Learning Models ◽

Learning Methods ◽

Machine Learning Methods ◽

Machine Learning Model ◽

Popular Classes ◽

Machine Learning Models

The article discusses machine learning methods. Using the example of two popular classes: supervised learning and unsupervised learning. Variants of the main types of machine learning models for each method are presented. A generalized algorithm for building any machine learning model is formed.

Download Full-text

chemmodlab: a cheminformatics modeling laboratory R package for fitting and assessing machine learning models

Journal of Cheminformatics ◽

10.1186/s13321-018-0309-4 ◽

2018 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Jeremy R. Ash ◽

Jacqueline M. Hughes-Oliver

Keyword(s):

Machine Learning ◽

R Package ◽

Learning Models ◽

Machine Learning Models

Download Full-text

A pitfall for machine learning methods aiming to predict across cell types

Genome Biology ◽

10.1186/s13059-020-02177-y ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Jacob Schreiber ◽

Ritambhara Singh ◽

Jeffrey Bilmes ◽

William Stafford Noble

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Cell Types ◽

Chromatin Domain ◽

Learning Models ◽

Machine Learning Methods ◽

Domain Boundaries ◽

Average Activity ◽

Test Sets ◽

Machine Learning Models

AbstractMachine learning models that predict genomic activity are most useful when they make accurate predictions across cell types. Here, we show that when the training and test sets contain the same genomic loci, the resulting model may falsely appear to perform well by effectively memorizing the average activity associated with each locus across the training cell types. We demonstrate this phenomenon in the context of predicting gene expression and chromatin domain boundaries, and we suggest methods to diagnose and avoid the pitfall. We anticipate that, as more data becomes available, future projects will increasingly risk suffering from this issue.

Download Full-text

Sex estimation in cranial remains: A comparison of machine learning and discriminant analysis in Italian populations

10.1101/2020.04.30.071597 ◽

2020 ◽

Author(s):

A Pozzi ◽

C Raffone ◽

MG Belcastro ◽

TL Camilleri-Carter

Keyword(s):

Machine Learning ◽

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Foramen Magnum ◽

Sex Estimation ◽

Learning Models ◽

Linear Discriminant ◽

Machine Learning Methods ◽

Cranial Measurements ◽

Machine Learning Models

ABSTRACTObjectivesUsing cranial measurements in two Italian populations, we compare machine learning methods to the more traditional method of linear discriminant analysis in estimating sex. We use crania in sex estimation because it is useful especially when remains are fragmented or displaced, and the cranium may be the only remains found.Materials and MethodsUsing the machine learning methods of decision tree learning, support-vector machines, k-nearest neighbor algorithm, and ensemble methods we estimate the sex of two populations: Samples from Bologna and samples from the island of Sardinia. We used two datasets, one containing 17 cranial measurements, and one measuring the foramen magnum.Results and DiscussionOur results indicate that machine learning models produce similar results to linear discriminant analysis, but in some cases machine learning produces more consistent accuracy between the sexes. Our study shows that sex can be accurately predicted (> 80%) in Italian populations using the cranial measurements we gathered, except for the foramen magnum, which shows a level of accuracy of ∼70% accurate which is on par with previous geometric morphometrics studies using crania in sex estimation. We also find that our trained machine learning models produce population-specific results; we see that Italian crania are sexually dimorphic, but the features that are important to this dimorphism differ between the populations.

Download Full-text

Predicting Anesthetic Infusion Events Using Machine Learning

10.21203/rs.3.rs-783161/v1 ◽

2021 ◽

Author(s):

Naoki Miyaguchi ◽

Koh Takeuchi ◽

Hisashi Kashima ◽

Mizuki Morita ◽

Hiroshi Morimatsu

Keyword(s):

Machine Learning ◽

Flow Rate ◽

Short Term Memory ◽

Binary Classification ◽

Classification Problem ◽

Clinical Findings ◽

Support Vector ◽

Learning Models ◽

Continuous Administration ◽

Machine Learning Models

Abstract Recently, research has been conducted to automatically control anesthesia using machine learning, with the aim of alleviating the shortage of anesthesiologists. In this study, we address the problem of predicting decisions made by anesthesiologists during surgery using machine learning; specifically, we formulate a decision making problem by increasing the flow rate at each time point in the continuous administration of analgesic remifentanil as a supervised binary classification problem. The experiments were conducted to evaluate the prediction performance using six machine learning models: logistic regression, support vector machine, random forest, LightGBM, artificial neural network, and long short-term memory (LSTM), using 210 case data collected during actual surgeries. The results demonstrated that when predicting the future increase in flow rate of remifentanil after 1 min, the model using LSTM was able to predict with scores of 0.659 for sensitivity, 0.732 for specificity, and 0.753 for ROC-AUC; this demonstrates the potential to predict the decisions made by anesthesiologists using machine learning. Furthermore, we examined the importance and contribution of the features of each model using shapley additive explanations—a method for interpreting predictions made by machine learning models. The trends indicated by the results were partially consistent with known clinical findings.

Download Full-text

Improved Interpretability of Machine Learning Model Using Unsupervised Clustering: Predicting Time to First Treatment in Chronic Lymphocytic Leukemia

JCO Clinical Cancer Informatics ◽

10.1200/cci.18.00137 ◽

2019 ◽

pp. 1-11 ◽

Cited By ~ 4

Author(s):

David Chen ◽

Gaurav Goyal ◽

Ronald S. Go ◽

Sameer A. Parikh ◽

Che G. Ngufor

Keyword(s):

Machine Learning ◽

Risk Stratification ◽

Unsupervised Clustering ◽

Support Vector ◽

Learning Models ◽

Learning Methods ◽

Machine Learning Methods ◽

Continuous Output ◽

Time To First Treatment ◽

Machine Learning Models

PURPOSE Time to event is an important aspect of clinical decision making. This is particularly true when diseases have highly heterogeneous presentations and prognoses, as in chronic lymphocytic lymphoma (CLL). Although machine learning methods can readily learn complex nonlinear relationships, many methods are criticized as inadequate because of limited interpretability. We propose using unsupervised clustering of the continuous output of machine learning models to provide discrete risk stratification for predicting time to first treatment in a cohort of patients with CLL. PATIENTS AND METHODS A total of 737 treatment-naïve patients with CLL diagnosed at Mayo Clinic were included in this study. We compared predictive abilities for two survival models (Cox proportional hazards and random survival forest) and four classification methods (logistic regression, support vector machines, random forest, and gradient boosting machine). Probability of treatment was then stratified. RESULTS Machine learning methods did not yield significantly more accurate predictions of time to first treatment. However, automated risk stratification provided by clustering was able to better differentiate patients who were at risk for treatment within 1 year than models developed using standard survival analysis techniques. CONCLUSION Clustering the posterior probabilities of machine learning models provides a way to better interpret machine learning models.

Download Full-text

Semantic similarity and machine learning with ontologies

Briefings in Bioinformatics ◽

10.1093/bib/bbaa199 ◽

2020 ◽

Author(s):

Maxat Kulmanov ◽

Fatima Zohra Smaili ◽

Xin Gao ◽

Robert Hoehndorf

Keyword(s):

Machine Learning ◽

Semantic Similarity ◽

Domain Knowledge ◽

Life Sciences ◽

Similarity Measures ◽

Background Knowledge ◽

Biological Database ◽

Learning Models ◽

Machine Learning Methods ◽

Machine Learning Models

Abstract Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge and they are employed in almost every major biological database. Recently, ontologies are increasingly being used to provide background knowledge in similarity-based analysis and machine learning models. The methods employed to combine ontologies and machine learning are still novel and actively being developed. We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in ontologies and how ontologies can provide constraints that improve machine learning models. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at https://github.com/bio-ontology-research-group/machine-learning-with-ontologies.

Download Full-text