scholarly journals Benchmarking machine learning models for the analysis of genetic data using FRESA.CAD Binary Classification Benchmarking

2019 ◽  
Author(s):  
Javier de Velasco Oriol ◽  
Antonio Martinez-Torteya ◽  
Victor Trevino ◽  
Israel Alanis ◽  
Edgar E. Vallejo ◽  
...  

AbstractBackgroundMachine learning models have proven to be useful tools for the analysis of genetic data. However, with the availability of a wide variety of such methods, model selection has become increasingly difficult, both from the human and computational perspective.ResultsWe present the R package FRESA.CAD Binary Classification Benchmarking that performs systematic comparisons between a collection of representative machine learning methods for solving binary classification problems on genetic datasets.ConclusionsFRESA.CAD Binary Benchmarking demonstrates to be a useful tool over a variety of binary classification problems comprising the analysis of genetic data showing both quantitative and qualitative advantages over similar packages.

2020 ◽  
Author(s):  
Shreya Reddy ◽  
Lisa Ewen ◽  
Pankti Patel ◽  
Prerak Patel ◽  
Ankit Kundal ◽  
...  

<p>As bots become more prevalent and smarter in the modern age of the internet, it becomes ever more important that they be identified and removed. Recent research has dictated that machine learning methods are accurate and the gold standard of bot identification on social media. Unfortunately, machine learning models do not come without their negative aspects such as lengthy training times, difficult feature selection, and overwhelming pre-processing tasks. To overcome these difficulties, we are proposing a blockchain framework for bot identification. At the current time, it is unknown how this method will perform, but it serves to prove the existence of an overwhelming gap of research under this area.<i></i></p>


2019 ◽  
pp. 29-43
Author(s):  
Anastasiya A. Korepanova ◽  
◽  
Valerii D. Oliseenko ◽  
Maxim V. Abramov ◽  
Alexander L. Tulupyev ◽  
...  

The article describes the approach to solving the problem of comparing user profiles of different social networks and identifying those that belong to one person. An appropriate method is proposed based on a comparison of the social environment and the values of account profile attributes in two different social networks. The results of applying various machine learning models to solving this problem are compared. The novelty of the approach lies in the proposed new combination of various methods and application to new social networks. The practical significance of the study is to automate the process of determining the ownership of profiles in various social networks to one user. These results can be applied in the task of constructing a meta-profile of a user of an information system for the subsequent construction of a profile of his vulnerabilities, as well as in other studies devoted to social networks.


Data is the most crucial component of a successful ML system. Once a machine learning model is developed, it gets obsolete over time due to presence of new input data being generated every second. In order to keep our predictions accurate we need to find a way to keep our models up to date. Our research work involves finding a mechanism which can retrain the model with new data automatically. This research also involves exploring the possibilities of automating machine learning processes. We started this project by training and testing our model using conventional machine learning methods. The outcome was then compared with the outcome of those experiments conducted using the AutoML methods like TPOT. This helped us in finding an efficient technique to retrain our models. These techniques can be used in areas where people do not deal with the actual working of a ML model but only require the outputs of ML processes


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Jacob Schreiber ◽  
Ritambhara Singh ◽  
Jeffrey Bilmes ◽  
William Stafford Noble

AbstractMachine learning models that predict genomic activity are most useful when they make accurate predictions across cell types. Here, we show that when the training and test sets contain the same genomic loci, the resulting model may falsely appear to perform well by effectively memorizing the average activity associated with each locus across the training cell types. We demonstrate this phenomenon in the context of predicting gene expression and chromatin domain boundaries, and we suggest methods to diagnose and avoid the pitfall. We anticipate that, as more data becomes available, future projects will increasingly risk suffering from this issue.


2020 ◽  
Author(s):  
A Pozzi ◽  
C Raffone ◽  
MG Belcastro ◽  
TL Camilleri-Carter

ABSTRACTObjectivesUsing cranial measurements in two Italian populations, we compare machine learning methods to the more traditional method of linear discriminant analysis in estimating sex. We use crania in sex estimation because it is useful especially when remains are fragmented or displaced, and the cranium may be the only remains found.Materials and MethodsUsing the machine learning methods of decision tree learning, support-vector machines, k-nearest neighbor algorithm, and ensemble methods we estimate the sex of two populations: Samples from Bologna and samples from the island of Sardinia. We used two datasets, one containing 17 cranial measurements, and one measuring the foramen magnum.Results and DiscussionOur results indicate that machine learning models produce similar results to linear discriminant analysis, but in some cases machine learning produces more consistent accuracy between the sexes. Our study shows that sex can be accurately predicted (> 80%) in Italian populations using the cranial measurements we gathered, except for the foramen magnum, which shows a level of accuracy of ∼70% accurate which is on par with previous geometric morphometrics studies using crania in sex estimation. We also find that our trained machine learning models produce population-specific results; we see that Italian crania are sexually dimorphic, but the features that are important to this dimorphism differ between the populations.


2021 ◽  
Author(s):  
Naoki Miyaguchi ◽  
Koh Takeuchi ◽  
Hisashi Kashima ◽  
Mizuki Morita ◽  
Hiroshi Morimatsu

Abstract Recently, research has been conducted to automatically control anesthesia using machine learning, with the aim of alleviating the shortage of anesthesiologists. In this study, we address the problem of predicting decisions made by anesthesiologists during surgery using machine learning; specifically, we formulate a decision making problem by increasing the flow rate at each time point in the continuous administration of analgesic remifentanil as a supervised binary classification problem. The experiments were conducted to evaluate the prediction performance using six machine learning models: logistic regression, support vector machine, random forest, LightGBM, artificial neural network, and long short-term memory (LSTM), using 210 case data collected during actual surgeries. The results demonstrated that when predicting the future increase in flow rate of remifentanil after 1 min, the model using LSTM was able to predict with scores of 0.659 for sensitivity, 0.732 for specificity, and 0.753 for ROC-AUC; this demonstrates the potential to predict the decisions made by anesthesiologists using machine learning. Furthermore, we examined the importance and contribution of the features of each model using shapley additive explanations—a method for interpreting predictions made by machine learning models. The trends indicated by the results were partially consistent with known clinical findings.


2019 ◽  
pp. 1-11 ◽  
Author(s):  
David Chen ◽  
Gaurav Goyal ◽  
Ronald S. Go ◽  
Sameer A. Parikh ◽  
Che G. Ngufor

PURPOSE Time to event is an important aspect of clinical decision making. This is particularly true when diseases have highly heterogeneous presentations and prognoses, as in chronic lymphocytic lymphoma (CLL). Although machine learning methods can readily learn complex nonlinear relationships, many methods are criticized as inadequate because of limited interpretability. We propose using unsupervised clustering of the continuous output of machine learning models to provide discrete risk stratification for predicting time to first treatment in a cohort of patients with CLL. PATIENTS AND METHODS A total of 737 treatment-naïve patients with CLL diagnosed at Mayo Clinic were included in this study. We compared predictive abilities for two survival models (Cox proportional hazards and random survival forest) and four classification methods (logistic regression, support vector machines, random forest, and gradient boosting machine). Probability of treatment was then stratified. RESULTS Machine learning methods did not yield significantly more accurate predictions of time to first treatment. However, automated risk stratification provided by clustering was able to better differentiate patients who were at risk for treatment within 1 year than models developed using standard survival analysis techniques. CONCLUSION Clustering the posterior probabilities of machine learning models provides a way to better interpret machine learning models.


Author(s):  
Maxat Kulmanov ◽  
Fatima Zohra Smaili ◽  
Xin Gao ◽  
Robert Hoehndorf

Abstract Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge and they are employed in almost every major biological database. Recently, ontologies are increasingly being used to provide background knowledge in similarity-based analysis and machine learning models. The methods employed to combine ontologies and machine learning are still novel and actively being developed. We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in ontologies and how ontologies can provide constraints that improve machine learning models. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at https://github.com/bio-ontology-research-group/machine-learning-with-ontologies.


Sign in / Sign up

Export Citation Format

Share Document