scholarly journals Minimum Viable Model Estimates for Machine Learning Projects

2020 ◽  
Author(s):  
John Hawkins

Prioritization of machine learning projects requires estimates of both the potential ROI of the business case and the technical difficulty of building a model with the required characteristics. In this work we present a technique for estimating the minimum required performance characteristics of a predictive model given a set of information about how it will be used. This technique will result in robust, objective comparisons between potential projects. The resulting estimates will allow data scientists and managers to evaluate whether a proposed machine learning project is likely to succeed before any modelling needs to be done. The technique has been implemented into the open source application MinViME (Minimum Viable Model Estimator) which can be installed via the PyPI python package management system, or downloaded directly from the GitHub repository. Available at https://github.com/john-hawkins/MinViME.

Author(s):  
Wei Hao Khoong

In this paper, we introduce deboost, a Python library devoted to weighted distance ensembling of predictions for regression and classification tasks. Its backbone resides on the scikit-learn library for default models and data preprocessing functions. It offers flexible choices of models for the ensemble as long as they contain the predict method, like the models available from scikit-learn. deboost is released under the MIT open-source license and can be downloaded from the Python Package Index (PyPI) at https://pypi.org/project/deboost. The source scripts are also available on a GitHub repository at https://github.com/weihao94/DEBoost.


2021 ◽  
Author(s):  
Luc Thomès ◽  
Rebekka Burkholz ◽  
Daniel Bojar

AbstractAs a biological sequence, glycans occur in every domain of life and comprise monosaccharides that are chained together to form oligo- or polysaccharides. While glycans are crucial for most biological processes, existing analysis modalities make it difficult for researchers with limited computational background to include information from these diverse and nonlinear sequences into standard workflows. Here, we present glycowork, an open-source Python package that was designed for the processing and analysis of glycan data by end users, with a strong focus on glycan-related data science and machine learning. Glycowork includes numerous functions to, for instance, automatically annotate glycan motifs and analyze their distributions via heatmaps and statistical enrichment. We also provide visualization methods, routines to interact with stored databases, trained machine learning models, and learned glycan representations. We envision that glycowork can extract further insights from any glycan dataset and demonstrate this with several workflows that analyze glycan motifs in various biological contexts. Glycowork can be freely accessed at https://github.com/BojarLab/glycowork/.


2018 ◽  
Author(s):  
Masayuki Sato ◽  
Hirotaka Suetake ◽  
Masaaki Kotera

AbstractMotivationIn silico methodologies to assess pharmaceutical activity and toxicity are increasingly important in QSAR, and many chemical fingerprints have been developed to tackle this problem. Among them, KEGG Chemical Function and Substructure (KCF-S) has been shown to perform well in some pharmaceutical and metabolic studies. However, the software that generates KCF-S fingerprints has limited usability: the input file must be Molfile or SDF format, and the output is only a text file.ResultsWe established a new Python package, KCF-Convoy, to generate KCF format and KCF-S fingerprints from Molfile, SDF, SMILES, and InChI seamlessly. The obtained KCF-S was used in a number of supervised machine-learning methods to distinguish herbicides from other pesticides, and to find characteristic substructures in taxonomy groups.AvailabilityKCF-Convoy is implemented as a Python package freely available at https://github.com/KCF-Convoy and the user can use the package management system “pip” and also the Docker [email protected]


2021 ◽  
Vol 11 (19) ◽  
pp. 8861
Author(s):  
Philipp Ruf ◽  
Manav Madan ◽  
Christoph Reich ◽  
Djaffar Ould-Abdeslam

Nowadays, machine learning projects have become more and more relevant to various real-world use cases. The success of complex Neural Network models depends upon many factors, as the requirement for structured and machine learning-centric project development management arises. Due to the multitude of tools available for different operational phases, responsibilities and requirements become more and more unclear. In this work, Machine Learning Operations (MLOps) technologies and tools for every part of the overall project pipeline, as well as involved roles, are examined and clearly defined. With the focus on the inter-connectivity of specific tools and comparison by well-selected requirements of MLOps, model performance, input data, and system quality metrics are briefly discussed. By identifying aspects of machine learning, which can be reused from project to project, open-source tools which help in specific parts of the pipeline, and possible combinations, an overview of support in MLOps is given. Deep learning has revolutionized the field of Image processing, and building an automated machine learning workflow for object detection is of great interest for many organizations. For this, a simple MLOps workflow for object detection with images is portrayed.


2020 ◽  
Vol 167 ◽  
pp. 1950-1959 ◽  
Author(s):  
Sonali Dubey ◽  
Pushpa Singh ◽  
Piyush Yadav ◽  
Krishna Kant Singh

2020 ◽  
Vol 53 (5) ◽  
pp. 704-709
Author(s):  
Yan Liu ◽  
Zhijing Ling ◽  
Boyu Huo ◽  
Boqian Wang ◽  
Tianen Chen ◽  
...  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document