Minimum Viable Model Estimates for Machine Learning Projects

Mapping Intimacies ◽

10.5121/csit.2020.101803 ◽

2020 ◽

Author(s):

John Hawkins

Keyword(s):

Machine Learning ◽

Open Source ◽

Predictive Model ◽

Management System ◽

Technical Difficulty ◽

Business Case ◽

Performance Characteristics ◽

Learning Projects ◽

Viable Model ◽

Python Package

Prioritization of machine learning projects requires estimates of both the potential ROI of the business case and the technical difficulty of building a model with the required characteristics. In this work we present a technique for estimating the minimum required performance characteristics of a predictive model given a set of information about how it will be used. This technique will result in robust, objective comparisons between potential projects. The resulting estimates will allow data scientists and managers to evaluate whether a proposed machine learning project is likely to succeed before any modelling needs to be done. The technique has been implemented into the open source application MinViME (Minimum Viable Model Estimator) which can be installed via the PyPI python package management system, or downloaded directly from the GitHub repository. Available at https://github.com/john-hawkins/MinViME.

Download Full-text

DEBoost: A Python Library for Weighted Distance Ensembling in Machine Learning

10.20944/preprints202005.0354.v1 ◽

2020 ◽

Author(s):

Wei Hao Khoong

Keyword(s):

Machine Learning ◽

Open Source ◽

Data Preprocessing ◽

Weighted Distance ◽

Open Source License ◽

Classification Tasks ◽

Python Package

In this paper, we introduce deboost, a Python library devoted to weighted distance ensembling of predictions for regression and classification tasks. Its backbone resides on the scikit-learn library for default models and data preprocessing functions. It offers flexible choices of models for the ensemble as long as they contain the predict method, like the models available from scikit-learn. deboost is released under the MIT open-source license and can be downloaded from the Python Package Index (PyPI) at https://pypi.org/project/deboost. The source scripts are also available on a GitHub repository at https://github.com/weihao94/DEBoost.

Download Full-text

Glycowork: A Python package for glycan data science and machine learning

10.1101/2021.04.22.440981 ◽

2021 ◽

Author(s):

Luc Thomès ◽

Rebekka Burkholz ◽

Daniel Bojar

Keyword(s):

Machine Learning ◽

Open Source ◽

Data Science ◽

Biological Processes ◽

Biological Sequence ◽

Learning Models ◽

Related Data ◽

Strong Focus ◽

Python Package ◽

Machine Learning Models

AbstractAs a biological sequence, glycans occur in every domain of life and comprise monosaccharides that are chained together to form oligo- or polysaccharides. While glycans are crucial for most biological processes, existing analysis modalities make it difficult for researchers with limited computational background to include information from these diverse and nonlinear sequences into standard workflows. Here, we present glycowork, an open-source Python package that was designed for the processing and analysis of glycan data by end users, with a strong focus on glycan-related data science and machine learning. Glycowork includes numerous functions to, for instance, automatically annotate glycan motifs and analyze their distributions via heatmaps and statistical enrichment. We also provide visualization methods, routines to interact with stored databases, trained machine learning models, and learned glycan representations. We envision that glycowork can extract further insights from any glycan dataset and demonstrate this with several workflows that analyze glycan motifs in various biological contexts. Glycowork can be freely accessed at https://github.com/BojarLab/glycowork/.

Download Full-text

KCF-Convoy: efficient Python package to convert KEGG Chemical Function and Substructure fingerprints

10.1101/452383 ◽

2018 ◽

Author(s):

Masayuki Sato ◽

Hirotaka Suetake ◽

Masaaki Kotera

Keyword(s):

Machine Learning ◽

In Silico ◽

Management System ◽

Supervised Machine Learning ◽

Input File ◽

Text File ◽

Machine Learning Methods ◽

Chemical Fingerprints ◽

Pharmaceutical Activity ◽

Python Package

AbstractMotivationIn silico methodologies to assess pharmaceutical activity and toxicity are increasingly important in QSAR, and many chemical fingerprints have been developed to tackle this problem. Among them, KEGG Chemical Function and Substructure (KCF-S) has been shown to perform well in some pharmaceutical and metabolic studies. However, the software that generates KCF-S fingerprints has limited usability: the input file must be Molfile or SDF format, and the output is only a text file.ResultsWe established a new Python package, KCF-Convoy, to generate KCF format and KCF-S fingerprints from Molfile, SDF, SMILES, and InChI seamlessly. The obtained KCF-S was used in a number of supervised machine-learning methods to distinguish herbicides from other pesticides, and to find characteristic substructures in taxonomy groups.AvailabilityKCF-Convoy is implemented as a Python package freely available at https://github.com/KCF-Convoy and the user can use the package management system “pip” and also the Docker [email protected]

Download Full-text

Demystifying MLOps and Presenting a Recipe for the Selection of Open-Source Tools

Applied Sciences ◽

10.3390/app11198861 ◽

2021 ◽

Vol 11 (19) ◽

pp. 8861

Author(s):

Philipp Ruf ◽

Manav Madan ◽

Christoph Reich ◽

Djaffar Ould-Abdeslam

Keyword(s):

Machine Learning ◽

Object Detection ◽

Open Source ◽

Model Performance ◽

Network Models ◽

System Quality ◽

Neural Network Models ◽

Automated Machine Learning ◽

Learning Projects ◽

Selection Of

Nowadays, machine learning projects have become more and more relevant to various real-world use cases. The success of complex Neural Network models depends upon many factors, as the requirement for structured and machine learning-centric project development management arises. Due to the multitude of tools available for different operational phases, responsibilities and requirements become more and more unclear. In this work, Machine Learning Operations (MLOps) technologies and tools for every part of the overall project pipeline, as well as involved roles, are examined and clearly defined. With the focus on the inter-connectivity of specific tools and comparison by well-selected requirements of MLOps, model performance, input data, and system quality metrics are briefly discussed. By identifying aspects of machine learning, which can be reused from project to project, open-source tools which help in specific parts of the pipeline, and possible combinations, an overview of support in MLOps is given. Deep learning has revolutionized the field of Image processing, and building an automated machine learning workflow for object detection is of great interest for many organizations. For this, a simple MLOps workflow for object detection with images is portrayed.

Download Full-text

An IoT based smart irrigation management system using Machine learning and open source technologies

Computers and Electronics in Agriculture ◽

10.1016/j.compag.2018.09.040 ◽

2018 ◽

Vol 155 ◽

pp. 41-49 ◽

Cited By ~ 68

Author(s):

Amarendra Goap ◽

Deepak Sharma ◽

A.K. Shukla ◽

C. Rama Krishna

Keyword(s):

Machine Learning ◽

Open Source ◽

Management System ◽

Irrigation Management

Download Full-text

Exploration of Predictive Model for Learning Outcomes of Students in the E-learning Environment by Using Machine Learning

Korean Association For Learner-Centered Curriculum And Instruction ◽

10.22251/jlcci.2018.18.21.553 ◽

2018 ◽

Vol 18 (21) ◽

pp. 553-572

Author(s):

Hunkoog Jho

Keyword(s):

Machine Learning ◽

Learning Environment ◽

Predictive Model ◽

Learning Outcomes ◽

E Learning

Download Full-text

DEVELOPMENT OF AN OPEN SOURCE, MACHINE LEARNING BASED TOOLSET FOR THE IDENTIFICATION OF DIKES IN SATELLITE IMAGES THROUGH SEMANTIC SEGMENTATION

10.1130/abs/2020am-357672 ◽

2020 ◽

Author(s):

Ryan Gray ◽

◽

Tushar Mittal

Keyword(s):

Machine Learning ◽

Open Source ◽

Satellite Images ◽

Semantic Segmentation

Download Full-text

Toward an Open-source Toolkit for Machine Learning Education

Proceedings of the 51st ACM Technical Symposium on Computer Science Education ◽

10.1145/3328778.3372531 ◽

2020 ◽

Author(s):

N. Rich Nguyen

Keyword(s):

Machine Learning ◽

Open Source

Download Full-text

Household Waste Management System Using IoT and Machine Learning

Procedia Computer Science ◽

10.1016/j.procs.2020.03.222 ◽

2020 ◽

Vol 167 ◽

pp. 1950-1959 ◽

Cited By ~ 3

Author(s):

Sonali Dubey ◽

Pushpa Singh ◽

Piyush Yadav ◽

Krishna Kant Singh

Keyword(s):

Machine Learning ◽

Waste Management ◽

Management System ◽

Household Waste ◽

Waste Management System ◽

Household Waste Management

Download Full-text

Building A Platform for Machine Learning Operations from Open Source Frameworks

IFAC-PapersOnLine ◽

10.1016/j.ifacol.2021.04.161 ◽

2020 ◽

Vol 53 (5) ◽

pp. 704-709

Author(s):

Yan Liu ◽

Zhijing Ling ◽

Boyu Huo ◽

Boqian Wang ◽

Tianen Chen ◽

...

Keyword(s):

Machine Learning ◽

Open Source

Download Full-text