Arangopipe, a tool for machine learning meta-data management

Experimenting with different models, documenting results and findings, and repeating these tasks are day-to-day activities for machine learning engineers and data scientists. There is a need to keep control of the machine-learning pipeline and its metadata. This allows users to iterate quickly through experiments and retrieve key findings and observations from historical activity. This is the need that Arangopipe serves. Arangopipe is an open-source tool that provides a data model that captures the essential components of any machine learning life cycle. Arangopipe provides an application programming interface that permits machine-learning engineers to record the details of the salient steps in building their machine learning models. The components of the data model and an overview of the application programming interface is provided. Illustrative examples of basic and advanced machine learning workflows are provided. Arangopipe is not only useful for users involved in developing machine learning models but also useful for users deploying and maintaining them.

Download Full-text

End-To-End Computer Vision Framework: An Open-Source Platform for Research and Education

Sensors ◽

10.3390/s21113691 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3691

Author(s):

Ciprian Orhei ◽

Silviu Vert ◽

Muguras Mocofan ◽

Radu Vasiu

Keyword(s):

Machine Learning ◽

Image Processing ◽

Computer Vision ◽

Open Source ◽

Visual Processing ◽

Research Field ◽

Learning Models ◽

Research Activity ◽

End To End ◽

Machine Learning Models

Computer Vision is a cross-research field with the main purpose of understanding the surrounding environment as closely as possible to human perception. The image processing systems is continuously growing and expanding into more complex systems, usually tailored to the certain needs or applications it may serve. To better serve this purpose, research on the architecture and design of such systems is also important. We present the End-to-End Computer Vision Framework, an open-source solution that aims to support researchers and teachers within the image processing vast field. The framework has incorporated Computer Vision features and Machine Learning models that researchers can use. In the continuous need to add new Computer Vision algorithms for a day-to-day research activity, our proposed framework has an advantage given by the configurable and scalar architecture. Even if the main focus of the framework is on the Computer Vision processing pipeline, the framework offers solutions to incorporate even more complex activities, such as training Machine Learning models. EECVF aims to become a useful tool for learning activities in the Computer Vision field, as it allows the learner and the teacher to handle only the topics at hand, and not the interconnection necessary for visual processing flow.

Download Full-text

Machine Learning Boosted Docking (HASTEN): An Open-Source Tool To Accelerate Structurebased Virtual Screening Campaigns

10.26434/chemrxiv.14345849 ◽

2021 ◽

Author(s):

Tuomo Kalliokoski

Keyword(s):

Machine Learning ◽

Virtual Screening ◽

Open Source ◽

Learning Models ◽

Open Source Tool ◽

The Mean ◽

Machine Learning Models

The software macHine leArning booSTed dockiNg (HASTEN) was developed to accelerate structure-based virtual screening using machine learning models. It has been validated using datasets both from literature (12 datasets, each containing three million molecules docked with FRED) and in-house sources (one dataset of four million compounds docked with Glide). HASTEN showed reasonable performance by having the mean recall value of 0.78 of the top one percent scoring molecules after docking 10 % of the dataset for the literature data, whereas excellent recall value of 0.95 was achieved for the in-house data. The program can be used with any docking- and machine learning methodology, and is freely available from https://github.com/TuomoKalliokoski/HASTEN.

Download Full-text

Improving Logging Prediction on Imbalanced Datasets

International Journal of Open Source Software and Processes ◽

10.4018/ijossp.2016040103 ◽

2016 ◽

Vol 7 (2) ◽

pp. 43-71 ◽

Cited By ~ 3

Author(s):

Sangeeta Lal ◽

Neetu Sardana ◽

Ashish Sureka

Keyword(s):

Machine Learning ◽

Open Source ◽

Class Imbalance ◽

Learning Model ◽

Learning Models ◽

Class Imbalance Problem ◽

Imbalanced Datasets ◽

Imbalance Problem ◽

Machine Learning Model ◽

Machine Learning Models

Logging is an important yet tough decision for OSS developers. Machine-learning models are useful in improving several steps of OSS development, including logging. Several recent studies propose machine-learning models to predict logged code construct. The prediction performances of these models are limited due to the class-imbalance problem since the number of logged code constructs is small as compared to non-logged code constructs. No previous study analyzes the class-imbalance problem for logged code construct prediction. The authors first analyze the performances of J48, RF, and SVM classifiers for catch-blocks and if-blocks logged code constructs prediction on imbalanced datasets. Second, the authors propose LogIm, an ensemble and threshold-based machine-learning model. Third, the authors evaluate the performance of LogIm on three open-source projects. On average, LogIm model improves the performance of baseline classifiers, J48, RF, and SVM, by 7.38%, 9.24%, and 4.6% for catch-blocks, and 12.11%, 14.95%, and 19.13% for if-blocks logging prediction.

Download Full-text

A framework of developing machine learning models for facility life-cycle cost analysis

Building Research & Information ◽

10.1080/09613218.2019.1691488 ◽

2019 ◽

Vol 48 (5) ◽

pp. 501-525 ◽

Cited By ~ 3

Author(s):

Xinghua Gao ◽

Pardis Pishdad-Bozorgi

Keyword(s):

Machine Learning ◽

Life Cycle ◽

Cost Analysis ◽

Life Cycle Cost ◽

Life Cycle Cost Analysis ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Estimate ecotoxicity characterization factors for chemicals in life cycle assessment using machine learning models

Environment International ◽

10.1016/j.envint.2019.105393 ◽

2020 ◽

Vol 135 ◽

pp. 105393 ◽

Cited By ~ 3

Author(s):

Ping Hou ◽

Olivier Jolliet ◽

Ji Zhu ◽

Ming Xu

Keyword(s):

Machine Learning ◽

Life Cycle Assessment ◽

Life Cycle ◽

Characterization Factors ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Saga: An Open Source Platform for Training Machine Learning Models and Community-driven Sharing of Techniques

2019 International Conference on Content-Based Multimedia Indexing (CBMI) ◽

10.1109/cbmi.2019.8877455 ◽

2019 ◽

Author(s):

Rune Johan Borgli ◽

Hakon Kvale Stensland ◽

Pal Halvorsen ◽

Michael Alexander Riegler

Keyword(s):

Machine Learning ◽

Open Source ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Python Earth Engine API as a new open-source ecosphere for characterizing offshore hydrocarbon seeps and spills

The Leading Edge ◽

10.1190/tle40010035.1 ◽

2021 ◽

Vol 40 (1) ◽

pp. 35-44

Author(s):

Whitney Trainor-Guitton ◽

Leo Turon ◽

Dominique Dubucq

Keyword(s):

Open Source ◽

Application Programming Interface ◽

Google Earth ◽

Hydrocarbon Detection ◽

Detection Algorithms ◽

Hydrocarbon Seeps ◽

Oil Seeps ◽

Application Programming ◽

Calculated Probability ◽

Programming Interface

The Python Earth Engine application programming interface (API) provides a new open-source ecosphere for testing hydrocarbon detection algorithms on large volumes of images curated with the Google Earth Engine. We specifically demonstrate the Python Earth Engine API by calculating three hydrocarbon indices: fluorescence, rotation absorption, and normalized fluorescence. The Python Earth Engine API provides an ideal environment for testing these indices with varied oil seeps and spills by (1) removing barriers of proprietary software formats and (2) providing an extensive library of data analysis tools (e.g., Pandas and Seaborn) and classification algorithms (e.g., Scikit-learn and TensorFlow). Our results demonstrate end-member cases in which fluorescence and normalized fluorescence indices of seawater and oil are statistically similar and different. As expected, predictive classification is more effective and the calculated probability of oil is more accurate for scenarios in which seawater and oil are well separated in the fluorescence space.

Download Full-text

The Brand Effect: A Case Study in Taiwan Second-Hand Smartfhone Market

Journal of Social and Economic Statistics ◽

10.2478/jses-2021-0003 ◽

2021 ◽

Vol 10 (1-2) ◽

pp. 30-42

Author(s):

Guan-Yuan Wang

Keyword(s):

Machine Learning ◽

Life Cycle ◽

Purchase Intention ◽

Brand Preference ◽

Learning Models ◽

Consumer Purchase Intention ◽

Purchasing Behaviour ◽

Brand Effect ◽

Machine Learning Models

Abstract Since the smartphone market is an oligopoly market structure, consumer purchase intention is usually driven by brand preference. This research analyses the customer-to-customer market of second-hand smartphones, pointing out how the brand factor affects the consumers’ purchasing behaviour. It is found that the recovery value and life cycle of Apple smartphones are higher and longer than those of other brands. Moreover, the recovery value of other brand smartphones is significantly driven by the debut date of the Apple smartphones, implicitly forming a consumption cycle. In addition, through machine learning models, the predictability for the recovery value is able to reach 93.55%.

Download Full-text

Pengembangan Sistem Informasi Registrasi Mahasiswa Baru dengan Metode Analisis Gugus Kendali Mutu

Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI) ◽

10.23887/janapati.v10i3.41763 ◽

2021 ◽

Vol 10 (3) ◽

pp. 191

Author(s):

Santo Wijaya ◽

Marta H.R.S.R. Sari ◽

Adian Wihariono Putera

Keyword(s):

Open Source ◽

Application Programming Interface ◽

Web Based ◽

Application Programming ◽

Programming Interface

Pendidikan sebagai industri produk dan jasa berbasis ilmu pengetahuan dan keterampilan menghadapi persaingan yang semakin kompetitif dengan banyaknya institusi baik dalam dan luar negeri yang operasional di Indonesia. Untuk meningkatkan daya saing, maka utilisasi teknologi informasi khususnya di era revolusi industri 4.0 menjadi kunci penting. Penelitian ini bertujuan untuk mengembangkan Sistem Informasi Registrasi Mahasiswa Baru (SIRMB) menggunakan kerangka open-source web-based application serta integrasinya dengan teknologi Application Programming Interface (API) Bank BNI menjadikan layanan administrasi yang terotomasi. Proses identifikasi masalah sampai perancangan solusi SIRMB menggunakan analisis gugus kendali mutu (QCC) dengan pendekatan metode Plan-Do-Check-Action (PDCA) sehingga menjamin perbaikan yang berkesinambungan. Penelitian ini berkontribusi terhadap perbaikan 76.9% terhadap proses kerja dengan eliminasi proses kerja manual registrasi mahasiswa baru, sehingga memberikan peningkatan kualitas layanan dan peningkatan produktivitas secara keseluruhan.

Download Full-text

Glycowork: A Python package for glycan data science and machine learning

10.1101/2021.04.22.440981 ◽

2021 ◽

Author(s):

Luc Thomès ◽

Rebekka Burkholz ◽

Daniel Bojar

Keyword(s):

Machine Learning ◽

Open Source ◽

Data Science ◽

Biological Processes ◽

Biological Sequence ◽

Learning Models ◽

Related Data ◽

Strong Focus ◽

Python Package ◽

Machine Learning Models

AbstractAs a biological sequence, glycans occur in every domain of life and comprise monosaccharides that are chained together to form oligo- or polysaccharides. While glycans are crucial for most biological processes, existing analysis modalities make it difficult for researchers with limited computational background to include information from these diverse and nonlinear sequences into standard workflows. Here, we present glycowork, an open-source Python package that was designed for the processing and analysis of glycan data by end users, with a strong focus on glycan-related data science and machine learning. Glycowork includes numerous functions to, for instance, automatically annotate glycan motifs and analyze their distributions via heatmaps and statistical enrichment. We also provide visualization methods, routines to interact with stored databases, trained machine learning models, and learned glycan representations. We envision that glycowork can extract further insights from any glycan dataset and demonstrate this with several workflows that analyze glycan motifs in various biological contexts. Glycowork can be freely accessed at https://github.com/BojarLab/glycowork/.

Download Full-text