scholarly journals Arangopipe, a tool for machine learning meta-data management

Data Science ◽  
2021 ◽  
pp. 1-15
Author(s):  
Jörg Schad ◽  
Rajiv Sambasivan ◽  
Christopher Woodward

Experimenting with different models, documenting results and findings, and repeating these tasks are day-to-day activities for machine learning engineers and data scientists. There is a need to keep control of the machine-learning pipeline and its metadata. This allows users to iterate quickly through experiments and retrieve key findings and observations from historical activity. This is the need that Arangopipe serves. Arangopipe is an open-source tool that provides a data model that captures the essential components of any machine learning life cycle. Arangopipe provides an application programming interface that permits machine-learning engineers to record the details of the salient steps in building their machine learning models. The components of the data model and an overview of the application programming interface is provided. Illustrative examples of basic and advanced machine learning workflows are provided. Arangopipe is not only useful for users involved in developing machine learning models but also useful for users deploying and maintaining them.

Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3691
Author(s):  
Ciprian Orhei ◽  
Silviu Vert ◽  
Muguras Mocofan ◽  
Radu Vasiu

Computer Vision is a cross-research field with the main purpose of understanding the surrounding environment as closely as possible to human perception. The image processing systems is continuously growing and expanding into more complex systems, usually tailored to the certain needs or applications it may serve. To better serve this purpose, research on the architecture and design of such systems is also important. We present the End-to-End Computer Vision Framework, an open-source solution that aims to support researchers and teachers within the image processing vast field. The framework has incorporated Computer Vision features and Machine Learning models that researchers can use. In the continuous need to add new Computer Vision algorithms for a day-to-day research activity, our proposed framework has an advantage given by the configurable and scalar architecture. Even if the main focus of the framework is on the Computer Vision processing pipeline, the framework offers solutions to incorporate even more complex activities, such as training Machine Learning models. EECVF aims to become a useful tool for learning activities in the Computer Vision field, as it allows the learner and the teacher to handle only the topics at hand, and not the interconnection necessary for visual processing flow.


2021 ◽  
Author(s):  
Tuomo Kalliokoski

The software macHine leArning booSTed dockiNg (HASTEN) was developed to accelerate<br>structure-based virtual screening using machine learning models. It has been validated using<br>datasets both from literature (12 datasets, each containing three million molecules docked<br>with FRED) and in-house sources (one dataset of four million compounds docked with<br>Glide). HASTEN showed reasonable performance by having the mean recall value of 0.78 of<br>the top one percent scoring molecules after docking 10 % of the dataset for the literature data,<br>whereas excellent recall value of 0.95 was achieved for the in-house data. The program can be<br>used with any docking- and machine learning methodology, and is freely available from<br>https://github.com/TuomoKalliokoski/HASTEN.


2016 ◽  
Vol 7 (2) ◽  
pp. 43-71 ◽  
Author(s):  
Sangeeta Lal ◽  
Neetu Sardana ◽  
Ashish Sureka

Logging is an important yet tough decision for OSS developers. Machine-learning models are useful in improving several steps of OSS development, including logging. Several recent studies propose machine-learning models to predict logged code construct. The prediction performances of these models are limited due to the class-imbalance problem since the number of logged code constructs is small as compared to non-logged code constructs. No previous study analyzes the class-imbalance problem for logged code construct prediction. The authors first analyze the performances of J48, RF, and SVM classifiers for catch-blocks and if-blocks logged code constructs prediction on imbalanced datasets. Second, the authors propose LogIm, an ensemble and threshold-based machine-learning model. Third, the authors evaluate the performance of LogIm on three open-source projects. On average, LogIm model improves the performance of baseline classifiers, J48, RF, and SVM, by 7.38%, 9.24%, and 4.6% for catch-blocks, and 12.11%, 14.95%, and 19.13% for if-blocks logging prediction.


2021 ◽  
Vol 40 (1) ◽  
pp. 35-44
Author(s):  
Whitney Trainor-Guitton ◽  
Leo Turon ◽  
Dominique Dubucq

The Python Earth Engine application programming interface (API) provides a new open-source ecosphere for testing hydrocarbon detection algorithms on large volumes of images curated with the Google Earth Engine. We specifically demonstrate the Python Earth Engine API by calculating three hydrocarbon indices: fluorescence, rotation absorption, and normalized fluorescence. The Python Earth Engine API provides an ideal environment for testing these indices with varied oil seeps and spills by (1) removing barriers of proprietary software formats and (2) providing an extensive library of data analysis tools (e.g., Pandas and Seaborn) and classification algorithms (e.g., Scikit-learn and TensorFlow). Our results demonstrate end-member cases in which fluorescence and normalized fluorescence indices of seawater and oil are statistically similar and different. As expected, predictive classification is more effective and the calculated probability of oil is more accurate for scenarios in which seawater and oil are well separated in the fluorescence space.


2021 ◽  
Vol 10 (1-2) ◽  
pp. 30-42
Author(s):  
Guan-Yuan Wang

Abstract Since the smartphone market is an oligopoly market structure, consumer purchase intention is usually driven by brand preference. This research analyses the customer-to-customer market of second-hand smartphones, pointing out how the brand factor affects the consumers’ purchasing behaviour. It is found that the recovery value and life cycle of Apple smartphones are higher and longer than those of other brands. Moreover, the recovery value of other brand smartphones is significantly driven by the debut date of the Apple smartphones, implicitly forming a consumption cycle. In addition, through machine learning models, the predictability for the recovery value is able to reach 93.55%.


Author(s):  
Santo Wijaya ◽  
Marta H.R.S.R. Sari ◽  
Adian Wihariono Putera

Pendidikan sebagai industri produk dan jasa berbasis ilmu pengetahuan dan keterampilan menghadapi persaingan yang semakin kompetitif dengan banyaknya institusi baik dalam dan luar negeri yang operasional di Indonesia. Untuk meningkatkan daya saing, maka utilisasi teknologi informasi khususnya di era revolusi industri 4.0 menjadi kunci penting. Penelitian ini bertujuan untuk mengembangkan Sistem Informasi Registrasi Mahasiswa Baru (SIRMB) menggunakan kerangka open-source web-based application serta integrasinya dengan teknologi Application Programming Interface (API) Bank BNI menjadikan layanan administrasi yang terotomasi. Proses identifikasi masalah sampai perancangan solusi SIRMB menggunakan analisis gugus kendali mutu (QCC) dengan pendekatan metode Plan-Do-Check-Action (PDCA) sehingga menjamin perbaikan yang berkesinambungan. Penelitian ini berkontribusi terhadap perbaikan 76.9% terhadap proses kerja dengan eliminasi proses kerja manual registrasi mahasiswa baru, sehingga memberikan peningkatan kualitas layanan dan peningkatan produktivitas secara keseluruhan.


2021 ◽  
Author(s):  
Luc Thomès ◽  
Rebekka Burkholz ◽  
Daniel Bojar

AbstractAs a biological sequence, glycans occur in every domain of life and comprise monosaccharides that are chained together to form oligo- or polysaccharides. While glycans are crucial for most biological processes, existing analysis modalities make it difficult for researchers with limited computational background to include information from these diverse and nonlinear sequences into standard workflows. Here, we present glycowork, an open-source Python package that was designed for the processing and analysis of glycan data by end users, with a strong focus on glycan-related data science and machine learning. Glycowork includes numerous functions to, for instance, automatically annotate glycan motifs and analyze their distributions via heatmaps and statistical enrichment. We also provide visualization methods, routines to interact with stored databases, trained machine learning models, and learned glycan representations. We envision that glycowork can extract further insights from any glycan dataset and demonstrate this with several workflows that analyze glycan motifs in various biological contexts. Glycowork can be freely accessed at https://github.com/BojarLab/glycowork/.


Sign in / Sign up

Export Citation Format

Share Document