Towards scalable online machine learning collaborations with OpenML

Is massively collaborative machine learning possible? Can we share and organize our collective knowledge of machine learning to solve ever more challenging problems? In a way, yes: as a community, we are already very successful at developing high-quality open-source machine learning libraries, thanks to frictionless collaboration platforms for software development. However, code is only one aspect. The answer is much less clear when we also consider the data that goes into these algorithms and the exact models that are produced. A tremendous amount of work and experience goes into the collection, cleaning, and preprocessing of data and the design, evaluation, and finetuning of models, yet very little of this is shared and organized in a way so that others can easily build on it. Suppose one had a global platform for sharing machine learning datasets, models, and reproducible experiments in a frictionless way so that anybody could chip in at any time to share a good model, add or improve data, or suggest an idea. OpenML is an open-source initiative to create such a platform. It allows anyone to share datasets, machine learning pipelines, and full experiments, organizes all of it online with rich metadata, and enables anyone to reuse and build on them in novel and unexpected ways. All data is open and accessible through APIs, and it is readily integrated into popular machine learning tools to allow easy sharing of models and experiments. This openness also allows a budding ecosystem of automated processes to scale up machine learning further, such as discovering similar datasets, creating systematic benchmarks, or learning from all collected results how to build the best machine learning models and even automatically doing so for any new dataset. We welcome all of you to become a part of it.

Download Full-text

Latest Tools for Data Mining and Machine Learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i1003.0789s19 ◽

2019 ◽

Vol 8 (9S) ◽

pp. 18-23 ◽

Cited By ~ 2

Keyword(s):

Machine Learning ◽

Data Mining ◽

Decision Making ◽

Feature Selection ◽

Open Source ◽

Predictive Analysis ◽

Learning Tools ◽

Pros And Cons ◽

Selection For ◽

Extract Information

Nowadays, Data Mining is used everywhere for extracting information from the data and in turn, acquires knowledge for decision making. Data Mining analyzes patterns which are used to extract information and knowledge for making decisions. Many open source and licensed tools like Weka, RapidMiner, KNIME, and Orange are available for Data Mining and predictive analysis. This paper discusses about different tools available for Data Mining and Machine Learning, followed by the description, pros and cons of these tools. The article provides details of all the algorithms like classification, regression, characterization, discretization, clustering, visualization and feature selection for Data Mining and Machine Learning tools. It will help people for efficient decision making and suggests which tool is suitable according to their requirement.

Download Full-text

Side-Channel Countermeasures’ Dissection and the Limits of Closed Source Security Evaluations

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2020.i2.1-25 ◽

2020 ◽

pp. 1-25 ◽

Cited By ~ 1

Author(s):

Olivier Bronchain ◽

François-Xavier Standaert

Keyword(s):

Machine Learning ◽

Open Source ◽

Side Channel ◽

Learning Tools ◽

Side Channel Attacks ◽

Data Dependencies ◽

Security Evaluations ◽

Systèmes D’Information ◽

Closed Source ◽

Leakage Assessment

We take advantage of a recently published open source implementation of the AES protected with a mix of countermeasures against side-channel attacks to discuss both the challenges in protecting COTS devices against such attacks and the limitations of closed source security evaluations. The target implementation has been proposed by the French ANSSI (Agence Nationale de la Sécurité des Systèmes d’Information) to stimulate research on the design and evaluation of side-channel secure implementations. It combines additive and multiplicative secret sharings into an affine masking scheme that is additionally mixed with a shuffled execution. Its preliminary leakage assessment did not detect data dependencies with up to 100,000 measurements. We first exhibit the gap between such a preliminary leakage assessment and advanced attacks by demonstrating how a countermeasures’ dissection exploiting a mix of dimensionality reduction, multivariate information extraction and key enumeration can recover the full key with less than 2,000 measurements. We then discuss the relevance of open source evaluations to analyze such implementations efficiently, by pointing out that certain steps of the attack are hard to automate without implementation knowledge (even with machine learning tools), while performing them manually is straightforward. Our findings are not due to design flaws but from the general difficulty to prevent side-channel attacks in COTS devices with limited noise. We anticipate that high security on such devices requires significantly more shares.

Download Full-text

A Comparative Study of Different Machine Learning Tools

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i4.184190 ◽

2019 ◽

Vol 7 (4) ◽

pp. 184-190

Author(s):

Himani Maheshwari ◽

Pooja Goswami ◽

Isha Rana

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Learning Tools

Download Full-text

DEVELOPMENT OF AN OPEN SOURCE, MACHINE LEARNING BASED TOOLSET FOR THE IDENTIFICATION OF DIKES IN SATELLITE IMAGES THROUGH SEMANTIC SEGMENTATION

10.1130/abs/2020am-357672 ◽

2020 ◽

Author(s):

Ryan Gray ◽

◽

Tushar Mittal

Keyword(s):

Machine Learning ◽

Open Source ◽

Satellite Images ◽

Semantic Segmentation

Download Full-text

Toward an Open-source Toolkit for Machine Learning Education

Proceedings of the 51st ACM Technical Symposium on Computer Science Education ◽

10.1145/3328778.3372531 ◽

2020 ◽

Author(s):

N. Rich Nguyen

Keyword(s):

Machine Learning ◽

Open Source

Download Full-text

Improved nutrient management in cereals using Nutrient Expert and machine learning tools: Productivity, profitability and nutrient use efficiency

Agricultural Systems ◽

10.1016/j.agsy.2021.103181 ◽

2021 ◽

Vol 192 ◽

pp. 103181

Author(s):

Jagadish Timsina ◽

Sudarshan Dutta ◽

Krishna Prasad Devkota ◽

Somsubhra Chakraborty ◽

Ram Krishna Neupane ◽

...

Keyword(s):

Machine Learning ◽

Nutrient Management ◽

Nutrient Use Efficiency ◽

Learning Tools ◽

Nutrient Use ◽

Use Efficiency

Download Full-text

Paper2Wire – A Case Study of User-Centred Development of Machine Learning Tools for UX Designers

i-com ◽

10.1515/icom-2021-0002 ◽

2021 ◽

Vol 20 (1) ◽

pp. 19-32

Author(s):

Daniel Buschek ◽

Charlotte Anlauff ◽

Florian Lachner

Keyword(s):

Machine Learning ◽

Development Process ◽

User Study ◽

Concept Development ◽

Lessons Learned ◽

Design Tool ◽

Learning Tools ◽

Interface Elements ◽

Industry Partner

Abstract This paper reflects on a case study of a user-centred concept development process for a Machine Learning (ML) based design tool, conducted at an industry partner. The resulting concept uses ML to match graphical user interface elements in sketches on paper to their digital counterparts to create consistent wireframes. A user study (N=20) with a working prototype shows that this concept is preferred by designers, compared to the previous manual procedure. Reflecting on our process and findings we discuss lessons learned for developing ML tools that respect practitioners’ needs and practices.

Download Full-text