scholarly journals Practices and Infrastructures for ML Systems – An Interview Study

Author(s):  
Dennis Muiruri ◽  
Lucy Ellen Lwakatare ◽  
Jukka K. Nurminen ◽  
Tommi Mikkonen

<div> <div> <div> <p>The best practices and infrastructures for developing and maintaining machine learning (ML) enabled software systems are often reported by large and experienced data-driven organizations. However, little is known about the state of practice across other organizations. Using interviews, we investigated practices and tool-chains for ML-enabled systems from 16 organizations in various domains. Our study makes three broad observations related to data management practices, monitoring practices and automation practices in ML model training, and serving workflows. These have limited number of generic practices and tools applicable across organizations in different domains. </p> </div> </div> </div>

2021 ◽  
Author(s):  
Dennis Muiruri ◽  
Lucy Ellen Lwakatare ◽  
Jukka K. Nurminen ◽  
Tommi Mikkonen

<div> <div> <div> <p>The best practices and infrastructures for developing and maintaining machine learning (ML) enabled software systems are often reported by large and experienced data-driven organizations. However, little is known about the state of practice across other organizations. Using interviews, we investigated practices and tool-chains for ML-enabled systems from 16 organizations in various domains. Our study makes three broad observations related to data management practices, monitoring practices and automation practices in ML model training, and serving workflows. These have limited number of generic practices and tools applicable across organizations in different domains. </p> </div> </div> </div>


2020 ◽  
Author(s):  
Anthony Wang ◽  
Ryan Murdock ◽  
Steven Kauwe ◽  
Anton Oliynyk ◽  
Aleksander Gurlo ◽  
...  

<div>This Editorial is intended for materials scientists interested in performing machine learning-centered research.</div><div><br></div><div>We cover broad guidelines and best practices regarding the obtaining and treatment of data, feature engineering, model training, validation, evaluation and comparison, popular repositories for materials data and benchmarking datasets, model and architecture sharing, and finally publication.</div><div>In addition, we include interactive Jupyter notebooks with example Python code to demonstrate some of the concepts, workflows, and best practices discussed.</div><div><br></div><div>Overall, the data-driven methods and machine learning workflows and considerations are presented in a simple way, allowing interested readers to more intelligently guide their machine learning research using the suggested references, best practices, and their own materials domain expertise.</div>


2020 ◽  
Author(s):  
Anthony Wang ◽  
Ryan Murdock ◽  
Steven Kauwe ◽  
Anton Oliynyk ◽  
Aleksander Gurlo ◽  
...  

<div>This Editorial is intended for materials scientists interested in performing machine learning-centered research.</div><div><br></div><div>We cover broad guidelines and best practices regarding the obtaining and treatment of data, feature engineering, model training, validation, evaluation and comparison, popular repositories for materials data and benchmarking datasets, model and architecture sharing, and finally publication.</div><div>In addition, we include interactive Jupyter notebooks with example Python code to demonstrate some of the concepts, workflows, and best practices discussed.</div><div><br></div><div>Overall, the data-driven methods and machine learning workflows and considerations are presented in a simple way, allowing interested readers to more intelligently guide their machine learning research using the suggested references, best practices, and their own materials domain expertise.</div>


2020 ◽  
Vol 2 (3) ◽  
pp. 161-170 ◽  
Author(s):  
Man-Fai Ng ◽  
Jin Zhao ◽  
Qingyu Yan ◽  
Gareth J. Conduit ◽  
Zhi Wei Seh

Science ◽  
2019 ◽  
Vol 363 (6433) ◽  
pp. eaau0323 ◽  
Author(s):  
Karianne J. Bergen ◽  
Paul A. Johnson ◽  
Maarten V. de Hoop ◽  
Gregory C. Beroza

Understanding the behavior of Earth through the diverse fields of the solid Earth geosciences is an increasingly important task. It is made challenging by the complex, interacting, and multiscale processes needed to understand Earth’s behavior and by the inaccessibility of nearly all of Earth’s subsurface to direct observation. Substantial increases in data availability and in the increasingly realistic character of computer simulations hold promise for accelerating progress, but developing a deeper understanding based on these capabilities is itself challenging. Machine learning will play a key role in this effort. We review the state of the field and make recommendations for how progress might be broadened and accelerated.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Antti Salonen ◽  
Maheshwaran Gopalakrishnan

PurposeThe purpose of this study was to assess the readiness of the Swedish manufacturing industry to implement dynamic, data-driven preventive maintenance (PM) by identifying the gap between the state of the art and the state of practice.Design/methodology/approachAn embedded multiple case study was performed in which some of the largest companies in the discrete manufacturing industry, that is, mechanical engineering, were surveyed regarding the design of their PM programmes.FindingsThe studied manufacturing companies make limited use of the existing scientific state of the art when designing their PM programmes. They seem to be aware of the possibilities for improvement, but they also see obstacles to changing their practices according to future requirements.Practical implicationsThe results of this study will benefit both industry professionals and academicians, setting the initial stage for the development of data-driven, diversified and dynamic PM programmes.Originality/ValueFirst and foremost, this study maps the current state and practice in PM planning among some of the larger automotive manufacturing industries in Sweden. This work reveals a gap between the state of the art and the state of practice in the design of PM programmes. Insights regarding this gap show large improvement potentials which may prove important for academics as well as practitioners.


Author(s):  
Cristina Palomares ◽  
Xavier Franch ◽  
Carme Quer ◽  
Panagiota Chatzipetrou ◽  
Lidia López ◽  
...  

Author(s):  
José A. León Borges ◽  
◽  
Roger Ismael Noh Balam ◽  
Lino Rangel Gómez ◽  
Michael Philip Strand ◽  
...  

This research article, presents an analysis and a comparison of three different algorithms: A.- Grouping method K-means, B.-Expectation a convergence criteria, EM and C.- Methodology for classification LAMDA, using two software of classification Weka and SALSA, as an aid for the prediction of future elections in the state of Quintana Roo. When working with electoral data, these are classified in a qualitative and quantitative way, by such virtue at the end of this article you will have the elements necessary to decide, which software, has better performance for such learning of classification.The main reason for the development of this work, is to demonstrate the efficiency of algorithms, with different data types. At the end, it may be decided, the algorithm with the better performance in data management.


2021 ◽  
Vol 14 (6) ◽  
pp. 957-969
Author(s):  
Jinfei Liu ◽  
Jian Lou ◽  
Junxu Liu ◽  
Li Xiong ◽  
Jian Pei ◽  
...  

Data-driven machine learning has become ubiquitous. A marketplace for machine learning models connects data owners and model buyers, and can dramatically facilitate data-driven machine learning applications. In this paper, we take a formal data marketplace perspective and propose the first en<u> D </u>-to-end mod <u>e</u> l m <u>a</u> rketp <u>l</u> ace with diff <u>e</u> rential p <u>r</u> ivacy ( Dealer ) towards answering the following questions: How to formulate data owners' compensation functions and model buyers' price functions? How can the broker determine prices for a set of models to maximize the revenue with arbitrage-free guarantee, and train a set of models with maximum Shapley coverage given a manufacturing budget to remain competitive ? For the former, we propose compensation function for each data owner based on Shapley value and privacy sensitivity, and price function for each model buyer based on Shapley coverage sensitivity and noise sensitivity. Both privacy sensitivity and noise sensitivity are measured by the level of differential privacy. For the latter, we formulate two optimization problems for model pricing and model training, and propose efficient dynamic programming algorithms. Experiment results on the real chess dataset and synthetic datasets justify the design of Dealer and verify the efficiency and effectiveness of the proposed algorithms.


Sign in / Sign up

Export Citation Format

Share Document