Model-based machine learning

Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET , which has been widely used in practical applications.

Download Full-text

A Model-Based Factored Bayesian Reinforcement Learning Approach

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.1092 ◽

2014 ◽

Vol 513-517 ◽

pp. 1092-1095

Author(s):

Bo Wu ◽

Yan Peng Feng ◽

Hong Yan Zheng

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Iteration Algorithm ◽

Value Iteration ◽

Practical Applications ◽

Model Based ◽

Online Planning ◽

Bayesian Reinforcement Learning ◽

Bayesian Inference Method ◽

Unknown Structure

Bayesian reinforcement learning has turned out to be an effective solution to the optimal tradeoff between exploration and exploitation. However, in practical applications, the learning parameters with exponential growth are the main impediment for online planning and learning. To overcome this problem, we bring factored representations, model-based learning, and Bayesian reinforcement learning together in a new approach. Firstly, we exploit a factored representation to describe the states to reduce the size of learning parameters, and adopt Bayesian inference method to learn the unknown structure and parameters simultaneously. Then, we use an online point-based value iteration algorithm to plan and learn. The experimental results show that the proposed approach is an effective way for improving the learning efficiency in large-scale state spaces.

Download Full-text

Efficient Inference for Untied MLNs

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/644 ◽

2017 ◽

Author(s):

Somdeb Sarkhel ◽

Deepak Venugopal ◽

Nicholas Ruozzi ◽

Vibhav Gogate

Keyword(s):

Graphical Models ◽

Scaling Up ◽

Partition Functions ◽

Markov Logic ◽

Practical Applications ◽

Lifted Inference ◽

Hard Time ◽

Inference Algorithms ◽

Order Of Magnitude ◽

Symmetric Approximation

We address the problem of scaling up local-search or sampling-based inference in Markov logic networks (MLNs) that have large shared sub-structures but no (or few) tied weights. Such untied MLNs are ubiquitous in practical applications. However, they have very few symmetries, and as a result lifted inference algorithms--the dominant approach for scaling up inference--perform poorly on them. The key idea in our approach is to reduce the hard, time-consuming sub-task in sampling algorithms, computing the sum of weights of features that satisfy a full assignment, to the problem of computing a set of partition functions of graphical models, each defined over the logical variables in a first-order formula. The importance of this reduction is that when the treewidth of all the graphical models is small, it yields an order of magnitude speedup. When the treewidth is large, we propose an over-symmetric approximation and experimentally demonstrate that it is both fast and accurate.

Download Full-text

Machine Learning for the Educational Sciences

10.31234/osf.io/3hnr6 ◽

2021 ◽

Author(s):

Sven Hilbert ◽

Stefan Coors ◽

Elisabeth Barbara Kraus ◽

Bernd Bischl ◽

Mario Frei ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Decisive Role ◽

Quality Of Data ◽

Practical Applications ◽

Educational Sciences ◽

Complex Relationships ◽

The Impact ◽

Analytical Approaches

Classical statistical methods are limited in the analysis of highdimensional datasets. Machine learning (ML) provides a powerful framework for prediction by using complex relationships, often encountered in modern data with a large number of variables, cases and potentially non-linear effects. ML has turned into one of the most influential analytical approaches of this millennium and has recently become popular in the behavioral and social sciences. The impact of ML methods on research and practical applications in the educational sciences is still limited, but continuously grows as larger and more complex datasets become available through massive open online courses (MOOCs) and large scale investigations.The educational sciences are at a crucial pivot point, because of the anticipated impact ML methods hold for the field. Here, we review the opportunities and challenges of ML for the educational sciences, show how a look at related disciplines can help learning from their experiences, and argue for a philosophical shift in model evaluation. We demonstrate how the overall quality of data analysis in educational research can benefit from these methods and show how ML can play a decisive role in the validation of empirical models. In this review, we (1) provide an overview of the types of data suitable for ML, (2) give practical advice for the application of ML methods, and (3) show how ML-based tools and applications can be used to enhance the quality of education. Additionally we provide practical R code with exemplary analyses, available at https: //osf.io/ntre9/?view only=d29ae7cf59d34e8293f4c6bbde3e4ab2.

Download Full-text

Universal probabilistic programming offers a powerful approach to statistical phylogenetics

Communications Biology ◽

10.1038/s42003-021-01753-7 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Fredrik Ronquist ◽

Jan Kudlicka ◽

Viktor Senderov ◽

Johannes Borgström ◽

Nicolas Lartillot ◽

...

Keyword(s):

Programming Languages ◽

Graphical Models ◽

Sequential Monte Carlo ◽

Full Range ◽

Efficient Estimation ◽

Probabilistic Programming ◽

Automated Generation ◽

Inference Algorithms ◽

Powerful Approach ◽

Inference Strategy

AbstractStatistical phylogenetic analysis currently relies on complex, dedicated software packages, making it difficult for evolutionary biologists to explore new models and inference strategies. Recent years have seen more generic solutions based on probabilistic graphical models, but this formalism can only partly express phylogenetic problems. Here, we show that universal probabilistic programming languages (PPLs) solve the expressivity problem, while still supporting automated generation of efficient inference algorithms. To prove the latter point, we develop automated generation of sequential Monte Carlo (SMC) algorithms for PPL descriptions of arbitrary biological diversification (birth-death) models. SMC is a new inference strategy for these problems, supporting both parameter inference and efficient estimation of Bayes factors that are used in model testing. We take advantage of this in automatically generating SMC algorithms for several recent diversification models that have been difficult or impossible to tackle previously. Finally, applying these algorithms to 40 bird phylogenies, we show that models with slowing diversification, constant turnover and many small shifts generally explain the data best. Our work opens up several related problem domains to PPL approaches, and shows that few hurdles remain before these techniques can be effectively applied to the full range of phylogenetic models.

Download Full-text

Universal probabilistic programming offers a powerful approach to statistical phylogenetics

10.1101/2020.06.16.154443 ◽

2020 ◽

Cited By ~ 1

Author(s):

Fredrik Ronquist ◽

Jan Kudlicka ◽

Viktor Senderov ◽

Johannes Borgström ◽

Nicolas Lartillot ◽

...

Keyword(s):

Programming Languages ◽

Graphical Models ◽

Sequential Monte Carlo ◽

Full Range ◽

Efficient Estimation ◽

Probabilistic Programming ◽

Automated Generation ◽

Inference Algorithms ◽

Powerful Approach ◽

Inference Strategy

Statistical phylogenetic analysis currently relies on complex, dedicated software packages, making it difficult for evolutionary biologists to explore new models and inference strategies. Recent years have seen more generic solutions based on probabilistic graphical models, but this formalism can only partly express phylogenetic problems. Here we show that universal probabilistic programming languages (PPLs) solve the expressivity problem, while still supporting automated generation of efficient inference algorithms. To prove the latter point, we develop automated generation of sequential Monte Carlo (SMC) algorithms for PPL descriptions of arbitrary biological diversification (birth-death) models. SMC is a new inference strategy for these problems, supporting both parameter inference and efficient estimation of Bayes factors that are used in model testing. We take advantage of this in automatically generating SMC algorithms for several recent diversification models that have been difficult or impossible to tackle previously. Finally, applying these algorithms to 40 bird phylogenies, we show that models with slowing diversification, constant turnover and many small shifts generally explain the data best. Our work opens up several related problem domains to PPL approaches, and shows that few hurdles remain before these techniques can be effectively applied to the full range of phylogenetic models.

Download Full-text

Identifying Semitic Roots: Machine Learning with Linguistic Constraints

Computational Linguistics ◽

10.1162/coli.2008.07-002-r1-06-30 ◽

2008 ◽

Vol 34 (3) ◽

pp. 429-448 ◽

Cited By ~ 5

Author(s):

Ezra Daya ◽

Dan Roth ◽

Shuly Wintner

Keyword(s):

Machine Learning ◽

Large Scale ◽

Morphological Analysis ◽

Full Scale ◽

Linguistic Knowledge ◽

Learning Approach ◽

Semitic Languages ◽

Practical Applications ◽

Machine Learning Approach ◽

Shed Light

Words in Semitic languages are formed by combining two morphemes: a root and a pattern. The root consists of consonants only, by default three, and the pattern is a combination of vowels and consonants, with non-consecutive “slots” into which the root consonants are inserted. Identifying the root of a given word is an important task, considered to be an essential part of the morphological analysis of Semitic languages, and information on roots is important for linguistics research as well as for practical applications. We present a machine learning approach, augmented by limited linguistic knowledge, to the problem of identifying the roots of Semitic words. Although programs exist which can extract the root of words in Arabic and Hebrew, they are all dependent on labor-intensive construction of large-scale lexicons which are components of full-scale morphological analyzers. The advantage of our method is an automation of this process, avoiding the bottleneck of having to laboriously list the root and pattern of each lexeme in the language. To the best of our knowledge, this is the first application of machine learning to this problem, and one of the few attempts to directly address non-concatenative morphology using machine learning. More generally, our results shed light on the problem of combining classifiers under (linguistically motivated) constraints.

Download Full-text