How to Shift Bias: Lessons from the Baldwin Effect

An inductive learning algorithm takes a set of data as input and generates a hypothesis as output. A set of data is typically consistent with an infinite number of hypotheses; therefore, there must be factors other than the data that determine the output of the learning algorithm. In machine learning, these other factors are called the bias of the learner. Classical learning algorithms have a fixed bias, implicit in their design. Recently developed learning algorithms dynamically adjust their bias as they search for a hypothesis. Algorithms that shift bias in this manner are not as well understood as classical algorithms. In this paper, we show that the Baldwin effect has implications for the design and analysis of bias shifting algorithms. The Baldwin effect was proposed in 1896 to explain how phenomena that might appear to require Lamarckian evolution (inheritance of acquired characteristics) can arise from purely Darwinian evolution. Hinton and Nowlan presented a computational model of the Baldwin effect in 1987. We explore a variation on their model, which we constructed explicitly to illustrate the lessons that the Baldwin effect has for research in bias shifting algorithms. The main lesson is that it appears that a good strategy for shift of bias in a learning algorithm is to begin with a weak bias and gradually shift to a strong bias.

Download Full-text

Adding Learning to the Cellular Development of Neural Networks: Evolution and the Baldwin Effect

Evolutionary Computation ◽

10.1162/evco.1993.1.3.213 ◽

1993 ◽

Vol 1 (3) ◽

pp. 213-233 ◽

Cited By ~ 125

Author(s):

Frédéric Gruau ◽

Darrell Whitley

Keyword(s):

Neural Networks ◽

Boolean Functions ◽

Development Process ◽

Developmental Process ◽

Fitness Landscape ◽

Biological Cell ◽

Baldwin Effect ◽

Genetic Search ◽

Lamarckian Evolution ◽

The Baldwin Effect

A grammar tree is used to encode a cellular developmental process that can generate whole families of Boolean neural networks for computing parity and symmetry. The development process resembles biological cell division. A genetic algorithm is used to find a grammar tree that yields both architecture and weights specifying a particular neural network for solving specific Boolean functions. The current study particularly focuses on the addition of learning to the development process and the evolution of grammar trees. Three ways of adding learning to the development process are explored. Two of these exploit the Baldwin effect by changing the fitness landscape without using Lamarckian evolution. The third strategy is Lamarckian in nature. Results for these three modes of combining learning with genetic search are compared against genetic search without learning. Our results suggest that merely using learning to change the fitness landscape can be as effective as Lamarckian strategies at improving search.

Download Full-text

An Emergence of Coordinated Communication in Populations of Agents

Artificial Life ◽

10.1162/106454699568809 ◽

1999 ◽

Vol 5 (4) ◽

pp. 319-342 ◽

Cited By ~ 12

Author(s):

Vladimir Kvasnicka ◽

Jiri Pospichal

Keyword(s):

The Other ◽

Darwinian Evolution ◽

Baldwin Effect ◽

Simple Version ◽

Grammar System ◽

Internal States ◽

Cognitive Activities ◽

The Common ◽

Hidden Neurons ◽

The Baldwin Effect

The purpose of this article is to demonstrate that coordinated communication spontaneously emerges in a population composed of agents that are capable of specific cognitive activities. Internal states of agents are characterized by meaning vectors. Simple neural networks composed of one layer of hidden neurons perform cognitive activities of agents. An elementary communication act consists of the following: (a) two agents are selected, where one of them is declared the speaker and the other the listener; (b) the speaker codes a selected meaning vector onto a sequence of symbols and sends it to the listener as a message; and finally, (c) the listener decodes this message into a meaning vector and adapts his or her neural network such that the differences between speaker and listener meaning vectors are decreased. A Darwinian evolution enlarged by ideas from the Baldwin effect and Dawkins' memes is simulated by a simple version of an evolutionary algorithm without crossover. The agent fitness is determined by success of the mutual pairwise communications. It is demonstrated that agents in the course of evolution gradually do a better job of decoding received messages (they are closer to meaning vectors of speakers) and all agents gradually start to use the same vocabulary for the common communication. Moreover, if agent meaning vectors contain regularities, then these regularities are manifested also in messages created by agent speakers, that is, similar parts of meaning vectors are coded by similar symbol substrings. This observation is considered a manifestation of the emergence of a grammar system in the common coordinated communication.

Download Full-text

RULES-F: A fuzzy inductive learning algorithm

Proceedings of the Institution of Mechanical Engineers Part C Journal of Mechanical Engineering Science ◽

10.1243/0954406c20004 ◽

2006 ◽

Vol 220 (9) ◽

pp. 1433-1447 ◽

Cited By ~ 8

Author(s):

D T Pham ◽

S Bigot ◽

S S Dimov

Keyword(s):

Learning Algorithm ◽

Inductive Learning ◽

Learning Algorithms ◽

Fuzzy Rule ◽

Rule Induction ◽

Control Applications ◽

Fuzzy Models ◽

And Performance

Current inductive learning algorithms have difficulties handling attributes with numerical values. This paper presents RULES-F, a new fuzzy inductive learning algorithm in the RULES family, which integrates the capabilities and performance of a good inductive learning algorithm for classification applications with the ability to create accurate and compact fuzzy models for the generation of numerical outputs. The performance of RULES-F in two simulated control applications involving numerical output parameters is demonstrated and compared with that of the well-known fuzzy rule induction algorithm by Wang and Mendel.

Download Full-text

Lamarckian evolution, the Baldwin effect and function optimization

Parallel Problem Solving from Nature — PPSN III - Lecture Notes in Computer Science ◽

10.1007/3-540-58484-6_245 ◽

1994 ◽

pp. 5-15 ◽

Cited By ~ 77

Author(s):

Darrell Whitley ◽

V. Scott Gordon ◽

Keith Mathias

Keyword(s):

Function Optimization ◽

Baldwin Effect ◽

Lamarckian Evolution ◽

And Function ◽

The Baldwin Effect

Download Full-text

COMPARISON OF VARIOUS ROUTINES FOR UNKNOWN ATTRIBUTE VALUE PROCESSING: THE COVERING PARADIGM

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001496000530 ◽

1996 ◽

Vol 10 (08) ◽

pp. 939-955 ◽

Cited By ~ 14

Author(s):

IVAN BRUHA ◽

FRANTISEK FRANEK

Keyword(s):

Real World ◽

Learning Algorithm ◽

Inductive Learning ◽

Learning Algorithms ◽

Decision Rules ◽

Medical Data ◽

Definition Of ◽

Attribute Value

Simple inductive learning algorithms assume that all attribute values are available. The well-known Quinlan's paper1 discusses quite a few routines for the processing of unknown attribute values in the TDIDT family and analyzes seven of them. This paper introduces five routines for the processing of unknown attribute values that have been designed for the CN4 learning algorithm, a large extension of the well-known CN2. Both algorithms CN2 and CN4 induce lists of decision rules from examples applying the covering paradigm. CN2 offers two ways for the processing of unknown attribute values. The CN4's five routines differ in style of matching complexes with examples (objects) that involve unknown attribute values. The definition of matching is discussed in detail in the paper. The strategy of unknown value processing is described both for learning and classification phases in individual routines. The results of experiments with various percentages of unknown attribute values on real-world (mostly medical) data are presented and performances of all five routines are compared.

Download Full-text

Intelligent system of English composition scoring model based on improved machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189235 ◽

2020 ◽

pp. 1-11

Author(s):

Jie Liu ◽

Lin Lin ◽

Xiufang Liang

Keyword(s):

Machine Learning ◽

Evaluation System ◽

Intelligent System ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Assessment System ◽

English Composition ◽

Region Extraction ◽

Constraint Model

The online English teaching system has certain requirements for the intelligent scoring system, and the most difficult stage of intelligent scoring in the English test is to score the English composition through the intelligent model. In order to improve the intelligence of English composition scoring, based on machine learning algorithms, this study combines intelligent image recognition technology to improve machine learning algorithms, and proposes an improved MSER-based character candidate region extraction algorithm and a convolutional neural network-based pseudo-character region filtering algorithm. In addition, in order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. The research results show that the algorithm proposed in this paper has a certain practical effect, and it can be applied to the English assessment system and the online assessment system of the homework evaluation system algorithm system.

Download Full-text

Application of a Rough Set-Based Inductive Learning System

Fundamenta Informaticae ◽

10.3233/fi-1993-182-409 ◽

1993 ◽

Vol 18 (2-4) ◽

pp. 209-220

Author(s):

Michael Hadjimichael ◽

Anita Wasilewska

Keyword(s):

Machine Learning ◽

Rough Set ◽

Presidential Election ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Inductive Learning ◽

Real Data ◽

Semantic Content ◽

Learning System ◽

Voter Preferences

We present here an application of Rough Set formalism to Machine Learning. The resulting Inductive Learning algorithm is described, and its application to a set of real data is examined. The data consists of a survey of voter preferences taken during the 1988 presidential election in the U.S.A. Results include an analysis of the predictive accuracy of the generated rules, and an analysis of the semantic content of the rules.

Download Full-text

Bruce H. Weber and David J. Depew (Eds.).Evolution and Learning: The Baldwin Effect Reconsidered. Cambridge, MA: MIT Press, 2003 (cloth), 2007 (paper). x + 341 pp. $26.00 (paper). ISBN 978-0-262-23229-6 (cloth), ISBN 978-0-262-73181-2 (paper)

Journal of the History of the Behavioral Sciences ◽

10.1002/jhbs.20433 ◽

2010 ◽

Vol 46 (2) ◽

pp. 221-222

Author(s):

Jacy L. Young

Keyword(s):

Baldwin Effect ◽

The Baldwin Effect

Download Full-text

Learning and the Evolution of Language: The Role of Cultural Variation and Learning Costs in the Baldwin Effect

Artificial Life ◽

10.1162/106454602321202408 ◽

2002 ◽

Vol 8 (4) ◽

pp. 311-339 ◽

Cited By ~ 23

Author(s):

Steve Munroe ◽

Angelo Cangelosi

Keyword(s):

Language Acquisition ◽

Structural Properties ◽

Cultural Transmission ◽

Cultural Variation ◽

Baldwin Effect ◽

Evolution Of Language ◽

Specific Language ◽

Acquisition Device ◽

The Baldwin Effect

The Baldwin effect has been explicitly used by Pinker and Bloom as an explanation of the origins of language and the evolution of a language acquisition device. This article presents new simulations of an artificial life model for the evolution of compositional languages. It specifically addresses the role of cultural variation and of learning costs in the Baldwin effect for the evolution of language. Results show that when a high cost is associated with language learning, agents gradually assimilate in their genome some explicit features (e.g., lexical properties) of the specific language they are exposed to. When the structure of the language is allowed to vary through cultural transmission, Baldwinian processes cause, instead, the assimilation of a predisposition to learn, rather than any structural properties associated with a specific language. The analysis of the mechanisms underlying such a predisposition in terms of categorical perception supports Deacon's hypothesis regarding the Baldwinian inheritance of general underlying cognitive capabilities that serve language acquisition. This is in opposition to the thesis that argues for assimilation of structural properties needed for the specification of a full-blown language acquisition device.

Download Full-text

A design of semantics filter using the inductive learning algorithm for GIS

SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483) ◽

10.1109/icsmc.2003.1243843 ◽

2004 ◽

Author(s):

M. Nerome ◽

T. Yabiku ◽

Y. Matsuda ◽

Dongslaik Kang ◽

H. Miyagi ◽

...

Keyword(s):

Learning Algorithm ◽

Inductive Learning

Download Full-text