Improving the Scalability of XCS-Based Learning Classifier Systems

Classifier Systems ◽

Problem Domain ◽

Code Fragment ◽

Cyclic Representation ◽

Large Scale Problems ◽

And Function ◽

<p>Using evolutionary intelligence and machine learning techniques, a broad range of intelligent machines have been designed to perform different tasks. An intelligent machine learns by perceiving its environmental status and taking an action that maximizes its chances of success. Human beings have the ability to apply knowledge learned from a smaller problem to more complex, large-scale problems of the same or a related domain, but currently the vast majority of evolutionary machine learning techniques lack this ability. This lack of ability to apply the already learned knowledge of a domain results in consuming more than the necessary resources and time to solve complex, large-scale problems of the domain. As the problem increases in size, it becomes difficult and even sometimes impractical (if not impossible) to solve due to the needed resources and time. Therefore, in order to scale in a problem domain, a systemis needed that has the ability to reuse the learned knowledge of the domain and/or encapsulate the underlying patterns in the domain. To extract and reuse building blocks of knowledge or to encapsulate the underlying patterns in a problem domain, a rich encoding is needed, but the search space could then expand undesirably and cause bloat, e.g. as in some forms of genetic programming (GP). Learning classifier systems (LCSs) are a well-structured evolutionary computation based learning technique that have pressures to implicitly avoid bloat, such as fitness sharing through niche based reproduction. The proposed thesis is that an LCS can scale to complex problems in a domain by reusing the learnt knowledge from simpler problems of the domain and/or encapsulating the underlying patterns in the domain. Wilson’s XCS is used to implement and test the proposed systems, which is a well-tested, online learning and accuracy based LCS model. To extract the reusable building blocks of knowledge, GP-tree like, code-fragments are introduced, which are more than simply another representation (e.g. ternary or real-valued alphabets). This thesis is extended to capture the underlying patterns in a problemusing a cyclic representation. Hard problems are experimented to test the newly developed scalable systems and compare them with benchmark techniques. Specifically, this work develops four systems to improve the scalability of XCS-based classifier systems. (1) Building blocks of knowledge are extracted fromsmaller problems of a Boolean domain and reused in learning more complex, large-scale problems in the domain, for the first time. By utilizing the learnt knowledge from small-scale problems, the developed XCSCFC (i.e. XCS with Code-Fragment Conditions) system readily solves problems of a scale that existing LCS and GP approaches cannot, e.g. the 135-bitMUX problem. (2) The introduction of the code fragments in classifier actions in XCSCFA (i.e. XCS with Code-Fragment Actions) enables the rich representation of GP, which when couples with the divide and conquer approach of LCS, to successfully solve various complex, overlapping and niche imbalance Boolean problems that are difficult to solve using numeric action based XCS. (3) The underlying patterns in a problem domain are encapsulated in classifier rules encoded by a cyclic representation. The developed XCSSMA system produces general solutions of any scale n for a number of important Boolean problems, for the first time in the field of LCS, e.g. parity problems. (4) Optimal solutions for various real-valued problems are evolved by extending the existing real-valued XCSR system with code-fragment actions to XCSRCFA. Exploiting the combined power of GP and LCS techniques, XCSRCFA successfully learns various continuous action and function approximation problems that are difficult to learn using the base techniques. This research work has shown that LCSs can scale to complex, largescale problems through reusing learnt knowledge. The messy nature, disassociation of message to condition order, masking, feature construction, and reuse of extracted knowledge add additional abilities to the XCS family of LCSs. The ability to use rich encoding in antecedent GP-like codefragments or consequent cyclic representation leads to the evolution of accurate, maximally general and compact solutions in learning various complex Boolean as well as real-valued problems. Effectively exploiting the combined power of GP and LCS techniques, various continuous action and function approximation problems are solved in a simple and straight forward manner. The analysis of the evolved rules reveals, for the first time in XCS, that no matter how specific or general the initial classifiers are, all the optimal classifiers are converged through the mechanism ‘be specific then generalize’ near the final stages of evolution. Also that standard XCS does not use all available information or all available genetic operators to evolve optimal rules, whereas the developed code-fragment action based systems effectively use figure and ground information during the training process. Thiswork has created a platformto explore the reuse of learnt functionality, not just terminal knowledge as present, which is needed to replicate human capabilities.</p>

Improving the Scalability of XCS-Based Learning Classifier Systems

10.26686/wgtn.17006398.v1 ◽

2021 ◽

Author(s):

◽

Muhammad Iqbal

Keyword(s):

Large Scale ◽

Building Blocks ◽

Classifier Systems ◽

Problem Domain ◽

Code Fragment ◽

Cyclic Representation ◽

Large Scale Problems ◽

And Function ◽

Extending XCS with Cyclic Graphs for Scalability on Complex Boolean Problems

Evolutionary Computation ◽

10.1162/evco_a_00167 ◽

2017 ◽

Vol 25 (2) ◽

pp. 173-204 ◽

Cited By ~ 7

Author(s):

Muhammad Iqbal ◽

Will N. Browne ◽

Mengjie Zhang

Keyword(s):

Building Blocks ◽

Digital Design ◽

Computational Time ◽

High Dimensional ◽

Classifier Systems ◽

Problem Size ◽

Main Research ◽

Classifier System ◽

Cyclic Representation ◽

A main research direction in the field of evolutionary machine learning is to develop a scalable classifier system to solve high-dimensional problems. Recently work has begun on autonomously reusing learned building blocks of knowledge to scale from low-dimensional problems to high-dimensional ones. An XCS-based classifier system, known as XCSCFC, has been shown to be scalable, through the addition of expression tree–like code fragments, to a limit beyond standard learning classifier systems. XCSCFC is especially beneficial if the target problem can be divided into a hierarchy of subproblems and each of them is solvable in a bottom-up fashion. However, if the hierarchy of subproblems is too deep, then XCSCFC becomes impractical because of the needed computational time and thus eventually hits a limit in problem size. A limitation in this technique is the lack of a cyclic representation, which is inherent in finite state machines (FSMs). However, the evolution of FSMs is a hard task owing to the combinatorially large number of possible states, connections, and interaction. Usually this requires supervised learning to minimize inappropriate FSMs, which for high-dimensional problems necessitates subsampling or incremental testing. To avoid these constraints, this work introduces a state-machine-based encoding scheme into XCS for the first time, termed XCSSMA. The proposed system has been tested on six complex Boolean problem domains: multiplexer, majority-on, carry, even-parity, count ones, and digital design verification problems. The proposed approach outperforms XCSCFA (an XCS that computes actions) and XCSF (an XCS that computes predictions) in three of the six problem domains, while the performance in others is similar. In addition, XCSSMA evolved, for the first time, compact and human readable general classifiers (i.e., solving any n-bit problems) for the even-parity and carry problem domains, demonstrating its ability to produce scalable solutions using a cyclic representation.

A Comparison of Learning Classifier Systems’ Rule Compaction Algorithms for Knowledge Visualization

ACM Transactions on Evolutionary Learning and Optimization ◽

10.1145/3468166 ◽

2021 ◽

Vol 1 (3) ◽

pp. 1-38

Author(s):

Yi Liu ◽

Will N. Browne ◽

Bing Xue

Keyword(s):

Large Scale ◽

Optimal Solution ◽

Population Level ◽

Learning Classifier Systems ◽

Classifier Systems ◽

Rule Based ◽

Stochastic Nature ◽

Learning Classifier ◽

Large Scale Problems ◽

Learning Classifier Systems (LCSs) are a paradigm of rule-based evolutionary computation (EC). LCSs excel in data-mining tasks regarding helping humans to understand the explored problem, often through visualizing the discovered patterns linking features to classes. Due to the stochastic nature of EC, LCSs unavoidably produce and keep redundant rules, which obscure the patterns. Thus, rule compaction methods are invoked to produce a better population by removing problematic rules. Previously, compaction methods have neither been tested on large-scale problems nor been assessed on the performance of capturing patterns. We review and test the most popular compaction algorithms, finding that across multiple LCSs’ populations for the same task, although the redundant rules can be different, the accurate rules are common. Furthermore, the patterns contained consistently refer to the nature of the explored domain, e.g., the data distribution or the importance of features for determining actions. This extends the [ O ] set hypothesis proposed by Butz et al. [1], in which an LCS is expected to evolve a minimal number of non-overlapped rules to represent an addressed domain. Two new compaction algorithms are introduced to search at the rule level and the population level by compacting multiple LCSs’ populations. Two visualization methods are employed for verifying the interpretability of these populations. Successful compaction is demonstrated on complex and real problems with clean datasets, e.g., the 11-bits Majority-On problem that requires 924 different interacting rules in the optimal solution to be uniquely identified to enable correct visualization. For the first time, the patterns contained in learned models for the large-scale 70-bits Multiplexer problem are visualized successfully.

Fine-Mapping of the Human Blood Plasma N-Glycome onto Its Proteome

Metabolites ◽

10.3390/metabo9070122 ◽

2019 ◽

Vol 9 (7) ◽

pp. 122 ◽

Cited By ~ 4

Author(s):

Karsten Suhre ◽

Irena Trbojević-Akmačić ◽

Ivo Ugrina ◽

Dennis Mook-Kanamori ◽

Tim Spector ◽

...

Keyword(s):

Blood Plasma ◽

Large Scale ◽

European Ancestry ◽

Human Blood Plasma ◽

Complex Disorders ◽

Human Proteins ◽

Comprehensive Determination ◽

And Function ◽

First Time ◽

Role And Function

Most human proteins are glycosylated. Attachment of complex oligosaccharides to the polypeptide part of these proteins is an integral part of their structure and function and plays a central role in many complex disorders. One approach towards deciphering this human glycan code is to study natural variation in experimentally well characterized samples and cohorts. High-throughput capable large-scale methods that allow for the comprehensive determination of blood circulating proteins and their glycans have been recently developed, but so far, no study has investigated the link between both traits. Here we map for the first time the blood plasma proteome to its matching N-glycome by correlating the levels of 1116 blood circulating proteins with 113 N-glycan traits, determined in 344 samples from individuals of Arab, South-Asian, and Filipino descent, and then replicate our findings in 46 subjects of European ancestry. We report protein-specific N-glycosylation patterns, including a correlation of core fucosylated structures with immunoglobulin G (IgG) levels, and of trisialylated, trigalactosylated, and triantennary structures with heparin cofactor 2 (SERPIND2). Our study reveals a detailed picture of protein N-glycosylation and suggests new avenues for the investigation of its role and function in the associated complex disorders.

CAFU: a Galaxy framework for exploring unmapped RNA-Seq data

Briefings in Bioinformatics ◽

10.1093/bib/bbz018 ◽

2019 ◽

Vol 21 (2) ◽

pp. 676-686 ◽

Cited By ~ 5

Author(s):

Siyuan Chen ◽

Chengzhi Ren ◽

Jingjing Zhai ◽

Jiantao Yu ◽

Xuyang Zhao ◽

...

Keyword(s):

Large Scale ◽

Biological Information ◽

Data Sets ◽

Rna Seq ◽

Mixed Species ◽

Short Reads ◽

Comprehensive Collection ◽

Expression Characterization ◽

And Function

Abstract A widely used approach in transcriptome analysis is the alignment of short reads to a reference genome. However, owing to the deficiencies of specially designed analytical systems, short reads unmapped to the genome sequence are usually ignored, resulting in the loss of significant biological information and insights. To fill this gap, we present Comprehensive Assembly and Functional annotation of Unmapped RNA-Seq data (CAFU), a Galaxy-based framework that can facilitate the large-scale analysis of unmapped RNA sequencing (RNA-Seq) reads from single- and mixed-species samples. By taking advantage of machine learning techniques, CAFU addresses the issue of accurately identifying the species origin of transcripts assembled using unmapped reads from mixed-species samples. CAFU also represents an innovation in that it provides a comprehensive collection of functions required for transcript confidence evaluation, coding potential calculation, sequence and expression characterization and function annotation. These functions and their dependencies have been integrated into a Galaxy framework that provides access to CAFU via a user-friendly interface, dramatically simplifying complex exploration tasks involving unmapped RNA-Seq reads. CAFU has been validated with RNA-Seq data sets from wheat and Zea mays (maize) samples. CAFU is freely available via GitHub: https://github.com/cma2015/CAFU.

On maximum achievable speeds for field solvers

International Journal of Numerical Methods for Heat &amp Fluid Flow ◽

10.1108/hff-01-2013-0016 ◽

2014 ◽

Vol 24 (7) ◽

pp. 1537-1544 ◽

Cited By ~ 5

Author(s):

Rainald Löhner ◽

Joseph D. Baum

Keyword(s):

Large Scale ◽

Content Type ◽

Flow Solver ◽

Large Scale Problems ◽

Expected Performance ◽

Depth Analysis ◽

Flow Speeds ◽

Time Required ◽

First Time ◽

Practical Implications

Purpose – Prompted by the empirical evidence that achievable flow solver speeds for large problems are limited by what appears to be a time of the order of O(0.1) sec/timestep regardless of the number of cores used, the purpose of this paper is to identify why this phenomenon occurs. Design/methodology/approach – A series of timing studies, as well as in-depth analysis of memory and inter-processors transfer requirements were carried out for a typical field solver. The results were analyzed and compared to the expected performance. Findings – The analysis shows that at present flow speeds per core are already limited by the achievable transfer rate to RAM. For smaller domains/larger number of processors, the limiting speed of CFD solvers is given by the MPI communication network. Research limitations/implications – This implies that at present, there is a “limiting useful size” for domains, and that there is a lower limit for the time it takes to update a flowfield. Practical implications – For practical calculations this implies that the time required for running large-scale problems will not decrease markedly once these applications migrate to machines with hundreds of thousands of cores. Originality/value – This is the first time such a finding has been reported in this context.

Scheming

10.3366/edinburgh/9781474440561.001.0001 ◽

2018 ◽

Cited By ~ 2

Author(s):

Seán Damer

Keyword(s):

Social Class ◽

Housing Policy ◽

Large Scale ◽

Class Structure ◽

Local Council ◽

Housing Scheme ◽

Post War ◽

First Time ◽

The Voice ◽

House Building

This book seeks to explain how the Corporation of Glasgow, in its large-scale council house-building programme in the inter- and post-war years, came to reproduce a hierarchical Victorian class structure. The three tiers of housing scheme which it constructed – Ordinary, Intermediate, and Slum-Clearance – effectively signified First, Second and Third Class. This came about because the Corporation uncritically reproduced the offensive and patriarchal attitudes of the Victorian bourgeoisie towards the working-class. The book shows how this worked out on the ground in Glasgow, and describes the attitudes of both authoritarian housing officials, and council tenants. This is the first time the voice of Glasgow’s council tenants has been heard. The conclusion is that local council housing policy was driven by unapologetic considerations of social class.

Neural Network for Large-Scale Problems, with Application to Human Motion

Journal of Computer Science Technology Updates ◽

10.15379/2410-2938.2016.03.02.02 ◽

2016 ◽

Vol 3 (2) ◽

Author(s):

Mohammad Bataineh

Keyword(s):

Neural Network ◽

Large Scale ◽

Human Motion ◽

Large Scale Problems

Plant hormones, plant growth regulators

Orvosi Hetilap ◽

10.1556/oh.2014.29939 ◽

2014 ◽

Vol 155 (26) ◽

pp. 1011-1018 ◽

Cited By ~ 1

Author(s):

György Végvári ◽

Edina Vidéki

Keyword(s):

Nervous System ◽

Growth And Development ◽

Large Scale ◽

Essential Role ◽

Higher Plants ◽

Structure And Function ◽

Wide Sense ◽

Large Scale Integration ◽

Scale Integration ◽

And Function

Plants seem to be rather defenceless, they are unable to do motion, have no nervous system or immune system unlike animals. Besides this, plants do have hormones, though these substances are produced not in glands. In view of their complexity they lagged behind animals, however, plant organisms show large scale integration in their structure and function. In higher plants, such as in animals, the intercellular communication is fulfilled through chemical messengers. These specific compounds in plants are called phytohormones, or in a wide sense, bioregulators. Even a small quantity of these endogenous organic compounds are able to regulate the operation, growth and development of higher plants, and keep the connection between cells, tissues and synergy beween organs. Since they do not have nervous and immume systems, phytohormones play essential role in plants’ life. Orv. Hetil., 2014, 155(26), 1011–1018.

Efficient Prediction of Structural and Electronic Properties of Hybrid 2D Materials Using DFT and Machine Learning

10.26434/chemrxiv.6254756.v1 ◽

2018 ◽

Author(s):

Sherif Tawfik ◽

Olexandr Isayev ◽

Catherine Stampfl ◽

Joseph Shapter ◽

David Winkler ◽

...

Keyword(s):

Machine Learning ◽

Band Gap ◽

Density Functional ◽

2D Materials ◽

Van Der Waals ◽

Building Blocks ◽

Interlayer Distance ◽

Computational Screening ◽

Wide Range

Materials constructed from different van der Waals two-dimensional (2D) heterostructures offer a wide range of benefits, but these systems have been little studied because of their experimental and computational complextiy, and because of the very large number of possible combinations of 2D building blocks. The simulation of the interface between two different 2D materials is computationally challenging due to the lattice mismatch problem, which sometimes necessitates the creation of very large simulation cells for performing density-functional theory (DFT) calculations. Here we use a combination of DFT, linear regression and machine learning techniques in order to rapidly determine the interlayer distance between two different 2D heterostructures that are stacked in a bilayer heterostructure, as well as the band gap of the bilayer. Our work provides an excellent proof of concept by quickly and accurately predicting a structural property (the interlayer distance) and an electronic property (the band gap) for a large number of hybrid 2D materials. This work paves the way for rapid computational screening of the vast parameter space of van der Waals heterostructures to identify new hybrid materials with useful and interesting properties.