scholarly journals Identification of biological mechanisms by semantic classifier systems

2018 ◽  
Author(s):  
Ludwig Lausser ◽  
Florian Schmid ◽  
Lea Siegle ◽  
Rolf Hühne ◽  
Malte Buchholz ◽  
...  

AbstractThe interpretability of a classification model is one of its most essential characteristics. It allows for the generation of new hypotheses on the molecular background of a disease. However, it is questionable if more complex molecular regulations can be reconstructed from such limited sets of data. To bridge the gap between complexity and interpretability, we replace the de novo reconstruction of these processes by a hybrid classification approach partially based on existing domain knowledge. Using semantic building blocks that reflect real biological processes these models were able to construct hypotheses on the underlying genetic configuration of the analysed phenotypes. As in the building process, also these hypotheses are composed of high-level biology-based terms. The semantic information we utilise from gene ontology is a vocabulary which comprises the essential processes or components of a biological system. The constructed semantic multi-classifier system consists of expert base classifiers which each select the most suitable term for characterising their assigned problems. Our experiments conducted on datasets of three distinct research fields revealed terms with well-known associations to the analysed context. Furthermore, some of the chosen terms do not seem to be obviously related to the issue and thus lead to new, hypotheses to pursue.Author summaryData mining strategies are designed for an unbiased de novo analysis of large sample collections and aim at the detection of frequent patterns or relationships. Later on, the gained information can be used to characterise diagnostically relevant classes and for providing hints to the underlying mechanisms which may cause a specific phenotype or disease. However, the practical use of data mining techniques can be restricted by the available resources and might not correctly reconstruct complex relationships such as signalling pathways.To counteract this, we devised a semantic approach to the issue: a multi-classifier system which incorporates existing biological knowledge and returns interpretable models based on these high-level semantic terms. As a novel feature, these models also allow for qualitative analysis and hypothesis generation on the molecular processes and their relationships leading to different phenotypes or diseases.

2017 ◽  
Vol 25 (2) ◽  
pp. 173-204 ◽  
Author(s):  
Muhammad Iqbal ◽  
Will N. Browne ◽  
Mengjie Zhang

A main research direction in the field of evolutionary machine learning is to develop a scalable classifier system to solve high-dimensional problems. Recently work has begun on autonomously reusing learned building blocks of knowledge to scale from low-dimensional problems to high-dimensional ones. An XCS-based classifier system, known as XCSCFC, has been shown to be scalable, through the addition of expression tree–like code fragments, to a limit beyond standard learning classifier systems. XCSCFC is especially beneficial if the target problem can be divided into a hierarchy of subproblems and each of them is solvable in a bottom-up fashion. However, if the hierarchy of subproblems is too deep, then XCSCFC becomes impractical because of the needed computational time and thus eventually hits a limit in problem size. A limitation in this technique is the lack of a cyclic representation, which is inherent in finite state machines (FSMs). However, the evolution of FSMs is a hard task owing to the combinatorially large number of possible states, connections, and interaction. Usually this requires supervised learning to minimize inappropriate FSMs, which for high-dimensional problems necessitates subsampling or incremental testing. To avoid these constraints, this work introduces a state-machine-based encoding scheme into XCS for the first time, termed XCSSMA. The proposed system has been tested on six complex Boolean problem domains: multiplexer, majority-on, carry, even-parity, count ones, and digital design verification problems. The proposed approach outperforms XCSCFA (an XCS that computes actions) and XCSF (an XCS that computes predictions) in three of the six problem domains, while the performance in others is similar. In addition, XCSSMA evolved, for the first time, compact and human readable general classifiers (i.e., solving any n-bit problems) for the even-parity and carry problem domains, demonstrating its ability to produce scalable solutions using a cyclic representation.


2018 ◽  
Author(s):  
Adam M. Forte ◽  
Kelin X. Whipple

Abstract. Quantitative analysis of digital topographic data is an increasingly important part of many studies in the geosciences. Initially, performing these analyses was a niche endeavor, requiring detailed domain knowledge and programming skills, but increasingly broad, flexible, open source code bases have been developed to increasingly democratize topographic analysis. However, many of these still require specific computing environments and/or moderate levels of knowledge of both the relevant programming language and the correct way to take these fundamental building blocks and conduct an efficient and effective topographic analysis. To partially address this, we have written the Topographic Analysis Kit (TAK) which leverages the power of one of these open source libraries, TopoToolbox, to build a series of high-level topographic analysis tools to perform a variety of common topographic analyses, including generation of maps of normalized channel steepness or chi and selection and statistical analysis of populations of watersheds. No programming skills or advanced Matlab capability is required for effective use of TAK. In addition, to expand the utility of TAK, along with the primary functions, which like the underlying TopoToolbox functions require Matlab and several proprietary toolboxes to run, we provide compiled versions of these functions that use the free Matlab Runtime Environment for users who do not have institutional access to Matlab or all of the required toolboxes.


2021 ◽  
Author(s):  
Trung Nguyen

<p><b>A key goal of Artificial Intelligence (AI) is to replicate different aspects of biological intelligence. Human intelligence can accumulate progressively complicated knowledge by reusing simpler concepts/tasks to represent more complex concepts and solve more difficult tasks. Also, humans and animals with biological intelligence have the autonomy that helps sustain them over a long period. </b></p><p>Young humans need a long period to obtain simple concepts and master basic skills. However, these learnt basic concepts and skills are important to construct foundation knowledge, which is highly reusable and thereby efficiently exploited to learn new knowledge. By relating unseen tasks to learnt knowledge, humans can learn new knowledge or solve new problems effectively. Thus, AI researchers aim to mimic human performance with the same ability to reuse learnt knowledge when solving a novel task in a continual manner. </p><p>Initial attempts to implement this knowledge-transfer ability have been through layered learning and multitask learning. Layered learning aims to learn a complex target task by learning a sequence of easier tasks that provide supportive knowledge prior to learning the target task. This learning paradigm requires human knowledge that may be biased, costly, or not available in a particular domain. Multitask learning generally uses multiple related tasks with individual goals to be learnt together with the hope that they can provide externally supportive signals to each other. However, multitask learning is commonly applied to optimisation tasks that are required to start simultaneously. </p><p>In this thesis, using the transfer of building blocks of learnt knowledge is of interest to solve complex problems. A complex problem is one where the solution cannot be simply enumerated in the time and computation available, often because there are multiple interacting patterns of input features or high dimensions in the data. A strategy for solving complex problems is to discover the high-level patterns in the data. The high-level patterns are ones with complex combinations of original input features (the underlying building blocks) to describe the desired output. However, as the complexity of building blocks grows along with the problem complexity, the size of the search space for solutions and the optimal building blocks also increases in complexity. This poses a challenge in discovering optimal building blocks. </p><p>Learning Classifier Systems (LCSs) are evolutionary rule-based algorithms inspired by cognitive science. LCSs are of interest as their niching nature enables solving problems heterogeneously and learning them progressively from simpler subproblems to more complex (sub)problems. LCSs also encourage transferring subproblem building blocks among tasks. Recent work has extended LCSs with various flexible representations. Among them, Code Fragments (CFs), Genetic Programming (GP)-like trees, are a rich form that can encode complex patterns in a small and concise format. CF-based LCSs are particularly suitable for addressing complex problems. For example, XCSCF*, which was based on Wilson's XCS (an accuracy-based online learning LCS), can learn a generalised solution to the n-bit Multiplexer problem. The above techniques provided remarkable improvements to the scalability of CF-based LCSs. However, there are certain limits in such systems compared with human intelligence, such as their limited autonomy, e.g. the requirement of an appropriate learning order (e.g. layered learning) to enable learning progress. Humans can learn multiple tasks in a parallel ad hoc manner, whereas AI cannot do this autonomously. </p><p>The proposed thesis is that systems of parallel learning agents can solve multiple problems concurrently enabling multitask learning and eventually the ability to learn continually. Here, each agent is a CF-based XCS where the problems are Boolean in nature to aid interpretability. The overall goal of this thesis is to develop novel CF-based XCSs that enable learning continually with the least human support. </p><p>The contributions of this thesis are three specific systems that provide a pathway to continual learning. By reducing the requirements of human guidance without degrading the learning performance. (1) The evolution of CFs is nested and interactive with the evolution of rules. The fitness of CFs called CF-fitness is introduced to guide this process. The evolution of CFs enables growing the complexity of CFs without a depth limit to address hierarchical features. The system is the first XCS with CFs in rule conditions that can learn complex problems that used to be intractable without transfer learning. The introduction of CF evolution allows appropriate latent building blocks that address subproblems to be grouped together and flexibly reused. (2) A new system of multitask learning is developed based on the estimation of the relatedness among tasks. A new dynamic parameter helps automate feature transfer among multiple tasks, which enables improved learning performance in supportive tasks and reduced negative influence between unrelated tasks. (3) A system of parallel learning agents, where each is an XCS with CF-actions, is developed to remove the requirement of a human-biased learning order. The system can provide a clear learning order and a highly interpretable network of knowledge. This network of knowledge enables the system to accumulate knowledge hierarchically and focus on only the novel aspects of any new task. </p><p>The research work has shown that CF-based LCSs can solve hierarchical and large-scale problems autonomously without (extensive) human guidance. The learnt knowledge represented by CFs is highly interpretable. This work is also a foundation for the systems that can learn continually. Ultimately, this thesis is a step towards general learners and problem solvers.</p>


2021 ◽  
Author(s):  
Trung Nguyen

<p><b>A key goal of Artificial Intelligence (AI) is to replicate different aspects of biological intelligence. Human intelligence can accumulate progressively complicated knowledge by reusing simpler concepts/tasks to represent more complex concepts and solve more difficult tasks. Also, humans and animals with biological intelligence have the autonomy that helps sustain them over a long period. </b></p><p>Young humans need a long period to obtain simple concepts and master basic skills. However, these learnt basic concepts and skills are important to construct foundation knowledge, which is highly reusable and thereby efficiently exploited to learn new knowledge. By relating unseen tasks to learnt knowledge, humans can learn new knowledge or solve new problems effectively. Thus, AI researchers aim to mimic human performance with the same ability to reuse learnt knowledge when solving a novel task in a continual manner. </p><p>Initial attempts to implement this knowledge-transfer ability have been through layered learning and multitask learning. Layered learning aims to learn a complex target task by learning a sequence of easier tasks that provide supportive knowledge prior to learning the target task. This learning paradigm requires human knowledge that may be biased, costly, or not available in a particular domain. Multitask learning generally uses multiple related tasks with individual goals to be learnt together with the hope that they can provide externally supportive signals to each other. However, multitask learning is commonly applied to optimisation tasks that are required to start simultaneously. </p><p>In this thesis, using the transfer of building blocks of learnt knowledge is of interest to solve complex problems. A complex problem is one where the solution cannot be simply enumerated in the time and computation available, often because there are multiple interacting patterns of input features or high dimensions in the data. A strategy for solving complex problems is to discover the high-level patterns in the data. The high-level patterns are ones with complex combinations of original input features (the underlying building blocks) to describe the desired output. However, as the complexity of building blocks grows along with the problem complexity, the size of the search space for solutions and the optimal building blocks also increases in complexity. This poses a challenge in discovering optimal building blocks. </p><p>Learning Classifier Systems (LCSs) are evolutionary rule-based algorithms inspired by cognitive science. LCSs are of interest as their niching nature enables solving problems heterogeneously and learning them progressively from simpler subproblems to more complex (sub)problems. LCSs also encourage transferring subproblem building blocks among tasks. Recent work has extended LCSs with various flexible representations. Among them, Code Fragments (CFs), Genetic Programming (GP)-like trees, are a rich form that can encode complex patterns in a small and concise format. CF-based LCSs are particularly suitable for addressing complex problems. For example, XCSCF*, which was based on Wilson's XCS (an accuracy-based online learning LCS), can learn a generalised solution to the n-bit Multiplexer problem. The above techniques provided remarkable improvements to the scalability of CF-based LCSs. However, there are certain limits in such systems compared with human intelligence, such as their limited autonomy, e.g. the requirement of an appropriate learning order (e.g. layered learning) to enable learning progress. Humans can learn multiple tasks in a parallel ad hoc manner, whereas AI cannot do this autonomously. </p><p>The proposed thesis is that systems of parallel learning agents can solve multiple problems concurrently enabling multitask learning and eventually the ability to learn continually. Here, each agent is a CF-based XCS where the problems are Boolean in nature to aid interpretability. The overall goal of this thesis is to develop novel CF-based XCSs that enable learning continually with the least human support. </p><p>The contributions of this thesis are three specific systems that provide a pathway to continual learning. By reducing the requirements of human guidance without degrading the learning performance. (1) The evolution of CFs is nested and interactive with the evolution of rules. The fitness of CFs called CF-fitness is introduced to guide this process. The evolution of CFs enables growing the complexity of CFs without a depth limit to address hierarchical features. The system is the first XCS with CFs in rule conditions that can learn complex problems that used to be intractable without transfer learning. The introduction of CF evolution allows appropriate latent building blocks that address subproblems to be grouped together and flexibly reused. (2) A new system of multitask learning is developed based on the estimation of the relatedness among tasks. A new dynamic parameter helps automate feature transfer among multiple tasks, which enables improved learning performance in supportive tasks and reduced negative influence between unrelated tasks. (3) A system of parallel learning agents, where each is an XCS with CF-actions, is developed to remove the requirement of a human-biased learning order. The system can provide a clear learning order and a highly interpretable network of knowledge. This network of knowledge enables the system to accumulate knowledge hierarchically and focus on only the novel aspects of any new task. </p><p>The research work has shown that CF-based LCSs can solve hierarchical and large-scale problems autonomously without (extensive) human guidance. The learnt knowledge represented by CFs is highly interpretable. This work is also a foundation for the systems that can learn continually. Ultimately, this thesis is a step towards general learners and problem solvers.</p>


2019 ◽  
Vol 7 (1) ◽  
pp. 87-95 ◽  
Author(s):  
Adam M. Forte ◽  
Kelin X. Whipple

Abstract. Quantitative analysis of digital topographic data is an increasingly important part of many studies in the geosciences. Initially, performing these analyses was a niche endeavor, requiring detailed domain knowledge and programming skills, but increasingly broad, flexible, open-source code bases have been developed to increasingly democratize topographic analysis. However, many of these analyses still require specific computing environments and/or moderate levels of knowledge of both the relevant programming language and the correct way to take these fundamental building blocks and conduct an efficient and effective topographic analysis. To partially address this, we have written the Topographic Analysis Kit (TAK), which leverages the power of one of these open code bases, TopoToolbox, to build a series of high-level topographic analysis tools to perform a variety of common topographic analyses. These analyses include the generation of maps of normalized channel steepness, or χ, and selection and statistical analysis of populations of watersheds. No programming skills or advanced mastery of MATLAB is required for effective use of TAK. In addition – to expand the utility of TAK along with the primary functions, which like the underlying TopoToolbox functions require MATLAB and several proprietary toolboxes to run – we provide compiled versions of these functions that use the free MATLAB Runtime Environment for users who do not have institutional access to MATLAB or all of the required toolboxes.


1988 ◽  
Vol 3 (3) ◽  
pp. 183-210 ◽  
Author(s):  
B. Chandrasekaran

AbstractThe level of abstraction of much of the work in knowledge-based systems (the rule, frame, logic level) is too low to provide a rich enough vocabulary for knowledge and control. I provide an overview of a framework called the Generic Task approach that proposes that knowledge systems should be built out of building blocks, each of which is appropriate for a basic type of problem solving. Each generic task uses forms of knowledge and control strategies that are characteristic to it, and are in general conceptually closer to domain knowledge. This facilitates knowledge acquisition and can produce a more perspicuous explanation of problem solving. The relationship of the constructs at the generic task level to the rule-frame level is analogous to that between high-level programming languages and assembly languages in computer science. I describe a set of generic tasks that have been found particularly useful in constructing diagnostic, design and planning systems. In particular, I describe two tools, CSRL and DSPL, that are useful for building classification-based diagnostic systems and skeletal planning systems respectively, and a high level toolbox that is under construction called the Generic Task toolbox.


2019 ◽  
Vol 1 (1) ◽  
pp. 31-39
Author(s):  
Ilham Safitra Damanik ◽  
Sundari Retno Andani ◽  
Dedi Sehendro

Milk is an important intake to meet nutritional needs. Both consumed by children, and adults. Indonesia has many producers of fresh milk, but it is not sufficient for national milk needs. Data mining is a science in the field of computers that is widely used in research. one of the data mining techniques is Clustering. Clustering is a method by grouping data. The Clustering method will be more optimal if you use a lot of data. Data to be used are provincial data in Indonesia from 2000 to 2017 obtained from the Central Statistics Agency. The results of this study are in Clusters based on 2 milk-producing groups, namely high-dairy producers and low-milk producing regions. From 27 data on fresh milk production in Indonesia, two high-level provinces can be obtained, namely: West Java and East Java. And 25 others were added in 7 provinces which did not follow the calculation of the K-Means Clustering Algorithm, including in the low level cluster.


Agriculture ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 371
Author(s):  
Yu Jin ◽  
Jiawei Guo ◽  
Huichun Ye ◽  
Jinling Zhao ◽  
Wenjiang Huang ◽  
...  

The remote sensing extraction of large areas of arecanut (Areca catechu L.) planting plays an important role in investigating the distribution of arecanut planting area and the subsequent adjustment and optimization of regional planting structures. Satellite imagery has previously been used to investigate and monitor the agricultural and forestry vegetation in Hainan. However, the monitoring accuracy is affected by the cloudy and rainy climate of this region, as well as the high level of land fragmentation. In this paper, we used PlanetScope imagery at a 3 m spatial resolution over the Hainan arecanut planting area to investigate the high-precision extraction of the arecanut planting distribution based on feature space optimization. First, spectral and textural feature variables were selected to form the initial feature space, followed by the implementation of the random forest algorithm to optimize the feature space. Arecanut planting area extraction models based on the support vector machine (SVM), BP neural network (BPNN), and random forest (RF) classification algorithms were then constructed. The overall classification accuracies of the SVM, BPNN, and RF models optimized by the RF features were determined as 74.82%, 83.67%, and 88.30%, with Kappa coefficients of 0.680, 0.795, and 0.853, respectively. The RF model with optimized features exhibited the highest overall classification accuracy and kappa coefficient. The overall accuracy of the SVM, BPNN, and RF models following feature optimization was improved by 3.90%, 7.77%, and 7.45%, respectively, compared with the corresponding unoptimized classification model. The kappa coefficient also improved. The results demonstrate the ability of PlanetScope satellite imagery to extract the planting distribution of arecanut. Furthermore, the RF is proven to effectively optimize the initial feature space, composed of spectral and textural feature variables, further improving the extraction accuracy of the arecanut planting distribution. This work can act as a theoretical and technical reference for the agricultural and forestry industries.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ha Min Son ◽  
Wooho Jeon ◽  
Jinhyun Kim ◽  
Chan Yeong Heo ◽  
Hye Jin Yoon ◽  
...  

AbstractAlthough computer-aided diagnosis (CAD) is used to improve the quality of diagnosis in various medical fields such as mammography and colonography, it is not used in dermatology, where noninvasive screening tests are performed only with the naked eye, and avoidable inaccuracies may exist. This study shows that CAD may also be a viable option in dermatology by presenting a novel method to sequentially combine accurate segmentation and classification models. Given an image of the skin, we decompose the image to normalize and extract high-level features. Using a neural network-based segmentation model to create a segmented map of the image, we then cluster sections of abnormal skin and pass this information to a classification model. We classify each cluster into different common skin diseases using another neural network model. Our segmentation model achieves better performance compared to previous studies, and also achieves a near-perfect sensitivity score in unfavorable conditions. Our classification model is more accurate than a baseline model trained without segmentation, while also being able to classify multiple diseases within a single image. This improved performance may be sufficient to use CAD in the field of dermatology.


2021 ◽  
Vol 9 (6) ◽  
pp. 1290
Author(s):  
Natalia Alvarez-Santullano ◽  
Pamela Villegas ◽  
Mario Sepúlveda Mardones ◽  
Roberto E. Durán ◽  
Raúl Donoso ◽  
...  

Burkholderia sensu lato (s.l.) species have a versatile metabolism. The aims of this review are the genomic reconstruction of the metabolic pathways involved in the synthesis of polyhydroxyalkanoates (PHAs) by Burkholderia s.l. genera, and the characterization of the PHA synthases and the pha genes organization. The reports of the PHA synthesis from different substrates by Burkholderia s.l. strains were reviewed. Genome-guided metabolic reconstruction involving the conversion of sugars and fatty acids into PHAs by 37 Burkholderia s.l. species was performed. Sugars are metabolized via the Entner–Doudoroff (ED), pentose-phosphate (PP), and lower Embden–Meyerhoff–Parnas (EMP) pathways, which produce reducing power through NAD(P)H synthesis and PHA precursors. Fatty acid substrates are metabolized via β-oxidation and de novo synthesis of fatty acids into PHAs. The analysis of 194 Burkholderia s.l. genomes revealed that all strains have the phaC, phaA, and phaB genes for PHA synthesis, wherein the phaC gene is generally present in ≥2 copies. PHA synthases were classified into four phylogenetic groups belonging to class I II and III PHA synthases and one outlier group. The reconstruction of PHAs synthesis revealed a high level of gene redundancy probably reflecting complex regulatory layers that provide fine tuning according to diverse substrates and physiological conditions.


Sign in / Sign up

Export Citation Format

Share Document