Incremental Search Space Construction for Machine Learning Pipeline Synthesis

In this chapter, an initiative is proposed where natural language processing (NLP) techniques and supervised machine learning algorithms have been combined to detect external plagiarism. The major emphasis is on to construct a framework to detect plagiarism from monolingual texts by implementing n-gram frequency comparison approach. The framework is based on 120 characteristics which have been extracted during pre-processing steps using simple NLP approach. Afterward, filter metrics has been applied to select most relevant features and supervised classification learning algorithm has been used later to classify the documents in four levels of plagiarism. Then, confusion matrix was built to estimate the false positives and false negatives. Finally, the authors have shown C4.5 decision tree-based classifier's suitability on calculating accuracy over naive Bayes. The framework achieved 89% accuracy with low false positive and false negative rate and it shows higher precision and recall value comparing to passage similarities method, sentence similarity method, and search space reduction method.

Download Full-text

Role of machine learning and artificial intelligence algorithms for teaching reform of linguistics

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189365 ◽

2020 ◽

pp. 1-12

Author(s):

Wang Li

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Search Space ◽

Teaching Experiment ◽

Machine Learning Algorithms ◽

Teaching Process ◽

Root Cause ◽

Pruning Strategy ◽

Intelligence Models ◽

Artificial Intelligence Models

The teaching of linguistics is limited by the influence of various factors, which leads to poor teaching effect, and the teaching process is difficult to evaluate. In order to improve the efficiency of linguistics teaching, this paper uses improved machine learning algorithms to construct a linguistics artificial intelligence teaching model. According to the teaching needs of linguistics, the efficiency of the teaching process is improved, and the teaching evaluation is performed, and the root cause analysis algorithm based on MCTS is optimized. Moreover, according to the frequent item set algorithm in data mining, a layered pruning strategy is proposed to further reduce the search space and improve the efficiency of the model. In addition, this study combines with the comparative teaching experiment to study the efficiency of artificial intelligence models in linguistics teaching. The statistical results show that the model proposed in this paper has a certain effect.

Download Full-text

Predicting disease-causing variant combinations

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1815601116 ◽

2019 ◽

pp. 201815601 ◽

Cited By ~ 6

Author(s):

Sofia Papadimitriou ◽

Andrea Gazzo ◽

Nassim Versbraegen ◽

Charlotte Nachtegael ◽

Jan Aerts ◽

...

Keyword(s):

Machine Learning ◽

Patient Care ◽

Rare Diseases ◽

Search Space ◽

Pathogenic Variant ◽

Genetic Models ◽

Biological Information ◽

Learning Approach ◽

Clinical Knowledge ◽

Machine Learning Approach

Notwithstanding important advances in the context of single-variant pathogenicity identification, novel breakthroughs in discerning the origins of many rare diseases require methods able to identify more complex genetic models. We present here the Variant Combinations Pathogenicity Predictor (VarCoPP), a machine-learning approach that identifies pathogenic variant combinations in gene pairs (called digenic or bilocus variant combinations). We show that the results produced by this method are highly accurate and precise, an efficacy that is endorsed when validating the method on recently published independent disease-causing data. Confidence labels of 95% and 99% are identified, representing the probability of a bilocus combination being a true pathogenic result, providing geneticists with rational markers to evaluate the most relevant pathogenic combinations and limit the search space and time. Finally, the VarCoPP has been designed to act as an interpretable method that can provide explanations on why a bilocus combination is predicted as pathogenic and which biological information is important for that prediction. This work provides an important step toward the genetic understanding of rare diseases, paving the way to clinical knowledge and improved patient care.

Download Full-text

Enzymatic Weight Update Algorithm for DNA-Based Molecular Learning

Molecules ◽

10.3390/molecules24071409 ◽

2019 ◽

Vol 24 (7) ◽

pp. 1409 ◽

Cited By ~ 1

Author(s):

Christina Baek ◽

Sang-Woo Lee ◽

Beom-Jin Lee ◽

Dong-Hyun Kwak ◽

Byoung-Tak Zhang

Keyword(s):

Machine Learning ◽

Dna Sequences ◽

Dna Nanotechnology ◽

Search Space ◽

Molecular Computing ◽

Training Data ◽

Molecular Systems ◽

Novel Approach ◽

Biological Substrates

Recent research in DNA nanotechnology has demonstrated that biological substrates can be used for computing at a molecular level. However, in vitro demonstrations of DNA computations use preprogrammed, rule-based methods which lack the adaptability that may be essential in developing molecular systems that function in dynamic environments. Here, we introduce an in vitro molecular algorithm that ‘learns’ molecular models from training data, opening the possibility of ‘machine learning’ in wet molecular systems. Our algorithm enables enzymatic weight update by targeting internal loop structures in DNA and ensemble learning, based on the hypernetwork model. This novel approach allows massively parallel processing of DNA with enzymes for specific structural selection for learning in an iterative manner. We also introduce an intuitive method of DNA data construction to dramatically reduce the number of unique DNA sequences needed to cover the large search space of feature sets. By combining molecular computing and machine learning the proposed algorithm makes a step closer to developing molecular computing technologies for future access to more intelligent molecular systems.

Download Full-text

Determining Solvability in the Birds of a Feather Card Game

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019627 ◽

2019 ◽

Vol 33 ◽

pp. 9627-9634

Author(s):

Shuto Araki ◽

Juan Pablo Arenas Uribe ◽

Zach Wilkerson ◽

Steven Bogaerts ◽

Chad Byers

Keyword(s):

Machine Learning ◽

Graph Theory ◽

Search Space ◽

Card Game

Birds of a Feather is a single-player card game in which cards are arranged in a grid. The player attempts to combine stacks of cards under certain rules, with the goal being to combine all cards into a single stack. This paper highlights several approaches for efficiently classifying whether a randomlychosen state has a single-stack solution. These approaches use graph theory and machine learning concepts to prune a state’s search space, resulting in significant reductions in runtime relative to a baseline search.

Download Full-text

Machine learning based search space optimisation for drug discovery

2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) ◽

10.1109/cibcb.2013.6595390 ◽

2013 ◽

Cited By ~ 5

Author(s):

Upul Senanayake ◽

Rahal Prabuddha ◽

Roshan Ragel

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Search Space

Download Full-text

A Survey on Evolutionary Computation Approaches to Feature Selection

10.26686/wgtn.14214497 ◽

2021 ◽

Author(s):

Bing Xue ◽

Mengjie Zhang ◽

William Browne ◽

X Yao

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Evolutionary Computation ◽

State Of The Art ◽

Search Space ◽

Mining Machine ◽

Future Research ◽

Selection Problems ◽

Personal Use ◽

Index Terms

Feature selection is an important task in data miningand machine learning to reduce the dimensionality of the dataand increase the performance of an algorithm, such as a clas-sification algorithm. However, feature selection is a challengingtask due mainly to the large search space. A variety of methodshave been applied to solve feature selection problems, whereevolutionary computation techniques have recently gained muchattention and shown some success. However, there are no compre-hensive guidelines on the strengths and weaknesses of alternativeapproaches. This leads to a disjointed and fragmented fieldwith ultimately lost opportunities for improving performanceand successful applications. This paper presents a comprehensivesurvey of the state-of-the-art work on evolutionary computationfor feature selection, which identifies the contributions of thesedifferent algorithms. In addition, current issues and challengesare also discussed to identify promising areas for future research. Index Terms—Evolutionary computation, feature selection,classification, data mining, machine learning. © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Download Full-text

A Survey on Evolutionary Computation Approaches to Feature Selection

10.26686/wgtn.14214497.v1 ◽

2021 ◽

Author(s):

Bing Xue ◽

Mengjie Zhang ◽

William Browne ◽

X Yao

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Evolutionary Computation ◽

State Of The Art ◽

Search Space ◽

Mining Machine ◽

Future Research ◽

Selection Problems ◽

Personal Use ◽

Index Terms

Feature selection is an important task in data miningand machine learning to reduce the dimensionality of the dataand increase the performance of an algorithm, such as a clas-sification algorithm. However, feature selection is a challengingtask due mainly to the large search space. A variety of methodshave been applied to solve feature selection problems, whereevolutionary computation techniques have recently gained muchattention and shown some success. However, there are no compre-hensive guidelines on the strengths and weaknesses of alternativeapproaches. This leads to a disjointed and fragmented fieldwith ultimately lost opportunities for improving performanceand successful applications. This paper presents a comprehensivesurvey of the state-of-the-art work on evolutionary computationfor feature selection, which identifies the contributions of thesedifferent algorithms. In addition, current issues and challengesare also discussed to identify promising areas for future research. Index Terms—Evolutionary computation, feature selection,classification, data mining, machine learning. © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Download Full-text

Self-assembling Peptide Discovery: Overcoming Human Bias With Machine Learning

10.21203/rs.3.rs-505801/v1 ◽

2021 ◽

Author(s):

Rohit Batra ◽

Troy Loeffler ◽

Henry Chan ◽

Srilok Sriniva ◽

Honggang Cui ◽

...

Keyword(s):

Machine Learning ◽

Self Assembly ◽

Search Space ◽

Sequence Length ◽

Monte Carlo Tree Search ◽

Self Assembling ◽

Naturally Occurring ◽

Peptide Materials ◽

Dynamics Simulations ◽

Better Than

Abstract Peptide materials have a wide array of functions from tissue engineering, surface coatings to catalysis and sensing. This class of biopolymer is composed of a sequence, comprised of 20 naturally occurring amino acids whose arrangement dictate the peptide functionality. While it is highly desirable to tailor the amino acid sequence, a small increase in their sequence length leads to dramatic increase in the possible candidates (e.g., from tripeptide = 20^3 or 8,000 peptides to a pentapeptide = 20^5 or 3.2 M). Traditionally, peptide design is guided by the use of structural propensity tables, hydrophobicity scales, or other desired properties and typically yields <10 peptides per study, barely scraping the surface of the search space. These approaches, driven by human expertise and intuition, are not easily scalable and are riddled with human bias. Here, we introduce a machine learning workflow that combines Monte Carlo tree search and random forest, with molecular dynamics simulations to develop a fully autonomous computational search engine (named, AI-expert) to discover peptide sequences with high potential for self-assembly (as a representative target functionality). We demonstrate the efficacy of the AI-expert to efficiently search large spaces of tripeptides and pentapeptides. Subsequent experiments on the proposed peptide sequences are performed to compare the predictability of the AI-expert with those of human experts. The AI performs on-par or better than human experts and suggests several non-intuitive sequences with high self-assembly propensity, outlining its potential to overcome human bias and accelerate peptide discovery.

Download Full-text