scholarly journals Enzymatic Weight Update Algorithm for DNA-Based Molecular Learning

Molecules ◽  
2019 ◽  
Vol 24 (7) ◽  
pp. 1409 ◽  
Author(s):  
Christina Baek ◽  
Sang-Woo Lee ◽  
Beom-Jin Lee ◽  
Dong-Hyun Kwak ◽  
Byoung-Tak Zhang

Recent research in DNA nanotechnology has demonstrated that biological substrates can be used for computing at a molecular level. However, in vitro demonstrations of DNA computations use preprogrammed, rule-based methods which lack the adaptability that may be essential in developing molecular systems that function in dynamic environments. Here, we introduce an in vitro molecular algorithm that ‘learns’ molecular models from training data, opening the possibility of ‘machine learning’ in wet molecular systems. Our algorithm enables enzymatic weight update by targeting internal loop structures in DNA and ensemble learning, based on the hypernetwork model. This novel approach allows massively parallel processing of DNA with enzymes for specific structural selection for learning in an iterative manner. We also introduce an intuitive method of DNA data construction to dramatically reduce the number of unique DNA sequences needed to cover the large search space of feature sets. By combining molecular computing and machine learning the proposed algorithm makes a step closer to developing molecular computing technologies for future access to more intelligent molecular systems.

2019 ◽  
Author(s):  
Christina Baek ◽  
Sang-Woo Lee ◽  
Beom-Jin Lee ◽  
Dong-Hyun Kwak

Recent research in DNA nanotechnology has demonstrated that biological substrates can be used for computing at a molecular level. However, in vitro demonstrations of DNA computations use preprogrammed, rule-based methods which lack the adaptability that may be essential in developing molecular systems that function in dynamic environments. Here, we introduce an in vitro molecular algorithm that ‘learns’ molecular models from training data, opening the possibility of ‘machine learning’ in wet molecular systems. Our algorithm enables enzymatic weight update by targeting internal loop structures in DNA and ensemble learning, based on the hypernetwork model. This novel approach allows massively parallel processing of DNA with enzymes for specific structural selection for learning in an iterative manner. We also introduce an intuitive method of DNA data construction to dramatically reduce the number of unique DNA sequences needed to cover the large search space of feature sets. By combining molecular computing and machine learning the proposed algorithm makes a step closer to developing molecular computing technologies for future access to more intelligent molecular systems.


2019 ◽  
Author(s):  
Ge Liu ◽  
Haoyang Zeng ◽  
Jonas Mueller ◽  
Brandon Carter ◽  
Ziheng Wang ◽  
...  

AbstractThe precise targeting of antibodies and other protein therapeutics is required for their proper function and the elimination of deleterious off-target effects. Often the molecular structure of a therapeutic target is unknown and randomized methods are used to design antibodies without a model that relates antibody sequence to desired properties. Here we present a machine learning method that can design human Immunoglobulin G (IgG) antibodies with target affinities that are superior to candidates from phage display panning experiments within a limited design budget. We also demonstrate that machine learning can improve target-specificity by the modular composition of models from different experimental campaigns, enabling a new integrative approach to improving target specificity. Our results suggest a new path for the discovery of therapeutic molecules by demonstrating that predictive and differentiable models of antibody binding can be learned from high-throughput experimental data without the need for target structural data.SignificanceAntibody based therapeutics must meet both affinity and specificity metrics, and existing in vitro methods for meeting these metrics are based upon randomization and empirical testing. We demonstrate that with sufficient target-specific training data machine learning can suggest novel antibody variable domain sequences that are superior to those observed during training. Our machine learning method does not require any target structural information. We further show that data from disparate antibody campaigns can be combined by machine learning to improve antibody specificity.


2021 ◽  
Author(s):  
Enzo Losi ◽  
Mauro Venturini ◽  
Lucrezia Manservigi ◽  
Giuseppe Fabio Ceschini ◽  
Giovanni Bechini ◽  
...  

Abstract A gas turbine trip is an unplanned shutdown, of which the most relevant consequences are business interruption and a reduction of equipment remaining useful life. Thus, understanding the underlying causes of gas turbine trip would allow predicting its occurrence in order to maximize gas turbine profitability and improve its availability. In the ever competitive Oil & Gas sector, data mining and machine learning are increasingly being employed to support a deeper insight and improved operation of gas turbines. Among the various machine learning tools, Random Forests are an ensemble learning method consisting of an aggregation of decision tree classifiers. This paper presents a novel methodology aimed at exploiting information embedded in the data and develops Random Forest models, aimed at predicting gas turbine trip based on information gathered during a timeframe of historical data acquired from multiple sensors. The novel approach exploits time series segmentation to increase the amount of training data, thus reducing overfitting. First, data are transformed according to a feature engineering methodology developed in a separate work by the same authors. Then, Random Forest models are trained and tested on unseen observations to demonstrate the benefits of the novel approach. The superiority of the novel approach is proved by considering two real-word case-studies, involving filed data taken during three years of operation of two fleets of Siemens gas turbines located in different regions. The novel methodology allows values of Precision, Recall and Accuracy in the range 75–85 %, thus demonstrating the industrial feasibility of the predictive methodology.


Author(s):  
Nikhil Krishnaswamy ◽  
Scott Friedman ◽  
James Pustejovsky

Many modern machine learning approaches require vast amounts of training data to learn new concepts; conversely, human learning often requires few examples—sometimes only one—from which the learner can abstract structural concepts. We present a novel approach to introducing new spatial structures to an AI agent, combining deep learning over qualitative spatial relations with various heuristic search algorithms. The agent extracts spatial relations from a sparse set of noisy examples of block-based structures, and trains convolutional and sequential models of those relation sets. To create novel examples of similar structures, the agent begins placing blocks on a virtual table, uses a CNN to predict the most similar complete example structure after each placement, an LSTM to predict the most likely set of remaining moves needed to complete it, and recommends one using heuristic search. We verify that the agent learned the concept by observing its virtual block-building activities, wherein it ranks each potential subsequent action toward building its learned concept. We empirically assess this approach with human participants’ ratings of the block structures. Initial results and qualitative evaluations of structures generated by the trained agent show where it has generalized concepts from the training data, which heuristics perform best within the search space, and how we might improve learning and execution.


2020 ◽  
Vol 2020 ◽  
pp. 1-18
Author(s):  
G. Ramos ◽  
J. R. Vaz ◽  
G. V. Mendonça ◽  
P. Pezarat-Correia ◽  
J. Rodrigues ◽  
...  

Research in physiology and sports science has shown that fatigue, a complex psychophysiological phenomenon, has a relevant impact in performance and in the correct functioning of our motricity system, potentially being a cause of damage to the human organism. Fatigue can be seen as a subjective or objective phenomenon. Subjective fatigue corresponds to a mental and cognitive event, while fatigue referred as objective is a physical phenomenon. Despite the fact that subjective fatigue is often undervalued, only a physically and mentally healthy athlete is able to achieve top performance in a discipline. Therefore, we argue that physical training programs should address the preventive assessment of both subjective and objective fatigue mechanisms in order to minimize the risk of injuries. In this context, our paper presents a machine-learning system capable of extracting individual fatigue descriptors (IFDs) from electromyographic (EMG) and heart rate variability (HRV) measurements. Our novel approach, using two types of biosignals so that a global (mental and physical) fatigue assessment is taken into account, reflects the onset of fatigue by implementing a combination of a dimensionless (0-1) global fatigue descriptor (GFD) and a support vector machine (SVM) classifier. The system, based on 9 main combined features, achieves fatigue regime classification performances of 0.82±0.24, ensuring a successful preventive assessment when dangerous fatigue levels are reached. Training data were acquired in a constant work rate test (executed by 14 subjects using a cycloergometry device), where the variable under study (fatigue) gradually increased until the volunteer reached an objective exhaustion state.


2011 ◽  
Vol 48-49 ◽  
pp. 1275-1281 ◽  
Author(s):  
Yu Jen Hu ◽  
Yuh Hua Hu ◽  
Jyh Bin Ke

We proposed a categorized method of DNA sequences matrix by FCM (fuzzy cluster means). FCM avoided the errors caused by the reduction of dimensions. It further reached comprehensive machine learning. In our experiment, there are 40 training data which are artificial samples, and we verify the proposed method with 182 natural DNA sequences. The result showed the proposed method enhanced the accuracy of the classification of genes from 76% to 93%.


Energies ◽  
2020 ◽  
Vol 13 (18) ◽  
pp. 4868
Author(s):  
Raghuram Kalyanam ◽  
Sabine Hoffmann

Solar radiation data is essential for the development of many solar energy applications ranging from thermal collectors to building simulation tools, but its availability is limited, especially the diffuse radiation component. There are several studies aimed at predicting this value, but very few studies cover the generalizability of such models on varying climates. Our study investigates how well these models generalize and also show how to enhance their generalizability on different climates. Since machine learning approaches are known to generalize well, we apply them to truly understand how well they perform on different climates than they are originally trained. Therefore, we trained them on datasets from the U.S. and tested on several European climates. The machine learning model that is developed for U.S. climates not only showed low mean absolute error (MAE) of 23 W/m2, but also generalized very well on European climates with MAE in the range of 20 to 27 W/m2. Further investigation into the factors influencing the generalizability revealed that careful selection of the training data can improve the results significantly.


Author(s):  
Enzo Losi ◽  
Mauro Venturini ◽  
Lucrezia Manservigi ◽  
Giuseppe Fabio Ceschini ◽  
Giovanni Bechini ◽  
...  

Abstract A gas turbine trip is an unplanned shutdown, of which the most relevant consequences are business interruption and a reduction of equipment remaining useful life. Thus, understanding the underlying causes of gas turbine trip would allow predicting its occurrence in order to maximize gas turbine profitability and improve its availability. In the ever competitive Oil & Gas sector, data mining and machine learning are increasingly being employed to support a deeper insight and improved operation of gas turbines. Among the various machine learning tools, Random Forests are an ensemble learning method consisting of an aggregation of decision tree classifiers. This paper presents a novel methodology aimed at exploiting information embedded in the data and develops Random Forest models, aimed at predicting gas turbine trip based on information gathered during a timeframe of historical data acquired from multiple sensors. The novel approach exploits time series segmentation to increase the amount of training data, thus reducing overfitting. First, data are transformed according to a feature engineering methodology developed in a separate work by the same authors. Then, Random Forest models are trained and tested on unseen observations to demonstrate the benefits of the novel approach. The superiority of the novel approach is proved by considering two real-word case-studies, involving field data taken during three years of operation of two fleets of gas turbines located in different regions. The novel methodology allows values of Precision, Recall and Accuracy in the range 75 - 85 %, thus demonstrating the industrial feasibility of the predictive methodology.


2010 ◽  
Vol 36 (2) ◽  
pp. 203-227 ◽  
Author(s):  
Kumiko Tanaka-Ishii ◽  
Satoshi Tezuka ◽  
Hiroshi Terada

This article presents a novel approach for readability assessment through sorting. A comparator that judges the relative readability between two texts is generated through machine learning, and a given set of texts is sorted by this comparator. Our proposal is advantageous because it solves the problem of a lack of training data, because the construction of the comparator only requires training data annotated with two reading levels. The proposed method is compared with regression methods and a state-of-the art classification method. Moreover, we present our application, called Terrace, which retrieves texts with readability similar to that of a given input text.


Author(s):  
Matteo Lorenzini ◽  
Marco Rospocher ◽  
Sara Tonelli

AbstractMetadata are fundamental for the indexing, browsing and retrieval of cultural heritage resources in repositories, digital libraries and catalogues. In order to be effectively exploited, metadata information has to meet some quality standards, typically defined in the collection usage guidelines. As manually checking the quality of metadata in a repository may not be affordable, especially in large collections, in this paper we specifically address the problem of automatically assessing the quality of metadata, focusing in particular on textual descriptions of cultural heritage items. We describe a novel approach based on machine learning that tackles this problem by framing it as a binary text classification task aimed at evaluating the accuracy of textual descriptions. We report our assessment of different classifiers using a new dataset that we developed, containing more than 100K descriptions. The dataset was extracted from different collections and domains from the Italian digital library “Cultura Italia” and was annotated with accuracy information in terms of compliance with the cataloguing guidelines. The results empirically confirm that our proposed approach can effectively support curators (F1 $$\sim $$ ∼  0.85) in assessing the quality of the textual descriptions of the records in their collections and provide some insights into how training data, specifically their size and domain, can affect classification performance.


Sign in / Sign up

Export Citation Format

Share Document