scholarly journals adabmDCA: adaptive Boltzmann machine learning for biological sequences

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Anna Paola Muntoni ◽  
Andrea Pagnani ◽  
Martin Weigt ◽  
Francesco Zamponi

Abstract Background Boltzmann machines are energy-based models that have been shown to provide an accurate statistical description of domains of evolutionary-related protein and RNA families. They are parametrized in terms of local biases accounting for residue conservation, and pairwise terms to model epistatic coevolution between residues. From the model parameters, it is possible to extract an accurate prediction of the three-dimensional contact map of the target domain. More recently, the accuracy of these models has been also assessed in terms of their ability in predicting mutational effects and generating in silico functional sequences. Results Our adaptive implementation of Boltzmann machine learning, , can be generally applied to both protein and RNA families and accomplishes several learning set-ups, depending on the complexity of the input data and on the user requirements. The code is fully available at https://github.com/anna-pa-m/adabmDCA. As an example, we have performed the learning of three Boltzmann machines modeling the Kunitz and Beta-lactamase2 protein domains and TPP-riboswitch RNA domain. Conclusions The models learned by are comparable to those obtained by state-of-the-art techniques for this task, in terms of the quality of the inferred contact map as well as of the synthetically generated sequences. In addition, the code implements both equilibrium and out-of-equilibrium learning, which allows for an accurate and lossless training when the equilibrium one is prohibitive in terms of computational time, and allows for pruning irrelevant parameters using an information-based criterion.

2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
O. Obulesu ◽  
Suresh Kallam ◽  
Gaurav Dhiman ◽  
Rizwan Patan ◽  
Ramana Kadiyala ◽  
...  

Cancer is a complicated worldwide health issue with an increasing death rate in recent years. With the swift blooming of the high throughput technology and several machine learning methods that have unfolded in recent years, progress in cancer disease diagnosis has been made based on subset features, providing awareness of the efficient and precise disease diagnosis. Hence, progressive machine learning techniques that can, fortunately, differentiate lung cancer patients from healthy persons are of great concern. This paper proposes a novel Wilcoxon Signed-Rank Gain Preprocessing combined with Generative Deep Learning called Wilcoxon Signed Generative Deep Learning (WS-GDL) method for lung cancer disease diagnosis. Firstly, test significance analysis and information gain eliminate redundant and irrelevant attributes and extract many informative and significant attributes. Then, using a generator function, the Generative Deep Learning method is used to learn the deep features. Finally, a minimax game (i.e., minimizing error with maximum accuracy) is proposed to diagnose the disease. Numerical experiments on the Thoracic Surgery Data Set are used to test the WS-GDL method's disease diagnosis performance. The WS-GDL approach may create relevant and significant attributes and adaptively diagnose the disease by selecting optimal learning model parameters. Quantitative experimental results show that the WS-GDL method achieves better diagnosis performance and higher computing efficiency in computational time, computational complexity, and false-positive rate compared to state-of-the-art approaches.


Mathematics ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 17
Author(s):  
Fazlollah Soleymani ◽  
Houman Masnavi ◽  
Stanford Shateyi

Bankruptcy prediction has been broadly investigated using financial ratios methodologies. One involved factor is the quality of the portfolio of loans which is given. Hence, having a model to classify/predict position of each loan candidate based on several features is important. In this work, an application of machine learning approach in mathematical finance and banking is discussed. It is shown how we can classify some lending portfolios of banks under several features such as rating categories and various maturities. Dynamic updates of the portfolio are also given along with the top probabilities showing how the financial data of this type can be classified. The discussions and results reveal that a good algorithm for doing such a classification on large economic data of such type is the k-nearest neighbors (KNN) with k=1 along with parallelization even over the support vector machine, random forest, and artificial neural network techniques to save as much as possible on computational time.


Geophysics ◽  
2012 ◽  
Vol 77 (5) ◽  
pp. E379-E389 ◽  
Author(s):  
A. Abubakar ◽  
T. M. Habashy ◽  
Y. Lin ◽  
M. Li

We have developed a model-compression scheme for improving the efficiency of the regularized Gauss-Newton inversion algorithm for marine controlled-source electromagnetic applications. In this scheme, the unknown model parameters (the conductivity/resistivity distribution) are represented in terms of a basis such as Fourier and wavelet (Haar and Daubechies). By applying a truncation criterion, the model may then be approximated by a reduced number of basis functions, which is usually much less than the number of the model parameters. Further, because the controlled-source electromagnetic measurements have low resolution, it is sufficient for inversion to only keep the low-spatial-frequency part of the image. This model-compression scheme accelerates the computational time and also reduces the memory usage of the Gauss-Newton method. We are able to significantly reduce the algorithm computational complexity without compromising the quality of the inverted models.


2016 ◽  
Vol 2016 ◽  
pp. 1-18 ◽  
Author(s):  
Ilias Politis ◽  
Asimakis Lykourgiotis ◽  
Tasos Dagiuklas

The delivery of three-dimensional immersive media to individual users remains a highly challenging problem due to the large amount of data involved, diverse network characteristics, and user terminal requirements, as well as user’s context. This paper proposes a framework for quality of experience-aware delivering of three-dimensional video across heterogeneous wireless networks. The proposed architecture combines a Media-Aware Proxy (application layer filter), an enhanced version of IEEE 802.21 protocol for monitoring key performance parameters from different entities and multiple layers, and a QoE controller with a machine learning-based decision engine, capable of modelling the perceived video quality. The proposed architecture is fully integrated with the Long Term Evolution Enhanced Packet Core networks. The paper investigates machine learning-based techniques for producing an objective QoE model based on parameters from the physical, the data link, and the network layers. Extensive test-bed experiments and statistical analysis indicate that the proposed framework is capable of modelling accurately the impact of network impairments to the perceptual quality of three-dimensional video user.


Author(s):  
Lorenzo Rocutto ◽  
Enrico Prati

Boltzmann Machines constitute a paramount class of neural networks for unsupervised learning and recommendation systems. Their bipartite version, called Restricted Boltzmann Machine (RBM), is the most developed because of its satisfactory trade-off between computability on classical computers and computational power. Though the diffusion of RBMs is quite limited as their training remains hard. Recently, a renewed interest has emerged as Adiabatic Quantum Computers (AQCs), which suggest a potential increase of the training speed with respect to conventional hardware. Due to the limited number of connections among the qubits forming the graph of existing hardware, associating one qubit per node of the neural network implies an incomplete graph. Thanks to embedding techniques, we developed a complete graph connecting nodes constituted by virtual qubits. The complete graph outperforms previous implementations based on incomplete graphs. Despite the fact that the learning rate per epoch is still slower with respect to a classical machine, the advantage is expected by the increase of number of nodes which impacts on the classical computational time but not on the quantum hardware based computation.


Author(s):  
Diana Popova ◽  
◽  
Denis Popov ◽  
Nikita Samoylenko ◽  
◽  
...  

Aerodynamic processes mathematical modeling is carried out using numerical methods. Now the level of development of software numerical methods of three-dimensional gas-dynamic modeling of processes in turbomachinery makes it possible to determine with high accuracy the main characteristics of units at the design stage. It significantly reduces the time and cost of production. This article proposes a methodology for installation and improving the mathematical and grid model of HPT rotor blade to improve the quality of three-dimensional modeling. Aerodynamic processes mathematical modeling in aircraft turbojet engine blade rows is carried out using numerical methods. Grid model settings and turbulence model significantly affect the results qualitative characteristics and the calculations duration. This article proposes a methodology for grid model constructing based on local intense vortex formation and flow mixing places thickening. The influence of the grid and turbulence models parameters are estimated on the kinetic energy losses amount and secondary flows structure. The design model includes the building geometric model, preparation of the grid model and description of the turbulence model. Influence of grid and BSL and SST turbulence models on results of turbine blade aerodynamic calculation is considered in this article. Basic recommendations for the construction of mathematical and grid models in the ANSYS for uncooled rotor blades have been developed.


2021 ◽  
Author(s):  
Laura Marie Helleckes ◽  
Michael Osthege ◽  
Wolfgang Wiechert ◽  
Eric von Lieres ◽  
Marco Oldiges

High-throughput experimentation has revolutionized data-driven experimental sciences and opened the door to the application of machine learning techniques. Nevertheless, the quality of any data analysis strongly depends on the quality of the data and specifically the degree to which random effects in the experimental data-generating process are quantified and accounted for. Accordingly, calibration, i.e. the quantitative association between observed quantities with measurement responses, is a core element of many workflows in experimental sciences. Particularly in life sciences, univariate calibration, often involving non-linear saturation effects, must be performed to extract quantitative information from measured data. At the same time, the estimation of uncertainty is inseparably connected to quantitative experimentation. Adequate calibration models that describe not only the input/output relationship in a measurement system, but also its inherent measurement noise are required. Due to its mathematical nature, statistically robust calibration modeling remains a challenge for many practitioners, at the same time being extremely beneficial for machine learning applications. In this work, we present a bottom-up conceptual and computational approach that solves many problems of understanding and implementing non-linear, empirical calibration modeling for quantification of analytes and process modeling. The methodology is first applied to the optical measurement of biomass concentrations in a high-throughput cultivation system, then to the quantification of glucose by an automated enzymatic assay. We implemented the conceptual framework in two Python packages, with which we demonstrate how it makes uncertainty quantification for various calibration tasks more accessible. Our software packages enable more reproducible and automatable data analysis routines compared to commonly observed workflows in life sciences. Subsequently, we combine the previously established calibration models with a hierarchical Monod-like differential equation model of microbial growth to describe multiple replicates of Corynebacterium glutamicum batch microbioreactor cultures. Key process model parameters are learned by both maximum likelihood estimation and Bayesian inference, highlighting the flexibility of the statistical and computational framework.


2020 ◽  
pp. 108-111
Author(s):  
I.V. Ponomarev

Typically, a researcher faces computational difficulties in assessing the parameters of a certain model when modeling the relationship between different indicators of analysis. A suitable model is generally obtained by a sequential refinement of the features included in its composition and, therefore, by performing multiple repetitions of computational algorithms. At the same time, the computational complexity of these algorithms begins to play a significant role in modeling. A certain set of indicators is used to reduce the number of iterations. These indicators are responsible for the quality of the constructed model and capable of “signaling" about the need to adjust the model. The model parameters and the quality functional value are such indicators in regression modeling. They are able to answer the question of the appropriateness of building a particular model and are indicators of the quality of the resulting functional dependence. In this paper, we study methods and algorithms for constructing and evaluating the main indicators of L∞-regression — a quality indicator and model parameters. The first part of the paper describes the most efficient computational procedures for determining parameters in the case of a three-dimensional uniform regression model, indicates the complexity of these algorithms, and gives a geometric interpretation. In the second part of the paper, a series of theorems on estimating the values of the parameters of threedimensional L∞-regression is presented, and a formula for calculation of an indicator of sample uniformity is provided.


Author(s):  
Andrew McDonald ◽  

Decades of subsurface exploration and characterization have led to the collation and storage of large volumes of well-related data. The amount of data gathered daily continues to grow rapidly as technology and recording methods improve. With the increasing adoption of machine-learning techniques in the subsurface domain, it is essential that the quality of the input data is carefully considered when working with these tools. If the input data are of poor quality, the impact on precision and accuracy of the prediction can be significant. Consequently, this can impact key decisions about the future of a well or a field. This study focuses on well-log data, which can be highly multidimensional, diverse, and stored in a variety of file formats. Well-log data exhibits key characteristics of big data: volume, variety, velocity, veracity, and value. Well data can include numeric values, text values, waveform data, image arrays, maps, and volumes. All of which can be indexed by time or depth in a regular or irregular way. A significant portion of time can be spent gathering data and quality checking it prior to carrying out petrophysical interpretations and applying machine-learning models. Well-log data can be affected by numerous issues causing a degradation in data quality. These include missing data ranging from single data points to entire curves, noisy data from tool-related issues, borehole washout, processing issues, incorrect environmental corrections, and mislabeled data. Having vast quantities of data does not mean it can all be passed into a machine-learning algorithm with the expectation that the resultant prediction is fit for purpose. It is essential that the most important and relevant data are passed into the model through appropriate feature selection techniques. Not only does this improve the quality of the prediction, but it also reduces computational time and can provide a better understanding of how the models reach their conclusion. This paper reviews data quality issues typically faced by petrophysicists when working with well-log data and deploying machine-learning models. This is achieved by first providing an overview of machine learning and big data within the petrophysical domain, followed by a review of the common well-log data issues, their impact on machine-learning algorithms, and methods for mitigating their influence.


2021 ◽  
Author(s):  
Dario Barsi ◽  
Andrea Perrone ◽  
Luca Ratto ◽  
Gianluca Ricci ◽  
Marco Sanguineti

Abstract The present paper presents an enhanced method for multi-disciplinary design and optimization of centrifugal compressors based on Machine Learning (ML) algorithms. The typical approach involves the preliminary design, the geometry parameterization, the generation of aero-mechanical databases and a surrogate-model based optimization. This procedure is able to provide excellent results, but it is time consuming and has to be repeated for each new design. The aim of the proposed procedure is to actively exploit the simulations performed in the past for subsequent designs thanks to the predictive capabilities of the ML surrogate model. A commercial 3D (three dimensional) computational fluid dynamics (CFD) solver for the aerodynamic computations and a commercial finite element code for the mechanical integrity calculations, coupled with scripting modules, have been adopted. Two different compressors, with different geometry and operating conditions, have been designed and two aero-mechanical databases have been developed. Then, these two databases have been joined and have been used for the training and validation of the surrogate model. To assess the performance of this approach, two new compressors have been designed, case 1 with operating conditions between those of the databases used for training and validation and case 2 with operating conditions far above. The use of an optimizer coupled to the prediction of the surrogate model has enabled to define the “best set” of model parameters, in compliance with aero-mechanical objectives and constraints. The accuracy of the ML algorithm forecast has been evaluated through CFD and FEM simulations carried out iteratively on the optimal samples, with new simulations added to the database for further training of the surrogate model. The results have been presented with reference to cases 1 and 2 and highlight all the benefits of the proposed approach.


Sign in / Sign up

Export Citation Format

Share Document