scholarly journals A Machine Learning-based Pipeline for the Classification of CTX-M in Metagenomics Samples

Processes ◽  
2019 ◽  
Vol 7 (4) ◽  
pp. 235 ◽  
Author(s):  
Diego Ceballos ◽  
Diana López-Álvarez ◽  
Gustavo Isaza ◽  
Reinel Tabares-Soto ◽  
Simón Orozco-Arias ◽  
...  

Bacterial infections are a major global concern, since they can lead to public health problems. To address this issue, bioinformatics contributes extensively with the analysis and interpretation of in silico data by enabling to genetically characterize different individuals/strains, such as in bacteria. However, the growing volume of metagenomic data requires new infrastructure, technologies, and methodologies that support the analysis and prediction of this information from a clinical point of view, as intended in this work. On the other hand, distributed computational environments allow the management of these large volumes of data, due to significant advances in processing architectures, such as multicore CPU (Central Process Unit) and GPGPU (General Propose Graphics Process Unit). For this purpose, we developed a bioinformatics workflow based on filtered metagenomic data with Duk tool. Data formatting was done through Emboss software and a prototype of a workflow. A pipeline was also designed and implemented in bash script based on machine learning. Further, Python 3 programming language was used to normalize the training data of the artificial neural network, which was implemented in the TensorFlow framework, and its behavior was visualized in TensorBoard. Finally, the values from the initial bioinformatics process and the data generated during the parameterization and optimization of the Artificial Neural Network are presented and validated based on the most optimal result for the identification of the CTX-M gene group.

Author(s):  
James A. Tallman ◽  
Michal Osusky ◽  
Nick Magina ◽  
Evan Sewall

Abstract This paper provides an assessment of three different machine learning techniques for accurately reproducing a distributed temperature prediction of a high-pressure turbine airfoil. A three-dimensional Finite Element Analysis thermal model of a cooled turbine airfoil was solved repeatedly (200 instances) for various operating point settings of the corresponding gas turbine engine. The response surface created by the repeated solutions was fed into three machine learning algorithms and surrogate model representations of the FEA model’s response were generated. The machine learning algorithms investigated were a Gaussian Process, a Boosted Decision Tree, and an Artificial Neural Network. Additionally, a simple Linear Regression surrogate model was created for comparative purposes. The Artificial Neural Network model proved to be the most successful at reproducing the FEA model over the range of operating points. The mean and standard deviation differences between the FEA and the Neural Network models were 15% and 14% of a desired accuracy threshold, respectively. The Digital Thread for Design (DT4D) was used to expedite all model execution and machine learning training. A description of DT4D is also provided.


2019 ◽  
Author(s):  
Blerta Rahmani ◽  
Hiqmet Kamberaj

AbstractIn this study, we employed a novel method for prediction of (macro)molecular properties using a swarm artificial neural network method as a machine learning approach. In this method, a (macro)molecular structure is represented by a so-called description vector, which then is the input in a so-called bootstrapping swarm artificial neural network (BSANN) for training the neural network. In this study, we aim to develop an efficient approach for performing the training of an artificial neural network using either experimental or quantum mechanics data. In particular, we aim to create different user-friendly online accessible databases of well-selected experimental (or quantum mechanics) results that can be used as proof of the concepts. Furthermore, with the optimized artificial neural network using the training data served as input for BSANN, we can predict properties and their statistical errors of new molecules using the plugins provided from that web-service. There are four databases accessible using the web-based service. That includes a database of 642 small organic molecules with known experimental hydration free energies, the database of 1475 experimental pKa values of ionizable groups in 192 proteins, the database of 2693 mutants in 14 proteins with given values of experimental values of changes in the Gibbs free energy, and a database of 7101 quantum mechanics heat of formation calculations.All the data are prepared and optimized in advance using the AMBER force field in CHARMM macromolecular computer simulation program. The BSANN is code for performing the optimization and prediction written in Python computer programming language. The descriptor vectors of the small molecules are based on the Coulomb matrix and sum over bonds properties, and for the macromolecular systems, they take into account the chemical-physical fingerprints of the region in the vicinity of each amino acid.Graphical TOC Entry


2021 ◽  
pp. 146808742110323
Author(s):  
Mohammad Hossein Moradi ◽  
Alexander Heinz ◽  
Uwe Wagner ◽  
Thomas Koch

To perform a suitable optimization method in terms of emission and efficiency for an internal combustion engine, first highly accurate and possible real-time capable modeling for the transient operations should be provided. In this work, the modeling of NO x and HC raw emission (before exhaust aftertreatment systems) in a six-cylinder gasoline engine under highly transient operation was performed using machine learning approaches. Three different machine learning methods, namely Artificial Neural Network, Long Short-Term Memory, and Random Forest were used and the results of these models were compared with each other. In general, the results show a significant improvement in accuracy compared to other studies that have modeled transient operations. Furthermore, the shortcoming of Artificial Neural Network for the prediction of the HC emission by the transient operation is observed. The coefficient of determination ( R2) for the best model for NO x prediction is 0.98 and 0.97 for the training data and test data, respectively. This value is 0.9 and 0.89 for the best HC prediction model.


2020 ◽  
Vol 15 ◽  
Author(s):  
Elham Shamsara ◽  
Sara Saffar Soflaei ◽  
Mohammad Tajfard ◽  
Ivan Yamshchikov ◽  
Habibollah Esmaili ◽  
...  

Background: Coronary artery disease (CAD) is an important cause of mortality and morbidity globally. Objective : The early prediction of the CAD would be valuable in identifying individuals at risk, and in focusing resources on its prevention. In this paper, we aimed to establish a diagnostic model to predict CAD by using three approaches of ANN (pattern recognition-ANN, LVQ-ANN, and competitive ANN). Methods: One promising method for early prediction of disease based on risk factors is machine learning. Among different machine learning algorithms, the artificial neural network (ANN) algo-rithms have been applied widely in medicine and a variety of real-world classifications. ANN is a non-linear computational model, that is inspired by the human brain to analyze and process complex datasets. Results: Different methods of ANN that are investigated in this paper indicates in both pattern recognition ANN and LVQ-ANN methods, the predictions of Angiography+ class have high accuracy. Moreover, in CNN the correlations between the individuals in cluster ”c” with the class of Angiography+ is strongly high. This accuracy indicates the significant difference among some of the input features in Angiography+ class and the other two output classes. A comparison among the chosen weights in these three methods in separating control class and Angiography+ shows that hs-CRP, FSG, and WBC are the most substantial excitatory weights in recognizing the Angiography+ individuals although, HDL-C and MCH are determined as inhibitory weights. Furthermore, the effect of decomposition of a multi-class problem to a set of binary classes and random sampling on the accuracy of the diagnostic model is investigated. Conclusion : This study confirms that pattern recognition-ANN had the most accuracy of performance among different methods of ANN. That’s due to the back-propagation procedure of the process in which the network classify input variables based on labeled classes. The results of binarization show that decomposition of the multi-class set to binary sets could achieve higher accuracy.


2020 ◽  
Vol 8 (10) ◽  
pp. 766
Author(s):  
Dohan Oh ◽  
Julia Race ◽  
Selda Oterkus ◽  
Bonguk Koo

Mechanical damage is recognized as a problem that reduces the performance of oil and gas pipelines and has been the subject of continuous research. The artificial neural network in the spotlight recently is expected to be another solution to solve the problems relating to the pipelines. The deep neural network, which is on the basis of artificial neural network algorithm and is a method amongst various machine learning methods, is applied in this study. The applicability of machine learning techniques such as deep neural network for the prediction of burst pressure has been investigated for dented API 5L X-grade pipelines. To this end, supervised learning is employed, and the deep neural network model has four layers with three hidden layers, and the neural network uses the fully connected layer. The burst pressure computed by deep neural network model has been compared with the results of finite element analysis based parametric study, and the burst pressure calculated by the experimental results. According to the comparison results, it showed good agreement. Therefore, it is concluded that deep neural networks can be another solution for predicting the burst pressure of API 5L X-grade dented pipelines.


2018 ◽  
Vol 215 ◽  
pp. 01011
Author(s):  
Sitti Amalia

This research proposed to design and implementation system of voice pattern recognition in the form of numbers with offline pronunciation. Artificial intelligent with backpropagation algorithm used on the simulation test. The test has been done to 100 voice files which got from 10 person voices for 10 different numbers. The words are consisting of number 0 to 9. The trial has been done with artificial neural network parameters such as tolerance value and the sum of a neuron. The best result is shown at tolerance value varied and a sum of the neuron is fixed. The percentage of this network training with optimal architecture and network parameter for each training data and new data are 82,2% and 53,3%. Therefore if tolerance value is fixed and a sum of neuron varied gave 82,2% for training data and 54,4% for new data


2021 ◽  
Vol 2021 ◽  
pp. 1-20
Author(s):  
Tuan Vu Dinh ◽  
Hieu Nguyen ◽  
Xuan-Linh Tran ◽  
Nhat-Duc Hoang

Soil erosion induced by rainfall is a critical problem in many regions in the world, particularly in tropical areas where the annual rainfall amount often exceeds 2000 mm. Predicting soil erosion is a challenging task, subjecting to variation of soil characteristics, slope, vegetation cover, land management, and weather condition. Conventional models based on the mechanism of soil erosion processes generally provide good results but are time-consuming due to calibration and validation. The goal of this study is to develop a machine learning model based on support vector machine (SVM) for soil erosion prediction. The SVM serves as the main prediction machinery establishing a nonlinear function that maps considered influencing factors to accurate predictions. In addition, in order to improve the accuracy of the model, the history-based adaptive differential evolution with linear population size reduction and population-wide inertia term (L-SHADE-PWI) is employed to find an optimal set of parameters for SVM. Thus, the proposed method, named L-SHADE-PWI-SVM, is an integration of machine learning and metaheuristic optimization. For the purpose of training and testing the method, a dataset consisting of 236 samples of soil erosion in Northwest Vietnam is collected with 10 influencing factors. The training set includes 90% of the original dataset; the rest of the dataset is reserved for assessing the generalization capability of the model. The experimental results indicate that the newly developed L-SHADE-PWI-SVM method is a competitive soil erosion predictor with superior performance statistics. Most importantly, L-SHADE-PWI-SVM can achieve a high classification accuracy rate of 92%, which is much better than that of backpropagation artificial neural network (87%) and radial basis function artificial neural network (78%).


Author(s):  
Hadjira Maouz ◽  
◽  
Asma Adda ◽  
Salah Hanini ◽  
◽  
...  

The concentration of carbonyl is one of the most important properties contributing to the detection of the thermal aging of polymer ethylene propylene diene monomer (EPDM). In this publication, an artificial neural network (ANN) model was developed to predict concentration of carbenyl during the thermal aging of EPDM using a database consisting of seven input variables. The best fitting training data was obtained with the architecture of (7 inputs neurons, 10 hidden neurons and 1 output neuron). A Levenberg Marquardt learning (LM) algorithm, hyperbolic tangent transfer function were used at the hidden and output layer respectively. The optimal ANN was obtained with a high correlation coefficient R= 0.995 and a very low root mean square error RMSE = 0.0148 mol/l during the generalization phase. The comparison between the experimental and calculated results show that the ANN model is able of predicted the concentration of carbonyl during the thermal aging of ethylene propylene diene monomer


Sign in / Sign up

Export Citation Format

Share Document