scholarly journals Iterative training set refinement enables reactive molecular dynamics via machine learned forces

RSC Advances ◽  
2020 ◽  
Vol 10 (8) ◽  
pp. 4293-4299 ◽  
Author(s):  
Lei Chen ◽  
Ivan Sukuba ◽  
Michael Probst ◽  
Alexander Kaiser

Reactive self-sputtering from a Be surface is simulated using neural network trained forces with high accuracy. The key in machine learning from DFT calculations is a well-balanced and complete training set of energies and forces obtained by iterative refinement.

2021 ◽  
Author(s):  
Wael Alnahari

Abstract In this paper, I proposed an iris recognition system by using deep learning via neural networks (CNN). Although CNN is used for machine learning, the recognition is achieved by building a non-trained CNN network with multiple layers. The main objective of the code the test pictures’ category (aka person name) with a high accuracy rate after having extracted enough features from training pictures of the same category which are obtained from a that I added to the code. I used IITD iris which included 10 iris pictures for 223 people.


2019 ◽  
Vol 8 (6) ◽  
pp. 799 ◽  
Author(s):  
Cheng-Shyuan Rau ◽  
Shao-Chun Wu ◽  
Jung-Fang Chuang ◽  
Chun-Ying Huang ◽  
Hang-Tsung Liu ◽  
...  

Background: We aimed to build a model using machine learning for the prediction of survival in trauma patients and compared these model predictions to those predicted by the most commonly used algorithm, the Trauma and Injury Severity Score (TRISS). Methods: Enrolled hospitalized trauma patients from 2009 to 2016 were divided into a training dataset (70% of the original data set) for generation of a plausible model under supervised classification, and a test dataset (30% of the original data set) to test the performance of the model. The training and test datasets comprised 13,208 (12,871 survival and 337 mortality) and 5603 (5473 survival and 130 mortality) patients, respectively. With the provision of additional information such as pre-existing comorbidity status or laboratory data, logistic regression (LR), support vector machine (SVM), and neural network (NN) (with the Stuttgart Neural Network Simulator (RSNNS)) were used to build models of survival prediction and compared to the predictive performance of TRISS. Predictive performance was evaluated by accuracy, sensitivity, and specificity, as well as by area under the curve (AUC) measures of receiver operating characteristic curves. Results: In the validation dataset, NN and the TRISS presented the highest score (82.0%) for balanced accuracy, followed by SVM (75.2%) and LR (71.8%) models. In the test dataset, NN had the highest balanced accuracy (75.1%), followed by the TRISS (70.2%), SVM (70.6%), and LR (68.9%) models. All four models (LR, SVM, NN, and TRISS) exhibited a high accuracy of more than 97.5% and a sensitivity of more than 98.6%. However, NN exhibited the highest specificity (51.5%), followed by the TRISS (41.5%), SVM (40.8%), and LR (38.5%) models. Conclusions: These four models (LR, SVM, NN, and TRISS) exhibited a similar high accuracy and sensitivity in predicting the survival of the trauma patients. In the test dataset, the NN model had the highest balanced accuracy and predictive specificity.


Author(s):  
Kamlesh A. Waghmare ◽  
Sheetal K. Bhala

Tourist reviews are the source of data that is going to be used for the travelers around the world to find the hotels for their stay according to their comfort. In this the hotels are ranked over the parameters or aspects considered keeping travelers in mind. This computation of data sets is done with the help of the machine learning algorithms and the neural network. The knowledge processing done over the reviews generates the sentiment score for each hotel with respect to the aspects defined. Here, the explicit , implicit and co-referential aspects are identified by suppressing the noise. This paper proposes the method that can be best used for the detection of the sentiments with the high accuracy.


Satellite images are important for developing and protected environmental resources that can be used for flood detection. The satellite image of before-flooding and after-flooding to be segmented and feature with integration of deeply LRNN and CNN networks for giving high accuracy. It is also important for learning LRNN and CNN is able to find the feature of flooding regions sufficiently and, it will influence the effectiveness of flood relief. The CNNs and LRNNs consists of two set are training set and testing set. The before flooding and after flooding of satellite images to be extract and segment formed by testing and training phase of data patches. All patches are trained by LRNN where changes occur or any misdetection of flooded region to extract accurately without delay. This proposed method obtain accuracy of system is 99% of flood region detections.


2021 ◽  
Author(s):  
Siddharth Ghule ◽  
Sayan Bagchi ◽  
Kumar Vanka

<div>Electricity generation is a major contributing factor for greenhouse gas emissions. Energy storage systems available today have a combined capacity to store less than 1% of the electricity being consumed worldwide. Redox Flow Batteries (RFBs) are promising candidates for green and efficient energy storage systems. RFBs are being used in renewable energy systems, but their widespread adoption is limited due to high production costs and toxicity associated with the transition-metal-based redox-active species. Therefore, cheaper and greener alternative organic redox-active species are being investigated. Recent reports have shown organic molecules based on phenazine are promising candidates for redox-active species in RFBs. However, the large number of available organic compounds makes the conventional experimental and DFT methods impractical to screen thousands of molecules in a reasonable amount of time. In contrast, machine-learning models have low development time, short prediction time, and high accuracy; thus, are being heavily investigated for virtual screening applications. In this work, we developed machine-learning models to predict the redox potential of phenazine derivatives in DME solvent using a small dataset of 185 molecules. 2D, 3D, and Molecular Fingerprint features were computed using readily available and easy-to-use python libraries, making our approach easily adaptable to similar work. Twenty linear and non-linear machine-learning models were investigated in this work. These models achieved excellent performance on the unseen data (i.e., R<sup>2</sup> > 0.98, MSE < 0.008 V2 and MAE < 0.07 V). Model performance was assessed in a consistent manner using the training and evaluation pipeline developed in this work. We showed that 2D molecular features are most informative and achieve the best prediction accuracy among four feature sets. We also showed that often less preferred but relatively faster linear models could perform better than non-linear models when the feature set contains different types of features (i.e., 2D, 3D, and Molecular Fingerprints). Further investigations revealed that it is possible to reduce the training and inference time without sacrificing prediction accuracy by using a small subset of features. Moreover, models were able to predict the previously reported promising redox-active compounds with high accuracy. Also, significantly low prediction errors were observed for the functional groups. Although some functional groups had only one compound in the training set, best-performing models could achieve errors (MAPE) less than 10%. The major source of error was a lack of data near-zero and in the positive region. Therefore, this work shows that it is possible to develop accurate machine-learning models that could potentially screen millions of compounds in a short amount of time with a small training set and limited number of easy to compute features. Thus, results obtained in this report would help in the adoption of green energy by accelerating the field of materials discovery for energy storage applications.</div>


2021 ◽  
Vol 13 (16) ◽  
pp. 3176
Author(s):  
Beata Hejmanowska ◽  
Piotr Kramarczyk ◽  
Ewa Głowienka ◽  
Sławomir Mikrut

The study presents the analysis of the possible use of limited number of the Sentinel-2 and Sentinel-1 to check if crop declarations that the EU farmers submit to receive subsidies are true. The declarations used in the research were randomly divided into two independent sets (training and test). Based on the training set, supervised classification of both single images and their combinations was performed using random forest algorithm in SNAP (ESA) and our own Python scripts. A comparative accuracy analysis was performed on the basis of two forms of confusion matrix (full confusion matrix commonly used in remote sensing and binary confusion matrix used in machine learning) and various accuracy metrics (overall accuracy, accuracy, specificity, sensitivity, etc.). The highest overall accuracy (81%) was obtained in the simultaneous classification of multitemporal images (three Sentinel-2 and one Sentinel-1). An unexpectedly high accuracy (79%) was achieved in the classification of one Sentinel-2 image at the end of May 2018. Noteworthy is the fact that the accuracy of the random forest method trained on the entire training set is equal 80% while using the sampling method ca. 50%. Based on the analysis of various accuracy metrics, it can be concluded that the metrics used in machine learning, for example: specificity and accuracy, are always higher then the overall accuracy. These metrics should be used with caution, because unlike the overall accuracy, to calculate these metrics, not only true positives but also false positives are used as positive results, giving the impression of higher accuracy. Correct calculation of overall accuracy values is essential for comparative analyzes. Reporting the mean accuracy value for the classes as overall accuracy gives a false impression of high accuracy. In our case, the difference was 10–16% for the validation data, and 25–45% for the test data.


2020 ◽  
Vol 1461 ◽  
pp. 012182
Author(s):  
Aravind Krishnamoorthy ◽  
Pankaj Rajak ◽  
Sungwook Hong ◽  
Ken-ichi Nomura ◽  
Subodh Tiwari ◽  
...  

2020 ◽  
Vol 2 (1) ◽  
pp. 17-31
Author(s):  
Szde Yu

The present study compared three methods aimed at predicting the writer's gender based on writing features manifested in electronic discourse. The compared methods included qualitative content analysis, statistical analysis, and machine learning. These methods were further combined to create a mixed methods model. The findings showed that the machine learning model combined with qualitative content analysis produced the best prediction accuracy. Including qualitative content analysis was able to improve accuracy rates even when the training set for machine learning was relatively small. Thus, this study presented a concise model that can be fairly reliable in predicting gender based on electronic discourse with high accuracy rates and such accuracy was consistently found when the model was tested by two separate samples.


Sign in / Sign up

Export Citation Format

Share Document