scholarly journals Machine Learning of Optical Properties of Materials - Predicting Spectra from Images and Images from Spectra

Author(s):  
Helge S. Stein ◽  
Dan Guevarra ◽  
Paul F Newhouse ◽  
Edwin Soedarmadji ◽  
John Gregoire

As the materials science community seeks to capitalize on recent advancements in computer science, the sparsity of well-labelled experimental data and limited throughput by which it can be generated have inhibited deployment of machine learning algorithms to date. Several successful examples in computational chemistry have inspired further adoption of machine learning algorithms, and in the present work we present autoencoding algorithms for measured optical properties of metal oxides, which can serve as an exemplar for the breadth and depth of data required for modern algorithms to learn the underlying structure of experimental materials science data. Our set of 180,902 distinct materials samples spans 78 distinct composition spaces, includes 45 elements, and contains more than 80,000 unique quinary oxide and 67,000 unique quaternary oxide compositions, making it the largest and most diverse experimental materials set utilized in machine learning studies. The extensive dataset enabled training and validation of 3 distinct models for mapping between sample images and absorption spectra, including a conditional variational autoencoder that generates images of hypothetical materials with tailored absorption properties. The absorption patterns auto-generated from sample images capture the salient features of ground truth spectra, and direct band gap energies extracted from these auto-generated patterns are quite accurate with a mean absolute error of 240 meV, which is the approximate uncertainty from traditional extraction of the band gap energy from measurements of the full transmission and reflection spectra. Optical properties of materials are not only ubiquitous in materials applications but also emblematic of the confluence of underlying physical phenomena that yield the type of complex data relationships that merit and benefit from neural network-type modelling.

Author(s):  
Helge S. Stein ◽  
Dan Guevarra ◽  
Paul F Newhouse ◽  
Edwin Soedarmadji ◽  
John Gregoire

As the materials science community seeks to capitalize on recent advancements in computer science, the sparsity of well-labelled experimental data and limited throughput by which it can be generated have inhibited deployment of machine learning algorithms to date. Several successful examples in computational chemistry have inspired further adoption of machine learning algorithms, and in the present work we present autoencoding algorithms for measured optical properties of metal oxides, which can serve as an exemplar for the breadth and depth of data required for modern algorithms to learn the underlying structure of experimental materials science data. Our set of 180,902 distinct materials samples spans 78 distinct composition spaces, includes 45 elements, and contains more than 80,000 unique quinary oxide and 67,000 unique quaternary oxide compositions, making it the largest and most diverse experimental materials set utilized in machine learning studies. The extensive dataset enabled training and validation of 3 distinct models for mapping between sample images and absorption spectra, including a conditional variational autoencoder that generates images of hypothetical materials with tailored absorption properties. The absorption patterns auto-generated from sample images capture the salient features of ground truth spectra, and direct band gap energies extracted from these auto-generated patterns are quite accurate with a mean absolute error of 240 meV, which is the approximate uncertainty from traditional extraction of the band gap energy from measurements of the full transmission and reflection spectra. Optical properties of materials are not only ubiquitous in materials applications but also emblematic of the confluence of underlying physical phenomena that yield the type of complex data relationships that merit and benefit from neural network-type modelling.


2018 ◽  
Vol 7 (2.8) ◽  
pp. 684 ◽  
Author(s):  
V V. Ramalingam ◽  
Ayantan Dandapath ◽  
M Karthik Raja

Heart related diseases or Cardiovascular Diseases (CVDs) are the main reason for a huge number of death in the world over the last few decades and has emerged as the most life-threatening disease, not only in India but in the whole world. So, there is a need of reliable, accurate and feasible system to diagnose such diseases in time for proper treatment. Machine Learning algorithms and techniques have been applied to various medical datasets to automate the analysis of large and complex data. Many researchers, in recent times, have been using several machine learning techniques to help the health care industry and the professionals in the diagnosis of heart related diseases. This paper presents a survey of various models based on such algorithms and techniques andanalyze their performance. Models based on supervised learning algorithms such as Support Vector Machines (SVM), K-Nearest Neighbour (KNN), NaïveBayes, Decision Trees (DT), Random Forest (RF) and ensemble models are found very popular among the researchers.


2021 ◽  
Author(s):  
Dillon Kessy ◽  
Jose Ignacio Sierra Castro ◽  
Jose Chirinos ◽  
Giorgio De Paola ◽  
Maria Jose Lopez Perez-Valiente

Abstract The application of Artificial Intelligence for planning has received increased attention in the energy industry in the past few years, particularly for the increased production efficiency requirements and environmental standards. The objective of this paper is to show the successful integration of production, completion, subsurface and spatial data using machine-learning algorithms to predict production performance for future development wells. The internal Marcellus Business Unit (MBU) well database, populated with data of 500+ historical wells, has been used in this study. Production data, treated as timeseries, has been processed using functional Principal Component Analysis (PCA) to allow removal of outliers and mode detection. Utilizing this data, a suite of machine-learning algorithms has been applied to reconstruct gas production from available and target well data. Uncertainty quantification has been provided for production curves to identify the quality of prediction. During the study, the sensitivity analysis on input variables has been performed iteratively to screen and rank the most important variables for prediction. The workflow, Unconventional Reservoir Assistant (URA), has been implemented in a proprietary cloud-based platform providing the necessary means for data upload, integration, pre-processing, and finally model training and deployment. This allows the user to focus on the evaluation of model output quality, data filter and workspace generation for continuous model testing and improvement. The full well dataset, split into trained and tested data, has been used for prediction as a preliminary guide to where the most prolific areas of development are located. Results were ranked based on production expected by pad and based on normalized performance. The information was then used to compare with type curves and original development order. In parallel, economic evaluation of break-even was performed to rank all future pads. Consequently, integration of the model prediction and breakeven ranking were used to generate the final development order for the MBU. The URA tool has shown preliminary success in predicting production performance for the pilot development area. Multiple case studies have been run achieving blind test results with high accuracy for historical prediction. Results show some dependency of predictor variable ranking on the field development area, providing insight on how subsurface may affect well dynamic behavior. This paper describes how the integration of URA can complement the development workflow for unconventional reservoirs and be used to predict performance based on complex data integration. The methodology used is superior to standard machine learning models providing only production indicators, as it gives the user the possibility to evaluate economics and completion design sensitivity for future well activities. The methodology can be further extended as a proxy model for well schedule optimization in planning or for better insight into well refrac selection.


2019 ◽  
Vol 10 (1) ◽  
pp. 47-55 ◽  
Author(s):  
Helge S. Stein ◽  
Dan Guevarra ◽  
Paul F. Newhouse ◽  
Edwin Soedarmadji ◽  
John M. Gregoire

Assembling the world's largest materials image and spectroscopy dataset enables training of machine learning models that learn hidden relationships in materials data, providing a key example of the data requirements to capitalize on recent advancements in computer science.


2021 ◽  
Vol 27 (4) ◽  
pp. 195-202
Author(s):  
Andrii Trostianchyn ◽  
Zoia Duriagina ◽  
Ivan Izonin ◽  
Roman Tkachenko ◽  
Volodymyr Kulyk ◽  
...  

The use of machine learning tools in modern materials science can significantly reduce the duration and cost of developing new materials and improving the properties of existing ones. This is especially true in studying expensive and strategic importance materials like alloys of rare earth metals, which are used to manufacture high-energy permanent magnets. At the same time, single machine learning algorithms do not always provide the accuracy required to solve a particular applied task. Therefore, the current paper aimed to develop an ensemble model for predicting the magnetic properties of Sm-Co system alloys with high accuracy. Based on literature data, we have collected the dataset of the relationship between phase composition, sample state, crystallographic orientation, microstructure, and magnetic properties. We have compared different machine learning algorithms. A stacking ensemble model was designed based on high-precision machine learning algorithms: Neural Networks, AdaBoost, Gradient Boosting, and Random Forest algorithm. The proposed ensemble scheme showed a significant increase in the accuracy for predicting the magnetic properties of Sm-Co alloys on the example of coercivity.


2020 ◽  
Vol 39 (5) ◽  
pp. 6579-6590
Author(s):  
Sandy Çağlıyor ◽  
Başar Öztayşi ◽  
Selime Sezgin

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.


2020 ◽  
pp. 1-11
Author(s):  
Jie Liu ◽  
Lin Lin ◽  
Xiufang Liang

The online English teaching system has certain requirements for the intelligent scoring system, and the most difficult stage of intelligent scoring in the English test is to score the English composition through the intelligent model. In order to improve the intelligence of English composition scoring, based on machine learning algorithms, this study combines intelligent image recognition technology to improve machine learning algorithms, and proposes an improved MSER-based character candidate region extraction algorithm and a convolutional neural network-based pseudo-character region filtering algorithm. In addition, in order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. The research results show that the algorithm proposed in this paper has a certain practical effect, and it can be applied to the English assessment system and the online assessment system of the homework evaluation system algorithm system.


2019 ◽  
Vol 1 (2) ◽  
pp. 78-80
Author(s):  
Eric Holloway

Detecting some patterns is a simple task for humans, but nearly impossible for current machine learning algorithms.  Here, the "checkerboard" pattern is examined, where human prediction nears 100% and machine prediction drops significantly below 50%.


Sign in / Sign up

Export Citation Format

Share Document