Structure prediction of multi-principal element alloys using ensemble learning

2019 ◽  
Vol 37 (3) ◽  
pp. 1003-1022 ◽  
Author(s):  
Amitava Choudhury ◽  
Tanmay Konnur ◽  
P.P. Chattopadhyay ◽  
Snehanshu Pal

Purpose The purpose of this paper, is to predict the various phases and crystal structure from multi-component alloys. Nowadays, the concept and strategies of the development of multi-principal element alloys (MPEAs) significantly increase the count of the potential candidate of alloy systems, which demand proper screening of large number of alloy systems based on the nature of their phase and structure. Experimentally obtained data linking elemental properties and their resulting phases for MPEAs is profused; hence, there is a strong scope for categorization/classification of MPEAs based on structural features of the resultant phase along with distinctive connections between elemental properties and phases. Design/methodology/approach In this paper, several machine-learning algorithms have been used to recognize the underlying data pattern using data sets to design MPEAs and classify them based on structural features of their resultant phase such as single-phase solid solution, amorphous and intermetallic compounds. Further classification of MPEAs having single-phase solid solution is performed based on crystal structure using an ensemble-based machine-learning algorithm known as random-forest algorithm. Findings The model developed by implementing random-forest algorithm has resulted in an accuracy of 91 per cent for phase prediction and 93 per cent for crystal structure prediction for single-phase solid solution class of MPEAs. Five input parameters are used in the prediction model namely, valence electron concentration, difference in the pauling negativeness, atomic size difference, mixing enthalpy and mixing entropy. It has been found that the valence electron concentration is the most important feature with respect to prediction of phases. To avoid overfitting problem, fivefold cross-validation has been performed. To understand the comparative performance, different algorithms such as K-nearest Neighbor, support vector machine, logistic regression, naïve-based approach, decision tree and neural network have been used in the data set. Originality/value In this paper, the authors described the phase selection and crystal structure prediction mechanism in MPEA data set and have achieved better accuracy using machine learning.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Taewon Jin ◽  
Ina Park ◽  
Taesu Park ◽  
Jaesik Park ◽  
Ji Hoon Shim

AbstractProperties of solid-state materials depend on their crystal structures. In solid solution high entropy alloy (HEA), its mechanical properties such as strength and ductility depend on its phase. Therefore, the crystal structure prediction should be preceded to find new functional materials. Recently, the machine learning-based approach has been successfully applied to the prediction of structural phases. However, since about 80% of the data set is used as a training set in machine learning, it is well known that it requires vast cost for preparing a dataset of multi-element alloy as training. In this work, we develop an efficient approach to predicting the multi-element alloys' structural phases without preparing a large scale of the training dataset. We demonstrate that our method trained from binary alloy dataset can be applied to the multi-element alloys' crystal structure prediction by designing a transformation module from raw features to expandable form. Surprisingly, without involving the multi-element alloys in the training process, we obtain an accuracy, 80.56% for the phase of the multi-element alloy and 84.20% accuracy for the phase of HEA. It is comparable with the previous machine learning results. Besides, our approach saves at least three orders of magnitude computational cost for HEA by employing expandable features. We suggest that this accelerated approach can be applied to predicting various structural properties of multi-elements alloys that do not exist in the current structural database.


2017 ◽  
Vol 45 (2) ◽  
pp. 66-74
Author(s):  
Yufeng Ma ◽  
Long Xia ◽  
Wenqi Shen ◽  
Mi Zhou ◽  
Weiguo Fan

Purpose The purpose of this paper is automatic classification of TV series reviews based on generic categories. Design/methodology/approach What the authors mainly applied is using surrogate instead of specific roles or actors’ name in reviews to make reviews more generic. Besides, feature selection techniques and different kinds of classifiers are incorporated. Findings With roles’ and actors’ names replaced by generic tags, the experimental result showed that it can generalize well to agnostic TV series as compared with reviews keeping the original names. Research limitations/implications The model presented in this paper must be built on top of an already existed knowledge base like Baidu Encyclopedia. Such database takes lots of work. Practical implications Like in digital information supply chain, if reviews are part of the information to be transported or exchanged, then the model presented in this paper can help automatically identify individual review according to different requirements and help the information sharing. Originality/value One originality is that the authors proposed the surrogate-based approach to make reviews more generic. Besides, they also built a review data set of hot Chinese TV series, which includes eight generic category labels for each review.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Rajit Nair ◽  
Santosh Vishwakarma ◽  
Mukesh Soni ◽  
Tejas Patel ◽  
Shubham Joshi

Purpose The latest 2019 coronavirus (COVID-2019), which first appeared in December 2019 in Wuhan's city in China, rapidly spread around the world and became a pandemic. It has had a devastating impact on daily lives, the public's health and the global economy. The positive cases must be identified as soon as possible to avoid further dissemination of this disease and swift care of patients affected. The need for supportive diagnostic instruments increased, as no specific automated toolkits are available. The latest results from radiology imaging techniques indicate that these photos provide valuable details on the virus COVID-19. User advanced artificial intelligence (AI) technologies and radiological imagery can help diagnose this condition accurately and help resolve the lack of specialist doctors in isolated areas. In this research, a new paradigm for automatic detection of COVID-19 with bare chest X-ray images is displayed. Images are presented. The proposed model DarkCovidNet is designed to provide correct binary classification diagnostics (COVID vs no detection) and multi-class (COVID vs no results vs pneumonia) classification. The implemented model computed the average precision for the binary and multi-class classification of 98.46% and 91.352%, respectively, and an average accuracy of 98.97% and 87.868%. The DarkNet model was used in this research as a classifier for a real-time object detection method only once. A total of 17 convolutionary layers and different filters on each layer have been implemented. This platform can be used by the radiologists to verify their initial application screening and can also be used for screening patients through the cloud. Design/methodology/approach This study also uses the CNN-based model named Darknet-19 model, and this model will act as a platform for the real-time object detection system. The architecture of this system is designed in such a way that they can be able to detect real-time objects. This study has developed the DarkCovidNet model based on Darknet architecture with few layers and filters. So before discussing the DarkCovidNet model, look at the concept of Darknet architecture with their functionality. Typically, the DarkNet architecture consists of 5 pool layers though the max pool and 19 convolution layers. Assume as a convolution layer, and as a pooling layer. Findings The work discussed in this paper is used to diagnose the various radiology images and to develop a model that can accurately predict or classify the disease. The data set used in this work is the images bases on COVID-19 and non-COVID-19 taken from the various sources. The deep learning model named DarkCovidNet is applied to the data set, and these have shown signification performance in the case of binary classification and multi-class classification. During the multi-class classification, the model has shown an average accuracy 98.97% for the detection of COVID-19, whereas in a multi-class classification model has achieved an average accuracy of 87.868% during the classification of COVID-19, no detection and Pneumonia. Research limitations/implications One of the significant limitations of this work is that a limited number of chest X-ray images were used. It is observed that patients related to COVID-19 are increasing rapidly. In the future, the model on the larger data set which can be generated from the local hospitals will be implemented, and how the model is performing on the same will be checked. Originality/value Deep learning technology has made significant changes in the field of AI by generating good results, especially in pattern recognition. A conventional CNN structure includes a convolution layer that extracts characteristics from the input using the filters it applies, a pooling layer that reduces calculation efficiency and the neural network's completely connected layer. A CNN model is created by integrating one or more of these layers, and its internal parameters are modified to accomplish a specific mission, such as classification or object recognition. A typical CNN structure has a convolution layer that extracts features from the input with the filters it applies, a pooling layer to reduce the size for computational performance and a fully connected layer, which is a neural network. A CNN model is created by combining one or more such layers, and its internal parameters are adjusted to accomplish a particular task, such as classification or object recognition.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Tressy Thomas ◽  
Enayat Rajabi

PurposeThe primary aim of this study is to review the studies from different dimensions including type of methods, experimentation setup and evaluation metrics used in the novel approaches proposed for data imputation, particularly in the machine learning (ML) area. This ultimately provides an understanding about how well the proposed framework is evaluated and what type and ratio of missingness are addressed in the proposals. The review questions in this study are (1) what are the ML-based imputation methods studied and proposed during 2010–2020? (2) How the experimentation setup, characteristics of data sets and missingness are employed in these studies? (3) What metrics were used for the evaluation of imputation method?Design/methodology/approachThe review process went through the standard identification, screening and selection process. The initial search on electronic databases for missing value imputation (MVI) based on ML algorithms returned a large number of papers totaling at 2,883. Most of the papers at this stage were not exactly an MVI technique relevant to this study. The literature reviews are first scanned in the title for relevancy, and 306 literature reviews were identified as appropriate. Upon reviewing the abstract text, 151 literature reviews that are not eligible for this study are dropped. This resulted in 155 research papers suitable for full-text review. From this, 117 papers are used in assessment of the review questions.FindingsThis study shows that clustering- and instance-based algorithms are the most proposed MVI methods. Percentage of correct prediction (PCP) and root mean square error (RMSE) are most used evaluation metrics in these studies. For experimentation, majority of the studies sourced the data sets from publicly available data set repositories. A common approach is that the complete data set is set as baseline to evaluate the effectiveness of imputation on the test data sets with artificially induced missingness. The data set size and missingness ratio varied across the experimentations, while missing datatype and mechanism are pertaining to the capability of imputation. Computational expense is a concern, and experimentation using large data sets appears to be a challenge.Originality/valueIt is understood from the review that there is no single universal solution to missing data problem. Variants of ML approaches work well with the missingness based on the characteristics of the data set. Most of the methods reviewed lack generalization with regard to applicability. Another concern related to applicability is the complexity of the formulation and implementation of the algorithm. Imputations based on k-nearest neighbors (kNN) and clustering algorithms which are simple and easy to implement make it popular across various domains.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Lam Hoang Viet Le ◽  
Toan Luu Duc Huynh ◽  
Bryan S. Weber ◽  
Bao Khac Quoc Nguyen

PurposeThis paper aims to identify the disproportionate impacts of the COVID-19 pandemic on labor markets.Design/methodology/approachThe authors conduct a large-scale survey on 16,000 firms from 82 industries in Ho Chi Minh City, Vietnam, and analyze the data set by using different machine-learning methods.FindingsFirst, job loss and reduction in state-owned enterprises have been significantly larger than in other types of organizations. Second, employees of foreign direct investment enterprises suffer a significantly lower labor income than those of other groups. Third, the adverse effects of the COVID-19 pandemic on the labor market are heterogeneous across industries and geographies. Finally, firms with high revenue in 2019 are more likely to adopt preventive measures, including the reduction of labor forces. The authors also find a significant correlation between firms' revenue and labor reduction as traditional econometrics and machine-learning techniques suggest.Originality/valueThis study has two main policy implications. First, although government support through taxes has been provided, the authors highlight evidence that there may be some additional benefit from targeting firms that have characteristics associated with layoffs or other negative labor responses. Second, the authors provide information that shows which firm characteristics are associated with particular labor market responses such as layoffs, which may help target stimulus packages. Although the COVID-19 pandemic affects most industries and occupations, heterogeneous firm responses suggest that there could be several varieties of targeted policies-targeting firms that are likely to reduce labor forces or firms likely to face reduced revenue. In this paper, the authors outline several industries and firm characteristics which appear to more directly be reducing employee counts or having negative labor responses which may lead to more cost–effect stimulus.


Author(s):  
A. Hanel ◽  
H. Klöden ◽  
L. Hoegner ◽  
U. Stilla

Today, cameras mounted in vehicles are used to observe the driver as well as the objects around a vehicle. In this article, an outline of a concept for image based recognition of dynamic traffic situations is shown. A dynamic traffic situation will be described by road users and their intentions. Images will be taken by a vehicle fleet and aggregated on a server. On these images, new strategies for machine learning will be applied iteratively when new data has arrived on the server. The results of the learning process will be models describing the traffic situation and will be transmitted back to the recording vehicles. The recognition will be performed as a standalone function in the vehicles and will use the received models. It can be expected, that this method can make the detection and classification of objects around the vehicles more reliable. In addition, the prediction of their actions for the next seconds should be possible. As one example how this concept is used, a method to recognize the illumination situation of a traffic scene is described. This allows to handle different appearances of objects depending on the illumination of the scene. Different illumination classes will be defined to distinguish different illumination situations. Intensity based features are extracted from the images and used by a classifier to assign an image to an illumination class. This method is being tested for a real data set of daytime and nighttime images. It can be shown, that the illumination class can be classified correctly for more than 80% of the images.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Sandeepkumar Hegde ◽  
Monica R. Mundada

Purpose According to the World Health Organization, by 2025, the contribution of chronic disease is expected to rise by 73% compared to all deaths and it is considered as global burden of disease with a rate of 60%. These diseases persist for a longer duration of time, which are almost incurable and can only be controlled. Cardiovascular disease, chronic kidney disease (CKD) and diabetes mellitus are considered as three major chronic diseases that will increase the risk among the adults, as they get older. CKD is considered a major disease among all these chronic diseases, which will increase the risk among the adults as they get older. Overall 10% of the population of the world is affected by CKD and it is likely to double in the year 2030. The paper aims to propose novel feature selection approach in combination with the machine-learning algorithm which can early predict the chronic disease with utmost accuracy. Hence, a novel feature selection adaptive probabilistic divergence-based feature selection (APDFS) algorithm is proposed in combination with the hyper-parameterized logistic regression model (HLRM) for the early prediction of chronic disease. Design/methodology/approach A novel feature selection APDFS algorithm is proposed which explicitly handles the feature associated with the class label by relevance and redundancy analysis. The algorithm applies the statistical divergence-based information theory to identify the relationship between the distant features of the chronic disease data set. The data set required to experiment is obtained from several medical labs and hospitals in India. The HLRM is used as a machine-learning classifier. The predictive ability of the framework is compared with the various algorithm and also with the various chronic disease data set. The experimental result illustrates that the proposed framework is efficient and achieved competitive results compared to the existing work in most of the cases. Findings The performance of the proposed framework is validated by using the metric such as recall, precision, F1 measure and ROC. The predictive performance of the proposed framework is analyzed by passing the data set belongs to various chronic disease such as CKD, diabetes and heart disease. The diagnostic ability of the proposed approach is demonstrated by comparing its result with existing algorithms. The experimental figures illustrated that the proposed framework performed exceptionally well in prior prediction of CKD disease with an accuracy of 91.6. Originality/value The capability of the machine learning algorithms depends on feature selection (FS) algorithms in identifying the relevant traits from the data set, which impact the predictive result. It is considered as a process of choosing the relevant features from the data set by removing redundant and irrelevant features. Although there are many approaches that have been already proposed toward this objective, they are computationally complex because of the strategy of following a one-step scheme in selecting the features. In this paper, a novel feature selection APDFS algorithm is proposed which explicitly handles the feature associated with the class label by relevance and redundancy analysis. The proposed algorithm handles the process of feature selection in two separate indices. Hence, the computational complexity of the algorithm is reduced to O(nk+1). The algorithm applies the statistical divergence-based information theory to identify the relationship between the distant features of the chronic disease data set. The data set required to experiment is obtained from several medical labs and hospitals of karkala taluk ,India. The HLRM is used as a machine learning classifier. The predictive ability of the framework is compared with the various algorithm and also with the various chronic disease data set. The experimental result illustrates that the proposed framework is efficient and achieved competitive results are compared to the existing work in most of the cases.


Author(s):  
Alexander M. Zolotarev ◽  
Brian J. Hansen ◽  
Ekaterina A. Ivanova ◽  
Katelynn M. Helfrich ◽  
Ning Li ◽  
...  

Background: Atrial fibrillation (AF) can be maintained by localized intramural reentrant drivers. However, AF driver detection by clinical surface-only multielectrode mapping (MEM) has relied on subjective interpretation of activation maps. We hypothesized that application of machine learning to electrogram frequency spectra may accurately automate driver detection by MEM and add some objectivity to the interpretation of MEM findings. Methods: Temporally and spatially stable single AF drivers were mapped simultaneously in explanted human atria (n=11) by subsurface near-infrared optical mapping (NIOM; 0.3 mm 2 resolution) and 64-electrode MEM (higher density or lower density with 3 and 9 mm 2 resolution, respectively). Unipolar MEM and NIOM recordings were processed by Fourier transform analysis into 28 407 total Fourier spectra. Thirty-five features for machine learning were extracted from each Fourier spectrum. Results: Targeted driver ablation and NIOM activation maps efficiently defined the center and periphery of AF driver preferential tracks and provided validated annotations for driver versus nondriver electrodes in MEM arrays. Compared with analysis of single electrogram frequency features, averaging the features from each of the 8 neighboring electrodes, significantly improved classification of AF driver electrograms. The classification metrics increased when less strict annotation, including driver periphery electrodes, were added to driver center annotation. Notably, f1-score for the binary classification of higher-density catheter data set was significantly higher than that of lower-density catheter (0.81±0.02 versus 0.66±0.04, P <0.05). The trained algorithm correctly highlighted 86% of driver regions with higher density but only 80% with lower-density MEM arrays (81% for lower-density+higher-density arrays together). Conclusions: The machine learning model pretrained on Fourier spectrum features allows efficient classification of electrograms recordings as AF driver or nondriver compared with the NIOM gold-standard. Future application of NIOM-validated machine learning approach may improve the accuracy of AF driver detection for targeted ablation treatment in patients.


Sign in / Sign up

Export Citation Format

Share Document