scholarly journals Efficient construction of linear models in materials modeling and applications to force constant expansions

2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Erik Fransson ◽  
Fredrik Eriksson ◽  
Paul Erhart

Abstract Linear models, such as force constant (FC) and cluster expansions, play a key role in physics and materials science. While they can in principle be parametrized using regression and feature selection approaches, the convergence behavior of these techniques, in particular with respect to thermodynamic properties is not well understood. Here, we therefore analyze the efficacy and efficiency of several state-of-the-art regression and feature selection methods, in particular in the context of FC extraction and the prediction of different thermodynamic properties. Generic feature selection algorithms such as recursive feature elimination with ordinary least-squares (OLS), automatic relevance determination regression, and the adaptive least absolute shrinkage and selection operator can yield physically sound models for systems with a modest number of degrees of freedom. For large unit cells with low symmetry and/or high-order expansions they come, however, with a non-negligible computational cost that can be more than two orders of magnitude higher than that of OLS. In such cases, OLS with cutoff selection provides a viable route as demonstrated here for both second-order FCs in large low-symmetry unit cells and high-order FCs in low-symmetry systems. While regression techniques are thus very powerful, they require well-tuned protocols. Here, the present work establishes guidelines for the design of protocols that are readily usable, e.g., in high-throughput and materials discovery schemes. Since the underlying algorithms are not specific to FC construction, the general conclusions drawn here also have a bearing on the construction of other linear models in physics and materials science.

Author(s):  
C. M. Sung ◽  
D. B. Williams

Researchers have tended to use high symmetry zone axes (e.g. <111> <114>) for High Order Laue Zone (HOLZ) line analysis since Jones et al reported the origin of HOLZ lines and described some of their applications. But it is not always easy to find HOLZ lines from a specific high symmetry zone axis during microscope operation, especially from second phases on a scale of tens of nanometers. Therefore it would be very convenient if we can use HOLZ lines from low symmetry zone axes and simulate these patterns in order to measure lattice parameter changes through HOLZ line shifts. HOLZ patterns of high index low symmetry zone axes are shown in Fig. 1, which were obtained from pure Al at -186°C using a double tilt cooling holder. Their corresponding simulated HOLZ line patterns are shown along with ten other low symmetry orientations in Fig. 2. The simulations were based upon kinematical diffraction conditions.


1999 ◽  
Vol 604 ◽  
Author(s):  
Rosa E. Meléndez ◽  
Andrew J. Carn ◽  
Kazuki Sada ◽  
Andrew D. Hamilton

AbstractThe use of organic molecules as gelators in certain organic solvents has been the target of recent research in materials science. The types of structures formed in the gel matrix have potential applications as porous solids that can be used as absorbents or in catalysis. We will present and discuss the organogelation properties of a family of bis-ureas. Studies presented will include a molecule structure activity relationship, thermodynamic properties, comparison to x-ray crystallographic data and potential functionalization of the gels formed by this class of compounds


2021 ◽  
Author(s):  
Joel C. Najmon ◽  
Homero Valladares ◽  
Andres Tovar

Abstract Multiscale topology optimization (MSTO) is a numerical design approach to optimally distribute material within coupled design domains at multiple length scales. Due to the substantial computational cost of performing topology optimization at multiple scales, MSTO methods often feature subroutines such as homogenization of parameterized unit cells and inverse homogenization of periodic microstructures. Parameterized unit cells are of great practical use, but limit the design to a pre-selected cell shape. On the other hand, inverse homogenization provide a physical representation of an optimal periodic microstructure at every discrete location, but do not necessarily embody a manufacturable structure. To address these limitations, this paper introduces a Gaussian process regression model-assisted MSTO method that features the optimal distribution of material at the macroscale and topology optimization of a manufacturable microscale structure. In the proposed approach, a macroscale optimization problem is solved using a gradient-based optimizer The design variables are defined as the homogenized stiffness tensors of the microscale topologies. As such, analytical sensitivity is not possible so the sensitivity coefficients are approximated using finite differences after each microscale topology is optimized. The computational cost of optimizing each microstructure is dramatically reduced by using Gaussian process regression models to approximate the homogenized stiffness tensor. The capability of the proposed MSTO method is demonstrated with two three-dimensional numerical examples. The correlation of the Gaussian process regression models are presented along with the final multiscale topologies for the two examples: a cantilever beam and a 3-point bending beam.


1991 ◽  
Vol 24 (6) ◽  
pp. 987-993 ◽  
Author(s):  
A. Boultif ◽  
D. Louër

The dichotomy method for indexing powder diffraction patterns for low-symmetry lattices is studied in terms of an optimization of bound relations used in the comparison of observed data with the calculated patterns generated at each level of the analysis. A rigorous mathematical treatment is presented for monoclinic and triclinic cases. A new program, DICVOL91, has been written, working from the cubic end of the symmetry sequence to triclinic lattices. The search of unit cells is exhaustive within input parameter limits, although a few restrictions for the hkl indices of the first two diffraction lines have been introduced in the study of triclinic symmetry. The efficiency of the method has been checked by means of a large number of accurate powder data, with a very high success rate. Calculation times appeared to be quite reasonable for the majority of examples, down to monoclinic symmetry, but were less predictable for triclinic cases. Applications to all symmetries, including cases with a dominant zone, are discussed.


2020 ◽  
Vol 11 ◽  
Author(s):  
Shuhei Kimura ◽  
Ryo Fukutomi ◽  
Masato Tokuhisa ◽  
Mariko Okada

Several researchers have focused on random-forest-based inference methods because of their excellent performance. Some of these inference methods also have a useful ability to analyze both time-series and static gene expression data. However, they are only of use in ranking all of the candidate regulations by assigning them confidence values. None have been capable of detecting the regulations that actually affect a gene of interest. In this study, we propose a method to remove unpromising candidate regulations by combining the random-forest-based inference method with a series of feature selection methods. In addition to detecting unpromising regulations, our proposed method uses outputs from the feature selection methods to adjust the confidence values of all of the candidate regulations that have been computed by the random-forest-based inference method. Numerical experiments showed that the combined application with the feature selection methods improved the performance of the random-forest-based inference method on 99 of the 100 trials performed on the artificial problems. However, the improvement tends to be small, since our combined method succeeded in removing only 19% of the candidate regulations at most. The combined application with the feature selection methods moreover makes the computational cost higher. While a bigger improvement at a lower computational cost would be ideal, we see no impediments to our investigation, given that our aim is to extract as much useful information as possible from a limited amount of gene expression data.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e10849
Author(s):  
Maximilian Knoll ◽  
Jennifer Furkel ◽  
Juergen Debus ◽  
Amir Abdollahi

Background Model building is a crucial part of omics based biomedical research to transfer classifications and obtain insights into underlying mechanisms. Feature selection is often based on minimizing error between model predictions and given classification (maximizing accuracy). Human ratings/classifications, however, might be error prone, with discordance rates between experts of 5–15%. We therefore evaluate if a feature pre-filtering step might improve identification of features associated with true underlying groups. Methods Data was simulated for up to 100 samples and up to 10,000 features, 10% of which were associated with the ground truth comprising 2–10 normally distributed populations. Binary and semi-quantitative ratings with varying error probabilities were used as classification. For feature preselection standard cross-validation (V2) was compared to a novel heuristic (V1) applying univariate testing, multiplicity adjustment and cross-validation on switched dependent (classification) and independent (features) variables. Preselected features were used to train logistic regression/linear models (backward selection, AIC). Predictions were compared against the ground truth (ROC, multiclass-ROC). As use case, multiple feature selection/classification methods were benchmarked against the novel heuristic to identify prognostically different G-CIMP negative glioblastoma tumors from the TCGA-GBM 450 k methylation array data cohort, starting from a fuzzy umap based rough and erroneous separation. Results V1 yielded higher median AUC ranks for two true groups (ground truth), with smaller differences for true graduated differences (3–10 groups). Lower fractions of models were successfully fit with V1. Median AUCs for binary classification and two true groups were 0.91 (range: 0.54–1.00) for V1 (Benjamini-Hochberg) and 0.70 (0.28–1.00) for V2, 13% (n = 616) of V2 models showed AUCs < = 50% for 25 samples and 100 features. For larger numbers of features and samples, median AUCs were 0.75 (range 0.59–1.00) for V1 and 0.54 (range 0.32–0.75) for V2. In the TCGA-GBM data, modelBuildR allowed best prognostic separation of patients with highest median overall survival difference (7.51 months) followed a difference of 6.04 months for a random forest based method. Conclusions The proposed heuristic is beneficial for the retrieval of features associated with two true groups classified with errors. We provide the R package modelBuildR to simplify (comparative) evaluation/application of the proposed heuristic (http://github.com/mknoll/modelBuildR).


2018 ◽  
Vol 13 (3) ◽  
pp. 323-336 ◽  
Author(s):  
Naeimeh Elkhani ◽  
Ravie Chandren Muniyandi ◽  
Gexiang Zhang

Computational cost is a big challenge for almost all intelligent algorithms which are run on CPU. In this regard, our proposed kernel P system multi-objective binary particle swarm optimization feature selection and classification method should perform with an efficient time that we aimed to settle via using potentials of membrane computing in parallel processing and nondeterminism. Moreover, GPUs perform better with latency-tolerant, highly parallel and independent tasks. In this study, to meet all the potentials of a membrane-inspired model particularly parallelism and to improve the time cost, feature selection method implemented on GPU. The time cost of the proposed method on CPU, GPU and Multicore indicates a significant improvement via implementing method on GPU.


Author(s):  
Awder Mohammed Ahmed ◽  
◽  
Adnan Mohsin Abdulazeez ◽  

Multi-label classification addresses the issues that more than one class label assigns to each instance. Many real-world multi-label classification tasks are high-dimensional due to digital technologies, leading to reduced performance of traditional multi-label classifiers. Feature selection is a common and successful approach to tackling this problem by retaining relevant features and eliminating redundant ones to reduce dimensionality. There is several feature selection that is successfully applied in multi-label learning. Most of those features are wrapper methods that employ a multi-label classifier in their processes. They run a classifier in each step, which requires a high computational cost, and thus they suffer from scalability issues. Filter methods are introduced to evaluate the feature subsets using information-theoretic mechanisms instead of running classifiers to deal with this issue. Most of the existing researches and review papers dealing with feature selection in single-label data. While, recently multi-label classification has a wide range of real-world applications such as image classification, emotion analysis, text mining, and bioinformatics. Moreover, researchers have recently focused on applying swarm intelligence methods in selecting prominent features of multi-label data. To the best of our knowledge, there is no review paper that reviews swarm intelligence-based methods for multi-label feature selection. Thus, in this paper, we provide a comprehensive review of different swarm intelligence and evolutionary computing methods of feature selection presented for multi-label classification tasks. To this end, in this review, we have investigated most of the well-known and state-of-the-art methods and categorize them based on different perspectives. We then provided the main characteristics of the existing multi-label feature selection techniques and compared them analytically. We also introduce benchmarks, evaluation measures, and standard datasets to facilitate research in this field. Moreover, we performed some experiments to compare existing works, and at the end of this survey, some challenges, issues, and open problems of this field are introduced to be considered by researchers in the future.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Jia Guo ◽  
Huajun Zhu ◽  
Zhen-Guo Yan ◽  
Lingyan Tang ◽  
Songhe Song

By introducing hybrid technique into high-order CPR (correction procedure via reconstruction) scheme, a novel hybrid WCNS-CPR scheme is developed for efficient supersonic simulations. Firstly, a shock detector based on nonlinear weights is used to identify grid cells with high gradients or discontinuities throughout the whole flow field. Then, WCNS (weighted compact nonlinear scheme) is adopted to capture shocks in these areas, while the smooth area is calculated by CPR. A strategy to treat the interfaces of the two schemes is developed, which maintains high-order accuracy. Convergent order of accuracy and shock-capturing ability are tested in several numerical experiments; the results of which show that this hybrid scheme achieves expected high-order accuracy and high resolution, is robust in shock capturing, and has less computational cost compared to the WCNS.


Sign in / Sign up

Export Citation Format

Share Document