scholarly journals Data analytics accelerates the experimental discovery of new thermoelectric materials with extremely high figure of merit

Author(s):  
Sergey Levchenko ◽  
Yaqiong Zhong ◽  
Xiaojuan Hu ◽  
Debalaya Sarker ◽  
Qingrui Xia ◽  
...  

Abstract Thermoelectric (TE) materials are among very few sustainable yet feasible energy solutions of present time. This huge promise of energy harvesting is contingent on identifying/designing materials having higher efficiency than presently available ones. However, due to the vastness of the chemical space of materials, only its small fraction was scanned experimentally and/or computationally so far. Employing a compressed-sensing based symbolic regression in an active-learning framework, we have not only identified a trend in materials’ compositions for superior TE performance, but have also predicted and experimentally synthesized several extremely high performing novel TE materials. Among these, we found polycrystalline p-type Cu0.45Ag0.55GaTe2 to possess an experimental figure of merit as high as ~2.8 at 827 K. This is a breakthrough in the field, because all previously known thermoelectric materials with a comparable figure of merit are either unstable or much more difficult to synthesize, rendering them unusable in large-scale applications. The presented methodology demonstrates the importance and tremendous potential of physically informed descriptors in material science, in particular for relatively small data sets typically available from experiments at well-controlled conditions.

2018 ◽  
Author(s):  
Jan H. Jensen

This paper presents a comparison of a graph-based genetic algorithm (GB-GA) and machine learning (ML) results for the optimisation of logP values with a constraint for synthetic accessibility and shows that GA is as good or better than the ML approaches for this particular property. The molecules found by GB-GA bear little resemblance to the molecules used to construct the initial mating pool, indicating that the GB-GA approach can traverse a relatively large distance in chemical space using relatively few (50) generations. The paper also introduces a new non-ML graph-based generative model (GB-GM) that can be parameterized using very small data sets and combined with a Monte Carlo tree search (MCTS) algorithm. The results are comparable to previously published results (Sci. Technol. Adv. Mater. 2017, 18, 972-976) using a recurrent neural network (RNN) generative model, while the GB-GM-based method is orders of magnitude faster. The MCTS results seem more dependent on the composition of the training set than the GA approach for this particular property. Our results suggest that the performance of new ML-based generative models should be compared to more traditional, and often simpler, approaches such a GA.


2019 ◽  
Author(s):  
Jan H. Jensen

This paper presents a comparison of a graph-based genetic algorithm (GB-GA) and machine learning (ML) results for the optimisation of logP values with a constraint for synthetic accessibility and shows that GA is as good or better than the ML approaches for this particular property. The molecules found by GB-GA bear little resemblance to the molecules used to construct the initial mating pool, indicating that the GB-GA approach can traverse a relatively large distance in chemical space using relatively few (50) generations. The paper also introduces a new non-ML graph-based generative model (GB-GM) that can be parameterized using very small data sets and combined with a Monte Carlo tree search (MCTS) algorithm. The results are comparable to previously published results (Sci. Technol. Adv. Mater. 2017, 18, 972-976) using a recurrent neural network (RNN) generative model, while the GB-GM-based method is orders of magnitude faster. The MCTS results seem more dependent on the composition of the training set than the GA approach for this particular property. Our results suggest that the performance of new ML-based generative models should be compared to more traditional, and often simpler, approaches such a GA.


2019 ◽  
Author(s):  
Jan H. Jensen

This paper presents a comparison of a graph-based genetic algorithm (GB-GA) and machine learning (ML) results for the optimisation of logP values with a constraint for synthetic accessibility and shows that GA is as good or better than the ML approaches for this particular property. The molecules found by GB-GA bear little resemblance to the molecules used to construct the initial mating pool, indicating that the GB-GA approach can traverse a relatively large distance in chemical space using relatively few (50) generations. The paper also introduces a new non-ML graph-based generative model (GB-GM) that can be parameterized using very small data sets and combined with a Monte Carlo tree search (MCTS) algorithm. The results are comparable to previously published results (Sci. Technol. Adv. Mater. 2017, 18, 972-976) using a recurrent neural network (RNN) generative model, while the GB-GM-based method is orders of magnitude faster. The MCTS results seem more dependent on the composition of the training set than the GA approach for this particular property. Our results suggest that the performance of new ML-based generative models should be compared to more traditional, and often simpler, approaches such a GA.


Author(s):  
Tiziano Flati ◽  
Silvia Gioiosa ◽  
Nicola Spallanzani ◽  
Ilario Tagliaferri ◽  
Maria Angela Diroma ◽  
...  

AbstractBackgroundRNA editing is a widespread co-/post-transcriptional mechanism that alters primary RNA sequences through the modification of specific nucleotides and it can increase both the transcriptome and proteome diversity. The automatic detection of RNA-editing from RNA-seq data is computational intensive and limited to small data sets, thus preventing a reliable genome-wide characterisation of such process.ResultsIn this work we introduce HPC-REDItools, an upgraded tool for accurate RNA-editing events discovery from large dataset repositories. Availability: https://github.com/BioinfoUNIBA/REDItools2.ConclusionsHPC-REDItools is dramatically faster than the previous version, REDItools, enabling big-data analysis by means of a MPI-based implementation and scaling almost linearly with the number of available cores.


2021 ◽  
Vol 72 (2) ◽  
pp. 603-617
Author(s):  
Moulay Zaidan Lahjouji-Seppälä ◽  
Achim Rabus

Abstract Quantitative, corpus based research on spontaneous spoken Carpathian Rusyn language can cause several data-related problems: Speakers are using ambivalent forms in different quantities, resulting in a biased data set – while a stricter data-cleaning process would lead to a large scale data loss. On top of that, polytomous categorical dependent variables are hard to analyze due to methodological limitations. This paper provides several approaches to face unbalanced and biased data sets containing variation of conjugational forms of the verb maty ‘to have’ and (po-)znaty ‘to know’ in Carpathian Rusyn language. Using resampling based methods like Cross-Validation, Bootstrapping and Random Forests, we provide a strategy for circumventing possible methodological pitfalls and gaining the most information from our precious data, without trying to p-hack the results. Calculating the predictive power of several sociolinguistic factors on linguistic variation, we can make valid statements about the (sociolinguistic) status of Rusyn and the stability of the old dialect continuum of Rusyn varieties.


Author(s):  
Jianping Ju ◽  
Hong Zheng ◽  
Xiaohang Xu ◽  
Zhongyuan Guo ◽  
Zhaohui Zheng ◽  
...  

AbstractAlthough convolutional neural networks have achieved success in the field of image classification, there are still challenges in the field of agricultural product quality sorting such as machine vision-based jujube defects detection. The performance of jujube defect detection mainly depends on the feature extraction and the classifier used. Due to the diversity of the jujube materials and the variability of the testing environment, the traditional method of manually extracting the features often fails to meet the requirements of practical application. In this paper, a jujube sorting model in small data sets based on convolutional neural network and transfer learning is proposed to meet the actual demand of jujube defects detection. Firstly, the original images collected from the actual jujube sorting production line were pre-processed, and the data were augmented to establish a data set of five categories of jujube defects. The original CNN model is then improved by embedding the SE module and using the triplet loss function and the center loss function to replace the softmax loss function. Finally, the depth pre-training model on the ImageNet image data set was used to conduct training on the jujube defects data set, so that the parameters of the pre-training model could fit the parameter distribution of the jujube defects image, and the parameter distribution was transferred to the jujube defects data set to complete the transfer of the model and realize the detection and classification of the jujube defects. The classification results are visualized by heatmap through the analysis of classification accuracy and confusion matrix compared with the comparison models. The experimental results show that the SE-ResNet50-CL model optimizes the fine-grained classification problem of jujube defect recognition, and the test accuracy reaches 94.15%. The model has good stability and high recognition accuracy in complex environments.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 8-9
Author(s):  
Zahra Karimi ◽  
Brian Sullivan ◽  
Mohsen Jafarikia

Abstract Previous studies have shown that the accuracy of Genomic Estimated Breeding Value (GEBV) as a predictor of future performance is higher than the traditional Estimated Breeding Value (EBV). The purpose of this study was to estimate the potential advantage of selection on GEBV for litter size (LS) compared to selection on EBV in the Canadian swine dam line breeds. The study included 236 Landrace and 210 Yorkshire gilts born in 2017 which had their first farrowing after 2017. GEBV and EBV for LS were calculated with data that was available at the end of 2017 (GEBV2017 and EBV2017, respectively). De-regressed EBV for LS in July 2019 (dEBV2019) was used as an adjusted phenotype. The average dEBV2019 for the top 40% of sows based on GEBV2017 was compared to the average dEBV2019 for the top 40% of sows based on EBV2017. The standard error of the estimated difference for each breed was estimated by comparing the average dEBV2019 for repeated random samples of two sets of 40% of the gilts. In comparison to the top 40% ranked based on EBV2017, ranking based on GEBV2017 resulted in an extra 0.45 (±0.29) and 0.37 (±0.25) piglets born per litter in Landrace and Yorkshire replacement gilts, respectively. The estimated Type I errors of the GEBV2017 gain over EBV2017 were 6% and 7% in Landrace and Yorkshire, respectively. Considering selection of both replacement boars and replacement gilts using GEBV instead of EBV can translate into increased annual genetic gain of 0.3 extra piglets per litter, which would more than double the rate of gain observed from typical EBV based selection. The permutation test for validation used in this study appears effective with relatively small data sets and could be applied to other traits, other species and other prediction methods.


Author(s):  
Jungeui Hong ◽  
Elizabeth A. Cudney ◽  
Genichi Taguchi ◽  
Rajesh Jugulum ◽  
Kioumars Paryani ◽  
...  

The Mahalanobis-Taguchi System is a diagnosis and predictive method for analyzing patterns in multivariate cases. The goal of this study is to compare the ability of the Mahalanobis-Taguchi System and a neural network to discriminate using small data sets. We examine the discriminant ability as a function of data set size using an application area where reliable data is publicly available. The study uses the Wisconsin Breast Cancer study with nine attributes and one class.


2005 ◽  
Vol 297-300 ◽  
pp. 875-880
Author(s):  
Cheol Ho Lim ◽  
Ki Tae Kim ◽  
Yong Hwan Kim ◽  
Dong Choul Cho ◽  
Young Sup Lee ◽  
...  

P-type Bi0.5Sb1.5Te3 compounds doped with 3wt% Te were fabricated by spark plasma sintering and their mechanical and thermoelectric properties were investigated. The sintered compounds with the bending strength of more than 50MPa and the figure-of-merit 2.9×10-3/K were obtained by controlling the mixing ratio of large powders (PL) and small powders (PS). Compared with the conventionally prepared single crystal thermoelectric materials, the bending strength was increased up to more than three times and the figure-of-merit Z was similar those of single crystals. It is expected that the mechanical properties could be improved by using hybrid powders without degradation of thermoelectric properties.


Sign in / Sign up

Export Citation Format

Share Document