Data analytics accelerates the experimental discovery of new thermoelectric materials with extremely high figure of merit

Graph-based Genetic Algorithm and Generative Model/Monte Carlo Tree Search for the Exploration of Chemical Space

10.26434/chemrxiv.7240751.v1 ◽

2018 ◽

Author(s):

Jan H. Jensen

Keyword(s):

Genetic Algorithm ◽

Monte Carlo ◽

Chemical Space ◽

Generative Models ◽

Generative Model ◽

Small Data ◽

Data Sets ◽

Tree Search ◽

Monte Carlo Tree Search ◽

Small Data Sets

This paper presents a comparison of a graph-based genetic algorithm (GB-GA) and machine learning (ML) results for the optimisation of logP values with a constraint for synthetic accessibility and shows that GA is as good or better than the ML approaches for this particular property. The molecules found by GB-GA bear little resemblance to the molecules used to construct the initial mating pool, indicating that the GB-GA approach can traverse a relatively large distance in chemical space using relatively few (50) generations. The paper also introduces a new non-ML graph-based generative model (GB-GM) that can be parameterized using very small data sets and combined with a Monte Carlo tree search (MCTS) algorithm. The results are comparable to previously published results (Sci. Technol. Adv. Mater. 2017, 18, 972-976) using a recurrent neural network (RNN) generative model, while the GB-GM-based method is orders of magnitude faster. The MCTS results seem more dependent on the composition of the training set than the GA approach for this particular property. Our results suggest that the performance of new ML-based generative models should be compared to more traditional, and often simpler, approaches such a GA.

Download Full-text

Graph-based Genetic Algorithm and Generative Model/Monte Carlo Tree Search for the Exploration of Chemical Space

10.26434/chemrxiv.7240751 ◽

2019 ◽

Author(s):

Jan H. Jensen

Keyword(s):

Genetic Algorithm ◽

Monte Carlo ◽

Chemical Space ◽

Generative Models ◽

Generative Model ◽

Small Data ◽

Data Sets ◽

Tree Search ◽

Monte Carlo Tree Search ◽

Small Data Sets

This paper presents a comparison of a graph-based genetic algorithm (GB-GA) and machine learning (ML) results for the optimisation of logP values with a constraint for synthetic accessibility and shows that GA is as good or better than the ML approaches for this particular property. The molecules found by GB-GA bear little resemblance to the molecules used to construct the initial mating pool, indicating that the GB-GA approach can traverse a relatively large distance in chemical space using relatively few (50) generations. The paper also introduces a new non-ML graph-based generative model (GB-GM) that can be parameterized using very small data sets and combined with a Monte Carlo tree search (MCTS) algorithm. The results are comparable to previously published results (Sci. Technol. Adv. Mater. 2017, 18, 972-976) using a recurrent neural network (RNN) generative model, while the GB-GM-based method is orders of magnitude faster. The MCTS results seem more dependent on the composition of the training set than the GA approach for this particular property. Our results suggest that the performance of new ML-based generative models should be compared to more traditional, and often simpler, approaches such a GA.

Download Full-text

Graph-based Genetic Algorithm and Generative Model/Monte Carlo Tree Search for the Exploration of Chemical Space

10.26434/chemrxiv.7240751.v2 ◽

2019 ◽

Author(s):

Jan H. Jensen

Keyword(s):

Genetic Algorithm ◽

Monte Carlo ◽

Chemical Space ◽

Generative Models ◽

Generative Model ◽

Small Data ◽

Data Sets ◽

Tree Search ◽

Monte Carlo Tree Search ◽

Small Data Sets

This paper presents a comparison of a graph-based genetic algorithm (GB-GA) and machine learning (ML) results for the optimisation of logP values with a constraint for synthetic accessibility and shows that GA is as good or better than the ML approaches for this particular property. The molecules found by GB-GA bear little resemblance to the molecules used to construct the initial mating pool, indicating that the GB-GA approach can traverse a relatively large distance in chemical space using relatively few (50) generations. The paper also introduces a new non-ML graph-based generative model (GB-GM) that can be parameterized using very small data sets and combined with a Monte Carlo tree search (MCTS) algorithm. The results are comparable to previously published results (Sci. Technol. Adv. Mater. 2017, 18, 972-976) using a recurrent neural network (RNN) generative model, while the GB-GM-based method is orders of magnitude faster. The MCTS results seem more dependent on the composition of the training set than the GA approach for this particular property. Our results suggest that the performance of new ML-based generative models should be compared to more traditional, and often simpler, approaches such a GA.

Download Full-text

HPC-REDItools: a Novel HPC-aware Tool for Improved Large Scale RNA-editing Analysis

10.1101/2020.04.30.069732 ◽

2020 ◽

Cited By ~ 2

Author(s):

Tiziano Flati ◽

Silvia Gioiosa ◽

Nicola Spallanzani ◽

Ilario Tagliaferri ◽

Maria Angela Diroma ◽

...

Keyword(s):

Rna Editing ◽

Large Scale ◽

Big Data Analysis ◽

Small Data ◽

Data Sets ◽

Rna Seq ◽

Large Dataset ◽

Rna Sequences ◽

Genome Wide ◽

Small Data Sets

AbstractBackgroundRNA editing is a widespread co-/post-transcriptional mechanism that alters primary RNA sequences through the modification of specific nucleotides and it can increase both the transcriptome and proteome diversity. The automatic detection of RNA-editing from RNA-seq data is computational intensive and limited to small data sets, thus preventing a reliable genome-wide characterisation of such process.ResultsIn this work we introduce HPC-REDItools, an upgraded tool for accurate RNA-editing events discovery from large dataset repositories. Availability: https://github.com/BioinfoUNIBA/REDItools2.ConclusionsHPC-REDItools is dramatically faster than the previous version, REDItools, enabling big-data analysis by means of a MPI-based implementation and scaling almost linearly with the number of available cores.

Download Full-text

A Robust Approach to Variation in Carpathian Rusyn: Resampling-Based Methods for Small Data Sets

Journal of Linguistics/Jazykovedný casopis ◽

10.2478/jazcas-2021-0055 ◽

2021 ◽

Vol 72 (2) ◽

pp. 603-617

Author(s):

Moulay Zaidan Lahjouji-Seppälä ◽

Achim Rabus

Keyword(s):

Large Scale ◽

Small Data ◽

Data Sets ◽

Data Set ◽

Large Scale Data ◽

Dependent Variables ◽

Small Data Sets ◽

The Stability ◽

Sociolinguistic Status ◽

Scale Data

Abstract Quantitative, corpus based research on spontaneous spoken Carpathian Rusyn language can cause several data-related problems: Speakers are using ambivalent forms in different quantities, resulting in a biased data set – while a stricter data-cleaning process would lead to a large scale data loss. On top of that, polytomous categorical dependent variables are hard to analyze due to methodological limitations. This paper provides several approaches to face unbalanced and biased data sets containing variation of conjugational forms of the verb maty ‘to have’ and (po-)znaty ‘to know’ in Carpathian Rusyn language. Using resampling based methods like Cross-Validation, Bootstrapping and Random Forests, we provide a strategy for circumventing possible methodological pitfalls and gaining the most information from our precious data, without trying to p-hack the results. Calculating the predictive power of several sociolinguistic factors on linguistic variation, we can make valid statements about the (sociolinguistic) status of Rusyn and the stability of the old dialect continuum of Rusyn varieties.

Download Full-text

Classification of jujube defects in small data sets based on transfer learning

Neural Computing and Applications ◽

10.1007/s00521-021-05715-2 ◽

2021 ◽

Author(s):

Jianping Ju ◽

Hong Zheng ◽

Xiaohang Xu ◽

Zhongyuan Guo ◽

Zhaohui Zheng ◽

...

Keyword(s):

Transfer Learning ◽

Loss Function ◽

Training Model ◽

Parameter Distribution ◽

Test Accuracy ◽

Small Data ◽

Data Sets ◽

Data Set ◽

Small Data Sets

AbstractAlthough convolutional neural networks have achieved success in the field of image classification, there are still challenges in the field of agricultural product quality sorting such as machine vision-based jujube defects detection. The performance of jujube defect detection mainly depends on the feature extraction and the classifier used. Due to the diversity of the jujube materials and the variability of the testing environment, the traditional method of manually extracting the features often fails to meet the requirements of practical application. In this paper, a jujube sorting model in small data sets based on convolutional neural network and transfer learning is proposed to meet the actual demand of jujube defects detection. Firstly, the original images collected from the actual jujube sorting production line were pre-processed, and the data were augmented to establish a data set of five categories of jujube defects. The original CNN model is then improved by embedding the SE module and using the triplet loss function and the center loss function to replace the softmax loss function. Finally, the depth pre-training model on the ImageNet image data set was used to conduct training on the jujube defects data set, so that the parameters of the pre-training model could fit the parameter distribution of the jujube defects image, and the parameter distribution was transferred to the jujube defects data set to complete the transfer of the model and realize the detection and classification of the jujube defects. The classification results are visualized by heatmap through the analysis of classification accuracy and confusion matrix compared with the comparison models. The experimental results show that the SE-ResNet50-CL model optimizes the fine-grained classification problem of jujube defect recognition, and the test accuracy reaches 94.15%. The model has good stability and high recognition accuracy in complex environments.

Download Full-text

45 A permutation test for validation of genomic estimated breeding values

Journal of Animal Science ◽

10.1093/jas/skaa278.016 ◽

2020 ◽

Vol 98 (Supplement_4) ◽

pp. 8-9

Author(s):

Zahra Karimi ◽

Brian Sullivan ◽

Mohsen Jafarikia

Keyword(s):

Permutation Test ◽

Breeding Value ◽

Small Data ◽

Data Sets ◽

Type I ◽

Future Performance ◽

Small Data Sets ◽

Estimated Breeding Value ◽

Estimated Breeding Values ◽

Top 40

Abstract Previous studies have shown that the accuracy of Genomic Estimated Breeding Value (GEBV) as a predictor of future performance is higher than the traditional Estimated Breeding Value (EBV). The purpose of this study was to estimate the potential advantage of selection on GEBV for litter size (LS) compared to selection on EBV in the Canadian swine dam line breeds. The study included 236 Landrace and 210 Yorkshire gilts born in 2017 which had their first farrowing after 2017. GEBV and EBV for LS were calculated with data that was available at the end of 2017 (GEBV2017 and EBV2017, respectively). De-regressed EBV for LS in July 2019 (dEBV2019) was used as an adjusted phenotype. The average dEBV2019 for the top 40% of sows based on GEBV2017 was compared to the average dEBV2019 for the top 40% of sows based on EBV2017. The standard error of the estimated difference for each breed was estimated by comparing the average dEBV2019 for repeated random samples of two sets of 40% of the gilts. In comparison to the top 40% ranked based on EBV2017, ranking based on GEBV2017 resulted in an extra 0.45 (±0.29) and 0.37 (±0.25) piglets born per litter in Landrace and Yorkshire replacement gilts, respectively. The estimated Type I errors of the GEBV2017 gain over EBV2017 were 6% and 7% in Landrace and Yorkshire, respectively. Considering selection of both replacement boars and replacement gilts using GEBV instead of EBV can translate into increased annual genetic gain of 0.3 extra piglets per litter, which would more than double the rate of gain observed from typical EBV based selection. The permutation test for validation used in this study appears effective with relatively small data sets and could be applied to other traits, other species and other prediction methods.

Download Full-text

A Comparison Study of Mahalanobis-Taguchi System and Neural Network for Multivariate Pattern Recognition

Design Engineering, Parts A and B ◽

10.1115/imece2005-80029 ◽

2005 ◽

Cited By ~ 10

Author(s):

Jungeui Hong ◽

Elizabeth A. Cudney ◽

Genichi Taguchi ◽

Rajesh Jugulum ◽

Kioumars Paryani ◽

...

Keyword(s):

Neural Network ◽

Small Data ◽

Data Sets ◽

Comparison Study ◽

Data Set ◽

Set Size ◽

Breast Cancer Study ◽

Discriminant Ability ◽

Small Data Sets ◽

Multivariate Pattern

The Mahalanobis-Taguchi System is a diagnosis and predictive method for analyzing patterns in multivariate cases. The goal of this study is to compare the ability of the Mahalanobis-Taguchi System and a neural network to discriminate using small data sets. We examine the discriminant ability as a function of data set size using an application area where reliable data is publicly available. The study uses the Wisconsin Breast Cancer study with nine attributes and one class.

Download Full-text

Ensemble CNN in Transform Domains for Image Super-resolution from Small Data Sets

2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA) ◽

10.1109/icmla51294.2020.00068 ◽

2020 ◽

Author(s):

Yingnan Liu ◽

Randy Clinton Paffenroth

Keyword(s):

Super Resolution ◽

Small Data ◽

Data Sets ◽

Small Data Sets ◽

Image Super Resolution ◽

Transform Domains

Download Full-text

Fabrication of Bi-Te Based Thermoelectric Semiconductors by Using Hybrid Powders

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.297-300.875 ◽

2005 ◽

Vol 297-300 ◽

pp. 875-880

Author(s):

Cheol Ho Lim ◽

Ki Tae Kim ◽

Yong Hwan Kim ◽

Dong Choul Cho ◽

Young Sup Lee ◽

...

Keyword(s):

Mechanical Properties ◽

Single Crystals ◽

Thermoelectric Properties ◽

Thermoelectric Materials ◽

Figure Of Merit ◽

Bending Strength ◽

Mixing Ratio ◽

Plasma Sintering ◽

Spark Plasma ◽

P Type

P-type Bi0.5Sb1.5Te3 compounds doped with 3wt% Te were fabricated by spark plasma sintering and their mechanical and thermoelectric properties were investigated. The sintered compounds with the bending strength of more than 50MPa and the figure-of-merit 2.9×10-3/K were obtained by controlling the mixing ratio of large powders (PL) and small powders (PS). Compared with the conventionally prepared single crystal thermoelectric materials, the bending strength was increased up to more than three times and the figure-of-merit Z was similar those of single crystals. It is expected that the mechanical properties could be improved by using hybrid powders without degradation of thermoelectric properties.

Download Full-text