Accelerating materials science with high-throughput computations and machine learning

The advent of machine learning (ML) techniques in solving problems related to materials science and chemical engineering is driving expectations to give faster predictions of material properties.

Download Full-text

Materials science in the artificial intelligence age: high-throughput library generation, machine learning, and a pathway from correlations to the underpinning physics

MRS Communications ◽

10.1557/mrc.2019.95 ◽

2019 ◽

Vol 9 (3) ◽

pp. 821-838 ◽

Cited By ~ 13

Author(s):

Rama K. Vasudevan ◽

Kamal Choudhary ◽

Apurva Mehta ◽

Ryan Smith ◽

Gilad Kusne ◽

...

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

High Throughput ◽

Materials Science ◽

Image Position

Abstract

Download Full-text

Evolution of Metastable Structures in Bimetallic Catalysts from Microscopy and Machine-Learning Molecular Dynamics

10.26434/chemrxiv.11811660.v1 ◽

2020 ◽

Author(s):

Jin Soo Lim ◽

Jonathan Vandermause ◽

Matthijs A. van Spronsen ◽

Albert Musaelian ◽

Christopher R. O’Connor ◽

...

Keyword(s):

Machine Learning ◽

Molecular Dynamics ◽

Large Scale ◽

Materials Science ◽

Complete Characterization ◽

Layer By Layer ◽

Surface Restructuring ◽

Metastable Structures ◽

Mechanistic Investigation ◽

Underlying Mechanisms

Restructuring of interface plays a crucial role in materials science and heterogeneous catalysis. Bimetallic systems, in particular, often adopt very different composition and morphology at surfaces compared to the bulk. For the first time, we reveal a detailed atomistic picture of the long-timescale restructuring of Pd deposited on Ag, using microscopy, spectroscopy, and novel simulation methods. Encapsulation of Pd by Ag always precedes layer-by-layer dissolution of Pd, resulting in significant Ag migration out of the surface and extensive vacancy pits. These metastable structures are of vital catalytic importance, as Ag-encapsulated Pd remains much more accessible to reactants than bulk-dissolved Pd. The underlying mechanisms are uncovered by performing fast and large-scale machine-learning molecular dynamics, followed by our newly developed method for complete characterization of atomic surface restructuring events. Our approach is broadly applicable to other multimetallic systems of interest and enables the previously impractical mechanistic investigation of restructuring dynamics.

Download Full-text

High Throughput Ultrasonic Multi-implant Readout Using a Machine-Learning Assisted CDMA Receiver

2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) ◽

10.1109/embc44109.2020.9176480 ◽

2020 ◽

Author(s):

Sina Faraji Alamouti ◽

Mohammad Meraj Ghanbari ◽

Nathan Tessema Ersumo ◽

Rikky Muller

Keyword(s):

Machine Learning ◽

High Throughput ◽

Cdma Receiver

Download Full-text

Accelerating organic solar cell material's discovery: high-throughput screening and big data

Energy & Environmental Science ◽

10.1039/d1ee00559f ◽

2021 ◽

Author(s):

Xabier Rodríguez-Martínez ◽

Enrique Pascual-San-José ◽

Mariano Campoy-Quiles

Keyword(s):

Machine Learning ◽

Big Data ◽

High Throughput ◽

Organic Solar Cells ◽

High Throughput Screening ◽

Organic Solar Cell ◽

State Of The Art ◽

Review Article ◽

Machine Learning Algorithms ◽

Device Optimization

This review article presents the state-of-the-art in high-throughput computational and experimental screening routines with application in organic solar cells, including materials discovery, device optimization and machine-learning algorithms.

Download Full-text

Catalyze Materials Science with Machine Learning

ACS Materials Letters ◽

10.1021/acsmaterialslett.1c00204 ◽

2021 ◽

pp. 1151-1171

Author(s):

Jaehyun Kim ◽

Donghoon Kang ◽

Sangbum Kim ◽

Ho Won Jang

Keyword(s):

Machine Learning ◽

Materials Science

Download Full-text

FRI0585 HIGH-THROUGHPUT METHODOLOGY FOR EMR-BASED IDENTIFICATION OF CLINICAL SUB-PHENOTYPES IN COMPLEX PATIENT POPULATIONS

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-eular.3489 ◽

2020 ◽

Vol 79 (Suppl 1) ◽

pp. 897.2-897

Author(s):

M. Maurits ◽

T. Huizinga ◽

M. Reinders ◽

S. Raychaudhuri ◽

E. Karlson ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Dimensionality Reduction ◽

High Throughput ◽

Brain Cancer ◽

Machine Learning Techniques ◽

Summary Statistics ◽

Medical Problems ◽

Learning Techniques ◽

Icd Codes

Background:Heterogeneity in disease populations complicates discovery of risk factors. To identify risk factors for subpopulations of diseases, we need analytical methods that can deal with unidentified disease subgroups.Objectives:Inspired by successful approaches from the Big Data field, we developed a high-throughput approach to identify subpopulations within patients with heterogeneous, complex diseases using the wealth of information available in Electronic Medical Records (EMRs).Methods:We extracted longitudinal healthcare-interaction records coded by 1,853 PheCodes[1] of the 64,819 patients from the Boston’s Partners-Biobank. Through dimensionality reduction using t-SNE[2] we created a 2D embedding of 32,424 of these patients (set A). We then identified distinct clusters post-t-SNE using DBscan[3] and visualized the relative importance of individual PheCodes within them using specialized spectrographs. We replicated this procedure in the remaining 32,395 records (set B).Results:Summary statistics of both sets were comparable (Table 1).Table 1.Summary statistics of the total Partners Biobank dataset and the 2 partitions.Set-Aset-BTotalEntries12,200,31112,177,13124,377,442Patients32,42432,39564,819Patientyears369,546.33368,597.92738,144.2unique ICD codes25,05624,95326,305unique Phecodes1,8511,8531,853We found 284 clusters in set A and 295 in set B, of which 63.4% from set A could be mapped to a cluster in set B with a median (range) correlation of 0.24 (0.03 – 0.58).Clusters represented similar yet distinct clinical phenotypes; e.g. patients diagnosed with “other headache syndrome” were separated into four distinct clusters characterized by migraines, neurofibromatosis, epilepsy or brain cancer, all resulting in patients presenting with headaches (Fig. 1 & 2). Though EMR databases tend to be noisy, our method was also able to differentiate misclassification from true cases; SLE patients with RA codes clustered separately from true RA cases.Figure 1.Two dimensional representation of Set A generated using dimensionality reduction (tSNE) and clustering (DBScan).Figure 2.Phenotype Spectrographs (PheSpecs) of four clusters characterized by “Other headache syndromes”, driven by codes relating to migraine, epilepsy, neurofibromatosis or brain cancer.Conclusion:We have shown that EMR data can be used to identify and visualize latent structure in patient categorizations, using an approach based on dimension reduction and clustering machine learning techniques. Our method can identify misclassified patients as well as separate patients with similar problems into subsets with different associated medical problems. Our approach adds a new and powerful tool to aid in the discovery of novel risk factors in complex, heterogeneous diseases.References:[1] Denny, J.C. et al. Bioinformatics (2010)[2]van der Maaten et al. Journal of Machine Learning Research (2008)[3] Ester, M. et al. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. (1996)Disclosure of Interests:Marc Maurits: None declared, Thomas Huizinga Grant/research support from: Ablynx, Bristol-Myers Squibb, Roche, Sanofi, Consultant of: Ablynx, Bristol-Myers Squibb, Roche, Sanofi, Marcel Reinders: None declared, Soumya Raychaudhuri: None declared, Elizabeth Karlson: None declared, Erik van den Akker: None declared, Rachel Knevel: None declared

Download Full-text

First-principles data integrated machine learning approach for high-throughput searching of ternary electrocatalyst toward oxygen reduction reaction

Chem Catalysis ◽

10.1016/j.checat.2021.06.001 ◽

2021 ◽

Author(s):

Hoje Chun ◽

Eunjik Lee ◽

Kyungju Nam ◽

Ji-Hoon Jang ◽

Woomin Kyoung ◽

...

Keyword(s):

Machine Learning ◽

Oxygen Reduction Reaction ◽

Oxygen Reduction ◽

High Throughput ◽

First Principles ◽

Reduction Reaction ◽

Learning Approach ◽

Machine Learning Approach

Download Full-text

Machine learning methodology for high throughput personalized neutron dose reconstruction in mixed neutron + photon exposures

Scientific Reports ◽

10.1038/s41598-021-83575-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Igor Shuryak ◽

Helen C. Turner ◽

Monica Pujol-Canadell ◽

Jay R. Perrier ◽

Guy Garty ◽

...

Keyword(s):

Machine Learning ◽

High Throughput ◽

Mean Squared Error ◽

Ex Vivo ◽

Probability Distributions ◽

Micronucleus Assay ◽

Neutron Dose ◽

Cell Probability ◽

Scanning Imaging ◽

Automated Scanning

AbstractWe implemented machine learning in the radiation biodosimetry field to quantitatively reconstruct neutron doses in mixed neutron + photon exposures, which are expected in improvised nuclear device detonations. Such individualized reconstructions are crucial for triage and treatment because neutrons are more biologically damaging than photons. We used a high-throughput micronucleus assay with automated scanning/imaging on lymphocytes from human blood ex-vivo irradiated with 44 different combinations of 0–4 Gy neutrons and 0–15 Gy photons (542 blood samples), which include reanalysis of past experiments. We developed several metrics that describe micronuclei/cell probability distributions in binucleated cells, and used them as predictors in random forest (RF) and XGboost machine learning analyses to reconstruct the neutron dose in each sample. The probability of “overfitting” was minimized by training both algorithms with repeated cross-validation on a randomly-selected subset of the data, and measuring performance on the rest. RF achieved the best performance. Mean R2 for actual vs. reconstructed neutron doses over 300 random training/testing splits was 0.869 (range 0.761 to 0.919) and root mean squared error was 0.239 (0.195 to 0.351) Gy. These results demonstrate the promising potential of machine learning to reconstruct the neutron dose component in clinically-relevant complex radiation exposure scenarios.

Download Full-text

Two-step machine learning enables optimized nanoparticle synthesis

npj Computational Materials ◽

10.1038/s41524-021-00520-w ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Flore Mekki-Berrada ◽

Zekun Ren ◽

Tan Huang ◽

Wai Kuan Wong ◽

Fang Zheng ◽

...

Keyword(s):

Machine Learning ◽

Optical Properties ◽

Materials Science ◽

Nanoparticle Synthesis ◽

Bayesian Optimization ◽

Absorbance Spectrum ◽

Nanomaterial Synthesis ◽

Target Spectrum ◽

Colour Palette ◽

Algorithmic Framework

AbstractIn materials science, the discovery of recipes that yield nanomaterials with defined optical properties is costly and time-consuming. In this study, we present a two-step framework for a machine learning-driven high-throughput microfluidic platform to rapidly produce silver nanoparticles with the desired absorbance spectrum. Combining a Gaussian process-based Bayesian optimization (BO) with a deep neural network (DNN), the algorithmic framework is able to converge towards the target spectrum after sampling 120 conditions. Once the dataset is large enough to train the DNN with sufficient accuracy in the region of the target spectrum, the DNN is used to predict the colour palette accessible with the reaction synthesis. While remaining interpretable by humans, the proposed framework efficiently optimizes the nanomaterial synthesis and can extract fundamental knowledge of the relationship between chemical composition and optical properties, such as the role of each reactant on the shape and amplitude of the absorbance spectrum.

Download Full-text