Evolving the Materials Genome: How Machine Learning Is Fueling the Next Generation of Materials Discovery

2020 ◽  
Vol 50 (1) ◽  
pp. 1-25 ◽  
Author(s):  
Changwon Suh ◽  
Clyde Fare ◽  
James A. Warren ◽  
Edward O. Pyzer-Knapp

Machine learning, applied to chemical and materials data, is transforming the field of materials discovery and design, yet significant work is still required to fully take advantage of machine learning algorithms, tools, and methods. Here, we review the accomplishments to date of the community and assess the maturity of state-of-the-art, data-intensive research activities that combine perspectives from materials science and chemistry. We focus on three major themes—learning to see, learning to estimate, and learning to search materials—to show how advanced computational learning technologies are rapidly and successfully used to solve materials and chemistry problems. Additionally, we discuss a clear path toward a future where data-driven approaches to materials discovery and design are standard practice.

2021 ◽  
Vol 186 (Supplement_1) ◽  
pp. 445-451
Author(s):  
Yifei Sun ◽  
Navid Rashedi ◽  
Vikrant Vaze ◽  
Parikshit Shah ◽  
Ryan Halter ◽  
...  

ABSTRACT Introduction Early prediction of the acute hypotensive episode (AHE) in critically ill patients has the potential to improve outcomes. In this study, we apply different machine learning algorithms to the MIMIC III Physionet dataset, containing more than 60,000 real-world intensive care unit records, to test commonly used machine learning technologies and compare their performances. Materials and Methods Five classification methods including K-nearest neighbor, logistic regression, support vector machine, random forest, and a deep learning method called long short-term memory are applied to predict an AHE 30 minutes in advance. An analysis comparing model performance when including versus excluding invasive features was conducted. To further study the pattern of the underlying mean arterial pressure (MAP), we apply a regression method to predict the continuous MAP values using linear regression over the next 60 minutes. Results Support vector machine yields the best performance in terms of recall (84%). Including the invasive features in the classification improves the performance significantly with both recall and precision increasing by more than 20 percentage points. We were able to predict the MAP with a root mean square error (a frequently used measure of the differences between the predicted values and the observed values) of 10 mmHg 60 minutes in the future. After converting continuous MAP predictions into AHE binary predictions, we achieve a 91% recall and 68% precision. In addition to predicting AHE, the MAP predictions provide clinically useful information regarding the timing and severity of the AHE occurrence. Conclusion We were able to predict AHE with precision and recall above 80% 30 minutes in advance with the large real-world dataset. The prediction of regression model can provide a more fine-grained, interpretable signal to practitioners. Model performance is improved by the inclusion of invasive features in predicting AHE, when compared to predicting the AHE based on only the available, restricted set of noninvasive technologies. This demonstrates the importance of exploring more noninvasive technologies for AHE prediction.


Author(s):  
Xabier Rodríguez-Martínez ◽  
Enrique Pascual-San-José ◽  
Mariano Campoy-Quiles

This review article presents the state-of-the-art in high-throughput computational and experimental screening routines with application in organic solar cells, including materials discovery, device optimization and machine-learning algorithms.


2021 ◽  
pp. medethics-2020-107095
Author(s):  
Charalampia (Xaroula) Kerasidou ◽  
Angeliki Kerasidou ◽  
Monika Buscher ◽  
Stephen Wilkinson

Artificial intelligence (AI) is changing healthcare and the practice of medicine as data-driven science and machine-learning technologies, in particular, are contributing to a variety of medical and clinical tasks. Such advancements have also raised many questions, especially about public trust. As a response to these concerns there has been a concentrated effort from public bodies, policy-makers and technology companies leading the way in AI to address what is identified as a "public trust deficit". This paper argues that a focus on trust as the basis upon which a relationship between this new technology and the public is built is, at best, ineffective, at worst, inappropriate or even dangerous, as it diverts attention from what is actually needed to actively warrant trust. Instead of agonising about how to facilitate trust, a type of relationship which can leave those trusting vulnerable and exposed, we argue that efforts should be focused on the difficult and dynamic process of ensuring reliance underwritten by strong legal and regulatory frameworks. From there, trust could emerge but not merely as a means to an end. Instead, as something to work in practice towards; that is, the deserved result of an ongoing ethical relationship where there is the appropriate, enforceable and reliable regulatory infrastructure in place for problems, challenges and power asymmetries to be continuously accounted for and appropriately redressed.


2021 ◽  
Vol 19 (2) ◽  
pp. 2056-2094
Author(s):  
Koji Oshima ◽  
◽  
Daisuke Yamamoto ◽  
Atsuhiro Yumoto ◽  
Song-Ju Kim ◽  
...  

<abstract><p>Data-driven and feedback cycle-based approaches are necessary to optimize the performance of modern complex wireless communication systems. Machine learning technologies can provide solutions for these requirements. This study shows a comprehensive framework of optimizing wireless communication systems and proposes two optimal decision schemes that have not been well-investigated in existing research. The first one is supervised learning modeling and optimal decision making by optimization, and the second is a simple and implementable reinforcement learning algorithm. The proposed schemes were verified through real-world experiments and computer simulations, which revealed the necessity and validity of this research.</p></abstract>


2020 ◽  
Author(s):  
Zhengjing Ma ◽  
Gang Mei

Landslides are one of the most critical categories of natural disasters worldwide and induce severely destructive outcomes to human life and the overall economic system. To reduce its negative effects, landslides prevention has become an urgent task, which includes investigating landslide-related information and predicting potential landslides. Machine learning is a state-of-the-art analytics tool that has been widely used in landslides prevention. This paper presents a comprehensive survey of relevant research on machine learning applied in landslides prevention, mainly focusing on (1) landslides detection based on images, (2) landslides susceptibility assessment, and (3) the development of landslide warning systems. Moreover, this paper discusses the current challenges and potential opportunities in the application of machine learning algorithms for landslides prevention.


2021 ◽  
Vol 42 (12) ◽  
pp. 124101
Author(s):  
Thomas Hirtz ◽  
Steyn Huurman ◽  
He Tian ◽  
Yi Yang ◽  
Tian-Ling Ren

Abstract In a world where data is increasingly important for making breakthroughs, microelectronics is a field where data is sparse and hard to acquire. Only a few entities have the infrastructure that is required to automate the fabrication and testing of semiconductor devices. This infrastructure is crucial for generating sufficient data for the use of new information technologies. This situation generates a cleavage between most of the researchers and the industry. To address this issue, this paper will introduce a widely applicable approach for creating custom datasets using simulation tools and parallel computing. The multi-I–V curves that we obtained were processed simultaneously using convolutional neural networks, which gave us the ability to predict a full set of device characteristics with a single inference. We prove the potential of this approach through two concrete examples of useful deep learning models that were trained using the generated data. We believe that this work can act as a bridge between the state-of-the-art of data-driven methods and more classical semiconductor research, such as device engineering, yield engineering or process monitoring. Moreover, this research gives the opportunity to anybody to start experimenting with deep neural networks and machine learning in the field of microelectronics, without the need for expensive experimentation infrastructure.


2021 ◽  
pp. 1-18
Author(s):  
Seyed Reza Shahamiri ◽  
Fadi Thabtah ◽  
Neda Abdelhamid

BACKGROUND: Autistic Spectrum Disorder (ASD) is a neurodevelopment condition that is normally linked with substantial healthcare costs. Typical ASD screening techniques are time consuming, so the early detection of ASD could reduce such costs and help limit the development of the condition. OBJECTIVE: We propose an automated approach to detect autistic traits that replaces the scoring function used in current ASD screening with a more intelligent and less subjective approach. METHODS: The proposed approach employs deep neural networks (DNNs) to detect hidden patterns from previously labelled cases and controls, then applies the knowledge derived to classify the individual being screened. Specificity, sensitivity, and accuracy of the proposed approach are evaluated using ten-fold cross-validation. A comparative analysis has also been conducted to compare the DNNs’ performance with other prominent machine learning algorithms. RESULTS: Results indicate that deep learning technologies can be embedded within existing ASD screening to assist the stakeholders in the early identification of ASD traits. CONCLUSION: The proposed system will facilitate access to needed support for the social, physical, and educational well-being of the patient and family by making ASD screening more intelligent and accurate.


Blood ◽  
2020 ◽  
Vol 136 (Supplement 1) ◽  
pp. 33-34 ◽  
Author(s):  
Yazan Rouphail ◽  
Nathan Radakovich ◽  
Jacob Shreve ◽  
Sudipto Mukherjee ◽  
Babal K. Jha ◽  
...  

Background Multi-omic analysis can identify unique signatures that correlate with cancer subtypes. While clinically meaningful molecular subtypes of AML have been defined based on the status of single genes such as NPM1 and FLT3, such categories remain heterogeneous and further work is needed to characterize their genetic and transcriptomic diversity on a truly individualized basis. Further, patients (pts) with NPM1+/FLT3-ITD- AML have a better overall survival compared to patients with NPM1-/FLT3-ITD+, suggesting that these pts could have different transcriptomic signature that impact phenotype, pathophysiology, and outcomes. Many current transcriptome analytic techniques use clustering analysis to aggregate samples and look at relationships on a cohort-wide basis to build transcriptomic signatures that correlate with phenotype or outcome. Such approaches can undermine the heterogeneity of the gene expression in pts with the same signatures. In this study, we took advantage of state of the art machine learning algorithms to identify unique transcriptomic signatures that correlate with AML genomic phenotype. Methods Genomic (whole exome sequencing and targeted deep sequencing) and transcriptomic data from 451 AML pts included in the Beat AML study (publicly available data) were used to build transcriptomic signatures that are specific for AML patients with NPM1+/FLT3-ITD+ compared to NPM1+/FLT3-ITD, and NPM1-/FLT3-ITD-. We chose these AML phenotypes as they have been described extensively and they correlate with clinical outcomes. Results A total of 242 patients (54%) had NPM1-/FLT3-, 35 (8%) were NPM1+/FLT3-, and 47 (10%) were NPM1+/FLT3+. Our algorithm identified 20 genes that are highly specific for NPM1/FLT3ITD phenotype: HOXB-AS3, SCRN1, LMX1B, PCBD1, DNAJC15, HOXA3, NPTXq, RP11-1055B8, ABDH128, HOXB8, SOCS2, HOXB3, HOXB9, MIR503HG, FAM221B, NRP1, NDUFAF3, MEG3, CCDC136, and HIST1H2BC. Interestingly, several of those genes were overexpressed or underexpressed in specific phenotypes. For example, SCRN1, LMX1B, RP11-1055B8, ABDH128, HOXB8, MIR503HG, NRP1 are only overexpressed or underexpressed in patients with NPM1-/FLT3-, while PCBD1, NDUFAF3, FAM221B are overexpressed or underexpressed in pts with NPM1+/FLT3+. These genes affect several important pathways that regulate cell differentiation, proliferation, mitochondrial oxidative phosphorylation, histone modification and lipid metabolism. All these genes had previously been reported as having altered expression in genomic studies of AML, confirming our approach's ability to identify biologically meaningful relationships. Further, our algorithm can provide a personalized explanation of overexpressed and underexpressed genes specific for a given patient, thus identifying targetable pathways for each pt. Figure 1 below shows three pts with the same genotype (NPM1+/FLT3-ITD+) but demonstrate different transcriptomic patterns of overexpression or underexpression that affect different biological pathways. Conclusions We describe the use of a state of the art explainable machine learning approach to define transcriptomic signatures that are specific for individual pts. In addition to correctly distinguishing AML subtype based on specific transcriptomic signatures, our model was able to accurately identify upregulated and downregulated genes that affecte several important biological pathways in AML and can summarize these pathways at an individual level. Such an approach can be used to provide personalized treatment options that can target the activated pathways at an individual level. Disclosures Mukherjee: Partnership for Health Analytic Research, LLC (PHAR, LLC): Honoraria; Novartis: Consultancy, Membership on an entity's Board of Directors or advisory committees, Research Funding; EUSA Pharma: Consultancy; Celgene/Acceleron: Membership on an entity's Board of Directors or advisory committees; Bristol Myers Squib: Honoraria; Aplastic Anemia and MDS International Foundation: Honoraria; Celgene: Consultancy, Honoraria, Research Funding. Maciejewski:Alexion, BMS: Speakers Bureau; Novartis, Roche: Consultancy, Honoraria. Sekeres:BMS: Consultancy; Takeda/Millenium: Consultancy; Pfizer: Consultancy. Nazha:Jazz: Research Funding; Incyte: Speakers Bureau; Novartis: Speakers Bureau; MEI: Other: Data monitoring Committee.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
David F. Nettleton ◽  
Dimitrios Katsantonis ◽  
Argyris Kalaitzidis ◽  
Natasa Sarafijanovic-Djukic ◽  
Pau Puigdollers ◽  
...  

Abstract Background In this study, we compared four models for predicting rice blast disease, two operational process-based models (Yoshino and Water Accounting Rice Model (WARM)) and two approaches based on machine learning algorithms (M5Rules and Recurrent Neural Networks (RNN)), the former inducing a rule-based model and the latter building a neural network. In situ telemetry is important to obtain quality in-field data for predictive models and this was a key aspect of the RICE-GUARD project on which this study is based. According to the authors, this is the first time process-based and machine learning modelling approaches for supporting plant disease management are compared. Results Results clearly showed that the models succeeded in providing a warning of rice blast onset and presence, thus representing suitable solutions for preventive remedial actions targeting the mitigation of yield losses and the reduction of fungicide use. All methods gave significant “signals” during the “early warning” period, with a similar level of performance. M5Rules and WARM gave the maximum average normalized scores of 0.80 and 0.77, respectively, whereas Yoshino gave the best score for one site (Kalochori 2015). The best average values of r and r2 and %MAE (Mean Absolute Error) for the machine learning models were 0.70, 0.50 and 0.75, respectively and for the process-based models the corresponding values were 0.59, 0.40 and 0.82. Thus it has been found that the ML models are competitive with the process-based models. This result has relevant implications for the operational use of the models, since most of the available studies are limited to the analysis of the relationship between the model outputs and the incidence of rice blast. Results also showed that machine learning methods approximated the performances of two process-based models used for years in operational contexts. Conclusions Process-based and data-driven models can be used to provide early warnings to anticipate rice blast and detect its presence, thus supporting fungicide applications. Data-driven models derived from machine learning methods are a viable alternative to process-based approaches and – in cases when training datasets are available – offer a potentially greater adaptability to new contexts.


Sign in / Sign up

Export Citation Format

Share Document