Communication efficient distributed learning of neural networks in Big Data environments using Spark

Author(s):  
Fouad Alkhoury ◽  
Dennis Wegener ◽  
Karl-Heinz Sylla ◽  
Michael Mock
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Dipendra Jha ◽  
Vishu Gupta ◽  
Logan Ward ◽  
Zijiang Yang ◽  
Christopher Wolverton ◽  
...  

AbstractThe application of machine learning (ML) techniques in materials science has attracted significant attention in recent years, due to their impressive ability to efficiently extract data-driven linkages from various input materials representations to their output properties. While the application of traditional ML techniques has become quite ubiquitous, there have been limited applications of more advanced deep learning (DL) techniques, primarily because big materials datasets are relatively rare. Given the demonstrated potential and advantages of DL and the increasing availability of big materials datasets, it is attractive to go for deeper neural networks in a bid to boost model performance, but in reality, it leads to performance degradation due to the vanishing gradient problem. In this paper, we address the question of how to enable deeper learning for cases where big materials data is available. Here, we present a general deep learning framework based on Individual Residual learning (IRNet) composed of very deep neural networks that can work with any vector-based materials representation as input to build accurate property prediction models. We find that the proposed IRNet models can not only successfully alleviate the vanishing gradient problem and enable deeper learning, but also lead to significantly (up to 47%) better model accuracy as compared to plain deep neural networks and traditional ML techniques for a given input materials representation in the presence of big data.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 70535-70551 ◽  
Author(s):  
Haruna Chiroma ◽  
Usman Ali Abdullahi ◽  
Shafi'i Muhammad Abdulhamid ◽  
Ala Abdulsalam Alarood ◽  
Lubna A. Gabralla ◽  
...  

Author(s):  
Vishal Babu Siramshetty ◽  
Dac-Trung Nguyen ◽  
Natalia J. Martinez ◽  
Anton Simeonov ◽  
Noel T. Southall ◽  
...  

The rise of novel artificial intelligence methods necessitates a comparison of this wave of new approaches with classical machine learning for a typical drug discovery project. Inhibition of the potassium ion channel, whose alpha subunit is encoded by human Ether-à-go-go-Related Gene (hERG), leads to prolonged QT interval of the cardiac action potential and is a significant safety pharmacology target for the development of new medicines. Several computational approaches have been employed to develop prediction models for assessment of hERG liabilities of small molecules including recent work using deep learning methods. Here we perform a comprehensive comparison of prediction models based on classical (random forests and gradient boosting) and modern (deep neural networks and recurrent neural networks) artificial intelligence methods. The training set (~9000 compounds) was compiled by integrating hERG bioactivity data from ChEMBL database with experimental data generated from an in-house, high-throughput thallium flux assay. We utilized different molecular descriptors including the latent descriptors, which are real-valued continuous vectors derived from chemical autoencoders trained on a large chemical space (> 1.5 million compounds). The models were prospectively validated on ~840 in-house compounds screened in the same thallium flux assay. The deep neural networks performed significantly better than the classical methods with the latent descriptors. The recurrent neural networks that operate on SMILES provided highest model sensitivity. The best models were merged into a consensus model that offered superior performance compared to reference models from academic and commercial domains. Further, we shed light on the potential of artificial intelligence methods to exploit the chemistry big data and generate novel chemical representations useful in predictive modeling and tailoring new chemical space.<br>


2021 ◽  
Author(s):  
Athanasios Lyras ◽  
Sotiria Vernikou ◽  
Andreas Kanavos ◽  
Spyros Sioutas ◽  
Phivos Mylonas

Author(s):  
Trevor J. Bihl ◽  
William A. Young II ◽  
Gary R. Weckman

Despite the natural advantage humans have for recognizing and interpreting patterns, large and complex datasets, as in Big Data, preclude efficient human analysis. Artificial neural networks (ANNs) provide a family of pattern recognition approaches for prediction, clustering and classification applicable to KDD with ANN model complexity ranging from simple (for small problems) highly complex (for large issues). To provide a starting point for readers, this chapter first describes foundational concepts that relate to ANNs. A listing of commonly used ANN methods, heuristics, and criteria for initializing ANNs is then discussed. Common pre- and post- data processing methods for dimensionality reduction and data quality issues are then described. The authors then provide a tutorial example of ANN analysis. Finally, the authors list and describe applications of ANNs to specific business related endeavors for further reading.


Sign in / Sign up

Export Citation Format

Share Document