chemical dataset
Recently Published Documents


TOTAL DOCUMENTS

8
(FIVE YEARS 5)

H-INDEX

3
(FIVE YEARS 1)

2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Jules Leguy ◽  
Marta Glavatskikh ◽  
Thomas Cauchy ◽  
Benoit Da Mota

AbstractChemical diversity is one of the key term when dealing with machine learning and molecular generation. This is particularly true for quantum chemical datasets. The composition of which should be done meticulously since the calculation is highly time demanding. Previously we have seen that the most known quantum chemical dataset QM9 lacks chemical diversity. As a consequence, ML models trained on QM9 showed generalizability shortcomings. In this paper we would like to present (i) a fast and generic method to evaluate chemical diversity, (ii) a new quantum chemical dataset of 435k molecules, OD9, that includes QM9 and new molecules generated with a diversity objective, (iii) an analysis of the diversity impact on unconstrained and goal-directed molecular generation on the example of QED optimization. Our innovative approach makes it possible to individually estimate the impact of a solution to the diversity of a set, allowing for effective incremental evaluation. In the first application, we will see how the diversity constraint allows us to generate more than a million of molecules that would efficiently complete the reference datasets. The compounds were calculated with DFT thanks to a collaborative effort through the QuChemPedIA@home BOINC project. With regard to goal-directed molecular generation, getting a high QED score is not complicated, but adding a little diversity can cut the number of calls to the evaluation function by a factor of ten


2021 ◽  
Author(s):  
Jules Leguy ◽  
Marta Glavatskikh ◽  
Thomas Cauchy ◽  
Benoit Da Mota

Abstract Chemical diversity is one of the key term when dealing with machine learning and molecular generation. This is particularly true for quantum chemical datasets. The composition of which should be done meticulously since the calculation is highly time demanding. Previously we have seen that the most known quantum chemical dataset QM9 lacks chemical diversity. As a consequence, ML models trained on QM9 showed generalizability shortcomings. In this paper we would like to present (i) a fast and generic method to evaluate chemical diversity, (ii) a new quantum chemical dataset of 435k molecules, OD9, that includes QM9 and new molecules generated with a diversity objective, (iii) an analysis of the diversity impact on unconstrained and goal-directed molecular generation on the example of QED optimization. Our innovative approach makes it possible to individually estimate the impact of a solution to the diversity of a set, allowing for effective incremental evaluation. In the first application, we will see how the diversity constraint allows us to generate more than a million of molecules that would efficiently complete the reference datasets. The compounds were calculated with DFT thanks to a collaborative effort through the QuChemPedIA@home BOINC project. With regard to goal-directed molecular generation, getting a high QED score is not complicated, but adding a little diversity can cut the number of calls to the evaluation function by a factor of ten.


Data in Brief ◽  
2021 ◽  
pp. 107150
Author(s):  
Viet Tran-Khac ◽  
Pascal Perney ◽  
Laura Crépin ◽  
Philippe Quetin ◽  
Isabelle Domaizon ◽  
...  

Data in Brief ◽  
2020 ◽  
Vol 31 ◽  
pp. 106015
Author(s):  
Michele Mattioli ◽  
Michele Lustrino ◽  
Sara Ronca ◽  
Gianluca Bianchini

2019 ◽  
Vol 198 ◽  
pp. 387-397 ◽  
Author(s):  
Xing Peng ◽  
Xiaoxi Liu ◽  
Xurong Shi ◽  
Guoliang Shi ◽  
Mei Li ◽  
...  

2008 ◽  
Vol 50 (3) ◽  
pp. 208-208
Author(s):  
Petra S Kern ◽  
GY Patlewicz ◽  
RJ Dearman ◽  
CA Ryan ◽  
I Kimber ◽  
...  

2004 ◽  
Vol 50 (5) ◽  
pp. 274-288 ◽  
Author(s):  
G. Frank Gerberick ◽  
Cindy A. Ryan ◽  
Petra S. Kern ◽  
Rebecca J. Dearman ◽  
Ian Kimber ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document