Element selection for crystalline inorganic solid discovery guided by unsupervised machine learning of experimentally explored chemistry

AbstractThe selection of the elements to combine delimits the possible outcomes of synthetic chemistry because it determines the range of compositions and structures, and thus properties, that can arise. For example, in the solid state, the elemental components of a phase field will determine the likelihood of finding a new crystalline material. Researchers make these choices based on their understanding of chemical structure and bonding. Extensive data are available on those element combinations that produce synthetically isolable materials, but it is difficult to assimilate the scale of this information to guide selection from the diversity of potential new chemistries. Here, we show that unsupervised machine learning captures the complex patterns of similarity between element combinations that afford reported crystalline inorganic materials. This model guides prioritisation of quaternary phase fields containing two anions for synthetic exploration to identify lithium solid electrolytes in a collaborative workflow that leads to the discovery of Li3.3SnS3.3Cl0.7. The interstitial site occupancy combination in this defect stuffed wurtzite enables a low-barrier ion transport pathway in hexagonal close-packing.

Download Full-text

Feature Selection for Unsupervised Machine Learning of Accelerometer Data Physical Activity Clusters – A Systematic Review

Gait & Posture ◽

10.1016/j.gaitpost.2021.08.007 ◽

2021 ◽

Author(s):

Petra J. Jones ◽

Mike Catt ◽

Melanie J. Davies ◽

Charlotte L. Edwardson ◽

Evgeny M. Mirkes ◽

...

Keyword(s):

Physical Activity ◽

Machine Learning ◽

Systematic Review ◽

Feature Selection ◽

Accelerometer Data ◽

Unsupervised Machine Learning ◽

Selection For

Download Full-text

An integrated Shannon Entropy and reference ideal method for the selection of enhanced oil recovery pilot areas based on an unsupervised machine learning algorithm

Oil & Gas Science and Technology – Revue d’IFP Energies nouvelles ◽

10.2516/ogst/2021061 ◽

2021 ◽

Vol 76 ◽

pp. 82

Author(s):

S. Mahdia Motahhari ◽

Mehdi Rafizadeh ◽

S. Mahmoud Reza Pishvaie ◽

Mohammad Ahmadi

Keyword(s):

Machine Learning ◽

Enhanced Oil Recovery ◽

Shannon Entropy ◽

Oil Recovery ◽

Oil Field ◽

Unsupervised Machine Learning ◽

Main Challenge ◽

Ideal Method ◽

Selection Of ◽

Economic Criteria

Pilot-scale enhanced oil recovery in hydrocarbon field development is often implemented to reduce investment risk due to geological uncertainties. Selection of the pilot area is important, since the result will be extended to the full field. The main challenge in choosing a pilot region is the absence of a systematic and quantitative method. In this paper, we present a novel quantitative and systematic method composed of reservoir-geology and operational-economic criteria where a cluster analysis is utilized as an unsupervised machine learning method. A field of study will be subdivided into pilot candidate areas, and the optimized pilot size is calculated using the economic objective function. Subsequently, the corresponding Covariance (COV) matrix is computed for the simulated 3-D reservoir quality maps in the areas. The areas are optimally clustered to select the dominant cluster. The operational-economic criteria could be applied for decision making as well as the proximity of each area to the center of dominant cluster as a geological-reservoir criterion. Ultimately, the Shannon entropy weighting and the reference ideal method are applied to compute the pilot opportunity index in each area. The proposed method was employed for a pilot study on an oil field in south west Iran.

Download Full-text

Identifying Measurement Invariant Item Sets in Cross-Cultural Settings Using an Automated Item Selection Procedure

Methodology ◽

10.1027/1614-2241/a000155 ◽

2018 ◽

Vol 14 (4) ◽

pp. 177-188 ◽

Cited By ~ 2

Author(s):

Martin Schultze ◽

Michael Eid

Keyword(s):

Measurement Invariance ◽

Optimal Solution ◽

Selection Procedure ◽

Cross Cultural ◽

Item Selection ◽

Ant System ◽

Item Quality ◽

Ant System Algorithm ◽

Selection For ◽

Selection Of

Abstract. In the construction of scales intended for the use in cross-cultural studies, the selection of items needs to be guided not only by traditional criteria of item quality, but has to take information about the measurement invariance of the scale into account. We present an approach to automated item selection which depicts the process as a combinatorial optimization problem and aims at finding a scale which fulfils predefined target criteria – such as measurement invariance across cultures. The search for an optimal solution is performed using an adaptation of the [Formula: see text] Ant System algorithm. The approach is illustrated using an application to item selection for a personality scale assuming measurement invariance across multiple countries.

Download Full-text

MÖSSBAUER STUDY OF HYDROGEN DIFFUSION AND INTERSTITIAL SITE OCCUPANCY IN57Co : PdHxAND57Fe : PdHx

Le Journal de Physique Colloques ◽

10.1051/jphyscol:19792222 ◽

1979 ◽

Vol 40 (C2) ◽

pp. C2-635-C2-638

Author(s):

F. Pröbst ◽

F. E. Wagner ◽

M. Karger ◽

G. Wortmann

Keyword(s):

Hydrogen Diffusion ◽

Site Occupancy ◽

Interstitial Site ◽

Mössbauer Study

Download Full-text

ESTIMASI NILAI HERITABILITAS BOBOT IKAN MAS VARIETAS PUNTEN DALAM PROGRAM SELEKSI INDIVIDU

Jurnal Riset Akuakultur ◽

10.15578/jra.11.3.2016.217-223 ◽

2016 ◽

Vol 11 (3) ◽

pp. 217

Author(s):

Estu Nugroho ◽

Budi Setyono ◽

Mochammad Su’eb ◽

Tri Heru Prihadi

Keyword(s):

Body Weight ◽

Selection Response ◽

Base Population ◽

Breeding Programs ◽

Male And Female ◽

Body Weight Trait ◽

Population Selection ◽

Selection For ◽

Weight Trait ◽

Selection Of

Program pemuliaan ikan mas varietas Punten dilakukan dengan seleksi individu terhadap karakter bobot ikan. Pembentukan populasi dasar untuk kegiatan seleksi dilakukan dengan memijahkan secara massal induk ikan mas yang terdiri atas 20 induk betina dan 21 induk jantan yang dikoleksi dari daerah Punten, Kepanjen (delapan betina dan enam jantan), Kediri (tujuh betina dan 12 jantan), Sragen (27 betina dan 10 jantan), dan Blitar (15 betina dan 11 jantan). Larva umur 10 hari dipelihara selama empat bulan. Selanjutnya dilakukan penjarangan sebesar 50% dan benih dipelihara selama 14 bulan untuk dilakukan seleksi dengan panduan hasil sampling 250 ekor individu setiap populasi. Seleksi terhadap calon induk dilakukan saat umur 18 bulan pada populasi jantan dan betina secara terpisah dengan memilih berdasarkan 10% bobot ikan yang terbaik. Calon induk yang terseleksi kemudian dipelihara hingga matang gonad, kemudian dipilih sebanyak 150 pasang dan dipijahkan secara massal. Didapatkan respons positif dari hasil seleksi berdasarkan bobot ikan, yaitu 49,89 g atau 3,66% (populasi ikan jantan) dan 168,47 g atau 11,43% (populasi ikan betina). Nilai heritabilitas untuk bobot ikan adalah 0,238 (jantan) dan 0,505 (betina).Punten carp breeding programs were carried out by individual selection for body weight trait. The base population for selection activities were conducted by mass breeding of parent consisted of 20 female and 21 male collected from area Punten, eight female and six male (Kepanjen), seven female and 12 male (Kediri), 27 female and 10 male (Sragen), 15 female and 11 male (Blitar). Larvae 10 days old reared for four moths. Then after spacing out 50% of total harvest, the offspring reared for 14 months for selection activity based on the sampling of 250 individual each population. Selection of broodstock candidates performed since 18 months age on male and female populations separately by selecting based on 10% of fish with best body weight. Candidates selected broodstocks were then maintained until mature. In oder to produce the next generation 150 pairs were sets and held for mass spawning. The results revealed that selection response were positive, 49.89 g (3.66%) for male and 168.47 (11.43%) for female. Heritability for body weight is 0.238 (male) and 0.505 (female).

Download Full-text

Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

10.26434/chemrxiv.5513581.v1 ◽

2017 ◽

Author(s):

Sabrina Jaeger ◽

Simone Fulle ◽

Samo Turk

Keyword(s):

Machine Learning ◽

Language Processing ◽

Supervised Machine Learning ◽

Learning Approach ◽

Learning Approaches ◽

Unsupervised Machine Learning ◽

Feature Representations ◽

Machine Learning Approach ◽

The Individual ◽

Vector Representations

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.

Download Full-text

Analysis of the Bath Motion in the MM-SQC Dynamics Using Unsupervised Machine Learning Dimensionality Reduction Approaches: Principal Component Analysis

10.26434/chemrxiv.13332530 ◽

2020 ◽

Author(s):

Jiawei Peng ◽

Yu Xie ◽

Deping Hu ◽

Zhenggang Lan

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Collective Motion ◽

Principal Component ◽

Component Analysis ◽

Nonadiabatic Dynamics ◽

Trajectory Data ◽

Unsupervised Machine Learning ◽

Physical Knowledge ◽

Vibronic Couplings

The system-plus-bath model is an important tool to understand nonadiabatic dynamics for large molecular systems. The understanding of the collective motion of a huge number of bath modes is essential to reveal their key roles in the overall dynamics. We apply the principal component analysis (PCA) to investigate the bath motion based on the massive data generated from the MM-SQC (symmetrical quasi-classical dynamics method based on the Meyer-Miller mapping Hamiltonian) nonadiabatic dynamics of the excited-state energy transfer dynamics of Frenkel-exciton model. The PCA method clearly clarifies that two types of bath modes, which either display the strong vibronic couplings or have the frequencies close to electronic transition, are very important to the nonadiabatic dynamics. These observations are fully consistent with the physical insights. This conclusion is obtained purely based on the PCA understanding of the trajectory data, without the large involvement of pre-defined physical knowledge. The results show that the PCA approach, one of the simplest unsupervised machine learning methods, is very powerful to analyze the complicated nonadiabatic dynamics in condensed phase involving many degrees of freedom.

Download Full-text

Evaluation of Large-Sunflower Lines of Sunflower on Quantitative Morphological Traits

Scientific and Technical Bulletin of the Institute of Oilseed Crops NAAS ◽

10.36710/ioc-2020-29-05 ◽

2020 ◽

pp. 46-55

Keyword(s):

Fertility Restoration ◽

Morphological Characters ◽

Direct Selection ◽

Oilseed Crops ◽

Turkish Origin ◽

Self Pollination ◽

Special Area ◽

Pericarp Color ◽

Selection For ◽

Selection Of

Confectionery sunflower - a special area of use of sunflower, which requires the creation of marketable seeds quality features. One of the possible ways to create large-fruited sunflower is to create production hybrids and lines. Objective: to evaluate the created new large-fruited sunflower lines by a complex of morphological characters and determine the best lines for use as large-seeds hybrids as parent components or source material. In 2016-2019 years on the basis of the Institute of Oilseed Crops NAAS a study was conducted to assess the economic characteristics of large-fruited sunflower lines. We studied a collection of 27 lines of large-seeds sources. The lines were created by direct selection or crossing and sampling: Reyny of Argentinean origin, Zaporizhzhya confectionery variety, confectionery hybrid with striped pericarp color of Israeli origin, white seed of Turkish origin, synthetic population - donor of complex resistance. To study from the collection, lines were drawn that went through at least 7 generations with selection for seed size. Experience has shown that the shortest growing season for lines 174d and KP11 was 99 days, and the longest for lines I2K670 was 109 days. In the studied collection, the greatest mass of 1000 seeds has the KP11-146.47g line, which is the mother component and does not have branching. The second by weight of 1000 seeds (109 g) stood out line 168v, which also had branches and pollen fertility restoration genes and will be used as the paternal form. The third largest is also one basket line ZKN51-100. The collection included lines originating from the same combination, but with a different morphotype for the presence and absence of branching. So, based on the combination of KP11 x Zaporizhzhya Confectionery, three lines were obtained. A mass of 1000 seeds was observed in 98-86 g, with the branching line having the largest mass of 1000 seeds. The lines created with one combination VK678 x ZKN32: with a branch 168a had a mass of 1000 seeds 95g, and a line 168b - without a branch 109 g. Of the two lines obtained from the descendants of the combination KP11 x the striped hybrid both had branches, but the seeds were much smaller (weight of 1000 seeds 59 and 79 g). The collection also studied samples created on the basis of varieties and populations 160c, 174, 175b, the mass of 1000 seeds of which turned out to be more acceptable for large-fruited use from 83 to 99 g. Summing up the results of studying the collection of newly created lines, we can highlight the lines 162d, 168v, 175b, KP11 that are potentially promising for use in hybrids. The selections showed that large-fruited lines can be obtained from large-fruited varieties, self-pollination of large-fruited hybrids and crossing lines with hybrids and varieties. Self-pollination and selection of large-fruited lines in several generations does not provide the necessary variability for positive changes in selections. The result of the selection by weight of 1000 seeds in the offspring from crosses and from populations creates opportunities for new large-seeds sunflower.

Download Full-text

Deep Learning in Disease Diagnosis: Models and Datasets

Current Bioinformatics ◽

10.2174/1574893615999201002124021 ◽

2020 ◽

Vol 15 ◽

Author(s):

Deeksha Saxena ◽

Mohammed Haris Siddiqui ◽

Rajnish Kumar

Keyword(s):

Biological Sciences ◽

Machine Learning ◽

Deep Learning ◽

Disease Diagnosis ◽

Learning Models ◽

Data Types ◽

Related Data ◽

Abstract Level ◽

Experimental Validations ◽

Selection Of

Background: Deep learning (DL) is an Artificial neural network-driven framework with multiple levels of representation for which non-linear modules combined in such a way that the levels of representation can be enhanced from lower to a much abstract level. Though DL is used widely in almost every field, it has largely brought a breakthrough in biological sciences as it is used in disease diagnosis and clinical trials. DL can be clubbed with machine learning, but at times both are used individually as well. DL seems to be a better platform than machine learning as the former does not require an intermediate feature extraction and works well with larger datasets. DL is one of the most discussed fields among the scientists and researchers these days for diagnosing and solving various biological problems. However, deep learning models need some improvisation and experimental validations to be more productive. Objective: To review the available DL models and datasets that are used in disease diagnosis. Methods: Available DL models and their applications in disease diagnosis were reviewed discussed and tabulated. Types of datasets and some of the popular disease related data sources for DL were highlighted. Results: We have analyzed the frequently used DL methods, data types and discussed some of the recent deep learning models used for solving different biological problems. Conclusion: The review presents useful insights about DL methods, data types, selection of DL models for the disease diagnosis.

Download Full-text

An unsupervised machine-learning checkpoint-restart algorithm using Gaussian mixtures for particle-in-cell simulations

Journal of Computational Physics ◽

10.1016/j.jcp.2021.110185 ◽

2021 ◽

Vol 436 ◽

pp. 110185

Author(s):

G. Chen ◽

L. Chacón ◽

T.B. Nguyen

Keyword(s):

Machine Learning ◽

Gaussian Mixtures ◽

Unsupervised Machine Learning ◽

Particle In Cell

Download Full-text