HIGH-RESOLUTION NATIONAL-SCALE HYDROGEOLOGIC DATASETS IN SUPPORT OF MACHINE LEARNING MODELS

2020 ◽  
Author(s):  
Kenneth Belitz ◽  
Author(s):  
So Yeon Lee ◽  
Sang Tak Lee ◽  
Sungill Suh ◽  
Bum Jun Ko ◽  
Han Bin Oh

Abstract High-resolution liquid chromatography (LC)–tandem mass spectrometry (MS-MS)-based machine learning models are constructed to address the analytical challenge of identifying unknown controlled substances and new psychoactive substances (NPSs). Using a training set composed of 770 LC–MS-MS barcode spectra (with binary entries 0 or 1) obtained generally by high-resolution mass spectrometers, three classification machine learning models were generated and evaluated. The three models are artificial neural network (ANN), support vector machine (SVM) and k-nearest neighbor (k-NN) models. In these models, controlled substances and NPSs were classified into 13 subgroups (benzylpiperazine, opiate, benzodiazepine, amphetamine, cocaine, methcathinone, classical cannabinoid, fentanyl, 2C series, indazole carbonyl compound, indole carbonyl compound, phencyclidine and others). Using 193 LC–MS-MS barcode spectra as an external test set, accuracy of the ANN, SVM and k-NN models were evaluated as 72.5%, 90.0% and 94.3%, respectively. Also, the hybrid similarity search (HSS) algorithm was evaluated to examine whether this algorithm can successfully identify unknown controlled substances and NPSs whose data are unavailable in the database. When only 24 representative LC–MS-MS spectra of controlled substances and NPSs were selectively included in the database, it was found that HSS can successfully identify compounds with high reliability. The machine learning models and HSS algorithms are incorporated into our home-coded artificial intelligence screener for narcotic drugs and psychotropic substances standalone software that is equipped with a graphic user interface. The use of this software allows unknown controlled substances and NPSs to be identified in a convenient manner.


2021 ◽  
Vol 13 (24) ◽  
pp. 5038
Author(s):  
Xianghua Li ◽  
Jinliang Hou ◽  
Chunlin Huang

Accurate high-resolution gridded livestock distribution data are of great significance for the rational utilization of grassland resources, environmental impact assessment, and the sustainable development of animal husbandry. Traditional livestock distribution data are collected at the administrative unit level, which does not provide a sufficiently detailed geographical description of livestock distribution. In this study, we proposed a scheme by integrating high-resolution gridded geographic data and livestock statistics through machine learning regression models to spatially disaggregate the livestock statistics data into 1 km × 1 km spatial resolution. Three machine learning models, including support vector machine (SVM), random forest (RF), and deep neural network (DNN), were constructed to represent the complex nonlinear relationship between various environmental factors (e.g., land use practice, topography, climate, and socioeconomic factors) and livestock density. By applying the proposed method, we generated a set of 1 km × 1 km spatial distribution maps of cattle and sheep for western China from 2000 to 2015 at five-year intervals. Our projected cattle and sheep distribution maps reveal the spatial heterogeneity structures and change trend of livestock distribution at the grid level from 2000 to 2015. Compared with the traditional census livestock density, the gridded livestock distribution based on DNN has the highest accuracy, with the determinant coefficient (R2) of 0.75, root mean square error (RMSE) of 9.82 heads/km2 for cattle, and the R2 of 0.73, RMSE of 31.38 heads/km2 for sheep. The accuracy of the RF is slightly lower than the DNN but higher than the SVM. The projection accuracy of the three machine learning models is superior to those of the published Gridded Livestock of the World (GLW) datasets. Consequently, deep learning has the potential to be an effective tool for high-resolution gridded livestock projection by combining geographic and census data.


2018 ◽  
Author(s):  
Sam Ghazal ◽  
Michael Sauthier ◽  
David Brossier ◽  
Wassim Bouachir ◽  
Philippe Jouvet ◽  
...  

AbstractClinicians’ experts in mechanical ventilation are not continuously at each patient’s bedside in an intensive care unit to adjust mechanical ventilation settings and to analyze the impact of ventilator settings adjustments on gas exchange. The development of clinical decision support systems analyzing patients’ data in real time offers an opportunity to fill this gap. The objective of this study was to determine whether a machine learning predictive model could be trained on a set of clinical data and used to predict hemoglobin oxygen saturation 5 min after a ventilator setting change. Data of mechanically ventilated children admitted between May 2015 and April 2017 were included and extracted from a high-resolution research database. More than 7.105 rows of data were obtained from 610 patients, discretized into 3 class labels. Due to data imbalance, four different data balancing process were applied and two machine learning models (artificial neural network and Bootstrap aggregation of complex decision trees) were trained and tested on these four different balanced datasets. The best model predicted SpO2 with accuracies of 76%, 62% and 96% for the SpO2 class “< 84%”, “85 to 91%” and “> 92%”, respectively. This pilot study using machine learning predictive model resulted in an algorithm with good accuracy. To obtain a robust algorithm, more data are needed, suggesting the need of multicenter pediatric intensive care high resolution databases.


2020 ◽  
Vol 2 (1) ◽  
pp. 3-6
Author(s):  
Eric Holloway

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.


2021 ◽  
Author(s):  
Norberto Sánchez-Cruz ◽  
Jose L. Medina-Franco

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>


Sign in / Sign up

Export Citation Format

Share Document