HIGH-RESOLUTION NATIONAL-SCALE HYDROGEOLOGIC DATASETS IN SUPPORT OF MACHINE LEARNING MODELS

Abstract High-resolution liquid chromatography (LC)–tandem mass spectrometry (MS-MS)-based machine learning models are constructed to address the analytical challenge of identifying unknown controlled substances and new psychoactive substances (NPSs). Using a training set composed of 770 LC–MS-MS barcode spectra (with binary entries 0 or 1) obtained generally by high-resolution mass spectrometers, three classification machine learning models were generated and evaluated. The three models are artificial neural network (ANN), support vector machine (SVM) and k-nearest neighbor (k-NN) models. In these models, controlled substances and NPSs were classified into 13 subgroups (benzylpiperazine, opiate, benzodiazepine, amphetamine, cocaine, methcathinone, classical cannabinoid, fentanyl, 2C series, indazole carbonyl compound, indole carbonyl compound, phencyclidine and others). Using 193 LC–MS-MS barcode spectra as an external test set, accuracy of the ANN, SVM and k-NN models were evaluated as 72.5%, 90.0% and 94.3%, respectively. Also, the hybrid similarity search (HSS) algorithm was evaluated to examine whether this algorithm can successfully identify unknown controlled substances and NPSs whose data are unavailable in the database. When only 24 representative LC–MS-MS spectra of controlled substances and NPSs were selectively included in the database, it was found that HSS can successfully identify compounds with high reliability. The machine learning models and HSS algorithms are incorporated into our home-coded artificial intelligence screener for narcotic drugs and psychotropic substances standalone software that is equipped with a graphic user interface. The use of this software allows unknown controlled substances and NPSs to be identified in a convenient manner.

Download Full-text

Assessment of High‐Resolution Dynamical and Machine Learning Models for Prediction of Sea Ice Concentration in a Regional Application

Journal of Geophysical Research Oceans ◽

10.1029/2020jc016277 ◽

2020 ◽

Vol 125 (11) ◽

Author(s):

Sindre Fritzner ◽

Rune Graversen ◽

Kai H. Christensen

Keyword(s):

Machine Learning ◽

High Resolution ◽

Sea Ice ◽

Learning Models ◽

Sea Ice Concentration ◽

Ice Concentration ◽

Machine Learning Models

Download Full-text

Supplementary material to "High-resolution mapping of regional traffic emissions by using land-use machine learning models"

10.5194/acp-2021-281-supplement ◽

2021 ◽

Author(s):

Xiaomeng Wu ◽

Daoyuan Yang ◽

Jiajun Gu ◽

Yifan Wen ◽

Shaojun Zhang ◽

...

Keyword(s):

Machine Learning ◽

Land Use ◽

High Resolution ◽

Traffic Emissions ◽

Learning Models ◽

High Resolution Mapping ◽

Resolution Mapping ◽

Supplementary Material ◽

Machine Learning Models

Download Full-text

High-Resolution Gridded Livestock Projection for Western China Based on Machine Learning

Remote Sensing ◽

10.3390/rs13245038 ◽

2021 ◽

Vol 13 (24) ◽

pp. 5038

Author(s):

Xianghua Li ◽

Jinliang Hou ◽

Chunlin Huang

Keyword(s):

Machine Learning ◽

High Resolution ◽

Western China ◽

Support Vector ◽

Distribution Data ◽

Learning Models ◽

Distribution Maps ◽

Livestock Density ◽

Machine Learning Models ◽

Cattle And Sheep

Accurate high-resolution gridded livestock distribution data are of great significance for the rational utilization of grassland resources, environmental impact assessment, and the sustainable development of animal husbandry. Traditional livestock distribution data are collected at the administrative unit level, which does not provide a sufficiently detailed geographical description of livestock distribution. In this study, we proposed a scheme by integrating high-resolution gridded geographic data and livestock statistics through machine learning regression models to spatially disaggregate the livestock statistics data into 1 km × 1 km spatial resolution. Three machine learning models, including support vector machine (SVM), random forest (RF), and deep neural network (DNN), were constructed to represent the complex nonlinear relationship between various environmental factors (e.g., land use practice, topography, climate, and socioeconomic factors) and livestock density. By applying the proposed method, we generated a set of 1 km × 1 km spatial distribution maps of cattle and sheep for western China from 2000 to 2015 at five-year intervals. Our projected cattle and sheep distribution maps reveal the spatial heterogeneity structures and change trend of livestock distribution at the grid level from 2000 to 2015. Compared with the traditional census livestock density, the gridded livestock distribution based on DNN has the highest accuracy, with the determinant coefficient (R2) of 0.75, root mean square error (RMSE) of 9.82 heads/km2 for cattle, and the R2 of 0.73, RMSE of 31.38 heads/km2 for sheep. The accuracy of the RF is slightly lower than the DNN but higher than the SVM. The projection accuracy of the three machine learning models is superior to those of the published Gridded Livestock of the World (GLW) datasets. Consequently, deep learning has the potential to be an effective tool for high-resolution gridded livestock projection by combining geographic and census data.

Download Full-text

Using machine learning models to predict oxygen saturation following ventilator support adjustment in critically ill children: a single center pilot study

10.1101/334896 ◽

2018 ◽

Author(s):

Sam Ghazal ◽

Michael Sauthier ◽

David Brossier ◽

Wassim Bouachir ◽

Philippe Jouvet ◽

...

Keyword(s):

Machine Learning ◽

Mechanical Ventilation ◽

Intensive Care ◽

Pilot Study ◽

High Resolution ◽

Oxygen Saturation ◽

Predictive Model ◽

Critically Ill Children ◽

Learning Models ◽

Machine Learning Models

AbstractClinicians’ experts in mechanical ventilation are not continuously at each patient’s bedside in an intensive care unit to adjust mechanical ventilation settings and to analyze the impact of ventilator settings adjustments on gas exchange. The development of clinical decision support systems analyzing patients’ data in real time offers an opportunity to fill this gap. The objective of this study was to determine whether a machine learning predictive model could be trained on a set of clinical data and used to predict hemoglobin oxygen saturation 5 min after a ventilator setting change. Data of mechanically ventilated children admitted between May 2015 and April 2017 were included and extracted from a high-resolution research database. More than 7.105 rows of data were obtained from 610 patients, discretized into 3 class labels. Due to data imbalance, four different data balancing process were applied and two machine learning models (artificial neural network and Bootstrap aggregation of complex decision trees) were trained and tested on these four different balanced datasets. The best model predicted SpO2 with accuracies of 76%, 62% and 96% for the SpO2 class “< 84%”, “85 to 91%” and “> 92%”, respectively. This pilot study using machine learning predictive model resulted in an algorithm with good accuracy. To obtain a robust algorithm, more data are needed, suggesting the need of multicenter pediatric intensive care high resolution databases.

Download Full-text

Improving XGBoost with Imagination Sampling

Communications of the Blyth Institute ◽

10.33014/issn.2640-5652.2.1.holloway.1 ◽

2020 ◽

Vol 2 (1) ◽

pp. 3-6

Author(s):

Eric Holloway

Keyword(s):

Machine Learning ◽

General System ◽

Learning Models ◽

Starting Point ◽

Machine Learning Models

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.

Download Full-text

Development of Machine Learning Models to Predict Student Performance in Computer Literacy Courses

International Review on Computers and Software (IRECOS) ◽

10.15866/irecos.v13i1.16863 ◽

2018 ◽

Vol 13 (1) ◽

pp. 21

Author(s):

George Anderson ◽

Oduronke T. Eyitayo

Keyword(s):

Machine Learning ◽

Student Performance ◽

Computer Literacy ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Experimental Comparison of Machine Learning Models in Malware Packing Detection

2020 21st Asia-Pacific Network Operations and Management Symposium (APNOMS) ◽

10.23919/apnoms50412.2020.9237007 ◽

2020 ◽

Author(s):

Jong-Wouk Kim ◽

Juhong Namgung ◽

Yang-Sae Moon ◽

Mi-Jung Choi

Keyword(s):

Machine Learning ◽

Experimental Comparison ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Epigenetic Target Prediction with Accurate Machine Learning Models

10.26434/chemrxiv.13522313 ◽

2021 ◽

Author(s):

Norberto Sánchez-Cruz ◽

Jose L. Medina-Franco

Keyword(s):

Machine Learning ◽

Small Molecules ◽

Predictive Models ◽

Large Scale ◽

Target Prediction ◽

Quantitative Measure ◽

Learning Models ◽

Discovery Research ◽

Drug Discovery Research ◽

Machine Learning Models

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>

Download Full-text