PREDICTION OF BONDING ENERGY BY STRUCTURAL DESCRIPTORS OF METAL NANOALLOYS

Обсуждается проблема предсказания энергии связи для тернарных металлических наночастиц и построение моделей обучения на базе структурных дескрипторов. Были построены регрессионные зависимости удельной межатомной энергии связи для тернарной наносистемы Au - Ag - Cu. Использовался ряд из пяти радиальных признаков, зависящих от попарного межатомного расстояния дескрипторов структуры наночастицы. Для более корректной оценки точности была применена кросс-валидация, далее полученные на валидационных частях выборки результаты усреднялись. Полученная модель ограниченно предсказывает значение удельной межатомной энергии связи внутри группы данных для наночастиц одного состава, а для всей выборки средняя по модулю ошибка составляет 14%. При этом модель практически безошибочно определяет состав наночастицы из нескольких вариантов. Наибольшее значение коэффициента детерминации на всей выборке получено с помощью ансамблевого алгоритма случайный лес. Обнаружена отрицательная корреляция между энергией связи наносплава и положением первого пика радиальной функции распределения для атомов меди. The problem of predicting the binding energy for ternary metal nanoparticles and the construction of learning models based on structural descriptors are discussed. Regression dependences of the specific interatomic bond energy were constructed for the ternary Au - Ag - Cu nanosystem. A number of five radial features were used, depending on the pairwise interatomic distance of the nanoparticle structure descriptors. For a more correct assessment of the accuracy, cross-validation was applied, then the results obtained on the validation parts of the sample were averaged. The resulting model limitedly predicts the value of the specific interatomic binding energy within a group of data for nanoparticles of the same composition. For the entire sample the average error in modulus is 14 %. In this case, the model almost accurately determines the composition of a nanoparticle of several variants. The largest value of the coefficient of determination in the entire sample was obtained using an ensemble random forest algorithm. A negative correlation was found between the binding energy of the nanoalloy and the position of the first peak of the radial distribution function for copper atoms.

Download Full-text

Hourly Ground-Level PM2.5 Estimation Using Geostationary Satellite and Reanalysis Data via Deep Learning

Remote Sensing ◽

10.3390/rs13112121 ◽

2021 ◽

Vol 13 (11) ◽

pp. 2121

Author(s):

Changsuk Lee ◽

Kyunghwa Lee ◽

Sangmin Kim ◽

Jinhyeok Yu ◽

Seungtaek Jeong ◽

...

Keyword(s):

Cross Validation ◽

Ground Level ◽

Reanalysis Data ◽

Training Dataset ◽

Coefficient Of Determination ◽

Mean Bias Error ◽

Bias Error ◽

Air Quality Model ◽

Quality Model ◽

Near Surface

This study proposes an improved approach for monitoring the spatial concentrations of hourly particulate matter less than 2.5 μm in diameter (PM2.5) via a deep neural network (DNN) using geostationary ocean color imager (GOCI) images and unified model (UM) reanalysis data over the Korean Peninsula. The DNN performance was optimized to determine the appropriate training model structures, incorporating hyperparameter tuning, regularization, early stopping, and input and output variable normalization to prevent training dataset overfitting. Near-surface atmospheric information from the UM was also used as an input variable to spatially generalize the DNN model. The retrieved PM2.5 from the DNN was compared with estimates from random forest, multiple linear regression, and the Community Multiscale Air Quality model. The DNN demonstrated the highest accuracy compared to that of the conventional methods for the hold-out validation (root mean square error (RMSE) = 7.042 μg/m3, mean bias error (MBE) = −0.340 μg/m3, and coefficient of determination (R2) = 0.698) and the cross-validation (RMSE = 9.166 μg/m3, MBE = 0.293 μg/m3, and R2 = 0.49). Although the R2 was low due to underestimated high PM2.5 concentration patterns, the RMSE and MBE demonstrated reliable accuracy values (<10 μg/m3 and 1 μg/m3, respectively) for the hold-out validation and cross-validation.

Download Full-text

Research on Crude Protein Contents in Medicago Sativa Hay Harvest During 2008-2009 Using FT-NIR Spectrometry

Bulletin of University of Agricultural Sciences and Veterinary Medicine Cluj-Napoca Agriculture ◽

10.15835/buasvmcn-agr:6426 ◽

2011 ◽

Vol 68 (1) ◽

Author(s):

Laura DALE ◽

Ioan ROTAR ◽

Vasile FLORIAN ◽

Roxana VIDICAN ◽

André THEWIS ◽

...

Keyword(s):

Medicago Sativa ◽

Protein Content ◽

Crude Protein ◽

Nutritive Value ◽

Cross Validation ◽

External Validation ◽

Flowering Plant ◽

Coefficient Of Determination ◽

Research Station ◽

Crude Protein Content

Medicago sativa or alfalfa is a flowering plant that belongs to Pea Family that is widely grown throughout the world as forage for cattle, and is most often harvested as hay. Usually, alfalfa has the highest nutritive value of all common hay crops. This work aims to highlight a way for direct, non-destructive analysis of crude protein content in alfalfa hays. The primary objective was to build a model for crude protein calibration for alfalfa based on FT-NIR spectroscopy. The samples for analysis were collected over two experimental years (2008-2009) from field trials from the research station– Agricultural Development, Cojocna. In order to construct the model, reference values are needed; for this reason, the crude protein content was determined using the classical Kjeldahl method (Kjeltec Auto Analyser, Tecator). The values for crude protein ranged from 12.63% to 19.12% on the dry matter basis. The regression model’s construction was based on Partial Least Squares (PLS) calculated with the SIMPLS algorithm, using different pre-processing techniques and leave-one-out cross validation. Calibration of the two years together drove to a coefficient of determination for cross validation, R2 of 0.965. The robustness of the model was confirmed by applying it to independent samples (external validation) where the coefficient of determination was R2 = 0.977, RMSEP = 0.8. The results obtained indicated that NIRS can be used to determine crude protein, which could be used as criteria for quality control of alfalfa hays.

Download Full-text

Application of FTIR Spectroscopy and Chemometrics for Halal Authentication of Beef Meatball Adulterated with Dog Meat

Indonesian Journal of Chemistry ◽

10.22146/ijc.27159 ◽

2018 ◽

Vol 18 (2) ◽

pp. 376 ◽

Cited By ~ 2

Author(s):

Wiranti Sri Rahayu ◽

Abdul Rohman ◽

Sudibyo Martono ◽

Sudjadi Sudjadi

Keyword(s):

Ftir Spectroscopy ◽

Reliable Method ◽

Cross Validation ◽

Partial Least Square ◽

Least Square ◽

Coefficient Of Determination ◽

Calibration Model ◽

Mean Square ◽

Folch Method ◽

Halal Authentication

Beef meatball is one of the favorite meat-based food products among Indonesian community. Currently, beef is very expensive in Indonesian market compared to other common meat types such as chicken and lamb. This situation has intrigued some unethical meatball producers to replace or adulterate beef with lower priced-meat like dog meat. The objective of this study was to evaluate the capability of FTIR spectroscopy combined with chemometrics for identification and quantification of dog meat (DM) in beef meatball (BM). Meatball samples were prepared by adding DM into BM ingredients in the range of 0–100% wt/wt and were subjected to extraction using Folch method. Lipid extracts obtained from the samples were scanned using FTIR spectrophotometer at 4000–650 cm-1. Partial least square (PLS) calibration was used to quantify DM in the meatball. The results showed that combined frequency regions of 1782–1623 cm-1 and 1485-659 cm-1 using detrending treatment gave optimum prediction of DM in BM. Coefficient of determination (R2) for correlation between the actual value of DM and FTIR predicted value was 0.993 in calibration model and 0.995 in validation model. The root mean square error of calibration (RMSEC) and standard error of cross validation (SECV) were 1.63% and 2.68%, respectively. FTIR spectroscopy combined with multivariate analysis can serve as an accurate and reliable method for analysis of DM in meatball.

Download Full-text

Development of multistage 10-m shuttle run test for VO2max estimation in healthy adults

10.31083/jomh.2021.066 ◽

2021 ◽

Keyword(s):

Cross Validation ◽

Age Groups ◽

Healthy Adults ◽

Coefficient Of Determination ◽

Validation Test ◽

Uniform Distributions ◽

Gender And Age ◽

Run Test ◽

Shuttle Run ◽

Spatial Limitation

Background and objective: The disadvantage of the traditional 20-m multistage shuttle run test (MST) is that it requires a long space for measurements and does not include various age groups to develop the test. Therefore, we developed a new MST to improve the spatial limitation by reducing the measurement to a 10-m distance and to resolve the bias via uniform distributions of gender and age. Material and methods: Study subjects included 120 healthy adults (60 males and 60 females) aged 20 to 50 years. All subjects performed a graded maximal exercise test (GXT) and a 10-m MST at ﬁve-day intervals. We developed a regression model using 70% of the subject's data and performed a cross-validation test using 30% of the data. Results: The male regression model's coeﬃcient of determination (R2) was 58.8%, and the standard error of estimation (SEE) was 4.17 mL/kg/min. The female regression model's R2 was 69.2%, and the SEE was 3.39 mL/kg/min. The 10-m MST showed a high correlation with GXT on the VO2max (males: 0.816; females: 0.821). In the cross-validation test for the developed regression models, the male's SEE was 4.38 mL/kg/min, and the female's SEE was 4.56 mL/kg/min. Conclusion: Thus, the 10-m MST is an accurate and valid method for estimating the VO2max. Therefore, the 10-m MST developed by us can be used when the existing 20-m MST cannot be used due to spatial limitations and can be applied to both men and women in their 20s and 50s.

Download Full-text

Using Raman Spectroscopy as a Fast Tool to Classify and Analyze Bulgarian Wines—A Feasibility Study

Molecules ◽

10.3390/molecules25010170 ◽

2019 ◽

Vol 25 (1) ◽

pp. 170 ◽

Cited By ~ 1

Author(s):

Vera Deneva ◽

Ivan Bakardzhiyski ◽

Krasimir Bambalov ◽

Daniela Antonova ◽

Diana Tsobanova ◽

...

Keyword(s):

Raman Spectroscopy ◽

Phenolic Compounds ◽

Cross Validation ◽

Geographic Origin ◽

Coefficient Of Determination ◽

Total Phenolic ◽

Total Phenolic Compounds ◽

Calibration Models ◽

Rich Information ◽

Fast Classification

Raman spectroscopy, being able to provide rich information about the chemical composition of the sample, is gaining an increasing interest in the applications of food. Raman spectroscopy was used to analyze a set of wine samples (red and white) sourced from rarely studied traditional Bulgarian wines. One of the objectives of this study was to attempt the fast classification of Bulgarian wines according to variety and geographic origin. In addition, calibration models between phenolic compounds and Raman spectroscopy were developed using partial least squares (PLS) regression using cross-validation. Good calibration statistics were obtained for total phenolic compounds (by the Folin–Ciocalteu method) and total phenolic compounds and phenolic acids (spectrophotometrically at 280 nm) where the coefficient of determination (R2) and the standard error in the cross-validation (SECV) were 0.81 (474.2 mg/dm3 gallic acid), 0.87 (526.6 mg/dm3 catechin equivalents), and 0.81 (44.8 mg/dm3 caffeic equivalents), respectively. This study has demonstrated that Raman spectroscopy can be suitable for measuring phenolic compounds in both red and white wines.

Download Full-text

The Existence of A Priori Distinctions Between Learning Algorithms

Neural Computation ◽

10.1162/neco.1996.8.7.1391 ◽

1996 ◽

Vol 8 (7) ◽

pp. 1391-1420 ◽

Cited By ~ 48

Author(s):

David H. Wolpert

Keyword(s):

Cross Validation ◽

Learning Algorithm ◽

A Priori ◽

Learning Algorithms ◽

Loss Functions ◽

Average Error ◽

Quadratic Loss ◽

Training Set ◽

Natural Restriction ◽

Cross Validation Error

This is the second of two papers that use off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. The first paper discusses a particular set of ways to compare learning algorithms, according to which there are no distinctions between learning algorithms. This second paper concentrates on different ways of comparing learning algorithms from those used in the first paper. In particular this second paper discusses the associated a priori distinctions that do exist between learning algorithms. In this second paper it is shown, loosely speaking, that for loss functions other than zero-one (e.g., quadratic loss), there are a priori distinctions between algorithms. However, even for such loss functions, it is shown here that any algorithm is equivalent on average to its “randomized” version, and in this still has no first principles justification in terms of average error. Nonetheless, as this paper discusses, it may be that (for example) cross-validation has better head-to-head minimax properties than “anti-cross-validation” (choose the learning algorithm with the largest cross-validation error). This may be true even for zero-one loss, a loss function for which the notion of “randomization” would not be relevant. This paper also analyzes averages over hypotheses rather than targets. Such analyses hold for all possible priors over targets. Accordingly they prove, as a particular example, that cross-validation cannot be justified as a Bayesian procedure. In fact, for a very natural restriction of the class of learning algorithms, one should use anti-cross-validation rather than cross-validation (!).

Download Full-text

Estimating Spatio-Temporal Variations of PM2.5 Concentrations Using VIIRS-Derived AOD in the Guanzhong Basin, China

Remote Sensing ◽

10.3390/rs11222679 ◽

2019 ◽

Vol 11 (22) ◽

pp. 2679 ◽

Cited By ~ 5

Author(s):

Kainan Zhang ◽

Gerrit de Leeuw ◽

Zhiqiang Yang ◽

Xingfeng Chen ◽

Xiaoli Su ◽

...

Keyword(s):

Statistical Model ◽

Cross Validation ◽

Temporal Variations ◽

Coefficient Of Determination ◽

Prediction Errors ◽

Two Stage ◽

Temporal Models ◽

Spatio Temporal ◽

Guanzhong Basin ◽

Gwr Model

Aerosol optical depth (AOD) derived from satellite remote sensing is widely used to estimate surface PM2.5 (dry mass concentration of particles with an in situ aerodynamic diameter smaller than 2.5 µm) concentrations. In this research, a two-stage spatio-temporal statistical model for estimating daily surface PM2.5 concentrations in the Guanzhong Basin of China is proposed, using 6 km × 6 km AOD data available from the Visible Infrared Imaging Radiometer Suite (VIIRS) instrument as the main variable and meteorological factors, land-cover, and population data as auxiliary variables. The model is validated using a cross-validation method. The linear mixed effects (LME) model used in the first stage could be improved by using a geographically weighted regression (GWR) model or the generalized additive model (GAM) in the second stage, and the predictive capability of the GWR model is better than that of GAM. The two-stage spatio-temporal statistical model of LME and GWR successfully captures the temporal and spatial variations. The coefficient of determination (R2), the bias and the root-mean-squared prediction errors (RMSEs) of the model fitting to the two-stage spatio-temporal models of LME and GWR were 0.802, −0.378 µg/m3, and 12.746 µg/m3, respectively, and the model cross-validation results were 0.703, 1.451 µg/m3, and 15.731 µg/m3, respectively. The model prediction maps show that the topography has a strong influence on the spatial distribution of the PM2.5 concentrations in the Guanzhong Basin, and PM2.5 concentrations vary with the seasons. This method can provide reliable PM2.5 predictions to reduce the bias of exposure assessment in air pollution and health research.

Download Full-text

GBCNet: In-Field Grape Berries Counting for Yield Estimation by Dilated CNNs

Applied Sciences ◽

10.3390/app10144870 ◽

2020 ◽

Vol 10 (14) ◽

pp. 4870 ◽

Cited By ~ 2

Author(s):

Luca Coviello ◽

Marco Cristoforetti ◽

Giuseppe Jurman ◽

Cesare Furlanello

Keyword(s):

Deep Learning ◽

Cross Validation ◽

Learning Algorithms ◽

Average Error ◽

Yield Estimation ◽

Grape Berries ◽

Accuracy Level ◽

Validation Procedure ◽

Grape Varieties ◽

Single Variety

We introduce here the Grape Berries Counting Net (GBCNet), a tool for accurate fruit yield estimation from smartphone cameras, by adapting Deep Learning algorithms originally developed for crowd counting. We test GBCNet using cross-validation procedure on two original datasets CR1 and CR2 of grape pictures taken in-field before veraison. A total of 35,668 berries have been manually annotated for the task. GBCNet achieves good performances on both the seven grape varieties dataset CR1, although with a different accuracy level depending on the variety, and on the single variety dataset CR2: in particular Mean Average Error (MAE) ranges from 0.85% for Pinot Gris to 11.73% for Marzemino on CR1 and reaches 7.24% on the Teroldego CR2 dataset.

Download Full-text

Modeling the Spread of COVID-19 Infection Using a Multilayer Perceptron

Computational and Mathematical Methods in Medicine ◽

10.1155/2020/5714714 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10 ◽

Cited By ~ 10

Author(s):

Zlatan Car ◽

Sandi Baressi Šegota ◽

Nikola Anđelić ◽

Ivan Lorencin ◽

Vedran Mrzljak

Keyword(s):

Multilayer Perceptron ◽

Cross Validation ◽

Search Algorithm ◽

Activation Function ◽

Coefficient Of Determination ◽

Time Series Dataset ◽

Number Of Patients ◽

Patient Model ◽

Artificial Neural Network Ann ◽

Deceased Patient

Coronavirus (COVID-19) is a highly infectious disease that has captured the attention of the worldwide public. Modeling of such diseases can be extremely important in the prediction of their impact. While classic, statistical, modeling can provide satisfactory models, it can also fail to comprehend the intricacies contained within the data. In this paper, authors use a publicly available dataset, containing information on infected, recovered, and deceased patients in 406 locations over 51 days (22nd January 2020 to 12th March 2020). This dataset, intended to be a time-series dataset, is transformed into a regression dataset and used in training a multilayer perceptron (MLP) artificial neural network (ANN). The aim of training is to achieve a worldwide model of the maximal number of patients across all locations in each time unit. Hyperparameters of the MLP are varied using a grid search algorithm, with a total of 5376 hyperparameter combinations. Using those combinations, a total of 48384 ANNs are trained (16128 for each patient group—deceased, recovered, and infected), and each model is evaluated using the coefficient of determination (R2). Cross-validation is performed using K-fold algorithm with 5-folds. Best models achieved consists of 4 hidden layers with 4 neurons in each of those layers, and use a ReLU activation function, with R2 scores of 0.98599 for confirmed, 0.99429 for deceased, and 0.97941 for recovered patient models. When cross-validation is performed, these scores drop to 0.94 for confirmed, 0.781 for recovered, and 0.986 for deceased patient models, showing high robustness of the deceased patient model, good robustness for confirmed, and low robustness for recovered patient model.

Download Full-text

Global observation-based climatology of precipitation occurrence and peak intensity

10.5194/egusphere-egu2020-7837 ◽

2020 ◽

Author(s):

Hylke Beck ◽

Seth Westra ◽

Eric Wood

Keyword(s):

Land Surface ◽

Regression Models ◽

Cross Validation ◽

Climate Models ◽

Daily Precipitation ◽

State Of The Art ◽

Coefficient Of Determination ◽

Peak Intensity ◽

Uncertainty Estimates ◽

Fold Cross Validation

We introduce a unique set of global observation-based climatologies of daily precipitation (P) occurrence (related to the lower tail of the P distribution) and peak intensity (related to the upper tail of the P distribution). The climatologies were produced using Random Forest (RF) regression models trained with an unprecedented collection of daily P observations from 93,138 stations worldwide. Five-fold cross-validation was used to evaluate the generalizability of the approach and to quantify uncertainty globally. The RF models were found to provide highly satisfactory performance, yielding cross-validation coefficient of determination (R2) values from 0.74 for the 15-year return-period daily P intensity to 0.86 for the >0.5 mm d-1 daily P occurrence. The performance of the RF models was consistently superior to that of state-of-the-art reanalysis (ERA5) and satellite (IMERG) products. The highest P intensities over land were found along the western equatorial coast of Africa, in India, and along coastal areas of Southeast Asia. Using a 0.5 mm d-1 threshold, P was estimated to occur 23.2 % of days on average over the global land surface (excluding Antarctica). The climatologies including uncertainty estimates will be released as the Precipitation DISTribution (PDIST) dataset via www.gloh2o.org/pdist. We expect the dataset to be useful for numerous purposes, such as the evaluation of climate models, the bias correction of gridded P datasets, and the design of hydraulic structures in poorly gauged regions.

Download Full-text