NIR PLSR results obtained by calibration with noisy, low-precision reference values: Are the results acceptable?

Holzforschung ◽  
2006 ◽  
Vol 60 (4) ◽  
pp. 402-408 ◽  
Author(s):  
José Rodrigues ◽  
Ana Alves ◽  
Helena Pereira ◽  
Denilson da Silva Perez ◽  
Guillaume Chantre ◽  
...  

Abstract Both spectral noise and reference method noise affect the accuracy and the precision NIR predicted values. The reference noise is often neglected, and the few reports dealing with it only consider random noise artificially added to the original sound reference data. A calibration for lignin content of maritime pine (Pinus pinaster Ait.) wood meal was developed, but due to low precision and accuracy in the reference data set, NIR partial least-squares regression (PLSR) yielded a slope of 0.51 and an intercept at 14% Klason lignin. We demonstrate with an independent data set for external validation, obtained with higher precision and accuracy, that the NIR PLSR model based on the noisy reference data led to better results. The slope of the correlation between predicted and reference values was 0.89 and the intercept was 3.9. Thus, the model performed much better than expected from the cross-validation results. The predictability can be explained by the facts that the loadings of the first principal component (PC) of the calibration and test samples are very similar and dominated by lignin-related bands, and that most of the variation in the test set can be explained by the first PC. This only explains why the Klason lignin content could be predicted with the model without giving many spectral outliers, but not the good result of the external validation. We show that the latter can be explained by the inverse calibration used for PLSR and that predicted values can be more accurate and precise than the reference values used for calibration.

2020 ◽  
Vol 13 (7) ◽  
pp. 3835-3853 ◽  
Author(s):  
Julius Polz ◽  
Christian Chwala ◽  
Maximilian Graf ◽  
Harald Kunstmann

Abstract. Quantitative precipitation estimation with commercial microwave links (CMLs) is a technique developed to supplement weather radar and rain gauge observations. It is exploiting the relation between the attenuation of CML signal levels and the integrated rain rate along a CML path. The opportunistic nature of this method requires a sophisticated data processing using robust methods. In this study we focus on the processing step of rain event detection in the signal level time series of the CMLs, which we treat as a binary classification problem. This processing step is particularly challenging, because even when there is no rain, the signal level can show large fluctuations similar to that during rainy periods. False classifications can have a high impact on falsely estimated rainfall amounts. We analyze the performance of a convolutional neural network (CNN), which is trained to detect rainfall-specific attenuation patterns in CML signal levels, using data from 3904 CMLs in Germany. The CNN consists of a feature extraction and a classification part with, in total, 20 layers of neurons and 1.4×105 trainable parameters. With a structure inspired by the visual cortex of mammals, CNNs use local connections of neurons to recognize patterns independent of their location in the time series. We test the CNN's ability to recognize attenuation patterns from CMLs and time periods outside the training data. Our CNN is trained on 4 months of data from 800 randomly selected CMLs and validated on 2 different months of data, once for all CMLs and once for the 3104 CMLs not included in the training. No CMLs are excluded from the analysis. As a reference data set, we use the gauge-adjusted radar product RADOLAN-RW provided by the German meteorological service (DWD). The model predictions and the reference data are compared on an hourly basis. Model performance is compared to a state-of-the-art reference method, which uses the rolling standard deviation of the CML signal level time series as a detection criteria. Our results show that within the analyzed period of April to September 2018, the CNN generalizes well to the validation CMLs and time periods. A receiver operating characteristic (ROC) analysis shows that the CNN is outperforming the reference method, detecting on average 76 % of all rainy and 97 % of all nonrainy periods. From all periods with a reference rain rate larger than 0.6 mm h−1, more than 90 % was detected. We also show that the improved event detection leads to a significant reduction of falsely estimated rainfall by up to 51 %. At the same time, the quality of the correctly estimated rainfall is kept at the same level in regards to the Pearson correlation with the radar rainfall. In conclusion, we find that CNNs are a robust and promising tool to detect rainfall-induced attenuation patterns in CML signal levels from a large CML data set covering all of Germany.


2019 ◽  
Author(s):  
Julius Polz ◽  
Christian Chwala ◽  
Maximilian Graf ◽  
Harald Kunstmann

Abstract. Quantitative precipitation estimation with commercial microwave links (CMLs) is a technique developed to supplement weather radar and rain gauge observations. It is exploiting the relation between the attenuation of CML signal levels and the integrated rain rate along a CML path. The opportunistic nature of this method requires a sophisticated data processing using robust methods. In this study we focus on the processing step of rain event detection in the signal level time series of the CMLs, which we treat as a binary classification problem. We analyze the performance of a convolutional neural network (CNN), which is trained to detect rainfall specific attenuation patterns in CML signal levels, using data from 3904 CMLs in Germany. The CNN consists of a feature extraction and a classification part with, in total, 20 layers of neurons and 1.4 x 105 trainable parameters. With a structure, inspired by the visual cortex of mammals, CNNs use local connections of neurons to recognize patterns independent of their location in the time-series. We test the CNNs ability to generalize to CMLs and time periods outside the training data. Our CNN is trained on four months of data from 400 randomly selected CMLs and validated on two different months of data, once for all CMLs and once for the 3504 CMLs not included in the training. No CMLs are excluded from the analysis. As a reference data set we use the gauge adjusted radar product RADOLAN-RW provided by the German meteorological service (DWD). The model predictions and the reference data are compared on an hourly basis. Model performance is compared to a reference method, which uses the rolling standard deviation of the CML signal level time series as a detection criteria. Our results show that within the analyzed period of April to September 2018, the CNN generalizes well to the validation CMLs and time periods. A receiver operating characteristic (ROC) analysis shows that the CNN is outperforming the reference method, detecting on average 87 % of all rainy and 91 % of all non-rainy periods. In conclusion, we find that CNNs are a robust and promising tool to detect rainfall induced attenuation patterns in CML signal levels from a large CML data set covering entire Germany.


Holzforschung ◽  
2020 ◽  
Vol 74 (7) ◽  
pp. 655-662 ◽  
Author(s):  
Ana Alves ◽  
Rita Simões ◽  
José Luís Lousada ◽  
José Lima-Brito ◽  
José Rodrigues

AbstractSoftwood lignin consists mainly of guaiacyl (G) units and low amounts of hydroxyphenyl (H) units. Even in a small percentage, the ratio of H to G (H/G) and the intraspecific variation are crucial wood lignin properties. Analytical pyrolysis (Py) was already successfully used as a reference method to develop a model based on near-infrared (NIR) spectroscopy for the determination of the H/G ratio on Pinus pinaster (Pnb) wood samples. The predicted values of the Pinus sylvestris (Psyl) samples by this model were well correlated (R = 0.91) with the reference data (Py), but with a bias that increased with increasing H/G ratio. Partial least squares regression (PLS-R) models were developed for the prediction of the H/G ratio, dedicated models for Psyl wood samples and common models based on both species (Pnb and Psyl). All the calibration models showed a high coefficient of determination and low errors. The coefficient of determination of the external validation of the dedicated models ranged from 0.92 to 0.96 and for the common models ranged from 0.83 to 0.93. However, the comparison of the predictive ability of the dedicated and common models using the Psyl external validation set showed almost identical predicted values.


TAPPI Journal ◽  
2018 ◽  
Vol 17 (11) ◽  
pp. 611-617
Author(s):  
Sabrina Burkhardt

The traditional kappa number method was developed in 1960 as a way to more quickly determine the level of lignin remaining in a completed or in-progress pulp. A significantly faster approach than the Klason lignin procedure, the kappa number method is based on the reaction of a strong oxidizing agent (KMnO4) with lignin and small amounts of other organic functional groups present in the pulp, such as hexenuronic acid. While the usefulness of the kappa number for providing information about bleaching requirements and pulp properties has arguably transformed the pulp and paper industry, it has been mostly developed for kraft, sulfite, and soda wood pulps. Nonwood species have a different chemical makeup than hardwood or softwood sources. These chemical differ-ences can influence kappa and Klason measurements on the pulp and lead to wide ranges of error. Both original data from Sustainable Fiber Technologies’ sulfur and chlorine-free pulping process and kappa and Klason data from various nonwood pulp literature sources will be presented to challenge the assumption that the kappa number accurately represents lignin content in nonwood pulps.


2019 ◽  
Vol 15 (4) ◽  
pp. 328-340 ◽  
Author(s):  
Apilak Worachartcheewan ◽  
Napat Songtawee ◽  
Suphakit Siriwong ◽  
Supaluk Prachayasittikul ◽  
Chanin Nantasenamat ◽  
...  

Background: Human immunodeficiency virus (HIV) is an infective agent that causes an acquired immunodeficiency syndrome (AIDS). Therefore, the rational design of inhibitors for preventing the progression of the disease is required. Objective: This study aims to construct quantitative structure-activity relationship (QSAR) models, molecular docking and newly rational design of colchicine and derivatives with anti-HIV activity. Methods: A data set of 24 colchicine and derivatives with anti-HIV activity were employed to develop the QSAR models using machine learning methods (e.g. multiple linear regression (MLR), artificial neural network (ANN) and support vector machine (SVM)), and to study a molecular docking. Results: The significant descriptors relating to the anti-HIV activity included JGI2, Mor24u, Gm and R8p+ descriptors. The predictive performance of the models gave acceptable statistical qualities as observed by correlation coefficient (Q2) and root mean square error (RMSE) of leave-one out cross-validation (LOO-CV) and external sets. Particularly, the ANN method outperformed MLR and SVM methods that displayed LOO−CV 2 Q and RMSELOO-CV of 0.7548 and 0.5735 for LOOCV set, and Ext 2 Q of 0.8553 and RMSEExt of 0.6999 for external validation. In addition, the molecular docking of virus-entry molecule (gp120 envelope glycoprotein) revealed the key interacting residues of the protein (cellular receptor, CD4) and the site-moiety preferences of colchicine derivatives as HIV entry inhibitors for binding to HIV structure. Furthermore, newly rational design of colchicine derivatives using informative QSAR and molecular docking was proposed. Conclusion: These findings serve as a guideline for the rational drug design as well as potential development of novel anti-HIV agents.


BMJ Open ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. e040778
Author(s):  
Vineet Kumar Kamal ◽  
Ravindra Mohan Pandey ◽  
Deepak Agrawal

ObjectiveTo develop and validate a simple risk scores chart to estimate the probability of poor outcomes in patients with severe head injury (HI).DesignRetrospective.SettingLevel-1, government-funded trauma centre, India.ParticipantsPatients with severe HI admitted to the neurosurgery intensive care unit during 19 May 2010–31 December 2011 (n=946) for the model development and further, data from same centre with same inclusion criteria from 1 January 2012 to 31 July 2012 (n=284) for the external validation of the model.Outcome(s)In-hospital mortality and unfavourable outcome at 6 months.ResultsA total of 39.5% and 70.7% had in-hospital mortality and unfavourable outcome, respectively, in the development data set. The multivariable logistic regression analysis of routinely collected admission characteristics revealed that for in-hospital mortality, age (51–60, >60 years), motor score (1, 2, 4), pupillary reactivity (none), presence of hypotension, basal cistern effaced, traumatic subarachnoid haemorrhage/intraventricular haematoma and for unfavourable outcome, age (41–50, 51–60, >60 years), motor score (1–4), pupillary reactivity (none, one), unequal limb movement, presence of hypotension were the independent predictors as its 95% confidence interval (CI) of odds ratio (OR)_did not contain one. The discriminative ability (area under the receiver operating characteristic curve (95% CI)) of the score chart for in-hospital mortality and 6 months outcome was excellent in the development data set (0.890 (0.867 to 912) and 0.894 (0.869 to 0.918), respectively), internal validation data set using bootstrap resampling method (0.889 (0.867 to 909) and 0.893 (0.867 to 0.915), respectively) and external validation data set (0.871 (0.825 to 916) and 0.887 (0.842 to 0.932), respectively). Calibration showed good agreement between observed outcome rates and predicted risks in development and external validation data set (p>0.05).ConclusionFor clinical decision making, we can use of these score charts in predicting outcomes in new patients with severe HI in India and similar settings.


2014 ◽  
Vol 44 (7) ◽  
pp. 784-795 ◽  
Author(s):  
Susan J. Prichard ◽  
Eva C. Karau ◽  
Roger D. Ottmar ◽  
Maureen C. Kennedy ◽  
James B. Cronan ◽  
...  

Reliable predictions of fuel consumption are critical in the eastern United States (US), where prescribed burning is frequently applied to forests and air quality is of increasing concern. CONSUME and the First Order Fire Effects Model (FOFEM), predictive models developed to estimate fuel consumption and emissions from wildland fires, have not been systematically evaluated for application in the eastern US using the same validation data set. In this study, we compiled a fuel consumption data set from 54 operational prescribed fires (43 pine and 11 mixed hardwood sites) to assess each model’s uncertainties and application limits. Regions of indifference between measured and predicted values by fuel category and forest type represent the potential error that modelers could incur in estimating fuel consumption by category. Overall, FOFEM predictions have narrower regions of indifference than CONSUME and suggest better correspondence between measured and predicted consumption. However, both models offer reliable predictions of live fuel (shrubs and herbaceous vegetation) and 1 h fine fuels. Results suggest that CONSUME and FOFEM can be improved in their predictive capability for woody fuel, litter, and duff consumption for eastern US forests. Because of their high biomass and potential smoke management problems, refining estimates of litter and duff consumption is of particular importance.


Sensors ◽  
2017 ◽  
Vol 17 (3) ◽  
pp. 559 ◽  
Author(s):  
Alan Bourke ◽  
Espen Ihlen ◽  
Ronny Bergquist ◽  
Per Wik ◽  
Beatrix Vereijken ◽  
...  

2008 ◽  
Vol 178 (2) ◽  
pp. 278-281 ◽  
Author(s):  
Corrado Dimauro ◽  
Piero Bonelli ◽  
Paola Nicolussi ◽  
Salvatore P.G. Rassu ◽  
Aldo Cappio-Borlino ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document