An inductive approach to characterize physical, chemical, and biological system interactions in a 5th order river basin

Author(s):  
Adam Ward ◽  
Jennifer Drummond ◽  
Angang Li ◽  
Anna Lupon ◽  
Marie Kurz ◽  
...  

<p>Research in the river corridor commonly focuses in two study designs. One research strategy focuses on physical, chemical, and/or biological dynamics and feedbacks, emphasizing local variation and interaction over larger-scale context. A second study design focuses on gradients arising in response to non-local controls (e.g., climate, tectonic setting), with an emphasis on broad trends over smaller-scale “noise”. Here, we present a comprehensive set of measurements and calculated metrics describing physical, chemical, and biological conditions collected at 62 sites in the river corridor within a 5<sup>th</sup> order basin including more than 150 variables at each site. The size and scope of this data set allows us to assess which variables have spatial structure in the basin using spatial semivariograms and regressions with discharge and drainage area. We ask how physical, chemical, and biological sub-systems co-vary using a principal components analyses. Next, we explain both spatial structure and local variance simultaneously using support vector machine regression techniques that reveal possible nonlinear, multivariate relationships that may direct future research. Key outcomes from this study include (1) an introduction to an open-source, comprehensive characterization of the river corridor, (2) interpretations of both broad trends and local variance in the river corridor, and (3) a summary of which metrics have the most explanatory power and why within the study system.</p>

Author(s):  
Tobias Grundgeiger ◽  
Katharina Beckh ◽  
Oliver Happel

Anesthesiologists work in complex work environments where optimal scanning of information is critical for patient safety. The Salience, Effort, Expectancy, Value (SEEV) model can be used to model attention distributions of individuals. We used an existing data set of eye tracking data of anesthesiologists inducing general anesthesia to (1) develop a method for considering the effort parameter in the model in such an environment and (2) investigate the explanatory power of an EEV model compared to an EV model. To operationalize effort, we created a 3D model using Unreal Engine 4. We used Markov Chain Monte Carlo simulations to obtain EV and EEV model predictions. The inclusion of effort did not yield an advantage over the model which did not include effort. We discuss methodological considerations for future research and suggest to simultaneously consider salience and effort to be able to assess the role of effort more accurately.


2019 ◽  
Vol 11 (4) ◽  
pp. 1567-1581 ◽  
Author(s):  
Adam S. Ward ◽  
Jay P. Zarnetske ◽  
Viktor Baranov ◽  
Phillip J. Blaen ◽  
Nicolai Brekenfeld ◽  
...  

Abstract. A comprehensive set of measurements and calculated metrics describing physical, chemical, and biological conditions in the river corridor is presented. These data were collected in a catchment-wide, synoptic campaign in the H. J. Andrews Experimental Forest (Cascade Mountains, Oregon, USA) in summer 2016 during low-discharge conditions. Extensive characterization of 62 sites including surface water, hyporheic water, and streambed sediment was conducted spanning 1st- through 5th-order reaches in the river network. The objective of the sample design and data acquisition was to generate a novel data set to support scaling of river corridor processes across varying flows and morphologic forms present in a river network. The data are available at https://doi.org/10.4211/hs.f4484e0703f743c696c2e1f209abb842 (Ward, 2019).


2019 ◽  
Author(s):  
Adam S. Ward ◽  
Jay P. Zarnetske ◽  
Viktor Baranov ◽  
Phillip J. Blaen ◽  
Nicolai Brekenfeld ◽  
...  

Abstract. A comprehensive set of measurements and calculated metrics describing physical, chemical, and biological conditions in the river corridor is presented. These data were collected in a catchment-wide, synoptic campaign in Lookout Creek within the H.J. Andrews Experimental Forest (Cascade Mountains, Oregon, USA) in summer 2016 during low discharge conditions. Extensive characterization of 62 sites including surface water, hyporheic water, and streambed sediment was conducted spanning 1st through 5th order reaches in the river network. The objective of the sample design and data acquisition was to generate a novel data set to support scaling of river corridor processes across varying flows and morphologic forms present in a river network. The data are available at http://www.hydroshare.org/resource/f4484e0703f743c696c2e1f209abb842 (Ward, 2019).


2018 ◽  
Vol 26 (5) ◽  
pp. 297-310 ◽  
Author(s):  
Justin Sexton ◽  
Yvette Everingham ◽  
David Donald ◽  
Steve Staunton ◽  
Ronald White

On-line near infrared (NIR) spectroscopic analysis systems play an important role in assessing the quality of sugarcane in Australia. As quality measures are used to calculate the payment made to growers, it is imperative that NIR models are both accurate and robust. Machine learning and non-linear modelling approaches have been explored as methods for developing improved NIR models in a variety of industrial settings, yet there has been little research into their application to cane quality measures. The objective of this paper was to compare chemometric models of commercial cane sugar (CCS) based on four calibration techniques. CCS was estimated using partial least squares regression (PLS), support vector regression (SVR), artificial neural networks (ANNs) and gradient boosted trees (GBTs). Model performance was assessed on an independent validation data set using root mean square error of prediction (RMSEP) and r2 values. SVR (RMSEP = 0.37%; r2 = 0.92) and ANN (RMSEP = 0.36%; r2 = 0.93) performed similarly to PLS (RMSEP = 0.37%; r2 = 0.92) on the validation data set, while GBT exhibited a much lower skill (RMSEP = 0.51%; r2 = 0.85). Analysis of important wavelengths in each model showed that PLS regression, SVR and ANN techniques emphasized the importance of similar spectral regions. Future research should consider testing model robustness over seasons and/or regions. Comparisons of chemometric models should consider reporting variable importance as a way of understanding how models use spectral information.


2020 ◽  
Vol 27 (4) ◽  
pp. 329-336 ◽  
Author(s):  
Lei Xu ◽  
Guangmin Liang ◽  
Baowen Chen ◽  
Xu Tan ◽  
Huaikun Xiang ◽  
...  

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.


2019 ◽  
Vol 21 (9) ◽  
pp. 662-669 ◽  
Author(s):  
Junnan Zhao ◽  
Lu Zhu ◽  
Weineng Zhou ◽  
Lingfeng Yin ◽  
Yuchen Wang ◽  
...  

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.


2019 ◽  
Vol 15 (4) ◽  
pp. 328-340 ◽  
Author(s):  
Apilak Worachartcheewan ◽  
Napat Songtawee ◽  
Suphakit Siriwong ◽  
Supaluk Prachayasittikul ◽  
Chanin Nantasenamat ◽  
...  

Background: Human immunodeficiency virus (HIV) is an infective agent that causes an acquired immunodeficiency syndrome (AIDS). Therefore, the rational design of inhibitors for preventing the progression of the disease is required. Objective: This study aims to construct quantitative structure-activity relationship (QSAR) models, molecular docking and newly rational design of colchicine and derivatives with anti-HIV activity. Methods: A data set of 24 colchicine and derivatives with anti-HIV activity were employed to develop the QSAR models using machine learning methods (e.g. multiple linear regression (MLR), artificial neural network (ANN) and support vector machine (SVM)), and to study a molecular docking. Results: The significant descriptors relating to the anti-HIV activity included JGI2, Mor24u, Gm and R8p+ descriptors. The predictive performance of the models gave acceptable statistical qualities as observed by correlation coefficient (Q2) and root mean square error (RMSE) of leave-one out cross-validation (LOO-CV) and external sets. Particularly, the ANN method outperformed MLR and SVM methods that displayed LOO−CV 2 Q and RMSELOO-CV of 0.7548 and 0.5735 for LOOCV set, and Ext 2 Q of 0.8553 and RMSEExt of 0.6999 for external validation. In addition, the molecular docking of virus-entry molecule (gp120 envelope glycoprotein) revealed the key interacting residues of the protein (cellular receptor, CD4) and the site-moiety preferences of colchicine derivatives as HIV entry inhibitors for binding to HIV structure. Furthermore, newly rational design of colchicine derivatives using informative QSAR and molecular docking was proposed. Conclusion: These findings serve as a guideline for the rational drug design as well as potential development of novel anti-HIV agents.


2020 ◽  
Vol 16 (8) ◽  
pp. 1088-1105
Author(s):  
Nafiseh Vahedi ◽  
Majid Mohammadhosseini ◽  
Mehdi Nekoei

Background: The poly(ADP-ribose) polymerases (PARP) is a nuclear enzyme superfamily present in eukaryotes. Methods: In the present report, some efficient linear and non-linear methods including multiple linear regression (MLR), support vector machine (SVM) and artificial neural networks (ANN) were successfully used to develop and establish quantitative structure-activity relationship (QSAR) models capable of predicting pEC50 values of tetrahydropyridopyridazinone derivatives as effective PARP inhibitors. Principal component analysis (PCA) was used to a rational division of the whole data set and selection of the training and test sets. A genetic algorithm (GA) variable selection method was employed to select the optimal subset of descriptors that have the most significant contributions to the overall inhibitory activity from the large pool of calculated descriptors. Results: The accuracy and predictability of the proposed models were further confirmed using crossvalidation, validation through an external test set and Y-randomization (chance correlations) approaches. Moreover, an exhaustive statistical comparison was performed on the outputs of the proposed models. The results revealed that non-linear modeling approaches, including SVM and ANN could provide much more prediction capabilities. Conclusion: Among the constructed models and in terms of root mean square error of predictions (RMSEP), cross-validation coefficients (Q2 LOO and Q2 LGO), as well as R2 and F-statistical value for the training set, the predictive power of the GA-SVM approach was better. However, compared with MLR and SVM, the statistical parameters for the test set were more proper using the GA-ANN model.


2020 ◽  
Vol 44 (8) ◽  
pp. 851-860
Author(s):  
Joy Eliaerts ◽  
Natalie Meert ◽  
Pierre Dardenne ◽  
Vincent Baeten ◽  
Juan-Antonio Fernandez Pierna ◽  
...  

Abstract Spectroscopic techniques combined with chemometrics are a promising tool for analysis of seized drug powders. In this study, the performance of three spectroscopic techniques [Mid-InfraRed (MIR), Raman and Near-InfraRed (NIR)] was compared. In total, 364 seized powders were analyzed and consisted of 276 cocaine powders (with concentrations ranging from 4 to 99 w%) and 88 powders without cocaine. A classification model (using Support Vector Machines [SVM] discriminant analysis) and a quantification model (using SVM regression) were constructed with each spectral dataset in order to discriminate cocaine powders from other powders and quantify cocaine in powders classified as cocaine positive. The performances of the models were compared with gas chromatography coupled with mass spectrometry (GC–MS) and gas chromatography with flame-ionization detection (GC–FID). Different evaluation criteria were used: number of false negatives (FNs), number of false positives (FPs), accuracy, root mean square error of cross-validation (RMSECV) and determination coefficients (R2). Ten colored powders were excluded from the classification data set due to fluorescence background observed in Raman spectra. For the classification, the best accuracy (99.7%) was obtained with MIR spectra. With Raman and NIR spectra, the accuracy was 99.5% and 98.9%, respectively. For the quantification, the best results were obtained with NIR spectra. The cocaine content was determined with a RMSECV of 3.79% and a R2 of 0.97. The performance of MIR and Raman to predict cocaine concentrations was lower than NIR, with RMSECV of 6.76% and 6.79%, respectively and both with a R2 of 0.90. The three spectroscopic techniques can be applied for both classification and quantification of cocaine, but some differences in performance were detected. The best classification was obtained with MIR spectra. For quantification, however, the RMSECV of MIR and Raman was twice as high in comparison with NIR. Spectroscopic techniques combined with chemometrics can reduce the workload for confirmation analysis (e.g., chromatography based) and therefore save time and resources.


Author(s):  
Jing Qi ◽  
Kun Xu ◽  
Xilun Ding

AbstractHand segmentation is the initial step for hand posture recognition. To reduce the effect of variable illumination in hand segmentation step, a new CbCr-I component Gaussian mixture model (GMM) is proposed to detect the skin region. The hand region is selected as a region of interest from the image using the skin detection technique based on the presented CbCr-I component GMM and a new adaptive threshold. A new hand shape distribution feature described in polar coordinates is proposed to extract hand contour features to solve the false recognition problem in some shape-based methods and effectively recognize the hand posture in cases when different hand postures have the same number of outstretched fingers. A multiclass support vector machine classifier is utilized to recognize the hand posture. Experiments were carried out on our data set to verify the feasibility of the proposed method. The results showed the effectiveness of the proposed approach compared with other methods.


Sign in / Sign up

Export Citation Format

Share Document