scholarly journals Modelling the probability of capture for New Zealand's longfin eels ('Anguilla dieffenbachii') and shortfin eels ('Anguilla australis')

2021 ◽  
Author(s):  
◽  
Anthony Charsley

<p>Longfin eel and shortfin eel probability of capture models can be used to build probability of capture maps. These maps can help identify eel encounter hotspots in New Zealand and are useful for managing and conserving the species. This research models longfin eel and shortfin eel presence/absence data using regularized random forest (RRF) models, vectorautoregressive spatial-temporal (VAST) models and Bayesian Gaussian random field (GRaF) models. Probability of capture maps built under VAST and GRaF remain approximately consistent with the maps built under RRF models. That is, longfin eels have high probabilities of capture around the coast of New Zealand’s North Island and have low probabilities of capture throughout the centre of New Zealand’s South Island. Shortfin eels have high probabilities of capture in small isolated regions of New Zealand’s North Island and have very low probabilities of capture throughout most of New Zealand’s South Island. Cross validation and spatial cross validation was used to compare the models. Cross validation results show that, compared to RRF models, VAST models improve predictive accuracy for the longfin eel and shortfin eel. Whereas, GRaF only improves predictive performance for the longfin eel. However, spatial cross validation shows no significant difference between VAST and RRF models. Hence, VAST models have higher predictive accuracy than RRF models for the longfin eel and shortfin eel when the training set is spatially correlated to the test set.</p>

2021 ◽  
Author(s):  
◽  
Anthony Charsley

<p>Longfin eel and shortfin eel probability of capture models can be used to build probability of capture maps. These maps can help identify eel encounter hotspots in New Zealand and are useful for managing and conserving the species. This research models longfin eel and shortfin eel presence/absence data using regularized random forest (RRF) models, vectorautoregressive spatial-temporal (VAST) models and Bayesian Gaussian random field (GRaF) models. Probability of capture maps built under VAST and GRaF remain approximately consistent with the maps built under RRF models. That is, longfin eels have high probabilities of capture around the coast of New Zealand’s North Island and have low probabilities of capture throughout the centre of New Zealand’s South Island. Shortfin eels have high probabilities of capture in small isolated regions of New Zealand’s North Island and have very low probabilities of capture throughout most of New Zealand’s South Island. Cross validation and spatial cross validation was used to compare the models. Cross validation results show that, compared to RRF models, VAST models improve predictive accuracy for the longfin eel and shortfin eel. Whereas, GRaF only improves predictive performance for the longfin eel. However, spatial cross validation shows no significant difference between VAST and RRF models. Hence, VAST models have higher predictive accuracy than RRF models for the longfin eel and shortfin eel when the training set is spatially correlated to the test set.</p>


2021 ◽  
Author(s):  
Leopoldo M. Ruiz Maraggi ◽  
Larry W. Lake ◽  
Mark P. Walsh

Abstract A common industry practice is to select a particular model from a set of models to history match oil production and estimate reserves by extrapolation. Future production forecasting is usually done in this deterministic way. However, this approach neglects: a) model uncertainty, and b) quantification of uncertainty of future production forecasts. The current study evaluates the predictive accuracy of rate-time models to forecast production over a set of tight oil wells of West Texas. We present the application of an accuracy metric that evaluates the uncertainty of our models' estimates: the expected log predictive density (elpd). This work assesses the predictive performance of two empirical models—the Arps hyperbolic and the logistic growth models—and two physics-based models—scaled slightly compressible single-phase and scaled two-phase (oil and gas) solutions of the diffusivity equation. These models are arbitrarily selected for the purpose of illustrating the statistical procedure shown in this paper. First, we perform classical regression with the models and evaluate their predictive performance using frequentist (point estimates) metrics such as R2, the Akaike information criteria (AIC), and hindcasting. Second, we generate probabilistic production forecasts using Bayesian inference for each model. Third, we evaluate the predictive accuracy of the models using the elpd accuracy metric. This metric evaluates a measure of out-of-sample predictive performance. We apply both adjusted-within-sample and cross-validation techniques. The adjusted within-sample method is the widely applicable information criteria (WAIC). The cross-validation techniques are hindcasting and leave-one-out (LOO-CV) method. The results of this research are the following. First, we illustrate that the assessment of a model's predictive accuracy depends on whether we use frequentist or Bayesian approaches. This is an important finding in this work. The frequentist approach relies on point estimates while the Bayesian approach considers the uncertainty of our models' estimates. From a frequentist or classical standpoint, all of the models under study yielded very similar results which made it difficult to determine which model yielded the best predictive performance. From a Bayesian standpoint, however, we determined that the logistic growth model yielded a best match in 81 of 130 wells in our sample play and the two-phase physics-based model yielded a best match in 39 of the wells. In addition, we show that WAIC and LOO-CV present similar results for each model, a thing to expect because of their asymptotical equivalence. Finally, Our observations regarding the different models are subject to the dataset under study wherein a majority of the wells are in transient flow. The present study provides tools to evaluate the predictive accuracy of models used to forecast (extrapolate) production of tight oil wells. The elpd is an accuracy metric useful to evaluate the uncertainty of our models' estimates and compare their predictive performance since it assesses distributions instead of point estimates. To our knowledge, the proposed approach is a novel and an appropriate technique to evaluate the predictive accuracy of models to forecast hydrocarbon production.


2001 ◽  
Vol 6 (2) ◽  
pp. 15-28 ◽  
Author(s):  
K. Dučinskas ◽  
J. Šaltytė

The problem of classification of the realisation of the stationary univariate Gaussian random field into one of two populations with different means and different factorised covariance matrices is considered. In such a case optimal classification rule in the sense of minimum probability of misclassification is associated with non-linear (quadratic) discriminant function. Unknown means and the covariance matrices of the feature vector components are estimated from spatially correlated training samples using the maximum likelihood approach and assuming spatial correlations to be known. Explicit formula of Bayes error rate and the first-order asymptotic expansion of the expected error rate associated with quadratic plug-in discriminant function are presented. A set of numerical calculations for the spherical spatial correlation function is performed and two different spatial sampling designs are compared.


2021 ◽  
Vol 12 (2) ◽  
Author(s):  
Mohammad Haekal ◽  
Henki Bayu Seta ◽  
Mayanda Mega Santoni

Untuk memprediksi kualitas air sungai Ciliwung, telah dilakukan pengolahan data-data hasil pemantauan secara Online Monitoring dengan menggunakan Metode Data Mining. Pada metode ini, pertama-tama data-data hasil pemantauan dibuat dalam bentuk tabel Microsoft Excel, kemudian diolah menjadi bentuk Pohon Keputusan yang disebut Algoritma Pohon Keputusan (Decision Tree) mengunakan aplikasi WEKA. Metode Pohon Keputusan dipilih karena lebih sederhana, mudah dipahami dan mempunyai tingkat akurasi yang sangat tinggi. Jumlah data hasil pemantauan kualitas air sungai Ciliwung yang diolah sebanyak 5.476 data. Hasil klarifikasi dengan Pohon Keputusan, dari 5.476 data ini diperoleh jumlah data yang mengindikasikan sungai Ciliwung Tidak Tercemar sebanyak 1.059 data atau sebesar 19,3242%, dan yang mengindikasikan Tercemar sebanyak 4.417 data atau 80,6758%. Selanjutnya data-data hasil pemantauan ini dievaluasi menggunakan 4 Opsi Tes (Test Option) yaitu dengan Use Training Set, Supplied Test Set, Cross-Validation folds 10, dan Percentage Split 66%. Hasil evaluasi dengan 4 opsi tes yang digunakan ini, semuanya menunjukkan tingkat akurasi yang sangat tinggi, yaitu diatas 99%. Dari data-data hasil peneltian ini dapat diprediksi bahwa sungai Ciliwung terindikasi sebagai sungai tercemar bila mereferensi kepada Peraturan Pemerintah Republik Indonesia nomor 82 tahun 2001 dan diketahui pula bahwa penggunaan aplikasi WEKA dengan Algoritma Pohon Keputusan untuk mengolah data-data hasil pemantauan dengan mengambil tiga parameter (pH, DO dan Nitrat) adalah sangat akuran dan tepat. Kata Kunci : Kualitas air sungai, Data Mining, Algoritma Pohon Keputusan, Aplikasi WEKA.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Lisha Yu ◽  
Yang Zhao ◽  
Hailiang Wang ◽  
Tien-Lung Sun ◽  
Terrence E. Murphy ◽  
...  

Abstract Background Poor balance has been cited as one of the key causal factors of falls. Timely detection of balance impairment can help identify the elderly prone to falls and also trigger early interventions to prevent them. The goal of this study was to develop a surrogate approach for assessing elderly’s functional balance based on Short Form Berg Balance Scale (SFBBS) score. Methods Data were collected from a waist-mounted tri-axial accelerometer while participants performed a timed up and go test. Clinically relevant variables were extracted from the segmented accelerometer signals for fitting SFBBS predictive models. Regularized regression together with random-shuffle-split cross-validation was used to facilitate the development of the predictive models for automatic balance estimation. Results Eighty-five community-dwelling older adults (72.12 ± 6.99 year) participated in our study. Our results demonstrated that combined clinical and sensor-based variables, together with regularized regression and cross-validation, achieved moderate-high predictive accuracy of SFBBS scores (mean MAE = 2.01 and mean RMSE = 2.55). Step length, gender, gait speed and linear acceleration variables describe the motor coordination were identified as significantly contributed variables of balance estimation. The predictive model also showed moderate-high discriminations in classifying the risk levels in the performance of three balance assessment motions in terms of AUC values of 0.72, 0.79 and 0.76 respectively. Conclusions The study presented a feasible option for quantitatively accurate, objectively measured, and unobtrusively collected functional balance assessment at the point-of-care or home environment. It also provided clinicians and elderly with stable and sensitive biomarkers for long-term monitoring of functional balance.


Cancers ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 375
Author(s):  
Manish Kohli ◽  
Winston Tan ◽  
Bérengère Vire ◽  
Pierre Liaud ◽  
Mélina Blairvacq ◽  
...  

Precise management of kidney cancer requires the identification of prognostic factors. hPG80 (circulating progastrin) is a tumor promoting peptide present in the blood of patients with various cancers, including renal cell carcinoma (RCC). In this study, we evaluated the prognostic value of plasma hPG80 in 143 prospectively collected patients with metastatic RCC (mRCC). The prognostic impact of hPG80 levels on overall survival (OS) in mRCC patients after controlling for hPG80 levels in non-cancer age matched controls was determined and compared to the International Metastatic Database Consortium (IMDC) risk model (good, intermediate, poor). ROC curves were used to evaluate the diagnostic accuracy of hPG80 using the area under the curve (AUC). Our results showed that plasma hPG80 was detected in 94% of mRCC patients. hPG80 levels displayed high predictive accuracy with an AUC of 0.93 and 0.84 when compared to 18–25 year old controls and 50–80 year old controls, respectively. mRCC patients with high hPG80 levels (>4.5 pM) had significantly lower OS compared to patients with low hPG80 levels (<4.5 pM) (12 versus 31.2 months, respectively; p = 0.0031). Adding hPG80 levels (score of 1 for patients having hPG80 levels > 4.5 pM) to the six variables of the IMDC risk model showed a greater and significant difference in OS between the newly defined good-, intermediate- and poor-risk groups (p = 0.0003 compared to p = 0.0076). Finally, when patients with IMDC intermediate-risk group were further divided into two groups based on hPG80 levels within these subgroups, increased OS were observed in patients with low hPG80 levels (<4.5 pM). In conclusion, our data suggest that hPG80 could be used for prognosticating survival in mRCC alone or integrated to the IMDC score (by adding a variable to the IMDC score or by substratifying the IMDC risk groups), be a prognostic biomarker in mRCC patients.


2019 ◽  
Vol 76 (7) ◽  
pp. 2349-2361
Author(s):  
Benjamin Misiuk ◽  
Trevor Bell ◽  
Alec Aitken ◽  
Craig J Brown ◽  
Evan N Edinger

Abstract Species distribution models are commonly used in the marine environment as management tools. The high cost of collecting marine data for modelling makes them finite, especially in remote locations. Underwater image datasets from multiple surveys were leveraged to model the presence–absence and abundance of Arctic soft-shell clam (Mya spp.) to support the management of a local small-scale fishery in Qikiqtarjuaq, Nunavut, Canada. These models were combined to predict Mya abundance, conditional on presence throughout the study area. Results suggested that water depth was the primary environmental factor limiting Mya habitat suitability, yet seabed topography and substrate characteristics influence their abundance within suitable habitat. Ten-fold cross-validation and spatial leave-one-out cross-validation (LOO CV) were used to assess the accuracy of combined predictions and to test whether this was inflated by the spatial autocorrelation of transect sample data. Results demonstrated that four different measures of predictive accuracy were substantially inflated due to spatial autocorrelation, and the spatial LOO CV results were therefore adopted as the best estimates of performance.


1996 ◽  
Vol 84 (6) ◽  
pp. 1288-1297 ◽  
Author(s):  
James M. Bailey ◽  
Christina T. Mora ◽  
Stephen L. Shafer ◽  

Background Propofol is increasingly used for cardiac anesthesia and for perioperative sedation. Because pharmacokinetic parameters vary among distinct patient populations, rational drug dosing in the cardiac surgery patient is dependent on characterization of the drug's pharmacokinetic parameters in patients actually undergoing cardiac procedures and cardiopulmonary bypass (CPB). In this study, the pharmacokinetics of propofol was characterized in adult patients undergoing coronary revascularization. Methods Anesthesia was induced and maintained by computer-controlled infusions of propofol and alfentanil, or sufentanil, in 41 adult patients undergoing coronary artery bypass graft surgery. Blood samples for determination of plasma propofol concentrations were collected during the predefined study periods and assayed by high-pressure liquid chromatography. Three-compartment model pharmacokinetic parameters were determined by nonlinear extended least-squares regression of pooled data from patients receiving propofol throughout the perioperative period. The effect of CPB on propofol pharmacokinetics was modeled by allowing the parameters to change with the institution and completion of extracorporeal circulation and selecting the optimal model on the basis of the logarithm of the likelihood. Predicted propofol concentrations were calculated by convolving the infusion rates with unit disposition functions using the estimated parameters. The predictive accuracy of the parameters was evaluated by cross-validation and by a prospective comparison of predicted and measured levels in a subset of patients. Results Optimal pharmacokinetic parameters were: central compartment volume = 6.0 l; second compartment volume = 49.5 l; third compartment volume = 429.3 l; Cl1 (elimination clearance) = 0.68 l/min; Cl2 (distribution clearance) = 1.97 l/min1; and Cl3 (distribution clearance) = 0.70 l/min. The effects of CPB were optimally modeled by step changes in V1 and Cl1 to values of 15.9 and 1.95, respectively, with the institution of CPB. Median absolute prediction error was 18% in the cross-validation assessment and 19% in the prospective evaluation. There was no evidence for nonlinear kinetics. Previously published propofol pharmacokinetic parameter sets poorly predicted the observed concentrations in cardiac surgical patients. Conclusions The pharmacokinetics of propofol in adult patients undergoing cardiac surgery with CPB are dissimilar from those reported for other adult patient populations. The effect of CPB was best modeled by an increase in V1 and Cl1. Predictive accuracy of the derived pharmacokinetic parameters was excellent as measured by cross-validation and a prospective test.


2021 ◽  
pp. 1-10
Author(s):  
I. Krug ◽  
J. Linardon ◽  
C. Greenwood ◽  
G. Youssef ◽  
J. Treasure ◽  
...  

Abstract Background Despite a wide range of proposed risk factors and theoretical models, prediction of eating disorder (ED) onset remains poor. This study undertook the first comparison of two machine learning (ML) approaches [penalised logistic regression (LASSO), and prediction rule ensembles (PREs)] to conventional logistic regression (LR) models to enhance prediction of ED onset and differential ED diagnoses from a range of putative risk factors. Method Data were part of a European Project and comprised 1402 participants, 642 ED patients [52% with anorexia nervosa (AN) and 40% with bulimia nervosa (BN)] and 760 controls. The Cross-Cultural Risk Factor Questionnaire, which assesses retrospectively a range of sociocultural and psychological ED risk factors occurring before the age of 12 years (46 predictors in total), was used. Results All three statistical approaches had satisfactory model accuracy, with an average area under the curve (AUC) of 86% for predicting ED onset and 70% for predicting AN v. BN. Predictive performance was greatest for the two regression methods (LR and LASSO), although the PRE technique relied on fewer predictors with comparable accuracy. The individual risk factors differed depending on the outcome classification (EDs v. non-EDs and AN v. BN). Conclusions Even though the conventional LR performed comparably to the ML approaches in terms of predictive accuracy, the ML methods produced more parsimonious predictive models. ML approaches offer a viable way to modify screening practices for ED risk that balance accuracy against participant burden.


2004 ◽  
Vol 1 (1) ◽  
pp. 131-142
Author(s):  
Ljupčo Todorovski ◽  
Sašo Džeroski ◽  
Peter Ljubič

Both equation discovery and regression methods aim at inducing models of numerical data. While the equation discovery methods are usually evaluated in terms of comprehensibility of the induced model, the emphasis of the regression methods evaluation is on their predictive accuracy. In this paper, we present Ciper, an efficient method for discovery of polynomial equations and empirically evaluate its predictive performance on standard regression tasks. The evaluation shows that polynomials compare favorably to linear and piecewise regression models, induced by the existing state-of-the-art regression methods, in terms of degree of fit and complexity.


Sign in / Sign up

Export Citation Format

Share Document