data points Latest Research Papers

Predicting Metabolizable Energy from Digestible Energy for Growing and Finishing Beef Cattle and Relationships to Prediction of Methane

Journal of Animal Science ◽

10.1093/jas/skac013 ◽

2022 ◽

Author(s):

K E Hales ◽

C A Coppin ◽

Z K Smith ◽

Z S McDaniel ◽

L O Tedeschi ◽

...

Keyword(s):

Beef Cattle ◽

Methane Production ◽

Dry Matter ◽

Metabolizable Energy ◽

Finishing Cattle ◽

Digestible Energy ◽

A Value ◽

Wide Range ◽

Data Points ◽

Finishing Beef Cattle

Abstract Reliable predictions of metabolizable energy (ME) from digestible energy (DE) are necessary to prescribe nutrient requirements of beef cattle accurately. A previously developed database that included 87 treatment means from 23 respiration calorimetry studies has been updated to evaluate the efficiency of converting DE to ME by adding 47 treatment means from 11 additional studies. Diets were fed to growing-finishing cattle under individual feeding conditions. A citation-adjusted linear regression equation was developed where dietary ME concentration (Mcal/kg of dry matter [DM]) was the dependent variable and dietary DE concentration (Mcal/kg) was the independent variable: ME = 1.0001 × DE – 0.3926; r 2 = 0.99, root mean square prediction error [RMSPE] = 0.04, P < 0.01 for the intercept and slope). The slope did not differ from unity (95% CI = 0.936 to 1.065); therefore, the intercept (95% CI = -0.567 to -0.218) defines the value of ME predicted from DE. For practical use, we recommend ME = DE – 0.39. Based on the relationship between DE and ME, we calculated the citation-adjusted loss of methane, which yielded a value of 0.2433 Mcal/kg of DMI (SE = 0.0134). This value was also adjusted for the effects of dry matter intake (DMI) above maintenance, yielding a citation-adjusted relationship: CH4, Mcal/kg = 0.3344 – 0.05639 × multiple of maintenance; r 2 = 0.536, RMSPE = 0.0245, P < 0.01 for the intercept and slope). Both the 0.2433 value and the result of the intake-adjusted equation can be multiplied by DMI to yield an estimate of methane production. These two approaches were evaluated using a second, independent database comprising 129 data points from 29 published studies. Four equations in the literature that used DMI or intake energy to predict methane production also were evaluated with the second database. The mean bias was substantially greater for the two new equations, but slope bias was substantially less than noted for the other DMI-based equations. Our results suggest that ME for growing and finishing cattle can be predicted from DE across a wide range of diets, cattle types, and intake levels by simply subtracting a constant from DE. Mean bias associated with our two new methane emission equations suggests that further research is needed to determine whether coefficients to predict methane from DMI could be developed for specific diet types, levels of DMI relative to body weight, or other variables that affect the emission of methane.

Accuracy and predictability of novel MS Excel-based surgically induced astigmatism calculator

Indian Journal of Clinical and Experimental Ophthalmology ◽

10.18231/j.ijceo.2021.144 ◽

2022 ◽

Vol 7 (4) ◽

pp. 712-716

Author(s):

S K Prabhakar ◽

Oshin Middha ◽

Feba Mary George ◽

Uditi Pankaj Kothak ◽

Prashansa Yadav

Keyword(s):

Cataract Surgery ◽

Roc Curve Analysis ◽

Pearson Coefficient ◽

Coefficient Corrélation ◽

Sample Test ◽

Data Points ◽

Induced Astigmatism ◽

Microsoft Office ◽

Surgically Induced ◽

Vector Magnitude

Study of steepening, flattening, clockwise, and counter-clockwise torque effect is indispensable to understand and design surgical induced astigmatism calculator. Hence, in this study by constructing a novel Microsoft Office Excel 2007 based astigmatic calculator following cataract surgery, analysis on the accuracy and predictability evaluated for the performance. Post-cataract surgery patients from May 2019 to January 2020 at a tertiary medical institution recruited for this present study. Based on Pythagoras principle, MS Excel calculator designed and the law of cousins for calculating the vector magnitude and axis respectively. Manual keratometry measurements for pre and postoperative horizontal (Kh) and vertical (Kv) curvatures established, and statistical analysis for the resultant SIA magnitude and axis deduced with Medcalc software comparing with the existing SIA 2.1 version calculator. A total of 29 eyes of 25 patients studied with a mean age of 62.55 (±8.08) years, males contributing to 14 (56%), and right laterally in 17 (58%) eyes. MS Excel and SIA 2.1 versions calculated a mean SIA magnitude of 0.66 (±0.47) D and 0.64 (±0.55) D respectively. Pearson coefficient correlation (r=-0.16, p=0.40), paired-two sample test (t value= 0.11, p= 0.91) and ROC curve analysis (AUC = 0.75, p= 0.34, 95% CI= 0.25 to 0.99) calculated. Regression equation (y = 0.75 + -0.14 x) and limits of agreements (95% CI -0.29 to 0.31) analyzed, and, 95% of data points distributed within ±1.96 SD of the line of equality on Bland-Altman difference plots.The present calculator proclaimed an acceptable accuracy and agreement with a prediction of 0.61 Diopter for every unit change in the magnitude of SIA 2.1 software in addition to consideration of interchangeability.

Manifold Alignment Aware Ants: A Markovian Process for Manifold Extraction

Neural Computation ◽

10.1162/neco_a_01478 ◽

2022 ◽

pp. 1-47

Author(s):

Mohammad Mohammadi ◽

Peter Tino ◽

Kerstin Bunte

Keyword(s):

Background Noise ◽

Globular Clusters ◽

Large Data ◽

Local Alignment ◽

Density Estimator ◽

Data Sets ◽

Real World Data ◽

Data Points ◽

Low Dimensional ◽

Food Seeking

Abstract The presence of manifolds is a common assumption in many applications, including astronomy and computer vision. For instance, in astronomy, low-dimensional stellar structures, such as streams, shells, and globular clusters, can be found in the neighborhood of big galaxies such as the Milky Way. Since these structures are often buried in very large data sets, an algorithm, which can not only recover the manifold but also remove the background noise (or outliers), is highly desirable. While other works try to recover manifolds either by pushing all points toward manifolds or by downsampling from dense regions, aiming to solve one of the problems, they generally fail to suppress the noise on manifolds and remove background noise simultaneously. Inspired by the collective behavior of biological ants in food-seeking process, we propose a new algorithm that employs several random walkers equipped with a local alignment measure to detect and denoise manifolds. During the walking process, the agents release pheromone on data points, which reinforces future movements. Over time the pheromone concentrates on the manifolds, while it fades in the background noise due to an evaporation procedure. We use the Markov chain (MC) framework to provide a theoretical analysis of the convergence of the algorithm and its performance. Moreover, an empirical analysis, based on synthetic and real-world data sets, is provided to demonstrate its applicability in different areas, such as improving the performance of t-distributed stochastic neighbor embedding (t-SNE) and spectral clustering using the underlying MC formulas, recovering astronomical low-dimensional structures, and improving the performance of the fast Parzen window density estimator.

Doubly Robust Crowdsourcing

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.13304 ◽

2022 ◽

Vol 73 ◽

pp. 209-229

Author(s):

Chong Liu ◽

Yu-Xiang Wang

Keyword(s):

Large Scale ◽

Amazon Mechanical Turk ◽

Test Time ◽

Fair Price ◽

Data Points ◽

Label Aggregation ◽

Doubly Robust Estimation ◽

Sheer Size ◽

Statistical Estimation Problem ◽

Doubly Robust

Large-scale labeled dataset is the indispensable fuel that ignites the AI revolution as we see today. Most such datasets are constructed using crowdsourcing services such as Amazon Mechanical Turk which provides noisy labels from non-experts at a fair price. The sheer size of such datasets mandates that it is only feasible to collect a few labels per data point. We formulate the problem of test-time label aggregation as a statistical estimation problem of inferring the expected voting score. By imitating workers with supervised learners and using them in a doubly robust estimation framework, we prove that the variance of estimation can be substantially reduced, even if the learner is a poor approximation. Synthetic and real-world experiments show that by combining the doubly robust approach with adaptive worker/item selection rules, we often need much lower label cost to achieve nearly the same accuracy as in the ideal world where all workers label all data points.

Non-Covalent Interactions Atlas Benchmark Data Sets 5: London Dispersion in an Extended Chemical Space

10.26434/chemrxiv-2022-pl3r8 ◽

2022 ◽

Author(s):

Jan Řezáč

Keyword(s):

Chemical Space ◽

Data Sets ◽

Dft Methods ◽

Data Set ◽

Data Points ◽

Comprehensive Test ◽

Non Covalent Interactions ◽

London Dispersion ◽

Dissociation Curves ◽

Covalent Interactions

The Non-Covalent Interactions Atlas (www.nciatlas.org) has been extended with two data sets of benchmark interaction energies in complexes dominated by London dispersion. The D1200 data set of equilibrium geometries provides a thorough sampling of an extended chemical space, while the D442×10 set features dissociation curves for selected complexes. In total, they provide 5,178 new CCSD(T)/CBS data points of the highest quality. The new data have been combined with previous NCIA data sets in a comprehensive test of dispersion-corrected DFT methods, identifying the ones that achieve high accuracy in all types of non-covalent interactions in a broad chemical space. Additional tests of dispersion-corrected MP2 and semiempirical QM methods are also reported.

Statistical Methods to Improve the Quality of Real-Time Drilling Data

Journal of Energy Resources Technology ◽

10.1115/1.4053519 ◽

2022 ◽

pp. 1-22

Author(s):

Salem Al-Gharbi ◽

Abdulaziz Al-Majed ◽

Abdulazeez Abdulraheem ◽

Zeeshan Tariq ◽

Mohamed Mahmoud

Keyword(s):

Real Time ◽

Input Data ◽

Moving Average ◽

Median Filter ◽

Exponential Smoothing ◽

Time Data ◽

Drilling Data ◽

Real Time Data ◽

Data Points

Abstract The age of easy oil is ending, the industry started drilling in remote unconventional conditions. To help produce safer, faster, and most effective operations, the utilization of artificial intelligence and machine learning (AI/ML) has become essential. Unfortunately, due to the harsh environments of drilling and the data-transmission setup, a significant amount of the real-time data could defect. The quality and effectiveness of AI/ML models are directly related to the quality of the input data; only if the input data are good, the AI/ML generated analytical and prediction models will be good. Improving the real-time data is therefore critical to the drilling industry. The objective of this paper is to propose an automated approach using eight statistical data-quality improvement algorithms on real-time drilling data. These techniques are Kalman filtering, moving average, kernel regression, median filter, exponential smoothing, lowess, wavelet filtering, and polynomial. A dataset of +150,000 rows is fed into the algorithms, and their customizable parameters are calibrated to achieve the best improvement result. An evaluation methodology is developed based on real-time drilling data characteristics to analyze the strengths and weaknesses of each algorithm were highlighted. Based on the evaluation criteria, the best results were achieved using the exponential smoothing, median filter, and moving average. Exponential smoothing and median filter techniques improved the quality of data by removing most of the invalid data points, the moving average removed more invalid data-points but trimmed the data range.

Active Classification With Uncertainty Comparison Queries

Neural Computation ◽

10.1162/neco_a_01473 ◽

2022 ◽

pp. 1-23

Author(s):

Zhenghang Cui ◽

Issei Sato

Keyword(s):

Pairwise Comparison ◽

Current Method ◽

Query Complexity ◽

Data Set ◽

Set Size ◽

Labeling Algorithm ◽

Data Points ◽

Binary Classifiers ◽

Active Classification ◽

Quick Sort

Abstract Noisy pairwise comparison feedback has been incorporated to improve the overall query complexity of interactively learning binary classifiers. The positivity comparison oracle is extensively used to provide feedback on which is more likely to be positive in a pair of data points. Because it is impossible to determine accurate labels using this oracle alone without knowing the classification threshold, existing methods still rely on the traditional explicit labeling oracle, which explicitly answers the label given a data point. The current method conducts sorting on all data points and uses explicit labeling oracle to find the classification threshold. However, it has two drawbacks: (1) it needs unnecessary sorting for label inference and (2) it naively adapts quick sort to noisy feedback. In order to avoid these inefficiencies and acquire information of the classification threshold at the same time, we propose a new pairwise comparison oracle concerning uncertainties. This oracle answers which one has higher uncertainty given a pair of data points. We then propose an efficient adaptive labeling algorithm to take advantage of the proposed oracle. In addition, we address the situation where the labeling budget is insufficient compared to the data set size. Furthermore, we confirm the feasibility of the proposed oracle and the performance of the proposed algorithm theoretically and empirically.

Long-term corrosion monitoring of carbon steels and environmental correlation analysis via the random forest method

npj Materials Degradation ◽

10.1038/s41529-021-00211-3 ◽

2022 ◽

Vol 6 (1) ◽

Author(s):

Qing Li ◽

Xiaojian Xia ◽

Zibo Pei ◽

Xuequn Cheng ◽

Dawei Zhang ◽

...

Keyword(s):

Random Forest ◽

Corrosion Rate ◽

Atmospheric Corrosion ◽

Chloride Concentration ◽

Carbon Steels ◽

Sensor Data ◽

Corrosion Monitoring ◽

Atmospheric Conditions ◽

Data Points

AbstractIn this work, the atmospheric corrosion of carbon steels was monitored at six different sites (and hence, atmospheric conditions) using Fe/Cu-type atmospheric corrosion monitoring technology over a period of 12 months. After analyzing over 3 million data points, the sensor data were interpretable as the instantaneous corrosion rate, and the atmospheric “corrosivity” for each exposure environment showed highly dynamic changes from the C1 to CX level (according to the ISO 9223 standard). A random forest model was developed to predict the corrosion rate and investigate the impacts of ten “corrosive factors” in dynamic atmospheres. The results reveal rust layer, wind speed, rainfall rate, RH, and chloride concentration, played a significant role in the corrosion process.

Alpha particle microdosimetry calculations using a shallow neural network

Physics in Medicine and Biology ◽

10.1088/1361-6560/ac499c ◽

2022 ◽

Author(s):

Peter Wagstaff ◽

Pablo Minguez Gabina ◽

Ricardo Mínguez ◽

John C Roeske

Keyword(s):

Neural Network ◽

Alpha Particle ◽

Transfer Functions ◽

Single Cells ◽

Data Sets ◽

Cell Nuclei ◽

Data Set ◽

Testing Data ◽

Data Points ◽

Trained Neural Network

Abstract A shallow neural network was trained to accurately calculate the microdosimetric parameters, <z1> and <z1 2> (the first and second moments of the single-event specific energy spectra, respectively) for use in alpha-particle microdosimetry calculations. The regression network of four inputs and two outputs was created in MATLAB and trained on a data set consisting of both previously published microdosimetric data and recent Monte Carlo simulations. The input data consisted of the alpha-particle energies (3.97–8.78 MeV), cell nuclei radii (2–10 µm), cell radii (2.5–20 µm), and eight different source-target configurations. These configurations included both single cells in suspension and cells in geometric clusters. The mean square error (MSE) was used to measure the performance of the network. The sizes of the hidden layers were chosen to minimize MSE without overfitting. The final neural network consisted of two hidden layers with 13 and 20 nodes, respectively, each with tangential sigmoid transfer functions, and was trained on 1932 data points. The overall training/validation resulted in a MSE = 3.71×10-7. A separate testing data set included input values that were not seen by the trained network. The final test on 892 separate data points resulted in a MSE = 2.80×10-7. The 95th percentile testing data errors were within ±1.4% for <z1> outputs and ±2.8% for <z1 2> outputs, respectively. Cell survival was also predicted using actual vs. neural network generated microdosimetric moments and showed overall agreement within ±3.5%. In summary, this trained neural network can accurately produce microdosimetric parameters used for the study of alpha-particle emitters. The network can be exported and shared for tests on independent data sets and new calculations.

Simulating Multi-Asset Classes Prices Using Wasserstein Generative Adversarial Network: A Study of Stocks, Futures and Cryptocurrency

Journal of Risk and Financial Management ◽

10.3390/jrfm15010026 ◽

2022 ◽

Vol 15 (1) ◽

pp. 26

Author(s):

Feng Han ◽

Xiaojuan Ma ◽

Jiheng Zhang

Keyword(s):

Stock Market ◽

Original Data ◽

Futures Contracts ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Analysis Methodology ◽

Highly Sensitive ◽

Data Points ◽

Asset Classes ◽

The Difference

Financial data are expensive and highly sensitive with limited access. We aim to generate abundant datasets given the original prices while preserving the original statistical features. We introduce the Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) into the field of the stock market, futures market and cryptocurrency market. We train our model on various datasets, including the Hong Kong stock market, Hang Seng Index Composite stocks, precious metal futures contracts listed on the Chicago Mercantile Exchange and Japan Exchange Group, and cryptocurrency spots and perpetual contracts on Binance at various minute-level intervals. We quantify the difference of generated results (836,280 data points) and original data by MAE, MSE, RMSE and K-S distances. Results show that WGAN-GP can simulate assets prices and show the potential of a market simulator for trading analysis. We might be the first to look into multi-asset classes in a systematic approach with minute intervals across stocks, futures and cryptocurrency markets. We also contribute to quantitative analysis methodology for generated and original price data quality.

data points
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Predicting Metabolizable Energy from Digestible Energy for Growing and Finishing Beef Cattle and Relationships to Prediction of Methane

Accuracy and predictability of novel MS Excel-based surgically induced astigmatism calculator

Manifold Alignment Aware Ants: A Markovian Process for Manifold Extraction

Doubly Robust Crowdsourcing

Non-Covalent Interactions Atlas Benchmark Data Sets 5: London Dispersion in an Extended Chemical Space

Statistical Methods to Improve the Quality of Real-Time Drilling Data

Active Classification With Uncertainty Comparison Queries

Long-term corrosion monitoring of carbon steels and environmental correlation analysis via the random forest method

Alpha particle microdosimetry calculations using a shallow neural network

Simulating Multi-Asset Classes Prices Using Wasserstein Generative Adversarial Network: A Study of Stocks, Futures and Cryptocurrency

Export Citation Format

data pointsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Predicting Metabolizable Energy from Digestible Energy for Growing and Finishing Beef Cattle and Relationships to Prediction of Methane

Accuracy and predictability of novel MS Excel-based surgically induced astigmatism calculator

Manifold Alignment Aware Ants: A Markovian Process for Manifold Extraction

Doubly Robust Crowdsourcing

Non-Covalent Interactions Atlas Benchmark Data Sets 5: London Dispersion in an Extended Chemical Space

Statistical Methods to Improve the Quality of Real-Time Drilling Data

Active Classification With Uncertainty Comparison Queries

Long-term corrosion monitoring of carbon steels and environmental correlation analysis via the random forest method

Alpha particle microdosimetry calculations using a shallow neural network

Simulating Multi-Asset Classes Prices Using Wasserstein Generative Adversarial Network: A Study of Stocks, Futures and Cryptocurrency

data points
Recently Published Documents