Forecast and error analysis of vegetable production in Haryana by  various modeling techniques

Crop forecasting is a formidable challenge for every nation. The Government of India has developed a number of forecasting systems. The national and state governments need such pre-harvest forecasts for various policy decisions on storage, distribution, pricing, marketing, import-export and many more. In this paper, univariate forecasting models such as random walk, random walk with drift, moving average, simple exponential smoothing and Autoregressive Integrated Moving Average (ARIMA) models are considered and analyzed for their efficiency for forecasting vegetable production in the Haryana state. The State annual data on vegetable production were divided into the training data set from 1966-67 to 2013-14 and the test data set from 2014-15 to 2018-19. Suitable models were selected on the basis of error analysis on the training data and a percent error deviation test on the test data. Model diagnostic checking was carried out on ACF and PACF in residual terms through runs above and below the median, runs up and down and Ljung-Box tests. It is inferred that ARIMA (2,1,1) was found to be optimal and that the forecast values for the years 2019-20 to 2023-24 were estimated on the basis of this model, which were 7.82,8.23,8.72,9.2 and 9.72 million tonnes for the year 2019-20 to 2023-24, respectively. The significance of the mode is that we can forecast the values using this best fit model and forecast values are very important for the policymakers and other government agencies for proper policy decision regarding food security.

Download Full-text

Synthetic Sonic Log Generation With Machine Learning: A Contest Summary From Five Methods

Petrophysics – The SPWLA Journal of Formation Evaluation and Reservoir Description ◽

10.30632/pjv62n4-2021a4 ◽

2021 ◽

Vol 62 (4) ◽

pp. 393-406

Author(s):

Yanxiang Yu ◽

◽

Chicheng Xu ◽

Siddharth Misra ◽

Weichang Li ◽

...

Keyword(s):

Machine Learning ◽

Test Data ◽

Short Term Memory ◽

Rock Physics ◽

Training Data ◽

Machine Learning Techniques ◽

Blind Test ◽

Data Set ◽

Benchmark Model ◽

Sonic Log

Compressional and shear sonic traveltime logs (DTC and DTS, respectively) are crucial for subsurface characterization and seismic-well tie. However, these two logs are often missing or incomplete in many oil and gas wells. Therefore, many petrophysical and geophysical workflows include sonic log synthetization or pseudo-log generation based on multivariate regression or rock physics relations. Started on March 1, 2020, and concluded on May 7, 2020, the SPWLA PDDA SIG hosted a contest aiming to predict the DTC and DTS logs from seven “easy-to-acquire” conventional logs using machine-learning methods (GitHub, 2020). In the contest, a total number of 20,525 data points with half-foot resolution from three wells was collected to train regression models using machine-learning techniques. Each data point had seven features, consisting of the conventional “easy-to-acquire” logs: caliper, neutron porosity, gamma ray (GR), deep resistivity, medium resistivity, photoelectric factor, and bulk density, respectively, as well as two sonic logs (DTC and DTS) as the target. The separate data set of 11,089 samples from a fourth well was then used as the blind test data set. The prediction performance of the model was evaluated using root mean square error (RMSE) as the metric, shown in the equation below: RMSE=sqrt(1/2*1/m* [∑_(i=1)^m▒〖(〖DTC〗_pred^i-〖DTC〗_true^i)〗^2 + 〖(〖DTS〗_pred^i-〖DTS〗_true^i)〗^2 ] In the benchmark model, (Yu et al., 2020), we used a Random Forest regressor and conducted minimal preprocessing to the training data set; an RMSE score of 17.93 was achieved on the test data set. The top five models from the contest, on average, beat the performance of our benchmark model by 27% in the RMSE score. In the paper, we will review these five solutions, including preprocess techniques and different machine-learning models, including neural network, long short-term memory (LSTM), and ensemble trees. We found that data cleaning and clustering were critical for improving the performance in all models.

Download Full-text

Data Analysis With Shapley Values For Automatic Subject Selection in Alzheimer's Disease Data Sets Using Interpretable Machine Learning

10.21203/rs.3.rs-245707/v1 ◽

2021 ◽

Author(s):

Louise Bloch ◽

Christoph M. Friedrich

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Test Data ◽

Noisy Data ◽

Training Data ◽

Data Sets ◽

Data Set ◽

Model Interpretation ◽

Percentage Points ◽

Shapley Values

Abstract Background: The prediction of whether Mild Cognitive Impaired (MCI) subjects will prospectively develop Alzheimer's Disease (AD) is important for the recruitment and monitoring of subjects for therapy studies. Machine Learning (ML) is suitable to improve early AD prediction. The etiology of AD is heterogeneous, which leads to noisy data sets. Additional noise is introduced by multicentric study designs and varying acquisition protocols. This article examines whether an automatic and fair data valuation method based on Shapley values can identify subjects with noisy data. Methods: An ML-workow was developed and trained for a subset of the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. The validation was executed for an independent ADNI test data set and for the Australian Imaging, Biomarker and Lifestyle Flagship Study of Ageing (AIBL) cohort. The workow included volumetric Magnetic Resonance Imaging (MRI) feature extraction, subject sample selection using data Shapley, Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) for model training and Kernel SHapley Additive exPlanations (SHAP) values for model interpretation. This model interpretation enables clinically relevant explanation of individual predictions. Results: The XGBoost models which excluded 116 of the 467 subjects from the training data set based on their Logistic Regression (LR) data Shapley values outperformed the models which were trained on the entire training data set and which reached a mean classification accuracy of 58.54 % by 14.13 % (8.27 percentage points) on the independent ADNI test data set. The XGBoost models, which were trained on the entire training data set reached a mean accuracy of 60.35 % for the AIBL data set. An improvement of 24.86 % (15.00 percentage points) could be reached for the XGBoost models if those 72 subjects with the smallest RF data Shapley values were excluded from the training data set. Conclusion: The data Shapley method was able to improve the classification accuracies for the test data sets. Noisy data was associated with the number of ApoEϵ4 alleles and volumetric MRI measurements. Kernel SHAP showed that the black-box models learned biologically plausible associations.

Download Full-text

Personal Adaptive Method to Assess Mental Tension during Daily Life Using Heart Rate Variability

Methods of Information in Medicine ◽

10.3414/me11-01-0027 ◽

2012 ◽

Vol 51 (01) ◽

pp. 39-44 ◽

Cited By ~ 11

Author(s):

K. Matsuoka ◽

K. Yoshino

Keyword(s):

Heart Rate ◽

Heart Rate Variability ◽

Linear Regression ◽

Multiple Linear Regression ◽

Test Data ◽

Daily Life ◽

Pearson Correlation ◽

Multiple Linear Regression Model ◽

Training Data ◽

Data Set

SummaryObjectives: The aim of this study is to present a method of assessing psychological tension that is optimized to every individual on the basis of the heart rate variability (HRV) data which, to eliminate the influence of the inter-individual variability, are measured in a long time period during daily life.Methods: HRV and body accelerations were recorded from nine normal subjects for two months of normal daily life. Fourteen HRV indices were calculated with the HRV data at 512 seconds prior to the time of every mental tension level report. Data to be analyzed were limited to those with body accelerations of 30 mG (0.294 m/s2) and lower. Further, the differences from the reference values in the same time zone were calculated with both the mental tension score (Δtension) and HRV index values (ΔHRVI). The multiple linear regression model that estimates Δtension from the scores for principal components of ΔHRVI were then constructed for each individual. The data were divided into training data set and test data set in accordance with the twofold cross validation method. Multiple linear regression coefficients were determined using the training data set, and with the optimized model its generalization capability was checked using the test data set.Results: The subjects’ mean Pearson correlation coefficient was 0.52 with the training data set and 0.40 with the test data set. The subjects’ mean coefficient of determination was 0.28 with the training data set and 0.11 with the test data set.Conclusion: We proposed a method of assessing psychological tension that is optimized to every individual based on HRV data measured over a long period of daily life.

Download Full-text

Towards improving the accuracy of aortic transvalvular pressure gradients: rethinking Bernoulli

Medical & Biological Engineering & Computing ◽

10.1007/s11517-020-02186-w ◽

2020 ◽

Vol 58 (8) ◽

pp. 1667-1679

Author(s):

Benedikt Franke ◽

J. Weese ◽

I. Waechter-Stehle ◽

J. Brüning ◽

T. Kuehne ◽

...

Keyword(s):

Test Data ◽

Ground Truth ◽

Training Data ◽

Patient Specific ◽

Pressure Gradients ◽

Bernoulli Model ◽

Bernoulli Equation ◽

Data Set ◽

Non Invasive ◽

Adjusted Model

Abstract The transvalvular pressure gradient (TPG) is commonly estimated using the Bernoulli equation. However, the method is known to be inaccurate. Therefore, an adjusted Bernoulli model for accurate TPG assessment was developed and evaluated. Numerical simulations were used to calculate TPGCFD in patient-specific geometries of aortic stenosis as ground truth. Geometries, aortic valve areas (AVA), and flow rates were derived from computed tomography scans. Simulations were divided in a training data set (135 cases) and a test data set (36 cases). The training data was used to fit an adjusted Bernoulli model as a function of AVA and flow rate. The model-predicted TPGModel was evaluated using the test data set and also compared against the common Bernoulli equation (TPGB). TPGB and TPGModel both correlated well with TPGCFD (r > 0.94), but significantly overestimated it. The average difference between TPGModel and TPGCFD was much lower: 3.3 mmHg vs. 17.3 mmHg between TPGB and TPGCFD. Also, the standard error of estimate was lower for the adjusted model: SEEModel = 5.3 mmHg vs. SEEB = 22.3 mmHg. The adjusted model’s performance was more accurate than that of the conventional Bernoulli equation. The model might help to improve non-invasive assessment of TPG.

Download Full-text

Diagnostic assessment of a deep learning system for detecting atrial fibrillation in pulse waveforms

Heart ◽

10.1136/heartjnl-2018-313147 ◽

2018 ◽

Vol 104 (23) ◽

pp. 1921-1928 ◽

Cited By ~ 36

Author(s):

Ming-Zher Poh ◽

Yukkee Cheung Poh ◽

Pak-Hei Chan ◽

Chun-Ka Wong ◽

Louise Pun ◽

...

Keyword(s):

Atrial Fibrillation ◽

Deep Learning ◽

Test Data ◽

Predictive Value ◽

Characteristic Curve ◽

Performance Comparison ◽

Learning System ◽

Training Data ◽

Validation Data ◽

Data Set

ObjectiveTo evaluate the diagnostic performance of a deep learning system for automated detection of atrial fibrillation (AF) in photoplethysmographic (PPG) pulse waveforms.MethodsWe trained a deep convolutional neural network (DCNN) to detect AF in 17 s PPG waveforms using a training data set of 149 048 PPG waveforms constructed from several publicly available PPG databases. The DCNN was validated using an independent test data set of 3039 smartphone-acquired PPG waveforms from adults at high risk of AF at a general outpatient clinic against ECG tracings reviewed by two cardiologists. Six established AF detectors based on handcrafted features were evaluated on the same test data set for performance comparison.ResultsIn the validation data set (3039 PPG waveforms) consisting of three sequential PPG waveforms from 1013 participants (mean (SD) age, 68.4 (12.2) years; 46.8% men), the prevalence of AF was 2.8%. The area under the receiver operating characteristic curve (AUC) of the DCNN for AF detection was 0.997 (95% CI 0.996 to 0.999) and was significantly higher than all the other AF detectors (AUC range: 0.924–0.985). The sensitivity of the DCNN was 95.2% (95% CI 88.3% to 98.7%), specificity was 99.0% (95% CI 98.6% to 99.3%), positive predictive value (PPV) was 72.7% (95% CI 65.1% to 79.3%) and negative predictive value (NPV) was 99.9% (95% CI 99.7% to 100%) using a single 17 s PPG waveform. Using the three sequential PPG waveforms in combination (<1 min in total), the sensitivity was 100.0% (95% CI 87.7% to 100%), specificity was 99.6% (95% CI 99.0% to 99.9%), PPV was 87.5% (95% CI 72.5% to 94.9%) and NPV was 100% (95% CI 99.4% to 100%).ConclusionsIn this evaluation of PPG waveforms from adults screened for AF in a real-world primary care setting, the DCNN had high sensitivity, specificity, PPV and NPV for detecting AF, outperforming other state-of-the-art methods based on handcrafted features.

Download Full-text

Political regimes and foreign aid effectiveness in Ghana

International Journal of Development Issues ◽

10.1108/ijdi-02-2018-0029 ◽

2019 ◽

Vol 18 (1) ◽

pp. 15-33 ◽

Cited By ~ 2

Author(s):

Vincent Konadu Tawiah ◽

Evans John Barnes ◽

Prince Acheampong ◽

Ofori Yaw

Keyword(s):

Economic Growth ◽

Foreign Aid ◽

Political Ideology ◽

Aid Effectiveness ◽

Political Agenda ◽

Annual Data ◽

Political Regimes ◽

Data Set ◽

Content Type ◽

The Government

Purpose This paper has examined the effectiveness of foreign aid on Ghanaian economy under different political regimes. Design/methodology/approach Using vector error correction and co-integration models on the annual data set over a period of 35 years, the authors demonstrate that foreign aid has had varied impacts on economic growth depending on the political ideology of the government in power. Findings With capitalist political philosophy, foreign aid improves private sector growth through infrastructural development. On the other hand, a government with socialist philosophy applies most of its foreign aid in direct social interventions with the view of improving human capital. Thus, each political party is likely to seek foreign aid/grant that will support its political agenda. Overall, the results show that foreign aid has a positive impact on the growth of the Ghanaian economy when there is good macroeconomic environment. Practical implications This implies that the country experiences economic growth when there are sound economic policies to apply foreign aid. Originality/value The practical implication of the findings of this paper is that donor countries and agencies should consider the philosophy of the government in power while granting aid to recipient countries, especially in Africa. The results are robust to different proxies and models.

Download Full-text

DEPTH ESTIMATION OF SHALLOW WATER USING MULTISPECTRAL SATELLITE IMAGERY SENTINEL-2A

Jurnal Segara ◽

10.15578/segara.v16i3.8562 ◽

2020 ◽

Vol 16 (3) ◽

Author(s):

Arip Rahman

Keyword(s):

Shallow Water ◽

Test Data ◽

Remote Sensing Data ◽

Depth Estimation ◽

Training Data ◽

Coefficient Of Determination ◽

Support Vector ◽

Data Set ◽

Svm Algorithm ◽

Sentinel 2A

Shallow water bathymetry estimation from remote sensing data has been increasing widespread, as an alternative to traditional bathymetry measurement that has disturbed by technical and logistic problem. Deriving bathymetry data from Sentinel 2A images, at visible wavelength (blue, green and red) 10 meter spatial resolution was carried out around the waters of the Kemujan Island Karimunjawa National Park Central Java. Amount of 1280 points data are used as training data sets and 854 points data as test data set produced from sounding. Dark Object Substraction (DOS) has been to correct atmospherically the Sentinel-2A images. Several algorithm has been applied to derive bathymetry data, including: linear transform, ratio transform and support vector machine (SVM). The highest correlation between depth prediction and observe resulted from SVM algorithm with a coefficient of determination (R2) 0.71 (training data) and 0.56 (test data). The assessment of the accuracy of the three methods using RMSE and MAE values, the SVM algorithm has the smallest value (< 1 m). This indicates that the SVM algorithm has a high accuracy compared to the other two methods. The bathymetry map derived from Sentinel 2A imagery cannot be used as a reference for navigation.

Download Full-text

Validating the genomic signature of pediatric septic shock

Physiological Genomics ◽

10.1152/physiolgenomics.00025.2008 ◽

2008 ◽

Vol 34 (1) ◽

pp. 127-134 ◽

Cited By ~ 59

Author(s):

Natalie Cvijanovich ◽

Thomas P. Shanley ◽

Richard Lin ◽

Geoffrey L. Allen ◽

Neal J. Thomas ◽

...

Keyword(s):

Septic Shock ◽

Test Data ◽

Training Data ◽

Lymphocyte Function ◽

Data Set ◽

Class Prediction ◽

Expression Signature ◽

Genome Wide ◽

Genome Wide Expression ◽

Normal Controls

We previously generated genome-wide expression data (microarray) from children with septic shock having the potential to lead the field into novel areas of investigation. Herein we seek to validate our data through a bioinformatic approach centered on a validation patient cohort. Forty-two children with a clinical diagnosis of septic shock and 15 normal controls served as the training data set, while 30 separate children with septic shock and 14 separate normal controls served as the test data set. Class prediction modeling using the training data set and the previously reported genome-wide expression signature of pediatric septic shock correctly identified 95–100% of controls and septic shock patients in the test data set, depending on the class prediction algorithm and the gene selection method. Subjecting the test data set to an identical filtering strategy as that used for the training data set, demonstrated 75% concordance between the two gene lists. Subjecting the test data set to a purely statistical filtering strategy, with highly stringent correction for multiple comparisons, demonstrated <50% concordance with the previous gene filtering strategy. However, functional analysis of this statistics-based gene list demonstrated similar functional annotations and signaling pathways as that seen in the training data set. In particular, we validated that pediatric septic shock is characterized by large-scale repression of genes related to zinc homeostasis and lymphocyte function. These data demonstrate that the previously reported genome-wide expression signature of pediatric septic shock is applicable to a validation cohort of patients.

Download Full-text

ARTIFICIAL NEURAL NETWORK BASED APPROACH TO EEG SIGNAL SIMULATION

International Journal of Neural Systems ◽

10.1142/s0129065712500086 ◽

2012 ◽

Vol 22 (03) ◽

pp. 1250008 ◽

Cited By ~ 12

Author(s):

NIKOLA M. TOMASEVIC ◽

ALEKSANDAR M. NESKOVIC ◽

NATASA J. NESKOVIC

Keyword(s):

Moving Average ◽

Training Data ◽

Autoregressive Moving Average ◽

Eeg Signal ◽

Data Set ◽

Eeg Data ◽

Simulation Based ◽

Signal Simulation ◽

Artificial Neural ◽

The One

In this paper a new approach to the electroencephalogram (EEG) signal simulation based on the artificial neural networks (ANN) is proposed. The aim was to simulate the spontaneous human EEG background activity based solely on the experimentally acquired EEG data. Therefore, an EEG measurement campaign was conducted on a healthy awake adult in order to obtain an adequate ANN training data set. As demonstration of the performance of the ANN based approach, comparisons were made against autoregressive moving average (ARMA) filtering based method. Comprehensive quantitative and qualitative statistical analysis showed clearly that the EEG process obtained by the proposed method was in satisfactory agreement with the one obtained by measurements.

Download Full-text

Pathogenic Variation in Colletotrichum gloeosporioides Infecting Stylosanthes spp. in a Center of Diversity in Brazil

Phytopathology ◽

10.1094/phyto.2002.92.5.553 ◽

2002 ◽

Vol 92 (5) ◽

pp. 553-562 ◽

Cited By ~ 12

Author(s):

S. Chakraborty ◽

C. D. Fernandes ◽

M. J. d' A. Charchar ◽

M. R. Thomas

Keyword(s):

Test Data ◽

Colletotrichum Gloeosporioides ◽

Germ Plasm ◽

Training Data ◽

Data Set ◽

Linear Discriminant ◽

Pathogenic Variation ◽

Wild Host ◽

Center Of Diversity ◽

New Races

Pathogenic variation in Colletotrichum gloeosporioides infecting species of the tropical pasture legume Stylosanthes at its center of diversity was determined from 296 isolates collected from wild host population and selected germ plasm of S. capitata, S. guianensis, S. scabra, and S. macrocephala in Brazil. A putative host differential set comprising 11 accessions was selected from a bioassay of 18 isolates on 19 host accessions using principal component analysis. A similar analysis of anthracnose severity data for a subset of 195 isolates on the 11 differentials indicated that an adequate summary of pathogenic variation could be obtained using only five of these differentials. Of the five differentials, S. seabrana ‘Primar’ was resistant and S. scabra ‘Fitzroy’ was susceptible to most isolates. A cluster analysis was used to determine eight natural race clusters using the 195 isolates. Linear discriminant functions were developed for eight race clusters using the 195 isolates as the training data set, and these were applied to classify a test data set of the remaining 101 isolates. All except 11 isolates of the test data set were classified into one of the eight race clusters. Over 10% of the 296 isolates were weakly pathogenic to all five differentials and another 40% were virulent on just one differential. The unclassified isolates represent six new races with unique virulence combinations, of which one isolate is virulent on all five differentials. The majority of isolates came from six field sites, and Shannon's index of diversity indicated considerable variation between sites. Pathogenic diversity was extensive at three sites where selected germ plasm were under evaluation, and complex race clusters and unclassified isolates representing new races were more prevalent at these sites compared with sites containing wild Stylosanthes populations.

Download Full-text