Sequential Sampling for Estimation and Classification of the Incidence of Hop Powdery Mildew II: Cone Sampling

Sequential sampling models for estimation and classification of the incidence of powdery mildew (caused by Podosphaera macularis) on hop (Humulus lupulus) cones were developed using parameter estimates of the binary power law derived from the analysis of 221 transect data sets (model construction data set) collected from 41 hop yards sampled in Oregon and Washington from 2000 to 2005. Stop lines, models that determine when sufficient information has been collected to estimate mean disease incidence and stop sampling, for sequential estimation were validated by bootstrap simulation using a subset of 21 model construction data sets and simulated sampling of an additional 13 model construction data sets. Achieved coefficient of variation (C) approached the prespecified C as the estimated disease incidence, [Formula: see text], increased, although achieving a C of 0.1 was not possible for data sets in which [Formula: see text] < 0.03 with the number of sampling units evaluated in this study. The 95% confidence interval of the median difference between [Formula: see text] of each yard (achieved by sequential sampling) and the true p of the original data set included 0 for all 21 data sets evaluated at levels of C of 0.1 and 0.2. For sequential classification, operating characteristic (OC) and average sample number (ASN) curves of the sequential sampling plans obtained by bootstrap analysis and simulated sampling were similar to the OC and ASN values determined by Monte Carlo simulation. Correct decisions of whether disease incidence was above or below prespecified thresholds (pt) were made for 84.6 or 100% of the data sets during simulated sampling when stop lines were determined assuming a binomial or beta-binomial distribution of disease incidence, respectively. However, the higher proportion of correct decisions obtained by assuming a beta-binomial distribution of disease incidence required, on average, sampling 3.9 more plants per sampling round to classify disease incidence compared with the binomial distribution. Use of these sequential sampling plans may aid growers in deciding the order in which to harvest hop yards to minimize the risk of a condition called “cone early maturity” caused by late-season infection of cones by P. macularis. Also, sequential sampling could aid in research efforts, such as efficacy trials, where many hop cones are assessed to determine disease incidence.

Download Full-text

Sequential Sampling for Estimation and Classification of the Incidence of Hop Powdery Mildew I: Leaf Sampling

Plant Disease ◽

10.1094/pdis-91-8-1002 ◽

2007 ◽

Vol 91 (8) ◽

pp. 1002-1012 ◽

Cited By ~ 13

Author(s):

David H. Gent ◽

William W. Turechek ◽

Walter F. Mahaffee

Keyword(s):

Monte Carlo ◽

Powdery Mildew ◽

Monte Carlo Simulations ◽

Binomial Distribution ◽

Disease Incidence ◽

Sequential Sampling ◽

Data Sets ◽

Sampling Plans ◽

Binomial Approximation

Hop powdery mildew (caused by Podosphaera macularis) is an important disease of hops (Humulus lupulus) in the Pacific Northwest. Sequential sampling models for estimation and classification of the incidence of powdery mildew on leaves of hop were developed based on the beta-binomial distribution, using parameter estimates of the binary power law determined in previous studies. Stop lines, models that indicate that enough information has been collected to estimate disease incidence and cease sampling, for sequential estimation were validated by bootstrap simulations of a select group of 18 data sets (out of a total of 198 data sets) from the model-construction data, and through simulated sampling of 104 data sets collected independently (i.e., validation data sets). The achieved coefficient of variation (C) approached prespecified C values as the achieved disease incidence ([Formula: see text]) increased. Achieving a C of 0.1 was not possible for data sets in which [Formula: see text] < 0.10. The 95% confidence interval of the median difference between the true p and [Formula: see text] included zero for 16 of 18 data sets evaluated at C = 0.2 and all data sets when C = 0.1. For sequential classification, Monte-Carlo simulations were used to determine the probability of classifying mean disease incidence as less than a threshold incidence, pt (operating characteristic [OC]), and average sample number (ASN) curves for 16 combinations of candidate stop lines and error levels (α and β). Four pairs of stop lines were selected for further evaluation based on the results of the Monte-Carlo simulations. Bootstrap simulations of the 18 selected data sets indicated that the OC and ASN curves of the sequential sampling plans for each of the four sets of stop lines were similar to OC and ASN values determined by Monte Carlo simulation. Correct classification of disease incidence as being above or below preselected thresholds was 2.0 to 7.7% higher when stop lines were determined by the beta-binomial approximation than when stop lines were calculated using the binomial distribution. Correct decision rates differed depending on the location where sampling was initiated in the hop yard; however, in all instances were greater than 86% when stop lines were determined using the beta-binomial approximation. The sequential sampling plans evaluated in this study should allow for rapid and accurate estimation and classification of the incidence of hop leaves with powdery mildew, and aid in sampling for pest management decision making.

Download Full-text

Spatial Heterogeneity of the Incidence of Powdery Mildew on Hop Cones

Plant Disease ◽

10.1094/pd-90-1433 ◽

2006 ◽

Vol 90 (11) ◽

pp. 1433-1440 ◽

Cited By ~ 17

Author(s):

David H. Gent ◽

Walter F. Mahaffee ◽

William W. Turechek

Keyword(s):

Powdery Mildew ◽

Spatial Heterogeneity ◽

Power Law ◽

Binomial Distribution ◽

Disease Incidence ◽

Disease Assessment ◽

Cluster Sampling ◽

Data Sets ◽

Ratio Test ◽

Frequency Distributions

The spatial heterogeneity of the incidence of hop cones with powdery mildew (Podosphaera macularis) was characterized from transect surveys of 41 commercial hop yards in Oregon and Washington from 2000 to 2005. The proportion of sampled cones with powdery mildew ( p) was recorded for each of 221 transects, where N = 60 sampling units of n = 25 cones assessed in each transect according to a cluster sampling strategy. Disease incidence ranged from 0 to 0.92 among all yards and dates. The binomial and beta-binomial frequency distributions were fit to the N sampling units in a transect using maximum likelihood. The estimation procedure converged for 74% of the data sets where p > 0, and a loglikelihood ratio test indicated that the beta-binomial distribution provided a better fit to the data than the binomial distribution for 46% of the data sets, indicating an aggregated pattern of disease. Similarly, the C(α) test indicated that 54% could be described by the beta-binomial distribution. The heterogeneity parameter of the beta-binomial distribution, θ, a measure of variation among sampling units, ranged from 0.01 to 0.20, with a mean of 0.037 and a median of 0.015. Estimates of the index of dispersion ranged from 0.79 to 7.78, with a mean of 1.81 and a median of 1.37, and were significantly greater than 1 for 54% of the data sets. The binary power law provided an excellent fit to the data, with slope and intercept parameters significantly greater than 1, which indicated that heterogeneity varied systematically with the incidence of infected cones. A covariance analysis indicated that the geographic location (region) of the yards and the type of hop cultivar had little effect on heterogeneity; however, the year of sampling significantly influenced the intercept and slope parameters of the binary power law. Significant spatial autocorrelation was detected in only 11% of the data sets, with estimates of first-order autocorrelation, r1, ranging from -0.30 to 0.70, with a mean of 0.06 and a median of 0.04; however, correlation was detected in only 20 and 16% of the data sets by median and ordinary runs analysis, respectively. Together, these analyses suggest that the incidence of powdery mildew on cones was slightly aggregated among plants, but patterns of aggregation larger than the sampling unit were rare (20% or less of data sets). Knowledge of the heterogeneity of diseased cones was used to construct fixed sampling curves to precisely estimate the incidence of powdery mildew on cones at varying disease intensities. Use of the sampling curves developed in this research should help to improve sampling methods for disease assessment and management decisions.

Download Full-text

Spatial Pattern of Strawberry Powdery Mildew (Podosphaera aphanis) and Airborne Inoculum

Plant Disease ◽

10.1094/pdis-10-12-0946-re ◽

2014 ◽

Vol 98 (1) ◽

pp. 43-54 ◽

Cited By ~ 11

Author(s):

H. Van der Heyden ◽

M. Lefebvre ◽

L. Roberge ◽

L. Brodeur ◽

O. Carisse

Keyword(s):

Powdery Mildew ◽

Power Law ◽

Negative Binomial Distribution ◽

Binomial Distribution ◽

Negative Binomial ◽

Disease Incidence ◽

Data Sets ◽

Incidence Data ◽

Airborne Inoculum ◽

The Relationship

The relationship between strawberry powdery mildew and airborne conidium concentration (ACC) of Podosphaera aphanis was studied using data collected from 2006 to 2009 in 15 fields, and spatial pattern was described using 2 years of airborne inoculum and disease incidence data collected in fields planted with the June-bearing strawberry (Fragaria × ananassa) cultivar Jewel. Disease incidence, expressed as the proportion of diseased leaflets, and ACC were monitored in fields divided into 3 × 8 grids containing 24 100 m2 quadrats. Variance-to-mean ratio, index of dispersion, negative binomial distribution, Poisson distribution, and binomial and beta-binomial distributions were used to characterize the level of spatial heterogeneity. The relationship between percent leaf area diseased and daily ACC was linear, while the relationship between ACC and disease incidence followed an exponential growth curve. The V/M ratios were significantly greater than 1 for 100 and 96% of the sampling dates for ACC sampled at 0.35 m from the ground (ACC0.35m) and for ACC sampled at 1.0 m from the ground (ACC1.0m), respectively. For disease incidence, the index of dispersion D was significantly greater than 1 for 79% of the sampling dates. The negative binomial distribution fitted 86% of the data sets for both ACC1.0m and ACC0.35m. For disease incidence data, the beta-binomial distribution provided a good fit of 75% of the data sets. Taylor's power law indicated that, for ACC at both sampling heights, heterogeneity increased with increasing mean ACC, whereas the binary form of the power law suggested that heterogeneity was not dependent on the mean for disease incidence. When the spatial location of each sampling location was taken into account, Spatial Analysis by Distance Indices showed low aggregation indices for both ACCs and disease incidence, and weak association between ACC and disease incidence. Based on these analyses, it was found that the distribution of strawberry powdery mildew was weakly aggregated. Although a higher level of heterogeneity was observed for airborne inoculum, the heterogeneity was low with no distinct foci, suggesting that epidemics are induced by well-distributed inoculum. This low level of heterogeneity allows mean airborne inoculum concentration to be estimated using only one sampler per field with an overall accuracy of at least 0.841. The results obtained in this study could be used to develop a sampling scheme that will improve strawberry powdery mildew risk estimation.

Download Full-text

Analysis of Fire Blight Shoot Infection Epidemics on Apple

Plant Disease ◽

10.1094/pdis-92-9-1349 ◽

2008 ◽

Vol 92 (9) ◽

pp. 1349-1356 ◽

Cited By ~ 7

Author(s):

Alan R. Biggs ◽

William W. Turechek ◽

Tim R. Gottwald

Keyword(s):

Power Law ◽

Binomial Distribution ◽

Fire Blight ◽

Disease Incidence ◽

High Rate ◽

Parameter Estimates ◽

Data Sets ◽

Full Data ◽

Data Set ◽

Shoot Blight

Fire blight incidence and spread of the shoot blight phase of the disease was studied in four apple cultivars in replicated blocks over 4 years (1994 to 1997). Cv. York was highly susceptible, followed by ‘Fuji’ and ‘Golden Delicious,’ which were moderately susceptible, and ‘Liberty,’ which was least susceptible. On York, the first appearance of shoot blight was within 48 h of its predicted appearance according to the Maryblyt model in 3 of the 4 years studied. Shoot blight epidemics in York in 1995 and 1996, and Fuji in 1995, were best described with a logistic model that showed apparent infection rates ranging from 0.05 to 0.20, indicating a low to moderately high rate of disease increase. The spatial positions (row and column) of all infected plants in each subplot were recorded on plot maps on each sampling date. The binomial and β-binomial distributions were fit to the data to test for spatial aggregation of disease incidence for each cultivar plot. Maximum likelihood estimation was possible for 92 (43.6%) of the 211 data sets subjected to this analysis. Of these, 35 data sets were better fit by the β-binomial distribution than the binomial distribution. The binary power law was used to characterize the relationship between the variance among quadrats within each plot to the variance expected for that plot given the observed level of disease incidence. The binary power law provided an excellent fit to the full data set and to nearly all of the subsets and, with b > 1, indicated that heterogeneity changed systematically with disease incidence. A covariance analysis was conducted to determine the effect of the factors ‘year,’ ‘cultivar,’ ‘orchard plot,’ and ‘observation date’ on the intercept and slope parameters of the binary power law. In general, plot followed by year had the greatest impact on parameter estimates and is an indication that location and seasonal factors impact heterogeneity of disease, although the specifics could not be ascertained from this study. Ordinary runs analysis was used to analyze the pattern of diseased trees within rows and detected significant nonrandom patterns of disease incidence in 63.5% of the orchard plots over the 4-year study. From these data sets, 68.7% had significantly fewer runs, particularly at disease incidences greater than 0.1. The fewer-than-expected runs at incidences greater than 0.10 provides strong evidence of localized spread.

Download Full-text

Classification of jujube defects in small data sets based on transfer learning

Neural Computing and Applications ◽

10.1007/s00521-021-05715-2 ◽

2021 ◽

Author(s):

Jianping Ju ◽

Hong Zheng ◽

Xiaohang Xu ◽

Zhongyuan Guo ◽

Zhaohui Zheng ◽

...

Keyword(s):

Transfer Learning ◽

Loss Function ◽

Training Model ◽

Parameter Distribution ◽

Test Accuracy ◽

Small Data ◽

Data Sets ◽

Data Set ◽

Small Data Sets

AbstractAlthough convolutional neural networks have achieved success in the field of image classification, there are still challenges in the field of agricultural product quality sorting such as machine vision-based jujube defects detection. The performance of jujube defect detection mainly depends on the feature extraction and the classifier used. Due to the diversity of the jujube materials and the variability of the testing environment, the traditional method of manually extracting the features often fails to meet the requirements of practical application. In this paper, a jujube sorting model in small data sets based on convolutional neural network and transfer learning is proposed to meet the actual demand of jujube defects detection. Firstly, the original images collected from the actual jujube sorting production line were pre-processed, and the data were augmented to establish a data set of five categories of jujube defects. The original CNN model is then improved by embedding the SE module and using the triplet loss function and the center loss function to replace the softmax loss function. Finally, the depth pre-training model on the ImageNet image data set was used to conduct training on the jujube defects data set, so that the parameters of the pre-training model could fit the parameter distribution of the jujube defects image, and the parameter distribution was transferred to the jujube defects data set to complete the transfer of the model and realize the detection and classification of the jujube defects. The classification results are visualized by heatmap through the analysis of classification accuracy and confusion matrix compared with the comparison models. The experimental results show that the SE-ResNet50-CL model optimizes the fine-grained classification problem of jujube defect recognition, and the test accuracy reaches 94.15%. The model has good stability and high recognition accuracy in complex environments.

Download Full-text

Massive Data Classification of Neural Responses

Advances in Medical Technologies and Clinical Practice - Biomedical Diagnostics and Clinical Technologies ◽

10.4018/978-1-60566-280-0.ch009 ◽

2010 ◽

pp. 278-298

Author(s):

Pedro Tomás ◽

IST TU Lisbon ◽

Aleksandar Ilic ◽

Leonel Sousa

Keyword(s):

Execution Time ◽

Data Parallelism ◽

Data Sets ◽

Neural Responses ◽

Neuronal Responses ◽

Data Set ◽

Web Interfaces ◽

Mass Classification ◽

Neuronal Code

When analyzing the neuronal code, neuroscientists usually perform extra-cellular recordings of neuronal responses (spikes). Since the size of the microelectrodes used to perform these recordings is much larger than the size of the cells, responses from multiple neurons are recorded by each micro-electrode. Thus, the obtained response must be classified and evaluated, in order to identify how many neurons were recorded, and to assess which neuron generated each spike. A platform for the mass-classification of neuronal responses is proposed in this chapter, employing data-parallelism for speeding up the classification of neuronal responses. The platform is built in a modular way, supporting multiple web-interfaces, different back-end environments for parallel computing or different algorithms for spike classification. Experimental results on the proposed platform show that even for an unbalanced data set of neuronal responses the execution time was reduced of about 45%. For balanced data sets, the platform may achieve a reduction in execution time equal to the inverse of the number of back-end computational elements.

Download Full-text

Bagging Approach for Medical Plants Recognition Based on Their DNA Sequences

International Journal of Social Ecology and Sustainable Development ◽

10.4018/ijsesd.2018100103 ◽

2018 ◽

Vol 9 (4) ◽

pp. 45-60

Author(s):

Mohamed Elhadi Rahmani ◽

Abdelmalek Amine ◽

Reda Mohamed Hamou

Keyword(s):

Dna Sequences ◽

Majority Vote ◽

Data Sets ◽

Data Set ◽

Drug Production ◽

Medical Plants

Many drugs in modern medicines originate from plants and the first step in drug production, is the recognition of plants needed for this purpose. This article presents a bagging approach for medical plants recognition based on their DNA sequences. In this work, the authors have developed a system that recognize DNA sequences of 14 medical plants, first they divided the 14-class data set into bi class sub-data sets, then instead of using an algorithm to classify the 14-class data set, they used the same algorithm to classify the sub-data sets. By doing so, they have simplified the problem of classification of 14 plants into sub-problems of bi class classification. To construct the subsets, the authors extracted all possible pairs of the 14 classes, so they gave each class more chances to be well predicted. This approach allows the study of the similarity between DNA sequences of a plant with each other plants. In terms of results, the authors have obtained very good results in which the accuracy has been doubled (from 45% to almost 80%). Classification of a new sequence was completed according to majority vote.

Download Full-text

Sampling for Plant Disease Incidence

Phytopathology ◽

10.1094/phyto.1999.89.11.1088 ◽

1999 ◽

Vol 89 (11) ◽

pp. 1088-1103 ◽

Cited By ~ 76

Author(s):

L. V. Madden ◽

G. Hughes

Keyword(s):

Power Law ◽

Binomial Distribution ◽

Disease Incidence ◽

Sequential Sampling ◽

Cluster Sampling ◽

Diseased Plant ◽

Ratio Test ◽

Wide Range ◽

Sequential Probability ◽

The Relationship

Knowledge of the distribution of diseased plant units (such as leaves, plants, or roots) or of the relationship between the variance and mean incidence is essential to efficiently sample for diseased plant units. Cluster sampling, consisting of N sampling units of n individuals each, is needed to determine whether the binomial or beta-binomial distribution describes the data or to estimate parameters of the binary power law for disease incidence. The precision of estimated disease incidence can then be evaluated under a wide range of settings including the hierarchical sampling of groups of individuals, the various levels of spatial heterogeneity of disease, and the situation when all individuals are disease free. Precision, quantified with the standard error or the width of the confidence interval for incidence, is directly related to N and inversely related to the degree of heterogeneity (characterized by the intracluster correlation, ρ). Based on direct estimates of ρ (determined from the θ parameter of the beta-binomial distribution or from the observed variance) or a model predicting ρ as a function of incidence (derived from the binary power law), one can calculate, before a sampling bout, the value of N needed to achieve a desired level of precision. The value of N can also be determined during a sampling bout using sequential sampling methods, either to estimate incidence with desired precision or to test a hypothesis about true disease incidence. In the latter case, the sequential probability ratio test is shown here to be useful for classifying incidence relative to a hypothesized threshold when the data follows the beta-binomial distribution with either a fixed ρ or a ρ that depends on incidence.

Download Full-text

Modified Deep Neural Networks for Dog Breeds Identification

10.20944/preprints201812.0232.v1 ◽

2018 ◽

Cited By ~ 1

Author(s):

Aydin Ayanzadeh ◽

Sahand Vahidnia

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

State Of The Art ◽

The State ◽

Fine Tuning ◽

Test Accuracy ◽

Data Sets ◽

Data Set

In this paper, we leverage state of the art models on Imagenet data-sets. We use the pre-trained model and learned weighs to extract the feature from the Dog breeds identification data-set. Afterwards, we applied fine-tuning and dataaugmentation to increase the performance of our test accuracy in classification of dog breeds datasets. The performance of the proposed approaches are compared with the state of the art models of Image-Net datasets such as ResNet-50, DenseNet-121, DenseNet-169 and GoogleNet. we achieved 89.66% , 85.37% 84.01% and 82.08% test accuracy respectively which shows thesuperior performance of proposed method to the previous works on Stanford dog breeds datasets.

Download Full-text

Supernova Host Galaxy Association and Photometric Classification of over 10,000 Light Curves from the Zwicky Transient Facility

Research Notes of the AAS ◽

10.3847/2515-5172/ac416e ◽

2021 ◽

Vol 5 (12) ◽

pp. 283

Author(s):

Braden Garretson ◽

Dan Milisavljevic ◽

Jack Reynolds ◽

Kathryn E. Weil ◽

Bhagya Subrayan ◽

...

Keyword(s):

Value Added ◽

Light Curves ◽

Host Galaxy ◽

Massive Data ◽

Data Sets ◽

Data Set ◽

Scale Modeling ◽

Final Data ◽

Type Ia

Abstract Here we present a catalog of 12,993 photometrically-classified supernova-like light curves from the Zwicky Transient Facility, along with candidate host galaxy associations. By training a random forest classifier on spectroscopically classified supernovae from the Bright Transient Survey, we achieve an accuracy of 80% across four supernova classes resulting in a final data set of 8208 Type Ia, 2080 Type II, 1985 Type Ib/c, and 720 SLSN. Our work represents a pathfinder effort to supply massive data sets of supernova light curves with value-added information that can be used to enable population-scale modeling of explosion parameters and investigate host galaxy environments.

Download Full-text