subsample size
Recently Published Documents


TOTAL DOCUMENTS

20
(FIVE YEARS 2)

H-INDEX

6
(FIVE YEARS 1)

2019 ◽  
Vol 40 (4) ◽  
pp. 2309-2341 ◽  
Author(s):  
Stefania Bellavia ◽  
Nataša Krejić ◽  
Nataša Krklec Jerinkić

Abstract This paper deals with the minimization of a large sum of convex functions by inexact Newton (IN) methods employing subsampled functions, gradients and Hessian approximations. The conjugate gradient method is used to compute the IN step and global convergence is enforced by a nonmonotone line-search procedure. The aim is to obtain methods with affordable costs and fast convergence. Assuming strongly convex functions, R-linear convergence and worst-case iteration complexity of the procedure are investigated when functions and gradients are approximated with increasing accuracy. A set of rules for the forcing parameters and subsample Hessian sizes are derived that ensure local q-linear/q-superlinear convergence of the proposed method. The random choice of the Hessian subsample is also considered and convergence in the mean square, both for finite and infinite sums of functions, is proved. Finally, the analysis of global convergence with asymptotic R-linear rate is extended to the case of the sum of convex functions and strongly convex objective function. Numerical results on well-known binary classification problems are also given. Adaptive strategies for selecting forcing terms and Hessian subsample size, streaming out of the theoretical analysis, are employed and the numerical results show that they yield effective IN methods.


Author(s):  
Jan Žižka ◽  
Arnošt Svoboda

Customers of various services are often invited to type a summarizing review via an Internet portal. Such reviews, written in natural languages, are typically unstructured, giving also a numeric evaluation within the scale “good” and “bad.” The more reviews, the better feedback can be acquired for improving the service. However, after accumulating massive data, the non-linearly growing processing complexity may exceed the computational abilities to analyze the text contents. Decision tree inducers like c5 can reveal understandable knowledge from data but they need the data as a whole. This article describes an application of windowing, which is a technique for generating dataset subsamples that provide enough information for an inducer to train a classifier and get results similar to those achieved by training a model from the entire dataset. The windowing results, significantly reducing the complexity of the learning problem, are demonstrated using hundreds of thousands reviews written in English by hotel-service customers. A user obtains knowledge represented by significant words. The results show classification accuracy errors, training and testing time, tree sizes, and words relevant for the review meaning in dependence on the training subsample size. Finally, a method of suitable training-set size estimation is suggested.


2014 ◽  
Vol 22 (5) ◽  
pp. 1229-1244 ◽  
Author(s):  
Jonathon K. Parker ◽  
Lawrence O. Hall
Keyword(s):  

2006 ◽  
Vol 89 (4) ◽  
pp. 1004-1011 ◽  
Author(s):  
Guner Ozay ◽  
Ferda Seyhan ◽  
Aysun Yilmaz ◽  
Thomas B Whitaker ◽  
Andrew B Slate ◽  
...  

Abstract The variability associated with the aflatoxin test procedure used to estimate aflatoxin levels in bulk shipments of hazelnuts was investigated. Sixteen 10 kg samples of shelled hazelnuts were taken from each of 20 lots that were suspected of aflatoxin contamination. The total variance associated with testing shelled hazelnuts was estimated and partitioned into sampling, sample preparation, and analytical variance components. Each variance component increased as aflatoxin concentration (either B1 or total) increased. With the use of regression analysis, mathematical expressions were developed to model the relationship between aflatoxin concentration and the total, sampling, sample preparation, and analytical variances. The expressions for these relationships were used to estimate the variance for any sample size, subsample size, and number of analyses for a specific aflatoxin concentration. The sampling, sample preparation, and analytical variances associated with estimating aflatoxin in a hazelnut lot at a total aflatoxin level of 10 ng/g and using a 10 kg sample, a 50 g subsample, dry comminution with a Robot Coupe mill, and a highperformance liquid chromatographic analytical method are 174.40, 0.74, and 0.27, respectively. The sampling, sample preparation, and analytical steps of the aflatoxin test procedure accounted for 99.4, 0.4, and 0.2% of the total variability, respectively.


2006 ◽  
Vol 21 (2) ◽  
pp. 349-351 ◽  
Author(s):  
Michael J. Chimney ◽  
James A. Bowers

2004 ◽  
Vol 32 (5) ◽  
pp. 1981-2027 ◽  
Author(s):  
Soumendra N. Lahiri ◽  
Daniel J. Nordman

2004 ◽  
Vol 87 (4) ◽  
pp. 884-891 ◽  
Author(s):  
Eugenia A Vargas ◽  
Thomas B Whitaker ◽  
Eliene A Santos ◽  
Andrew B Slate ◽  
Francisco B Lima ◽  
...  

Abstract The variability associated with testing lots of green coffee beans for ochratoxin A (OTA) was investigated. Twenty-five lots of green coffee were tested for OTA contamination. The total variance associated with testing green coffee was estimated and partitioned into sampling, sample preparation, and analytical variances. All variances increased with an increase in OTA concentration. Using regression analysis, mathematical expressions were developed to model the relationship between OTA concentration and the total, sampling, sample preparation, and analytical variances. The expressions for these relationships were used to estimate the variance for any sample size, subsample size, and number of analyses for a specific OTA concentration. Testing a lot with 5 μg/kg OTA using a 1 kg sample, Romer RAS mill, 25 g subsamples, and liquid chromatography analysis, the total, sampling, sample preparation, and analytical variances were 10.75 (coefficient of variation [CV] = 65.6%), 7.80 (CV = 55.8%), 2.84 (CV = 33.7%), and 0.11 (CV = 6.6%), respectively. The total variance for sampling, sample preparation, and analytical were 73, 26, and 1%, respectively.


2004 ◽  
Vol 87 (4) ◽  
pp. 943-949 ◽  
Author(s):  
Mary W Trucksess ◽  
Thomas B Whitaker ◽  
Andrew B Slate ◽  
Kristina M Williams ◽  
Vickery A Brewer ◽  
...  

Abstract Peanuts contain proteins that can cause severe allergic reactions in some sensitized individuals. Studies were conducted to determine the percentage of recovery by an enzyme-linked immunosorbent assay (ELISA) method in the analysis for peanuts in energy bars and milk chocolate and to determine the sampling, subsampling, and analytical variances associated with testing energy bars and milk chocolate for peanuts. Food products containing chocolate were selected because their composition makes sample preparation for subsampling difficult. Peanut-contaminated energy bars, noncontaminated energy bars, incurred milk chocolate containing known levels of peanuts, and peanut-free milk chocolate were used. A commercially available ELISA kit was used for analysis. The sampling, sample preparation, and analytical variances associated with each step of the test procedure to measure peanut protein were determined for energy bars. The sample preparation and analytical variances were determined for milk chocolate. Variances were found to be functions of peanut concentration. Sampling and subsampling variability associated with energy bars accounted for 96.6% of the total testing variability. Subsampling variability associated with powdered milk chocolate accounted for >60% of the total testing variability. The variability among peanut test results can be reduced by increasing sample size, subsample size, and number of analyses. For energy bars the effect of increasing sample size from 1 to 4 bars, subsample size from 5 to 20 g, and number of aliquots quantified from 1 to 2 on reducing the sampling, sample preparation, and analytical variance was demonstrated. For powdered milk chocolate, the effects of increasing subsample size from 5 to 20 g and number of aliquots quantified from 1 to 2 on reducing sample preparation and analytical variances were demonstrated. This study serves as a template for application to other foods, and for extrapolation to different sizes of samples and subsamples as well as numbers of analyses.


Sign in / Sign up

Export Citation Format

Share Document