A Quadratic Classification Rule with Equicorrelated Training Vectors for Non Random Samples

SummaryThe availability of factor VIII concentrates is frequently a limitation in the management of classical hemophilia. Such concentrates are prepared from fresh or fresh-frozen plasma. A significant volume of plasma in the United States becomes “indated”, i. e., in contact with red blood cells for 24 hours at 4°, and is therefore not used to prepare factor VIII concentrates. To evaluate this possible resource, partially purified factor VIII was prepared from random samples of fresh-frozen, indated and outdated plasma. The yield of factor VIII protein and procoagulant activity from indated plasma was about the same as that from fresh-frozen plasma. The yield from outdated plasma was substantially less. After further purification, factor VIII from the three sources gave a single subunit band when reduced and analyzed by sodium dodecyl sulfate polyacrylamide gel electrophoresis. These results indicate that the approximately 287,000 liters of indated plasma processed annually by the American National Red Cross (ANRC) could be used to prepare factor VIII concentrates of good quality. This resource alone could quadruple the supply of factor VIII available for therapy.

Download Full-text

Comparison of Nonlinear Spatial Correlation Models by the Influence of the Data Augmentation to the Classification Risk

Nonlinear Analysis Modelling and Control ◽

10.15388/na.2002.7.1.15200 ◽

2002 ◽

Vol 7 (1) ◽

pp. 31-42

Author(s):

J. Šaltytė ◽

K. Dučinskas

Keyword(s):

Spatial Correlation ◽

Random Fields ◽

Data Augmentation ◽

Gaussian Random Fields ◽

Classification Rule ◽

Numerical Comparison ◽

First Order ◽

Bayesian Risk ◽

Correlation Models

The Bayesian classification rule used for the classification of the observations of the (second-order) stationary Gaussian random fields with different means and common factorised covariance matrices is investigated. The influence of the observed data augmentation to the Bayesian risk is examined for three different nonlinear widely applicable spatial correlation models. The explicit expression of the Bayesian risk for the classification of augmented data is derived. Numerical comparison of these models by the variability of Bayesian risk in case of the first-order neighbourhood scheme is performed.

Download Full-text

Quadratic Discriminant Analysis of Spatially Correlated Data

Nonlinear Analysis Modelling and Control ◽

10.15388/na.2001.6.1.15212 ◽

2001 ◽

Vol 6 (2) ◽

pp. 15-28 ◽

Cited By ~ 2

Author(s):

K. Dučinskas ◽

J. Šaltytė

Keyword(s):

Discriminant Function ◽

Error Rate ◽

Gaussian Random Field ◽

Correlated Data ◽

Covariance Matrices ◽

Classification Rule ◽

Spatial Correlations ◽

Linear Quadratic ◽

Spatial Correlation Function ◽

Spatially Correlated

The problem of classification of the realisation of the stationary univariate Gaussian random field into one of two populations with different means and different factorised covariance matrices is considered. In such a case optimal classification rule in the sense of minimum probability of misclassification is associated with non-linear (quadratic) discriminant function. Unknown means and the covariance matrices of the feature vector components are estimated from spatially correlated training samples using the maximum likelihood approach and assuming spatial correlations to be known. Explicit formula of Bayes error rate and the first-order asymptotic expansion of the expected error rate associated with quadratic plug-in discriminant function are presented. A set of numerical calculations for the spherical spatial correlation function is performed and two different spatial sampling designs are compared.

Download Full-text

gbt-HIPS: Explaining the Classifications of Gradient Boosted Tree Ensembles

Applied Sciences ◽

10.3390/app11062511 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2511

Author(s):

Julian Hatwell ◽

Mohamed Medhat Gaber ◽

R. Muhammad Atif Azad

Keyword(s):

State Of The Art ◽

Heuristic Method ◽

Good Explanation ◽

Classification Rule ◽

Data Sets ◽

Classification Models ◽

Boundary Values ◽

Class Label ◽

Input Space ◽

Boosted Tree

This research presents Gradient Boosted Tree High Importance Path Snippets (gbt-HIPS), a novel, heuristic method for explaining gradient boosted tree (GBT) classification models by extracting a single classification rule (CR) from the ensemble of decision trees that make up the GBT model. This CR contains the most statistically important boundary values of the input space as antecedent terms. The CR represents a hyper-rectangle of the input space inside which the GBT model is, very reliably, classifying all instances with the same class label as the explanandum instance. In a benchmark test using nine data sets and five competing state-of-the-art methods, gbt-HIPS offered the best trade-off between coverage (0.16–0.75) and precision (0.85–0.98). Unlike competing methods, gbt-HIPS is also demonstrably guarded against under- and over-fitting. A further distinguishing feature of our method is that, unlike much prior work, our explanations also provide counterfactual detail in accordance with widely accepted recommendations for what makes a good explanation.

Download Full-text

Estimating epidemiologic dynamics from cross-sectional viral load distributions

Science ◽

10.1126/science.abh0635 ◽

2021 ◽

pp. eabh0635

Author(s):

James A. Hay ◽

Lee Kennedy-Shaffer ◽

Sanjat Kanjilal ◽

Niall J. Lennon ◽

Stacey B. Gabriel ◽

...

Keyword(s):

Population Distribution ◽

Quantitative Polymerase Chain Reaction ◽

Cross Sectional ◽

Outbreak Management ◽

Viral Loads ◽

Time Estimates ◽

Load Distributions ◽

Random Samples ◽

Combining Data ◽

Polymerase Chain

Estimating an epidemic’s trajectory is crucial for developing public health responses to infectious diseases, but case data used for such estimation are confounded by variable testing practices. We show that the population distribution of viral loads observed under random or symptom-based surveillance, in the form of cycle threshold (Ct) values obtained from reverse-transcription quantitative polymerase chain reaction testing, changes during an epidemic. Thus, Ct values from even limited numbers of random samples can provide improved estimates of an epidemic’s trajectory. Combining data from multiple such samples improves the precision and robustness of such estimation. We apply our methods to Ct values from surveillance conducted during the SARS-CoV-2 pandemic in a variety of settings and offer alternative approaches for real-time estimates of epidemic trajectories for outbreak management and response.

Download Full-text

Classification rule learning using subgroup discovery of cross-domain attributes responsible for design-silicon mismatch

Proceedings of the 47th Design Automation Conference on - DAC '10 ◽

10.1145/1837274.1837368 ◽

2010 ◽

Cited By ~ 9

Author(s):

Nicholas Callegari ◽

Dragoljub (Gagi) Drmanac ◽

Li-C. Wang ◽

Magdy S. Abadir

Keyword(s):

Rule Learning ◽

Classification Rule ◽

Subgroup Discovery ◽

Cross Domain

Download Full-text

Non-random sampling and association tests on realized returns and risk proxies

Review of Accounting Studies ◽

10.1007/s11142-021-09581-0 ◽

2021 ◽

Author(s):

Frank Ecker ◽

Jennifer Francis ◽

Per Olsson ◽

Katherine Schipper

Keyword(s):

Random Sampling ◽

Reference Sample ◽

Positive Association ◽

Cost Of Equity ◽

Association Tests ◽

Random Samples ◽

Distribution Matching ◽

Matched Samples ◽

Data Requirements ◽

Selection Of

AbstractThis paper investigates how data requirements often encountered in archival accounting research can produce a data-restricted sample that is a non-random selection of observations from the reference sample to which the researcher wishes to generalize results. We illustrate the effects of non-random sampling on results of association tests in a setting with data on one variable of interest for all observations and frequently-missing data on another variable of interest. We develop and validate a resampling approach that uses only observations from the data-restricted sample to construct distribution-matched samples that approximate randomly-drawn samples from the reference sample. Our simulation tests provide evidence that distribution-matched samples yield generalizable results. We demonstrate the effects of non-random sampling in tests of the association between realized returns and five implied cost of equity metrics. In this setting, the reference sample has full information on realized returns, while on average only 16% of reference sample observations have data on cost of equity metrics. Consistent with prior research (e.g., Easton and Monahan The Accounting Review 80, 501–538, 2005), analysis using the unadjusted (non-random) cost of equity sample reveals weak or negative associations between realized returns and cost of equity metrics. In contrast, using distribution-matched samples, we find reliable evidence of the theoretically-predicted positive association. We also conceptually and empirically compare distribution-matching with multiple imputation and selection models, two other approaches to dealing with non-random samples.

Download Full-text

Diagnosis of Diseases: Classification Rule Discovery from Medical Data using Genetic Algorithm with Suppressor Mutation

2020 International Conference on System, Computation, Automation and Networking (ICSCAN) ◽

10.1109/icscan49426.2020.9262429 ◽

2020 ◽

Author(s):

E. Thamizhselvi ◽

Geetha Vaithianathan

Keyword(s):

Genetic Algorithm ◽

Medical Data ◽

Suppressor Mutation ◽

Classification Rule ◽

Rule Discovery ◽

Diagnosis Of Diseases

Download Full-text

Different Hatching Rates of Floodwater Mosquitoes Aedes sticticus, Aedes rossicus and Aedes cinereus from Different Flooded Environments

Insects ◽

10.3390/insects12040279 ◽

2021 ◽

Vol 12 (4) ◽

pp. 279

Author(s):

Anders Lindström ◽

Disa Eklöf ◽

Tobias Lilja

Keyword(s):

Tap Water ◽

Soil Samples ◽

Mosquito Larvae ◽

Retaining Structures ◽

Retaining Structure ◽

Large Numbers ◽

Random Samples ◽

Nuisance Species ◽

Flooding Events ◽

Hatching Rates

In the lower Dalälven region, floodwater mosquitoes cause recurring problems. The main nuisance species is Aedes (Ochlerotatus) sticticus, but large numbers of Aedes (Aedes) rossicus and Aedes (Aedes) cinereus also hatch during flooding events. To increase understanding of which environments in the area give rise to mosquito nuisance, soil samples were taken from 20 locations from four environmental categories: grazed meadows, mowed meadows, unkept open grassland areas and forest areas. In each location 20 soil samples were taken, 10 from random locations and 10 from moisture retaining structures, such as tussocks, shrubs, piles of leaves, logs, and roots. The soil samples were soaked with tap water in the lab, and mosquito larvae were collected and allowed to develop to adult mosquitoes for species identification. Fewer larvae hatched from mowed areas and more larvae hatched from moisture retaining structure samples than random samples. The results showed that Aedes cinereus mostly hatch from grazed and unkept areas and hatched as much from random samples as from structures, whereas Aedes sticticus and Aedes rossicus hatched from open unkept and forest areas and hatch significantly more from structure samples. When the moisture retaining structures in open unkept areas where Aedes sticticus hatched were identified it was clear that they hatched predominantly from willow shrubs that offered shade. The results suggest that Ae. sticticus and Ae. cinereus favor different flooded environments for oviposition.

Download Full-text