The Influence of Outliers on Discrimination of Chronic Obturative Lung Disease

1988 ◽  
Vol 27 (04) ◽  
pp. 167-176 ◽  
Author(s):  
Ewa Krusmska ◽  
Jerzy Liebhart

SummaryThe paper discusses the influence of outliers on the results of linear and canonical discrimination used to assist medical diagnosis in chronic obturative lung disease. The outliers have been detected by χ2-plots based on unweighted sample means and covariances or their weighted analogues with Huber or Hampel weights. With Hampel weights outliers have been found different from those with both remaining methods. After trimming the 10 percent of the most distant individuals, the discrimination was done for the training sample collected earlier (N′ = 305) and for the test sample (N″ = 53) with the functions obtained from the training sample. The discrimination was performed for subsets of the most discriminative variables. When the sample size was sufficiently large (training sample), the goodness of reclassification was similar for classical functions and functions calculated after trimming. For small samples they differ. For classification of the test data the results obtained after trimming (especially with Hampel weights) are much better. The method may be recommended to be used in the computerized respiratory diseases consulting unit.

Author(s):  
Zhigang Wei ◽  
Limin Luo ◽  
Burt Lin ◽  
Dmitri Konson ◽  
Kamran Nikbin

Good durability/reliability performance of products can be achieved by properly constructing and implementing design curves, which are usually obtained by analyzing test data, such as fatigue S-N data. A good design curve construction approach should consider sample size, failure probability and confidence level, and these features are especially critical when test sample size is small. The authors have developed a design S-N curve construction method based on the tolerance limit concept. However, recent studies have shown that the analytical solutions based on the tolerance limit approach may not be accurate for very small sample size because of the assumptions and approximations introduced to the analytical approach. In this paper a Monte Carlo simulation approach is used to construct design curves for test data with an assumed underlining normal (or lognormal) distribution. The difference of factor K, which measures the confidence level of the test data, between the analytical solution and the Monte Carlo simulation solutions is compared. Finally, the design curves constructed based on these methods are demonstrated and compared using fatigue S-N data with small sample size.


Connectivity ◽  
2020 ◽  
Vol 148 (6) ◽  
Author(s):  
A. P. Kozyryatsʹkyy ◽  
◽  
V. V. Zhebka ◽  
L. O. Dʹomina ◽  
D. O. Tarasenko

The article investigates the effectiveness of the machine learning algorithm for the classification of Internet traffic. The RF algorithm, which works by constructing many decision trees, is considered. The efficiency of the RF algorithm in the problems of application classification in the presence and absence of background network traffic is evaluated. A laboratory network of several computers was set up to collect the data needed for analysis. One of the computers was connected to the World Wide Web and a wireless access point was set up on its base. On the same computer, all the traffic passing through it was captured using Wireshark. Various applications were running on other computers connected to the access point. Web pages were viewed using Google Chrome and Opera browsers, using Skype, video calls were made, files were downloaded using the µTorrent torrent client, the Steam digital game distribution service was used, etc. The obtained data were stored in the PCAP format. To bring the obtained data in line with the requirements of the problem, the data was pre-processed. In the experiment, a random forest was constructed and the quality of classification on a given sample was assessed. The most acceptable parameters of the algorithm were selected experimentally. It is experimentally chosen that the forest consists of 5 trees with the maximum possible depth. The algorithm is most effective for data related to DNS traffic. In addition to checking the operation of the algorithm on the test sample, which has the same class composition as the training, the assessment of its quality was also carried out in the presence of background traffic, i.e. in the test sample there were copies of classes absent in the training sample.


Author(s):  
Yuri I. Ingster ◽  
Christophe Pouet ◽  
Alexandre B. Tsybakov

We study the problem of classification of d -dimensional vectors into two classes (one of which is ‘pure noise’) based on a training sample of size m . The main specific feature is that the dimension d can be very large. We suppose that the difference between the distribution of the population and that of the noise is only in a shift, which is a sparse vector. For Gaussian noise, fixed sample size m , and dimension d that tends to infinity, we obtain the sharp classification boundary, i.e. the necessary and sufficient conditions for the possibility of successful classification. We propose classifiers attaining this boundary. We also give extensions of the result to the case where the sample size m depends on d and satisfies the condition , 0 ≤ γ < 1, and to the case of non-Gaussian noise satisfying the Cramér condition.


2017 ◽  
Author(s):  
Benjamin O. Turner ◽  
Erick J. Paul ◽  
Michael B. Miller ◽  
Aron K. Barbey

Despite a growing body of research suggesting that task-based functional magnetic resonance imaging (fMRI) studies often suffer from a lack of statistical power due to too-small samples, the proliferation of such underpowered studies continues unabated. Using large independent samples across eleven distinct tasks, we demonstrate the impact of sample size on replicability, assessed at different levels of analysis relevant to fMRI researchers. We find that the degree of replicability for typical sample sizes is modest and that sample sizes much larger than typical (e.g., N = 100) produce results that fall well short of perfectly replicable. Thus, our results join the existing line of work advocating for larger sample sizes. Moreover, because we test sample sizes over a fairly large range and use intuitive metrics of replicability, our hope is that our results are more understandable and convincing to researchers who may have found previous results advocating for larger samples inaccessible.


Author(s):  
Vladimir I. Volchikhin ◽  
Aleksandr I. Ivanov ◽  
Alexander V. Bezyaev ◽  
Evgeniy N. Kupriyanov

Introduction. The aim of the work is to reduce the requirements to test sample size when testing the hypothesis of normality. Materials and Methods. A neural network generalization of three well-known statistical criteria is used: the chi-square criterion, the Anderson–Darling criterion in ordinary form, and the Anderson–Darling criterion in logarithmic form. Results. The neural network combining of the chi-square criterion and the Anderson–Darling criterion reduces the sample size requirements by about 40 %. Adding a third neuron that reproduces the logarithmic version of the Andersоn–Darling test leads to a small decrease in the probability of errors by 2 %. The article deals with single-layer and multilayer neural networks, summarizing many currently known statistical criteria. Discussion and Conclusion. An assumption has been made that an artificial neuron can be assigned to each of the known statistical criteria. It is necessary to change the attitude to the synthesis of new statistical criteria that previously prevailed in the 20th century. There is no current need for striving to create statistical criteria for high power. It is much more advantageous trying to ensure that the data of newly synthesized statistical criteria are low correlated with many of the criteria already created.


Physiotherapy ◽  
2013 ◽  
Vol 21 (3) ◽  
Author(s):  
Natalia Uścinowicz ◽  
Wojciech Seidel ◽  
Paweł Zostawa ◽  
Sebastian Klich

AbstractThe recent Olympic Games in London incited much interest in the competition of disabled athletes. Various people connected with swimming, including coaches and athletes, have speculated about the fairness of competitions of disabled athletes. A constant problem are the subjective methods of classification in disabled sport. Originally, athletes with disabilities were classified according to medical diagnosis. Due to the injustice which still affects the competitors, functional classification was created shortly after. In the present review, the authors show the anomalies in the structure of the classification. The presented discovery led to the suggestion to introduce objective methods, thanks to which it would be no longer necessary to rely on the subjective assessment of the classifier. According to the authors, while using objective methods does not completely rule out the possibility of fraud by disabled athletes in the classification process, it would certainly reduce their incidence. Some of the objective methods useful for the classification of disabled athletes are: posturography, evaluation of the muscle parameters, electrogoniometric assessment, surface electromyography, and analysis of kinematic parameters. These methods have provide objective evaluation in the diagnostic sense but only if they are used in tandem. The authors demonstrate the undeniable benefits of using objective methods. Unfortunately, there are not only advantages of such solution, there also several drawbacks to be found. The conclusion of the article is the statement by the authors that it is right to use objective methods which allow to further the most important rule in sport: fair-play.


Author(s):  
Les Beach

To test the efficacy of the Personal Orientation Inventory in assessing growth in self-actualization in relation to encounter groups and to provide a more powerful measure of such changes, pre- and posttest data from 3 highly comparable encounter groups (N = 43) were combined for analysis. Results indicated that the Personal Orientation Inventory is a sensitive instrument for assessing personal growth in encounter groups and that a larger total sample size provides more significant results than those reported for small samples (e. g., fewer than 15 participants).


Author(s):  
Lam Dang Pham ◽  
Huy Phan ◽  
Ramaswamy Palaniappan ◽  
Alfred Mertins ◽  
Ian Mcloughlin

2012 ◽  
Vol 532-533 ◽  
pp. 1445-1449
Author(s):  
Ting Ting Tong ◽  
Zhen Hua Wu

EM algorithm is a common method to solve mixed model parameters in statistical classification of remote sensing image. The EM algorithm based on fuzzification is presented in this paper to use a fuzzy set to represent each training sample. Via the weighted degree of membership, different samples will be of different effect during iteration to decrease the impact of noise on parameter learning and to increase the convergence rate of algorithm. The function and accuracy of classification of image data can be completed preferably.


Sign in / Sign up

Export Citation Format

Share Document