ALGORITHM OF ε-SVR BASED ON A LARGE-SCALE SAMPLE SET: STEP-BY-STEP SEARCH

Author(s):  
SHAOHUA ZENG ◽  
Y. Y. TANG ◽  
YAN WEI ◽  
YONG WANG

In view of the support vectors of ε-SVR that are not distributed in the ε belt and only located on the outskirts of the ε belt, a novel algorithm to construct ε-SVR of a large-scale training sample set is proposed in this paper. It computes firstly the ε-SVR hyper-plane of a small training sample set and the distances d of all samples to the hyper-plane, then deletes the samples not in field ε ≤ d ≤ d max and searches SVs gradually in the scope ε ≤ d ≤ d max , and trains step-by-step the final ε-SVR. Finally, it analyzes the time complexity of the algorithm, and verifies its convergence in the theory and tests its efficiency by the simulation.

2013 ◽  
Vol 816-817 ◽  
pp. 512-515
Author(s):  
Fang Zhu ◽  
Jun Fang Wei

There has become a bottleneck to use support vector machine (SVM) due to the problems such as slow learning speed, large buffer memory requirement, low generalization performance and so on. These problems are caused by large-scale training sample set and outlier data immixed in the other class. Aiming at these problems, this paper proposed a new reduction strategy for large-scale training sample set according to analyzing on the structure of the training sample set based on the point set theory. By using fuzzy clustering method in this new strategy, the potential support vectors are obtained and the non-boundary outlier data immixed in the other class is removed. In view of reducing greatly the scale of the training sample set, it improves the generalization performance of SVM and effectively avoids over-learning. Finally, the experimental results shown the given reduction strategy can not only reduce the train samples of SVM and speed up the train process, but also ensure accuracy of classification.


Author(s):  
Fang Zhu ◽  
Junfang Wei ◽  
Tao Gao

There has become a bottleneck to use support vector machine (SVM) due to the problems such as slow learning speed, large buffer memory requirement, low generalization performance and so on. These problems are caused by large-scale training sample set and outlier data immixed in the other class. Aiming at these problems, this paper proposed a new reduction strategy for large-scale training sample set according to analyzing on the structure of the training sample set based on the point set theory. By using fuzzy clustering method in this new strategy, the potential support vectors are obtained and the non-boundary outlier data immixed in the other class is removed. In view of reducing greatly the scale of the training sample set, it improves the generalization performance of SVM and effectively avoids over-learning. Finally, the experimental results shown the given reduction strategy can not only reduce the train samples of SVM and speed up the train process, but also ensure accuracy of classification.


2009 ◽  
Vol 29 (10) ◽  
pp. 2736-2740 ◽  
Author(s):  
Fang ZHU ◽  
Jun-hua GU ◽  
Xin-wei YANG ◽  
Rui-xia YANG

2012 ◽  
Vol 248 ◽  
pp. 521-526
Author(s):  
An Na Wang ◽  
Mo Sha ◽  
Li Mei Liu ◽  
Mao Xiang Chu

Once faults happened in the process industry, it will bring on heavy casualties and fatal property losses, even lead to irreclaimable pollution. Combining the large-scale sample set reduction strategy with SVM, this paper proposed a new improved support vector machine (SVM) training algorithm based on support vectors to solve the difficulty of fault diagnosis of process industry. A new reduction algorithm is presented to reduce large-scale sample set. The constraint threshold is introduced in the new algorithm. The new training algorithm can eliminate the samples correspond to non-support vectors so that poor generation ability caused by large-scale training sample set is solved. A new evaluating indicator of large-scale sample set reduction strategy is also proposed in this paper. The improved SVM algorithm proposed is applied to two typical fault diagnosis of process industry. The results of experiments show that the improved SVM algorithm not only reduces greatly the cost of SVM learning but also markedly increases the speed of classification, and at one time the accuracy of fault diagnosis is not debased.


2013 ◽  
Vol 421 ◽  
pp. 701-705
Author(s):  
Fang Zhu ◽  
Jun Fang Wei

There has become a bottleneck to use support vector machine (SVM) due to the problems such as slow learning speed, large buffer memory requirement, low generalization performance and so on. These problems are caused by large-scale training sample set and outlier data immixed in the other class. Aiming at these problems, this paper proposed a new reduction strategy for large-scale training sample set according to analyzing on the structure of the training sample set based on the point set theory. By using fuzzy clustering method in this new strategy, the potential support vectors are obtained and the non-boundary outlier data immixed in the other class is removed. In view of reducing greatly the scale of the training sample set, it improves the generalization performance of SVM and effectively avoids over-learning. Then the proposed method was applied to bus passenger flow counting. The experimental results show that the method reposed in this paper obtains higher classification accuracy.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Gongchao Jing ◽  
Yufeng Zhang ◽  
Wenzhi Cui ◽  
Lu Liu ◽  
Jian Xu ◽  
...  

Abstract Background Due to their much lower costs in experiment and computation than metagenomic whole-genome sequencing (WGS), 16S rRNA gene amplicons have been widely used for predicting the functional profiles of microbiome, via software tools such as PICRUSt 2. However, due to the potential PCR bias and gene profile variation among phylogenetically related genomes, functional profiles predicted from 16S amplicons may deviate from WGS-derived ones, resulting in misleading results. Results Here we present Meta-Apo, which greatly reduces or even eliminates such deviation, thus deduces much more consistent diversity patterns between the two approaches. Tests of Meta-Apo on > 5000 16S-rRNA amplicon human microbiome samples from 4 body sites showed the deviation between the two strategies is significantly reduced by using only 15 WGS-amplicon training sample pairs. Moreover, Meta-Apo enables cross-platform functional comparison between WGS and amplicon samples, thus greatly improve 16S-based microbiome diagnosis, e.g. accuracy of gingivitis diagnosis via 16S-derived functional profiles was elevated from 65 to 95% by WGS-based classification. Therefore, with the low cost of 16S-amplicon sequencing, Meta-Apo can produce a reliable, high-resolution view of microbiome function equivalent to that offered by shotgun WGS. Conclusions This suggests that large-scale, function-oriented microbiome sequencing projects can probably benefit from the lower cost of 16S-amplicon strategy, without sacrificing the precision in functional reconstruction that otherwise requires WGS. An optimized C++ implementation of Meta-Apo is available on GitHub (https://github.com/qibebt-bioinfo/meta-apo) under a GNU GPL license. It takes the functional profiles of a few paired WGS:16S-amplicon samples as training, and outputs the calibrated functional profiles for the much larger number of 16S-amplicon samples.


Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4436
Author(s):  
Mohammad Al Ktash ◽  
Mona Stefanakis ◽  
Barbara Boldrini ◽  
Edwin Ostertag ◽  
Marc Brecht

A laboratory prototype for hyperspectral imaging in ultra-violet (UV) region from 225 to 400 nm was developed and used to rapidly characterize active pharmaceutical ingredients (API) in tablets. The APIs are ibuprofen (IBU), acetylsalicylic acid (ASA) and paracetamol (PAR). Two sample sets were used for a comparison purpose. Sample set one comprises tablets of 100% API and sample set two consists of commercially available painkiller tablets. Reference measurements were performed on the pure APIs in liquid solutions (transmission) and in solid phase (reflection) using a commercial UV spectrometer. The spectroscopic part of the prototype is based on a pushbroom imager that contains a spectrograph and charge-coupled device (CCD) camera. The tablets were scanned on a conveyor belt that is positioned inside a tunnel made of polytetrafluoroethylene (PTFE) in order to increase the homogeneity of illumination at the sample position. Principal component analysis (PCA) was used to differentiate the hyperspectral data of the drug samples. The first two PCs are sufficient to completely separate all samples. The rugged design of the prototype opens new possibilities for further development of this technique towards real large-scale application.


Sign in / Sign up

Export Citation Format

Share Document