DATABASE CLASSIFICATION BY INTEGRATING A CASE-BASED REASONING AND SUPPORT VECTOR MACHINE FOR INDUCTION

2010 ◽  
Vol 19 (01) ◽  
pp. 31-44 ◽  
Author(s):  
YEN-WEN WANG ◽  
PEI-CHANN CHANG ◽  
CHIN-YUAN FAN ◽  
CHIUNG-HUA HUANG

Database classification suffers from two common problems, i.e., the high dimensionality and nonstationary variations within the large historic data. This paper presents a hybrid classification model by integrating a case-based reasoning technique, a Support Vector Machine (SVM), and Genetic Algorithms to construct a decision-making system for data classification in various database applications. The model is mainly based on the concept that the historic database can be transformed into a smaller case-base together with a group of SVM models. As a result, the model can more accurately respond to the current data under classifying from the inductions by these SVM models generated from these smaller case bases. Hit rate is applied as a performance measure and the effectiveness of our proposed model is demonstrated by experimentally compared with other approaches on different database classification applications. The average hit rate of our proposed model is the highest among others.

2020 ◽  
Vol 23 (4) ◽  
pp. 274-284 ◽  
Author(s):  
Jingang Che ◽  
Lei Chen ◽  
Zi-Han Guo ◽  
Shuaiqun Wang ◽  
Aorigele

Background: Identification of drug-target interaction is essential in drug discovery. It is beneficial to predict unexpected therapeutic or adverse side effects of drugs. To date, several computational methods have been proposed to predict drug-target interactions because they are prompt and low-cost compared with traditional wet experiments. Methods: In this study, we investigated this problem in a different way. According to KEGG, drugs were classified into several groups based on their target proteins. A multi-label classification model was presented to assign drugs into correct target groups. To make full use of the known drug properties, five networks were constructed, each of which represented drug associations in one property. A powerful network embedding method, Mashup, was adopted to extract drug features from above-mentioned networks, based on which several machine learning algorithms, including RAndom k-labELsets (RAKEL) algorithm, Label Powerset (LP) algorithm and Support Vector Machine (SVM), were used to build the classification model. Results and Conclusion: Tenfold cross-validation yielded the accuracy of 0.839, exact match of 0.816 and hamming loss of 0.037, indicating good performance of the model. The contribution of each network was also analyzed. Furthermore, the network model with multiple networks was found to be superior to the one with a single network and classic model, indicating the superiority of the proposed model.


2020 ◽  
Vol 15 ◽  
Author(s):  
Chun Qiu ◽  
Sai Li ◽  
Shenghui Yang ◽  
Lin Wang ◽  
Aihui Zeng ◽  
...  

Aim: To search the genes related to the mechanisms of the occurrence of glioma and to try to build a prediction model for glioblastomas. Background: The morbidity and mortality of glioblastomas are very high, which seriously endangers human health. At present, the goals of many investigations on gliomas are mainly to understand the cause and mechanism of these tumors at the molecular level and to explore clinical diagnosis and treatment methods. However, there is no effective early diagnosis method for this disease, and there are no effective prevention, diagnosis or treatment measures. Methods: First, the gene expression profiles derived from GEO were downloaded. Then, differentially expressed genes (DEGs) in the disease samples and the control samples were identified. After that, GO and KEGG enrichment analyses of DEGs were performed by DAVID. Furthermore, the correlation-based feature subset (CFS) method was applied to the selection of key DEGs. In addition, the classification model between the glioblastoma samples and the controls was built by an Support Vector Machine (SVM) based on selected key genes. Results and Discussion: Thirty-six DEGs, including 17 upregulated and 19 downregulated genes, were selected as the feature genes to build the classification model between the glioma samples and the control samples by the CFS method. The accuracy of the classification model by using a 10-fold cross-validation test and independent set test was 76.25% and 70.3%, respectively. In addition, PPP2R2B and CYBB can also be found in the top 5 hub genes screened by the protein– protein interaction (PPI) network. Conclusions: This study indicated that the CFS method is a useful tool to identify key genes in glioblastomas. In addition, we also predicted that genes such as PPP2R2B and CYBB might be potential biomarkers for the diagnosis of glioblastomas.


2014 ◽  
Vol 24 (2) ◽  
pp. 397-404 ◽  
Author(s):  
Baozhen Yao ◽  
Ping Hu ◽  
Mingheng Zhang ◽  
Maoqing Jin

Abstract Automated Incident Detection (AID) is an important part of Advanced Traffic Management and Information Systems (ATMISs). An automated incident detection system can effectively provide information on an incident, which can help initiate the required measure to reduce the influence of the incident. To accurately detect incidents in expressways, a Support Vector Machine (SVM) is used in this paper. Since the selection of optimal parameters for the SVM can improve prediction accuracy, the tabu search algorithm is employed to optimize the SVM parameters. The proposed model is evaluated with data for two freeways in China. The results show that the tabu search algorithm can effectively provide better parameter values for the SVM, and SVM models outperform Artificial Neural Networks (ANNs) in freeway incident detection.


Molecules ◽  
2012 ◽  
Vol 17 (4) ◽  
pp. 4560-4582 ◽  
Author(s):  
Khac-Minh Thai ◽  
Thuy-Quyen Nguyen ◽  
Trieu-Du Ngo ◽  
Thanh-Dao Tran ◽  
Thi-Ngoc-Phuong Huynh

2018 ◽  
Vol 141 (4) ◽  
Author(s):  
Qihong Feng ◽  
Ronghao Cui ◽  
Sen Wang ◽  
Jin Zhang ◽  
Zhe Jiang

Diffusion coefficient of carbon dioxide (CO2), a significant parameter describing the mass transfer process, exerts a profound influence on the safety of CO2 storage in depleted reservoirs, saline aquifers, and marine ecosystems. However, experimental determination of diffusion coefficient in CO2-brine system is time-consuming and complex because the procedure requires sophisticated laboratory equipment and reasonable interpretation methods. To facilitate the acquisition of more accurate values, an intelligent model, termed MKSVM-GA, is developed using a hybrid technique of support vector machine (SVM), mixed kernels (MK), and genetic algorithm (GA). Confirmed by the statistical evaluation indicators, our proposed model exhibits excellent performance with high accuracy and strong robustness in a wide range of temperatures (273–473.15 K), pressures (0.1–49.3 MPa), and viscosities (0.139–1.950 mPa·s). Our results show that the proposed model is more applicable than the artificial neural network (ANN) model at this sample size, which is superior to four commonly used traditional empirical correlations. The technique presented in this study can provide a fast and precise prediction of CO2 diffusivity in brine at reservoir conditions for the engineering design and the technical risk assessment during the process of CO2 injection.


2014 ◽  
Vol 2014 ◽  
pp. 1-13 ◽  
Author(s):  
Xibin Wang ◽  
Junhao Wen ◽  
Shafiq Alam ◽  
Xiang Gao ◽  
Zhuo Jiang ◽  
...  

Accurate forecast of the sales growth rate plays a decisive role in determining the amount of advertising investment. In this study, we present a preclassification and later regression based method optimized by improved particle swarm optimization (IPSO) for sales growth rate forecasting. We use support vector machine (SVM) as a classification model. The nonlinear relationship in sales growth rate forecasting is efficiently represented by SVM, while IPSO is optimizing the training parameters of SVM. IPSO addresses issues of traditional PSO, such as relapsing into local optimum, slow convergence speed, and low convergence precision in the later evolution. We performed two experiments; firstly, three classic benchmark functions are used to verify the validity of the IPSO algorithm against PSO. Having shown IPSO outperform PSO in convergence speed, precision, and escaping local optima, in our second experiment, we apply IPSO to the proposed model. The sales growth rate forecasting cases are used to testify the forecasting performance of proposed model. According to the requirements and industry knowledge, the sample data was first classified to obtain types of the test samples. Next, the values of the test samples were forecast using the SVM regression algorithm. The experimental results demonstrate that the proposed model has good forecasting performance.


2019 ◽  
Vol 2 (2) ◽  
pp. 43
Author(s):  
Lalu Mutawalli ◽  
Mohammad Taufan Asri Zaen ◽  
Wire Bagye

In the era of technological disruption of mass communication, social media became a reference in absorbing public opinion. The digitalization of data is very rapidly produced by social media users because it is an attempt to represent the feelings of the audience. Data production in question is the user posts the status and comments on social media. Data production by the public in social media raises a very large set of data or can be referred to as big data. Big data is a collection of data sets in very large numbers, complex, has a relatively fast appearance time, so that makes it difficult to handle. Analysis of big data with data mining methods to get knowledge patterns in it. This study analyzes the sentiments of netizens on Twitter social media on Mr. Wiranto stabbing case. The results of the sentiment analysis showed 41% gave positive comments, 29% commented neutrally, and 29% commented negatively on events. Besides, modeling of the data is carried out using a support vector machine algorithm to create a system capable of classifying positive, neutral, and negative connotations. The classification model that has been made is then tested using the confusion matrix technique with each result is a precision value of 83%, a recall value of 80%, and finally, as much as 80% obtained in testing the accuracy.


Sign in / Sign up

Export Citation Format

Share Document