Application of ACO-SVM in Chinese Text Feature Selection

2012 ◽  
Vol 155-156 ◽  
pp. 770-775
Author(s):  
Jin Xiu Cui ◽  
Xiao Xia Huang

The algorithm proposed in this paper applies ACO in combination with support vector machine (SVM) in Chinese text feature selection. It obtains classifier models for each category at last. The experimental results show that the proposed method is feasible and lead to a considerable increase of classification accuracy.

Author(s):  
Gang Liu ◽  
Chunlei Yang ◽  
Sen Liu ◽  
Chunbao Xiao ◽  
Bin Song

A feature selection method based on mutual information and support vector machine (SVM) is proposed in order to eliminate redundant feature and improve classification accuracy. First, local correlation between features and overall correlation is calculated by mutual information. The correlation reflects the information inclusion relationship between features, so the features are evaluated and redundant features are eliminated with analyzing the correlation. Subsequently, the concept of mean impact value (MIV) is defined and the influence degree of input variables on output variables for SVM network based on MIV is calculated. The importance weights of the features described with MIV are sorted by descending order. Finally, the SVM classifier is used to implement feature selection according to the classification accuracy of feature combination which takes MIV order of feature as a reference. The simulation experiments are carried out with three standard data sets of UCI, and the results show that this method can not only effectively reduce the feature dimension and high classification accuracy, but also ensure good robustness.


2012 ◽  
Vol 2012 ◽  
pp. 1-12 ◽  
Author(s):  
Jongshill Lee ◽  
Youngjoon Chee ◽  
Inyoung Kim

We propose a new method for personal identification using the derived vectorcardiogram (dVCG), which is derived from the limb leads electrocardiogram (ECG). The dVCG was calculated from the standard limb leads ECG using the precalculated inverse transform matrix. Twenty-one features were extracted from the dVCG, and some or all of these 21 features were used in support vector machine (SVM) learning and in tests. The classification accuracy was 99.53%, which is similar to the previous dVCG analysis using the standard 12-lead ECG. Our experimental results show that it is possible to identify a person by features extracted from a dVCG derived from limb leads only. Hence, only three electrodes have to be attached to the person to be identified, which can reduce the effort required to connect electrodes and calculate the dVCG.


2021 ◽  
Vol 7 ◽  
pp. e390
Author(s):  
Shafaq Abbas ◽  
Zunera Jalil ◽  
Abdul Rehman Javed ◽  
Iqra Batool ◽  
Mohammad Zubair Khan ◽  
...  

Breast cancer is one of the leading causes of death in the current age. It often results in subpar living conditions for a patient as they have to go through expensive and painful treatments to fight this cancer. One in eight women all over the world is affected by this disease. Almost half a million women annually do not survive this fight and die from this disease. Machine learning algorithms have proven to outperform all existing solutions for the prediction of breast cancer using models built on the previously available data. In this paper, a novel approach named BCD-WERT is proposed that utilizes the Extremely Randomized Tree and Whale Optimization Algorithm (WOA) for efficient feature selection and classification. WOA reduces the dimensionality of the dataset and extracts the relevant features for accurate classification. Experimental results on state-of-the-art comprehensive dataset demonstrated improved performance in comparison with eight other machine learning algorithms: Support Vector Machine (SVM), Random Forest, Kernel Support Vector Machine, Decision Tree, Logistic Regression, Stochastic Gradient Descent, Gaussian Naive Bayes and k-Nearest Neighbor. BCD-WERT outperformed all with the highest accuracy rate of 99.30% followed by SVM achieving 98.60% accuracy. Experimental results also reveal the effectiveness of feature selection techniques in improving prediction accuracy.


2012 ◽  
Vol 532-533 ◽  
pp. 1497-1502
Author(s):  
Hong Mei Li ◽  
Lin Gen Yang ◽  
Li Hua Zou

To make feature subset which can gain the higher classification accuracy rate, the method based on genetic algorithms and the feature selection of support vector machine is proposed. Firstly, the ReliefF algorithm provides a priori information to GA, the parameters of the support vector machine mixed into the genetic encoding,and then using genetic algorithm finds the optimal feature subset and support vector machines parameter combination. Finally, experimental results show that the proposed algorithm can gain the higher classification accuracy rate based on the smaller feature subset.


2011 ◽  
Vol 201-203 ◽  
pp. 2070-2074
Author(s):  
He Xun Wang ◽  
Chong Liu ◽  
Yu Qing Sun

AdaBoost algorithm can achieve better performance by averaging over the predictions of some weak hypotheses. To improve the power of classification ability of AdaBoost, an infinite ensemble learning framework based on the Support Vector Machine was formulated. The framework can output an infinite AdaBoost through embedding infinite hypotheses into a new kernel of Support Vector Machine. The stump kernel embodies infinite decision stumps. At last, the algorithm was used in fault diagnosis for analog circuits. Experimental results show that infinite AdaBoost with Support Vector Machine is superior than finite AdaBoost with the same base hypothesis set. The purpose of enhancing classification accuracy of AdaBoost algorithm is achieved.


2020 ◽  
Vol 4 (3) ◽  
pp. 504-512
Author(s):  
Faried Zamachsari ◽  
Gabriel Vangeran Saragih ◽  
Susafa'ati ◽  
Windu Gata

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.


2020 ◽  
Vol 27 (4) ◽  
pp. 329-336 ◽  
Author(s):  
Lei Xu ◽  
Guangmin Liang ◽  
Baowen Chen ◽  
Xu Tan ◽  
Huaikun Xiang ◽  
...  

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.


Author(s):  
Midde Venkateswarlu Naik ◽  
D. Vasumathi ◽  
A.P. Siva Kumar

Aims: The proposed research work is on an evolutionary enhanced method for sentiment or emotion classification on unstructured review text in the big data field. The sentiment analysis plays a vital role for current generation of people for extracting valid decision points about any aspect such as movie ratings, education institute or politics ratings, etc. The proposed hybrid approach combined the optimal feature selection using Particle Swarm Optimization (PSO) and sentiment classification through Support Vector Machine (SVM). The current approach performance is evaluated with statistical measures, such as precision, recall, sensitivity, specificity, and was compared with the existing approaches. The earlier authors have achieved an accuracy of sentiment classifier in the English text up to 94% as of now. In the proposed scheme, an average accuracy of sentiment classifier on distinguishing datasets outperformed as 99% by tuning various parameters of SVM, such as constant c value and kernel gamma value in association with PSO optimization technique. The proposed method utilized three datasets, such as airline sentiment data, weather, and global warming datasets, that are publically available. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Background: The sentiment analysis plays a vital role for current generation people for extracting valid decisions about any aspect such as movie rating, education institute or even politics ratings, etc. Sentiment Analysis (SA) or opinion mining has become fascinated scientifically as a research domain for the present environment. The key area is sentiment classification on semi-structured or unstructured data in distinguish languages, which has become a major research aspect. User-Generated Content [UGC] from distinguishing sources has been hiked significantly with rapid growth in a web environment. The huge user-generated data over social media provides substantial value for discovering hidden knowledge or correlations, patterns, and trends or sentiment extraction about any specific entity. SA is a computational analysis to determine the actual opinion of an entity which is expressed in terms of text. SA is also called as computation of emotional polarity expressed over social media as natural text in miscellaneous languages. Usually, the automatic superlative sentiment classifier model depends on feature selection and classification algorithms. Methods: The proposed work used Support vector machine as classification technique and particle swarm optimization technique as feature selection purpose. In this methodology, we tune various permutations and combination parameters in order to obtain expected desired results with kernel and without kernel technique for sentiment classification on three datasets, including airline, global warming, weather sentiment datasets, that are freely hosted for research practices. Results: In the proposed scheme, The proposed method has outperformed with 99.2% of average accuracy to classify the sentiment on different datasets, among other machine learning techniques. The attained high accuracy in classifying sentiment or opinion about review text proves superior effectiveness over existing sentiment classifiers. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Conclusion: The objective of the research issue sentiment classifier accuracy has been hiked with the help of Kernel-based Support Vector Machine (SVM) based on parameter optimization. The optimal feature selection to classify sentiment or opinion towards review documents has been determined with the help of a particle swarm optimization approach. The proposed method utilized three datasets to simulate the results, such as airline sentiment data, weather sentiment data, and global warming data that are freely available datasets.


Author(s):  
Narina Thakur ◽  
Deepti Mehrotra ◽  
Abhay Bansal ◽  
Manju Bala

Objective: Since the adequacy of Learning Objects (LO) is a dynamic concept and changes in its use, needs and evolution, it is important to consider the importance of LO in terms of time to assess its relevance as the main objective of the proposed research. Another goal is to increase the classification accuracy and precision. Methods: With existing IR and ranking algorithms, MAP optimization either does not lead to a comprehensively optimal solution or is expensive and time - consuming. Nevertheless, Support Vector Machine learning competently leads to a globally optimal solution. SVM is a powerful classifier method with its high classification accuracy and the Tilted time window based model is computationally efficient. Results: This paper proposes and implements the LO ranking and retrieval algorithm based on the Tilted Time window and the Support Vector Machine, which uses the merit of both methods. The proposed model is implemented for the NCBI dataset and MAT Lab. Conclusion: The experiments have been carried out on the NCBI dataset, and LO weights are assigned to be relevant and non - relevant for a given user query according to the Tilted Time series and the Cosine similarity score. Results showed that the model proposed has much better accuracy.


Sign in / Sign up

Export Citation Format

Share Document