Lipreading Using n–Gram Feature Vector

Author(s):  
Preety Singh ◽  
Vijay Laxmi ◽  
Deepika Gupta ◽  
M. S. Gaur
Keyword(s):  
2021 ◽  
Vol 14 ◽  
pp. 1-11
Author(s):  
Suraya Alias

In the edge where conversation merely involves online chatting and texting one another, an automated conversational agent is needed to support certain repetitive tasks such as providing FAQs, customer service and product recommendations. One of the key challenges is to identify and discover user’s intention in a social conversation where the focus of our work in the academic domain. Our unsupervised text feature extraction method for Intent Pattern Discovery is developed by applying text features constraints to the FP-Growth technique. The academic corpus was developed using a chat messages dataset where the conversation between students and academicians regarding undergraduate and postgraduate queries were extracted as text features for our model. We experimented with our new Constrained Frequent Intent Pattern (cFIP) model in contrast with the N-gram model in terms of feature-vector size reduction, descriptive intent discovery, and analysis of cFIP Rules. Our findings show significant and descriptive intent patterns was discovered with confidence rules value of 0.9 for cFIP of 3-sequence. We report an average feature-vector size reduction of 76% compared to the Bigram model using both undergraduate and postgraduate conversation datasets. The usability testing results depicted overall user satisfaction average mean score is 4.30 out of 5 in using the Academic chatbot which supported our intent discovery cFIP approach.


2018 ◽  
Vol 7 (3.33) ◽  
pp. 15
Author(s):  
Young Man Kwon ◽  
So Hee Jun ◽  
Won Mo Gal ◽  
Myung Jae Lim

In this paper, we compared the performance of the classifiers according to feature vectors with Binary BOW, Count BOW and TF-IDF for malware detection. We used the feature of Opcode that extracted from PE file. For performance comparison, we measured the AUC score for the classifiers those are DT, KNN, MLP, MNB and SVM. As a result, we recommend neural network (MLP) and instance-based model (KNN) because they show the high AUC score and accuracy regardless of the unbalanced dataset and the feature vector. If you use classical classifiers, we recommend DT because it guarantees high AUC score and accuracy regardless of the same condition as the above. If you use SVM, you have to do Robust scaling to resolved outlier and unbalanced dataset. If you use MNB, you need to use N-gram technique to improve AUC score.  


2018 ◽  
Vol 7 (2) ◽  
pp. 54-59
Author(s):  
Yoon Gee Ong ◽  
◽  
Seung Shik Kang ◽  

2018 ◽  
Vol 30 (12) ◽  
pp. 2311
Author(s):  
Zhendong Li ◽  
Yong Zhong ◽  
Dongping Cao

Author(s):  
Vitaly Kuznetsov ◽  
Hank Liao ◽  
Mehryar Mohri ◽  
Michael Riley ◽  
Brian Roark

2020 ◽  
Author(s):  
Grant P. Strimel ◽  
Ariya Rastrow ◽  
Gautam Tiwari ◽  
Adrien Piérard ◽  
Jon Webb

2019 ◽  
Vol 24 (34) ◽  
pp. 4007-4012 ◽  
Author(s):  
Alessandra Lumini ◽  
Loris Nanni

Background: Anatomical Therapeutic Chemical (ATC) classification of unknown compound has raised high significance for both drug development and basic research. The ATC system is a multi-label classification system proposed by the World Health Organization (WHO), which categorizes drugs into classes according to their therapeutic effects and characteristics. This system comprises five levels and includes several classes in each level; the first level includes 14 main overlapping classes. The ATC classification system simultaneously considers anatomical distribution, therapeutic effects, and chemical characteristics, the prediction for an unknown compound of its ATC classes is an essential problem, since such a prediction could be used to deduce not only a compound’s possible active ingredients but also its therapeutic, pharmacological, and chemical properties. Nevertheless, the problem of automatic prediction is very challenging due to the high variability of the samples and the presence of overlapping among classes, resulting in multiple predictions and making machine learning extremely difficult. Methods: In this paper, we propose a multi-label classifier system based on deep learned features to infer the ATC classification. The system is based on a 2D representation of the samples: first a 1D feature vector is obtained extracting information about a compound’s chemical-chemical interaction and its structural and fingerprint similarities to other compounds belonging to the different ATC classes, then the original 1D feature vector is reshaped to obtain a 2D matrix representation of the compound. Finally, a convolutional neural network (CNN) is trained and used as a feature extractor. Two general purpose classifiers designed for multi-label classification are trained using the deep learned features and resulting scores are fused by the average rule. Results: Experimental evaluation based on rigorous cross-validation demonstrates the superior prediction quality of this method compared to other state-of-the-art approaches developed for this problem. Conclusion: Extensive experiments demonstrate that the new predictor, based on CNN, outperforms other existing predictors in the literature in almost all the five metrics used to examine the performance for multi-label systems, particularly in the “absolute true” rate and the “absolute false” rate, the two most significant indexes. Matlab code will be available at https://github.com/LorisNanni.


2019 ◽  
Vol 19 (4) ◽  
pp. 216-223 ◽  
Author(s):  
Tianyi Zhao ◽  
Donghua Wang ◽  
Yang Hu ◽  
Ningyi Zhang ◽  
Tianyi Zang ◽  
...  

Background: More and more scholars are trying to use it as a specific biomarker for Alzheimer’s Disease (AD) and mild cognitive impairment (MCI). Multiple studies have indicated that miRNAs are associated with poor axonal growth and loss of synaptic structures, both of which are early events in AD. The overall loss of miRNA may be associated with aging, increasing the incidence of AD, and may also be involved in the disease through some specific molecular mechanisms. Objective: Identifying Alzheimer’s disease-related miRNA can help us find new drug targets, early diagnosis. Materials and Methods: We used genes as a bridge to connect AD and miRNAs. Firstly, proteinprotein interaction network is used to find more AD-related genes by known AD-related genes. Then, each miRNA’s correlation with these genes is obtained by miRNA-gene interaction. Finally, each miRNA could get a feature vector representing its correlation with AD. Unlike other studies, we do not generate negative samples randomly with using classification method to identify AD-related miRNAs. Here we use a semi-clustering method ‘one-class SVM’. AD-related miRNAs are considered as outliers and our aim is to identify the miRNAs that are similar to known AD-related miRNAs (outliers). Results and Conclusion: We identified 257 novel AD-related miRNAs and compare our method with SVM which is applied by generating negative samples. The AUC of our method is much higher than SVM and we did case studies to prove that our results are reliable.


2020 ◽  
Vol 17 (4) ◽  
pp. 271-286
Author(s):  
Chang Xu ◽  
Limin Jiang ◽  
Zehua Zhang ◽  
Xuyao Yu ◽  
Renhai Chen ◽  
...  

Background: Protein-Protein Interactions (PPIs) play a key role in various biological processes. Many methods have been developed to predict protein-protein interactions and protein interaction networks. However, many existing applications are limited, because of relying on a large number of homology proteins and interaction marks. Methods: In this paper, we propose a novel integrated learning approach (RF-Ada-DF) with the sequence-based feature representation, for identifying protein-protein interactions. Our method firstly constructs a sequence-based feature vector to represent each pair of proteins, viaMultivariate Mutual Information (MMI) and Normalized Moreau-Broto Autocorrelation (NMBAC). Then, we feed the 638- dimentional features into an integrated learning model for judging interaction pairs and non-interaction pairs. Furthermore, this integrated model embeds Random Forest in AdaBoost framework and turns weak classifiers into a single strong classifier. Meanwhile, we also employ double fault detection in order to suppress over-adaptation during the training process. Results: To evaluate the performance of our method, we conduct several comprehensive tests for PPIs prediction. On the H. pyloridataset, our method achieves 88.16% accuracy and 87.68% sensitivity, the accuracy of our method is increased by 0.57%. On the S. cerevisiaedataset, our method achieves 95.77% accuracy and 93.36% sensitivity, the accuracy of our method is increased by 0.76%. On the Humandataset, our method achieves 98.16% accuracy and 96.80% sensitivity, the accuracy of our method is increased by 0.6%. Experiments show that our method achieves better results than other outstanding methods for sequence-based PPIs prediction. The datasets and codes are available at https://github.com/guofei-tju/RF-Ada-DF.git.


Sign in / Sign up

Export Citation Format

Share Document