Impact of corpus domain for sentiment classification: An evaluation study using supervised machine learning techniques

Support Vector ◽

Data Generation ◽

Analysis Techniques ◽

Learning Techniques ◽

Document Level

Nowadays, social media platforms, blogs, and e-commerce are commonly use to express opinion on politics, movies, products, education respectively; for election forecasting, business boosting and improvement of teaching and learning. As a result, data generation becomes easier; producing big data which requires appropriate techniques and tools to analyse easily, accurately and timely. Thus, making sentiment analysis very demanding research area. This study will investigate on what basis (sentiment classification level) or area of application (data source) do supervised machine learning approaches particularly Support Vector Machine (SVM), Naïve Bayes, and Maximum Entropy algorithms, and other technique-lexicon-based approach give the best result in sentiment analysis. Based on the review of the literature there is a contradiction on the point that SVM generated the best result in analyzing student sentiment on document level. This study also discovers that sentiment analysis differs from system to system based on polarity (types of the classes to predict: positive or negative, subjective or objective), different levels of classification (sentence, phrase, or document level) and language that is processed. This research produces a taxonomy which serves as a guide for the choice of techniques in sentiment analysis. The taxonomy explores the sentiment classification levels and data preprocessing stages. It also explores that sentiment analysis techniques were organised in to three (3) groups; Machine learning, Lexicon and hybrid or combination. The machine learning techniques were sub-grouped in to two (2) namely; supervised and unsupervised. The supervised were organized in to two (2): Classification and Regression. un-supervised machine learning techniques includes clustering and association. The clustering technique consist of k-means. Decision tree which is a classification based under supervised type of machine learning technique consist of random forest,(Akinkunmi, 2019) while the ruled-based classifiers consist of confidence criterion and support criterion. The commonly used tools are Weka, Python compiler, and R programming tool.

Application of Machine Learning Techniques to Predict Binding Affinity for Drug Targets: A Study of Cyclin-Dependent Kinase 2

Current Medicinal Chemistry ◽

10.2174/2213275912666191102162959 ◽

2020 ◽

Vol 28 (2) ◽

pp. 253-265 ◽

Cited By ~ 3

Author(s):

Gabriela Bitencourt-Ferreira ◽

Amauri Duarte da Silva ◽

Walter Filgueira de Azevedo

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

Predictive Performance ◽

Scoring Functions ◽

Cyclin Dependent Kinase ◽

Learning Models ◽

Learning Techniques ◽

Machine Learning Models

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.

Local mortality estimates during the COVID-19 pandemic in Italy

Journal of Population Economics ◽

10.1007/s00148-021-00857-y ◽

2021 ◽

Author(s):

Augusto Cerqua ◽

Roberta Di Stefano ◽

Marco Letta ◽

Sara Miccoli

Keyword(s):

Machine Learning ◽

Excess Mortality ◽

Control Method ◽

Local Level ◽

Mortality Data ◽

Official Method ◽

Learning Techniques ◽

Mortality Estimates

AbstractEstimates of the real death toll of the COVID-19 pandemic have proven to be problematic in many countries, Italy being no exception. Mortality estimates at the local level are even more uncertain as they require stringent conditions, such as granularity and accuracy of the data at hand, which are rarely met. The “official” approach adopted by public institutions to estimate the “excess mortality” during the pandemic draws on a comparison between observed all-cause mortality data for 2020 and averages of mortality figures in the past years for the same period. In this paper, we apply the recently developed machine learning control method to build a more realistic counterfactual scenario of mortality in the absence of COVID-19. We demonstrate that supervised machine learning techniques outperform the official method by substantially improving the prediction accuracy of the local mortality in “ordinary” years, especially in small- and medium-sized municipalities. We then apply the best-performing algorithms to derive estimates of local excess mortality for the period between February and September 2020. Such estimates allow us to provide insights about the demographic evolution of the first wave of the pandemic throughout the country. To help improve diagnostic and monitoring efforts, our dataset is freely available to the research community.

Malicious URL Detection Using Supervised Machine Learning Techniques

13th International Conference on Security of Information and Networks ◽

10.1145/3433174.3433592 ◽

2020 ◽

Author(s):

Vara Vundavalli ◽

Farhat Barsha ◽

Mohammad Masum ◽

Hossain Shahriar ◽

Hisham Haddad

Keyword(s):

Machine Learning ◽

Research Paper Classification using Supervised Machine Learning Techniques

2020 Intermountain Engineering, Technology and Computing (IETC) ◽

10.1109/ietc47856.2020.9249211 ◽

2020 ◽

Author(s):

Shovan Chowdhury ◽

Marco P. Schoen

Keyword(s):

Machine Learning ◽

Research Paper ◽

Learning Techniques ◽

Paper Classification

2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS) ◽

Effectuating Supervised Machine Learning Techniques for Multiclass Classification of Problematic Internet and Mobile Usage

10.1109/icccis51004.2021.9397062 ◽

2021 ◽

Author(s):

Sneha Sarkar ◽

Samanyu Bhandary ◽

Arti Arya

Keyword(s):

Machine Learning ◽

Multiclass Classification ◽

Supervised Machine Learning Techniques: An Overview with Applications to Banking

International Statistical Review ◽

10.1111/insr.12448 ◽

2021 ◽

Author(s):

Linwei Hu ◽

Jie Chen ◽

Joel Vaughan ◽

Soroush Aramideh ◽

Hanyu Yang ◽

...

Keyword(s):

Machine Learning ◽

Content-Based Image Retrieval using Local Patterns and Supervised Machine Learning Techniques

2019 Amity International Conference on Artificial Intelligence (AICAI) ◽

10.1109/aicai.2019.8701255 ◽

2019 ◽

Cited By ~ 3

Author(s):

Maher Alrahhal ◽

K.P. Supreethi

Keyword(s):

Machine Learning ◽

Image Retrieval ◽

Content Based Image Retrieval ◽

Learning Techniques ◽

Local Patterns

International Conference on Innovative Computing and Communications - Lecture Notes in Networks and Systems ◽

Empirical Analysis of Supervised Machine Learning Techniques for Cyberbullying Detection

10.1007/978-981-13-2354-6_24 ◽

2018 ◽

pp. 223-230

Author(s):

Akshi Kumar ◽

Shashwat Nayak ◽

Navya Chandra

Keyword(s):

Machine Learning ◽

Empirical Analysis ◽

Learning Techniques ◽

Cyberbullying Detection

Toward Palmprint Recognition Methodology Based Machine Learning Techniques

European Journal of Electrical Engineering and Computer Science ◽

10.24018/ejece.2020.4.4.225 ◽

2020 ◽

Vol 4 (4) ◽

Author(s):

M. M. Ata ◽

K. M. Elgamily ◽

M. A. Mohamed

Keyword(s):

Machine Learning ◽

Hough Transform ◽

Recognition Accuracy ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Palmprint Recognition ◽

Feature Vectors ◽

The presented paper proposes an algorithm for palmprint recognition using seven different machine learning algorithms. First of all, we have proposed a region of interest (ROI) extraction methodology which is a two key points technique. Secondly, we have performed some image enhancement techniques such as edge detection and morphological operations in order to make the ROI image more suitable for the Hough transform. In addition, we have applied the Hough transform in order to extract all the possible principle lines on the ROI images. We have extracted the most salient morphological features of those lines; slope and length. Furthermore, we have applied the invariant moments algorithm in order to produce 7 appropriate hues of interest. Finally, after performing a complete hybrid feature vectors, we have applied different machine learning algorithms in order to recognize palmprints effectively. Recognition accuracy have been tested by calculating precision, sensitivity, specificity, accuracy, dice, Jaccard coefficients, correlation coefficients, and training time. Seven different supervised machine learning algorithms have been implemented and utilized. The effect of forming the proposed hybrid feature vectors between Hough transform and Invariant moment have been utilized and tested. Experimental results show that the feed forward neural network with back propagation has achieved about 99.99% recognition accuracy among all tested machine learning techniques.